TY - GEN
T1 - Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep models
AU - Nacson, Mor Shpigel
AU - Gunasekar, Suriya
AU - Lee, Jason D.
AU - Srebro, Nathan
AU - Soudry, Daniel
N1 - Funding Information:
The authors are grateful to C. Zeno, and N. Merlis for helpful comments on the manuscript. This research was supported by the Israel Science foundation (grant No. 31/1031), and by the Taub foundation. SG and NS were partially supported by NSF awards IIS-1302662 and IIS-1764032.
Publisher Copyright:
© 2019 by the author(s).
PY - 2019
Y1 - 2019
N2 - With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models. To this end we study the limit of loss minimization with a diverging norm constraint (the "constrained path"), relate it to the limit of a "margin path" and characterize the resulting solution. For non-homogeneous ensemble models, which output is a sum of homogeneous sub-models, we show that this solution discards the shallowest sub-models if they are unnecessary. For homogeneous models, we show convergence to a "lexicographic max-margin solution", and provide conditions under which max-margin solutions are also attained as the limit of unconstrained gradient descent.
AB - With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models. To this end we study the limit of loss minimization with a diverging norm constraint (the "constrained path"), relate it to the limit of a "margin path" and characterize the resulting solution. For non-homogeneous ensemble models, which output is a sum of homogeneous sub-models, we show that this solution discards the shallowest sub-models if they are unnecessary. For homogeneous models, we show convergence to a "lexicographic max-margin solution", and provide conditions under which max-margin solutions are also attained as the limit of unconstrained gradient descent.
UR - http://www.scopus.com/inward/record.url?scp=85077970068&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077970068&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85077970068
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 8224
EP - 8233
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -