TY - GEN
T1 - Implicit regularization in nonconvex statistical estimation
T2 - 35th International Conference on Machine Learning, ICML 2018
AU - Ma, Cong
AU - Wang, Kaizheng
AU - Chi, Yuejie
AU - Chen, Yuxin
N1 - Publisher Copyright:
© The Author(s) 2018.
PY - 2018
Y1 - 2018
N2 - Recent years have seen a flurry of activities in designing provably efficient nonconvex optimization procedures for solving statistical estimation problems. For various problems like phase retrieval or low-rank matrix completion, state-of-the-art non- convex procedures require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. When it comes to vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in several nonconvex problems: even in the absence of explicit regularization, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial compu-tational savings. Focusing on two statistical estimation problems, i.e. solving random quadratic systems of equations and low-rank matrix completion, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. As a byprod-uct, for noisy matrix completion, we demonstrate that gradient descent enables optimal control of both en try wise and spectral-norm errors.
AB - Recent years have seen a flurry of activities in designing provably efficient nonconvex optimization procedures for solving statistical estimation problems. For various problems like phase retrieval or low-rank matrix completion, state-of-the-art non- convex procedures require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. When it comes to vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in several nonconvex problems: even in the absence of explicit regularization, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial compu-tational savings. Focusing on two statistical estimation problems, i.e. solving random quadratic systems of equations and low-rank matrix completion, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. As a byprod-uct, for noisy matrix completion, we demonstrate that gradient descent enables optimal control of both en try wise and spectral-norm errors.
UR - http://www.scopus.com/inward/record.url?scp=85052983418&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052983418&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85052983418
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 5264
EP - 5331
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
Y2 - 10 July 2018 through 15 July 2018
ER -