Abstract
Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide a novel theoretical analysis for adaptive regularization in nonconvex optimization settings. The core of our algorithm, termed GGT, consists of the efficient computation of the inverse square root of a low-rank matrix. Our preliminary experiments show improved iteration-wise convergence rates across synthetic tasks and standard deep learning benchmarks, and that the more carefullypreconditioned steps sometimes lead to a better solution.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 102-110 |
| Number of pages | 9 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 97 |
| State | Published - 2019 |
| Event | 36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States Duration: Jun 9 2019 → Jun 15 2019 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Efficient Full-Matrix Adaptive Regularization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver