DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

Ahmed Khaled, Konstantin Mishchenko, Chi Jin

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient-matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal-automatically adapting to both smooth and nonsmooth problems. While popular algorithms following the AdaGrad framework compute a running average of the squared gradients to use for normalization, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume36
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method'. Together they form a unique fingerprint.

Cite this