Skip to main navigation Skip to search Skip to main content

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

  • Ahmed Khaled
  • , Konstantin Mishchenko
  • , Chi Jin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient-matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal-automatically adapting to both smooth and nonsmooth problems. While popular algorithms following the AdaGrad framework compute a running average of the squared gradients to use for normalization, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period12/10/2312/16/23

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method'. Together they form a unique fingerprint.

Cite this