Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix Z between the original features X and a set of new features X̃ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing Z, a non-commutative polynomial of random matrices, with the sample covariance matrix of X asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum.

Original languageEnglish (US)
Pages (from-to)736-770
Number of pages35
JournalProceedings of Machine Learning Research
Volume272
StatePublished - 2025
Event36th International Conference on Algorithmic Learning Theory, ALT 2025 - Milan, Italy
Duration: Feb 24 2025Feb 27 2025

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression'. Together they form a unique fingerprint.

Cite this