Shape Matters: Understanding the Implicit Bias of the Noise Covariance

Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations


The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise — induced by mini-batches or label perturbation — is far more effective than Gaussian noise. This paper theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et al. and Woodworth et al. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.

Original languageEnglish (US)
Pages (from-to)2315-2357
Number of pages43
JournalProceedings of Machine Learning Research
StatePublished - 2021
Event34th Conference on Learning Theory, COLT 2021 - Boulder, United States
Duration: Aug 15 2021Aug 19 2021

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


  • Implicit regularization
  • implicit bias
  • over-parameterization


Dive into the research topics of 'Shape Matters: Understanding the Implicit Bias of the Noise Covariance'. Together they form a unique fingerprint.

Cite this