Distributed stochastic variance reduced gradient methods by sampling extra data with replacement

Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang

Research output: Contribution to journalArticlepeer-review

44 Scopus citations


We study the round complexity of minimizing the average of convex functions under a new setting of distributed optimization where each machine can receive two subsets of functions. The first subset is from a random partition and the second subset is randomly sampled with replacement. Under this setting, we define a broad class of distributed algorithms whose local computation can utilize both subsets and design a distributed stochastic variance reduced gradient method belonging to in this class. When the condition number of the problem is small, our method achieves the optimal parallel runtime, amount of communication and rounds of communication among all distributed first-order methods up to constant factors. When the condition number is relatively large, a lower bound is provided for the number of rounds of communication needed by any algorithm in this class. Then, we present an accelerated version of our method whose the rounds of communication matches the lower bound up to logarithmic terms, which establishes that this accelerated algorithm has the lowest round complexity among all algorithms in our class under this new setting.

Original languageEnglish (US)
Pages (from-to)1-43
Number of pages43
JournalJournal of Machine Learning Research
StatePublished - Nov 1 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability


  • Communication complexity
  • Distributed optimization
  • First-order method
  • Lower bound
  • Stochastic variance reduced gradient


Dive into the research topics of 'Distributed stochastic variance reduced gradient methods by sampling extra data with replacement'. Together they form a unique fingerprint.

Cite this