Online learning for matrix factorization and sparse coding

Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro

Research output: Contribution to journalArticlepeer-review

2150 Scopus citations

Abstract

Sparse coding-that is, modelling data vectors as sparse linear combinations of basis elements-is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets.

Original languageEnglish (US)
Pages (from-to)19-60
Number of pages42
JournalJournal of Machine Learning Research
Volume11
StatePublished - 2010
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Keywords

  • Basis pursuit
  • Dictionary learning
  • Matrix factorization
  • Nonnegative matrix factorization
  • Online learning
  • Sparse coding
  • Sparse principal component analysis
  • Stochastic approximations
  • Stochastic optimization

Fingerprint

Dive into the research topics of 'Online learning for matrix factorization and sparse coding'. Together they form a unique fingerprint.

Cite this