Logistic regression, AdaBoost and Bregman distances

Michael Collins, Robert E. Schapire, Yoram Singer

Research output: Contribution to journalArticlepeer-review

455 Scopus citations


We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms are iterative and can be divided into two types based on whether the parameters are updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms that includes both a sequential- and a parallel-update algorithm as special cases, thus showing how the sequential and parallel approaches can themselves be unified. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliary-function proof technique. As one of our sequential-update algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with the iterative scaling algorithm. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.

Original languageEnglish (US)
Pages (from-to)253-285
Number of pages33
JournalMachine Learning
Issue number1-3
StatePublished - Jul 2002

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence


  • AdaBoost
  • Boosting
  • Bregman distances
  • Convex optimization
  • Information geometry
  • Iterative scaling
  • Logistic regression
  • Maximum-entropy methods


Dive into the research topics of 'Logistic regression, AdaBoost and Bregman distances'. Together they form a unique fingerprint.

Cite this