TY - GEN

T1 - A comparison of new and old algorithms for a mixture estimation problem

AU - Helmbold, David P.

AU - Singer, Yoram

AU - Schapire, Robert E.

AU - Warmuth, Manfred K.

N1 - Funding Information:
We thank Jyrki Kivinen for helpful discussions. Yoram Singer acknowledges the Clore Foundation for its support. Part of this research was done while he was visiting AT&T Bell Labs, and the Computer and Information Sciences department at the University of California, Santa Cruz. Manfred K. Warmuth received funding from IWF grant IRI-9123692
Publisher Copyright:
© 1995 ACM.

PY - 1995/7/5

Y1 - 1995/7/5

N2 - We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EG,). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMη which, for η = 1, gives the usual EM update. Experimentally, both the EMη-update and the EGη-update for η > 1 outperform the EM algorithm and its variants. We also prove a worst-case polynomial bound on the global rate of convergence of the EGη algorithm.

AB - We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EG,). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMη which, for η = 1, gives the usual EM update. Experimentally, both the EMη-update and the EGη-update for η > 1 outperform the EM algorithm and its variants. We also prove a worst-case polynomial bound on the global rate of convergence of the EGη algorithm.

UR - http://www.scopus.com/inward/record.url?scp=85019130815&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019130815&partnerID=8YFLogxK

U2 - 10.1145/225298.225306

DO - 10.1145/225298.225306

M3 - Conference contribution

AN - SCOPUS:85019130815

T3 - Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995

SP - 69

EP - 78

BT - Proceedings of the 8th Annual Conference on Computational Learning Theory, COLT 1995

PB - Association for Computing Machinery, Inc

T2 - 8th Annual Conference on Computational Learning Theory, COLT 1995

Y2 - 5 July 1995 through 8 July 1995

ER -