TY - GEN

T1 - Training algorithms for hidden Markov models using entropy based distance functions

AU - Singer, Yoram

AU - Warmuth, Manfred K.

PY - 1997

Y1 - 1997

N2 - We present new algorithms for parameter estimation of HMMs. By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance measure between them. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligibly more time per iteration and an approximated version uses the same expectation step as Baum-Welch. We evaluate experimentally the new algorithms on synthetic and natural speech pronunciation data. For sparse models, i.e. models with relatively small number of non-zero parameters, the proposed algorithms require significantly fewer iterations.

AB - We present new algorithms for parameter estimation of HMMs. By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance measure between them. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligibly more time per iteration and an approximated version uses the same expectation step as Baum-Welch. We evaluate experimentally the new algorithms on synthetic and natural speech pronunciation data. For sparse models, i.e. models with relatively small number of non-zero parameters, the proposed algorithms require significantly fewer iterations.

UR - http://www.scopus.com/inward/record.url?scp=84899029607&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899029607&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84899029607

SN - 0262100657

SN - 9780262100656

T3 - Advances in Neural Information Processing Systems

SP - 641

EP - 647

BT - Advances in Neural Information Processing Systems 9 - Proceedings of the 1996 Conference, NIPS 1996

PB - Neural information processing systems foundation

T2 - 10th Annual Conference on Neural Information Processing Systems, NIPS 1996

Y2 - 2 December 1996 through 5 December 1996

ER -