TY - GEN
T1 - Learning probabilistic automata with variable memory length
AU - Ron, Dana
AU - Singer, Yoram
AU - Tishby, Naftali
N1 - Publisher Copyright:
© 1994 ACM.
PY - 1994/7/16
Y1 - 1994/7/16
N2 - We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in man-machine interaction such as handwriting and speech recognition. Conventionally used fixed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed efficiently learn distributions generated by our more restricted sources. In Particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high confidence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learning the structure of natural English text and using our hypothesis for the correction of corrupted text.
AB - We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in man-machine interaction such as handwriting and speech recognition. Conventionally used fixed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed efficiently learn distributions generated by our more restricted sources. In Particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high confidence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learning the structure of natural English text and using our hypothesis for the correction of corrupted text.
UR - http://www.scopus.com/inward/record.url?scp=85013571397&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013571397&partnerID=8YFLogxK
U2 - 10.1145/180139.181006
DO - 10.1145/180139.181006
M3 - Conference contribution
AN - SCOPUS:85013571397
T3 - Proceedings of the Annual ACM Conference on Computational Learning Theory
SP - 35
EP - 46
BT - Proceedings of the 7th Annual Conference on Computational Learning Theory, COLT 1994
PB - Association for Computing Machinery
T2 - 7th Annual Conference on Computational Learning Theory, COLT 1994
Y2 - 12 July 1994 through 15 July 1994
ER -