TY - GEN
T1 - A fully Bayesian approach to unsupervised part-of-speech tagging
AU - Goldwater, Sharon
AU - Griffiths, Thomas L.
PY - 2007
Y1 - 2007
N2 - Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.
AB - Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.
UR - http://www.scopus.com/inward/record.url?scp=84860525845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860525845&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84860525845
SN - 9781932432862
T3 - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
SP - 744
EP - 751
BT - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
T2 - 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
Y2 - 23 June 2007 through 30 June 2007
ER -