Abstract
This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algorithm only produce a trivial grammar.
Original language | English (US) |
---|---|
Pages | 139-146 |
Number of pages | 8 |
State | Published - 2007 |
Externally published | Yes |
Event | Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States Duration: Apr 22 2007 → Apr 27 2007 |
Other
Other | Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 |
---|---|
Country/Territory | United States |
City | Rochester, NY |
Period | 4/22/07 → 4/27/07 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language