Bayesian inference for PCFGS via Markov chain Monte Carlo

Mark Johnson, Thomas L. Griffiths, Sharon Goldwater

Research output: Contribution to conferencePaperpeer-review

119 Scopus citations

Abstract

This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic context free grammars (PCFGs) from terminal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illustrate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrating that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algorithm only produce a trivial grammar.

Original languageEnglish (US)
Pages139-146
Number of pages8
StatePublished - Dec 1 2007
Externally publishedYes
EventHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States
Duration: Apr 22 2007Apr 27 2007

Other

OtherHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007
CountryUnited States
CityRochester, NY
Period4/22/074/27/07

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Bayesian inference for PCFGS via Markov chain Monte Carlo'. Together they form a unique fingerprint.

Cite this