Automated reconstruction of ancient languages using probabilistic models of sound change

Alexandre Bouchard-Côté, David Hall, Thomas L. Griffiths, Dan Klein

Research output: Contribution to journalArticlepeer-review

86 Scopus citations

Abstract

One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system's reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.

Original languageEnglish (US)
Pages (from-to)4224-4229
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume110
Issue number11
DOIs
StatePublished - Mar 12 2013
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General

Keywords

  • Ancestral
  • Computational
  • Diachronic

Fingerprint

Dive into the research topics of 'Automated reconstruction of ancient languages using probabilistic models of sound change'. Together they form a unique fingerprint.

Cite this