Interpolating between types and tokens by estimating power-law generators

Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

87 Scopus citations

Abstract

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process-the Pitman-Yor process-as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 18 - Proceedings of the 2005 Conference
Pages459-466
Number of pages8
StatePublished - 2005
Externally publishedYes
Event2005 Annual Conference on Neural Information Processing Systems, NIPS 2005 - Vancouver, BC, Canada
Duration: Dec 5 2005Dec 8 2005

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Other

Other2005 Annual Conference on Neural Information Processing Systems, NIPS 2005
Country/TerritoryCanada
CityVancouver, BC
Period12/5/0512/8/05

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Interpolating between types and tokens by estimating power-law generators'. Together they form a unique fingerprint.

Cite this