TY - GEN
T1 - Interpolating between types and tokens by estimating power-law generators
AU - Goldwater, Sharon
AU - Griffiths, Thomas L.
AU - Johnson, Mark
PY - 2005
Y1 - 2005
N2 - Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process-the Pitman-Yor process-as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
AB - Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process-the Pitman-Yor process-as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
UR - http://www.scopus.com/inward/record.url?scp=33749240868&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749240868&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33749240868
SN - 9780262232531
T3 - Advances in Neural Information Processing Systems
SP - 459
EP - 466
BT - Advances in Neural Information Processing Systems 18 - Proceedings of the 2005 Conference
T2 - 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005
Y2 - 5 December 2005 through 8 December 2005
ER -