TY - GEN
T1 - Using vocabulary knowledge in Bayesian multinomial estimation
AU - Griffiths, Thomas L.
AU - Tenenbaum, Joshua B.
PY - 2002/1/1
Y1 - 2002/1/1
N2 - Estimating the parameters of sparse multinomial distributions is an important component of many statistical learning tasks. Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data.
AB - Estimating the parameters of sparse multinomial distributions is an important component of many statistical learning tasks. Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data.
UR - http://www.scopus.com/inward/record.url?scp=84887129778&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887129778&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84887129778
SN - 0262042088
SN - 9780262042086
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 14 - Proceedings of the 2001 Conference, NIPS 2001
PB - Neural information processing systems foundation
T2 - 15th Annual Neural Information Processing Systems Conference, NIPS 2001
Y2 - 3 December 2001 through 8 December 2001
ER -