TY - GEN
T1 - Contextual dependencies in unsupervised word segmentation
AU - Goldwater, Sharon
AU - Griffiths, Thomas L.
AU - Johnson, Mark
PY - 2006
Y1 - 2006
N2 - Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures.
AB - Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures.
UR - http://www.scopus.com/inward/record.url?scp=78650869754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650869754&partnerID=8YFLogxK
U2 - 10.3115/1220175.1220260
DO - 10.3115/1220175.1220260
M3 - Conference contribution
AN - SCOPUS:78650869754
SN - 1932432655
SN - 9781932432657
T3 - COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 673
EP - 680
BT - COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLING/ACL 2006
Y2 - 17 July 2006 through 21 July 2006
ER -