TY - GEN
T1 - Training restricted boltzmann machines on word observations
AU - Dahl, George E.
AU - Adams, Ryan P.
AU - Larochelle, Hugo
PY - 2012
Y1 - 2012
N2 - The restricted Boltzmann machine (RBM) is a flexible model for complex data. However, using RBMs for high-dimensional multinomial observations poses significant computational difficulties. In natural language processing applications, words are naturally modeled by K-ary discrete distributions, where K is determined by the vocabulary size and can easily be in the hundred thousands. The conventional approach to training RBMs on word observations is limited because it requires sampling the states of K-way softmax visible units during block Gibbs updates, an operation that takes time linear in K. In this work, we address this issue with a more general class of Markov chain Monte Carlo operators on the visible units, yielding updates with computational complexity independent of K. We demonstrate the success of our approach by training RBMs on hundreds of millions of word n-grams using larger vocabularies than previously feasible with RBMs and by using the learned features to improve performance on chunking and sentiment classification tasks, achieving state-of-the-art results on the latter.
AB - The restricted Boltzmann machine (RBM) is a flexible model for complex data. However, using RBMs for high-dimensional multinomial observations poses significant computational difficulties. In natural language processing applications, words are naturally modeled by K-ary discrete distributions, where K is determined by the vocabulary size and can easily be in the hundred thousands. The conventional approach to training RBMs on word observations is limited because it requires sampling the states of K-way softmax visible units during block Gibbs updates, an operation that takes time linear in K. In this work, we address this issue with a more general class of Markov chain Monte Carlo operators on the visible units, yielding updates with computational complexity independent of K. We demonstrate the success of our approach by training RBMs on hundreds of millions of word n-grams using larger vocabularies than previously feasible with RBMs and by using the learned features to improve performance on chunking and sentiment classification tasks, achieving state-of-the-art results on the latter.
UR - http://www.scopus.com/inward/record.url?scp=84867123033&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867123033&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84867123033
SN - 9781450312851
T3 - Proceedings of the 29th International Conference on Machine Learning, ICML 2012
SP - 679
EP - 686
BT - Proceedings of the 29th International Conference on Machine Learning, ICML 2012
T2 - 29th International Conference on Machine Learning, ICML 2012
Y2 - 26 June 2012 through 1 July 2012
ER -