Nonparametric spherical topic modeling with word embeddings

Nematollah Kayhan Batmanghelich, Ardavan Saeedi, Karthik R. Narasimhan, Samuel J. Gershman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

64 Scopus citations

Abstract

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.1.

Original languageEnglish (US)
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages537-542
Number of pages6
ISBN (Electronic)9781510827592
DOIs
StatePublished - 2016
Externally publishedYes
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: Aug 7 2016Aug 12 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers

Other

Other54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period8/7/168/12/16

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Artificial Intelligence
  • Linguistics and Language
  • Software

Fingerprint

Dive into the research topics of 'Nonparametric spherical topic modeling with word embeddings'. Together they form a unique fingerprint.

Cite this