The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies

David M. Blei, Thomas L. Griffiths, Michael I. Jordan

Research output: Contribution to journalArticlepeer-review

473 Scopus citations

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learningthe use of Bayesian nonparametric methods to infer distributions on flexible data structures.

Original languageEnglish (US)
Article number7
JournalJournal of the ACM
Volume57
Issue number2
DOIs
StatePublished - Jan 1 2010
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Hardware and Architecture
  • Artificial Intelligence

Keywords

  • Bayesian nonparametric statistics
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies'. Together they form a unique fingerprint.

Cite this