The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies

David M. Blei, Thomas L. Griffiths, Michael I. Jordan

Research output: Contribution to journalArticlepeer-review

468 Scopus citations


We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learningthe use of Bayesian nonparametric methods to infer distributions on flexible data structures.

Original languageEnglish (US)
Article number7
JournalJournal of the ACM
Issue number2
StatePublished - Jan 1 2010
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Hardware and Architecture
  • Artificial Intelligence


  • Bayesian nonparametric statistics
  • Unsupervised learning


Dive into the research topics of 'The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies'. Together they form a unique fingerprint.

Cite this