Bayesian entropy estimation for countable discrete distributions

Evan Archer, Il Memming Park, Jonathan William Pillow

Research output: Contribution to journalArticlepeer-review

53 Scopus citations


We consider the problem of estimating Shannon's entropy H from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over H can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over H, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous measures for mixing Pitman-Yor processes to produce an approximately at prior over H. We show that the resulting "Pitman-Yor Mixture" (PYM) entropy estimator is consistent for a large class of distributions. Finally, we explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.

Original languageEnglish (US)
Pages (from-to)2833-2868
Number of pages36
JournalJournal of Machine Learning Research
StatePublished - Oct 1 2014

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence


  • Bayesian estimation
  • Bayesian nonparametrics
  • Dirichlet process
  • Entropy
  • Information theory
  • Neural coding
  • Pitman-Yor process


Dive into the research topics of 'Bayesian entropy estimation for countable discrete distributions'. Together they form a unique fingerprint.

Cite this