Abstract
We consider the problem of estimating Shannon's entropy H from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over H can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over H, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous measures for mixing Pitman-Yor processes to produce an approximately at prior over H. We show that the resulting "Pitman-Yor Mixture" (PYM) entropy estimator is consistent for a large class of distributions. Finally, we explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2833-2868 |
| Number of pages | 36 |
| Journal | Journal of Machine Learning Research |
| Volume | 15 |
| State | Published - Oct 1 2014 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Keywords
- Bayesian estimation
- Bayesian nonparametrics
- Dirichlet process
- Entropy
- Information theory
- Neural coding
- Pitman-Yor process