Abstract
In this correspondence, we present a new universal entropy estimator for stationary ergodic sources, prove almost sure convergence, and establish an upper bound on the convergence rate for finite-alphabet finite memory sources. The algorithm is motivated by data compression using the Burrows-Wheeler block sorting transform (BWT). By exploiting the property that the BWT output sequence is close to a piecewise stationary memoryless source, we can segment the output sequence and estimate probabilities in each segment. Experimental results show that our algorithm outperforms Lempel-Ziv (LZ) string-matching-based algorithms.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1551-1561 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Information Theory |
| Volume | 50 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 2004 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Block sorting
- Burrows-wheelers transform (BWT)
- Entropy estimation
- Piecewise stationary memoryless source
- Tree source