Abstract
In this correspondence, we present a new universal entropy estimator for stationary ergodic sources, prove almost sure convergence, and establish an upper bound on the convergence rate for finite-alphabet finite memory sources. The algorithm is motivated by data compression using the Burrows-Wheeler block sorting transform (BWT). By exploiting the property that the BWT output sequence is close to a piecewise stationary memoryless source, we can segment the output sequence and estimate probabilities in each segment. Experimental results show that our algorithm outperforms Lempel-Ziv (LZ) string-matching-based algorithms.
Original language | English (US) |
---|---|
Pages (from-to) | 1551-1561 |
Number of pages | 11 |
Journal | IEEE Transactions on Information Theory |
Volume | 50 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2004 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Block sorting
- Burrows-wheelers transform (BWT)
- Entropy estimation
- Piecewise stationary memoryless source
- Tree source