Universal entropy estimation via block sorting

Haixiao Cai, Sanjeev R. Kulkarni, Sergio Verdú

Research output: Contribution to journalLetterpeer-review

51 Scopus citations

Abstract

In this correspondence, we present a new universal entropy estimator for stationary ergodic sources, prove almost sure convergence, and establish an upper bound on the convergence rate for finite-alphabet finite memory sources. The algorithm is motivated by data compression using the Burrows-Wheeler block sorting transform (BWT). By exploiting the property that the BWT output sequence is close to a piecewise stationary memoryless source, we can segment the output sequence and estimate probabilities in each segment. Experimental results show that our algorithm outperforms Lempel-Ziv (LZ) string-matching-based algorithms.

Original languageEnglish (US)
Pages (from-to)1551-1561
Number of pages11
JournalIEEE Transactions on Information Theory
Volume50
Issue number7
DOIs
StatePublished - Jul 2004

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Keywords

  • Block sorting
  • Burrows-wheelers transform (BWT)
  • Entropy estimation
  • Piecewise stationary memoryless source
  • Tree source

Fingerprint

Dive into the research topics of 'Universal entropy estimation via block sorting'. Together they form a unique fingerprint.

Cite this