Abstract
This paper studies universal estimation of divergence from the realizations of two unknown finite-alphabet sources. Two algorithms that borrow techniques from data compression are presented. The first divergence estimator applies the Burrows-Wheeler block sorting transform to the concatenation of the two realizations; consistency of this estimator is shown for all finite-memory sources. The second divergence estimator is based on the Context Tree Weighting method; consistency is shown for all sources whose memory length does not exceed a known bound. Experimental results show that both algorithms perform similarly and outperform string-matching and plug-in methods.
Original language | English (US) |
---|---|
Pages (from-to) | 3456-3475 |
Number of pages | 20 |
Journal | IEEE Transactions on Information Theory |
Volume | 52 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2006 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Block sorting
- Burrows-Wheeler transform
- Context tree weighting method
- Divergence estimation
- Information divergence
- Markov sources
- Universal methods