TY - GEN
T1 - How to store a random walk
AU - Viola, Emanuele
AU - Weinstein, Omri
AU - Yu, Huacheng
N1 - Publisher Copyright:
Copyright © 2020 by SIAM
PY - 2020
Y1 - 2020
N2 - Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files X := (X1, X2, . . ., Xn) ∼ µ which are correlated (Hµ(X) ≪ ∑ i Hµ(Xi)), using as little (expected) memory as possible, such that each individual file Xi can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by Pǎtraşcu (FOCS'08) and subsequently by Dodis, Pǎtraşcu and Thorup (STOC'10) shows that it is possible to store X using just a constant number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each Xi in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least Ω(n/poly lg n) extra bits for constant decoding time, even for “simple” joint distributions µ. We focus on the natural case of compressing Markov chains, i.e., storing a length-n random walk on any (possibly directed) graph G. Denoting by κ(G, n) the number of length-n walks on G, we show that there is a succinct data structure storing a random walk using lg2 κ(G, n) + O(lg n) bits of space, such that any vertex along the walk can be decoded in O(1) time on a word-RAM. If the graph is strongly connected (e.g., undirected), the space can be improved to only lg2 κ(G, n) + 5 extra bits. For the harder task of matching the point-wise optimal space of the walk, i.e., the empirical entropy ∑ ni=1−1 lg(deg(vi)), we present a data structure with O(1) extra bits at the price of O(lg n) decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the online version of the problem with constant update and query time.
AB - Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files X := (X1, X2, . . ., Xn) ∼ µ which are correlated (Hµ(X) ≪ ∑ i Hµ(Xi)), using as little (expected) memory as possible, such that each individual file Xi can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by Pǎtraşcu (FOCS'08) and subsequently by Dodis, Pǎtraşcu and Thorup (STOC'10) shows that it is possible to store X using just a constant number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each Xi in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least Ω(n/poly lg n) extra bits for constant decoding time, even for “simple” joint distributions µ. We focus on the natural case of compressing Markov chains, i.e., storing a length-n random walk on any (possibly directed) graph G. Denoting by κ(G, n) the number of length-n walks on G, we show that there is a succinct data structure storing a random walk using lg2 κ(G, n) + O(lg n) bits of space, such that any vertex along the walk can be decoded in O(1) time on a word-RAM. If the graph is strongly connected (e.g., undirected), the space can be improved to only lg2 κ(G, n) + 5 extra bits. For the harder task of matching the point-wise optimal space of the walk, i.e., the empirical entropy ∑ ni=1−1 lg(deg(vi)), we present a data structure with O(1) extra bits at the price of O(lg n) decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the online version of the problem with constant update and query time.
UR - http://www.scopus.com/inward/record.url?scp=85084048578&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084048578&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85084048578
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 426
EP - 445
BT - 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020
A2 - Chawla, Shuchi
PB - Association for Computing Machinery
T2 - 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020
Y2 - 5 January 2020 through 8 January 2020
ER -