## Abstract

Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files X := (X_{1}, X_{2}, . . ., X_{n}) ∼ µ which are correlated (H_{µ}(X) ≪ ∑ _{i} H_{µ}(X_{i})), using as little (expected) memory as possible, such that each individual file X_{i} can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by Pǎtraşcu (FOCS'08) and subsequently by Dodis, Pǎtraşcu and Thorup (STOC'10) shows that it is possible to store X using just a constant number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each X_{i} in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least Ω(n/poly lg n) extra bits for constant decoding time, even for “simple” joint distributions µ. We focus on the natural case of compressing Markov chains, i.e., storing a length-n random walk on any (possibly directed) graph G. Denoting by κ(G, n) the number of length-n walks on G, we show that there is a succinct data structure storing a random walk using lg_{2} κ(G, n) + O(lg n) bits of space, such that any vertex along the walk can be decoded in O(1) time on a word-RAM. If the graph is strongly connected (e.g., undirected), the space can be improved to only lg_{2} κ(G, n) + 5 extra bits. For the harder task of matching the point-wise optimal space of the walk, i.e., the empirical entropy ∑ ^{n}_{i}_{=1}^{−}^{1} lg(deg(v_{i})), we present a data structure with O(1) extra bits at the price of O(lg n) decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the online version of the problem with constant update and query time.

Original language | English (US) |
---|---|

Title of host publication | 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 |

Editors | Shuchi Chawla |

Publisher | Association for Computing Machinery |

Pages | 426-445 |

Number of pages | 20 |

ISBN (Electronic) | 9781611975994 |

State | Published - 2020 |

Event | 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 - Salt Lake City, United States Duration: Jan 5 2020 → Jan 8 2020 |

### Publication series

Name | Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms |
---|---|

Volume | 2020-January |

### Conference

Conference | 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 |
---|---|

Country/Territory | United States |

City | Salt Lake City |

Period | 1/5/20 → 1/8/20 |

## All Science Journal Classification (ASJC) codes

- Software
- General Mathematics