TY - GEN

T1 - Near-optimal two-pass streaming algorithm for sampling random walks over directed graphs

AU - Chen, Lijie

AU - Kol, Gillat

AU - Paramonov, Dmitry

AU - Saxena, Raghuvansh R.

AU - Song, Zhao

AU - Yu, Huacheng

N1 - Funding Information:
Funding Lijie Chen: Lijie Chen is supported by an IBM Fellowship. Zhao Song: Zhao Song is supported in part by Schmidt Foundation, Simons Foundation, NSF, DARPA/SRC, Google and Amazon AWS.
Publisher Copyright:
© 2021 Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu.

PY - 2021/7/1

Y1 - 2021/7/1

N2 - For a directed graph G with n vertices and a start vertex ustart, we wish to (approximately) sample an L-step random walk over G starting from ustart with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be Θ(n · L). Prior to our work, a better space complexity was only known with Õ(√L) passes. We essentially settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is Θ(n · √L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that any p-pass algorithm for this problem uses Ω(n · L1/p) space. In addition, we show a similar Θ(n · √L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

AB - For a directed graph G with n vertices and a start vertex ustart, we wish to (approximately) sample an L-step random walk over G starting from ustart with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be Θ(n · L). Prior to our work, a better space complexity was only known with Õ(√L) passes. We essentially settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is Θ(n · √L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that any p-pass algorithm for this problem uses Ω(n · L1/p) space. In addition, we show a similar Θ(n · √L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

KW - Random walk sampling

KW - Streaming algorithms

UR - http://www.scopus.com/inward/record.url?scp=85113892206&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85113892206&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ICALP.2021.52

DO - 10.4230/LIPIcs.ICALP.2021.52

M3 - Conference contribution

AN - SCOPUS:85113892206

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021

A2 - Bansal, Nikhil

A2 - Merelli, Emanuela

A2 - Worrell, James

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021

Y2 - 12 July 2021 through 16 July 2021

ER -