TY - JOUR

T1 - A Proof of the Block Model Threshold Conjecture

AU - Mossel, Elchanan

AU - Neeman, Joe

AU - Sly, Allan

N1 - Funding Information:
∗Supported by NSF grant DMS-1106999, NSF grant CCF 1320105 and DOD ONR grant N000141110140. † Supported by NSF grant DMS-1106999 and DOD ONR grant N000141110140. ‡ Supported by an Alfred Sloan Fellowship and NSF grant DMS-1208338.
Publisher Copyright:
© 2017, János Bolyai Mathematical Society and Springer-Verlag GmbH Germany, part of Springer Nature.

PY - 2018/6/1

Y1 - 2018/6/1

N2 - We study a random graph model called the “stochastic block model” in statistics and the “planted partition model” in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q′. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová [9], based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q=a/n and q′=b/n, s=(a−b)/2 and d=(a+b)/2, then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s2>d and impossible if s22>Cdlnd for sufficiently large C. In a previous work, we proved that indeed it is information theoretically impossible to cluster if s2 ≤ d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s2 < d. Here we prove the rest of the conjecture by providing an efficient algorithm for clustering in a way that is correlated with the true partition when s2>d. A different independent proof of the same result was recently obtained by Massoulié [20].

AB - We study a random graph model called the “stochastic block model” in statistics and the “planted partition model” in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q′. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová [9], based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q=a/n and q′=b/n, s=(a−b)/2 and d=(a+b)/2, then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s2>d and impossible if s22>Cdlnd for sufficiently large C. In a previous work, we proved that indeed it is information theoretically impossible to cluster if s2 ≤ d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s2 < d. Here we prove the rest of the conjecture by providing an efficient algorithm for clustering in a way that is correlated with the true partition when s2>d. A different independent proof of the same result was recently obtained by Massoulié [20].

UR - http://www.scopus.com/inward/record.url?scp=85028556763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028556763&partnerID=8YFLogxK

U2 - 10.1007/s00493-016-3238-8

DO - 10.1007/s00493-016-3238-8

M3 - Article

AN - SCOPUS:85028556763

VL - 38

SP - 665

EP - 708

JO - Combinatorica

JF - Combinatorica

SN - 0209-9683

IS - 3

ER -