TY - GEN

T1 - Reconstruction of Markov random fields from samples

T2 - 11th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2008 and 12th International Workshop on Randomization and Computation, RANDOM 2008

AU - Bresler, Guy

AU - Mossel, Elchanan

AU - Sly, Allan

N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.

PY - 2008

Y1 - 2008

N2 - Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide an explicit O(dnd+2log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n2log n) for all fixed d. In the full-length version we also discuss the effect of observing noisy samples. There we show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered.

AB - Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide an explicit O(dnd+2log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n2log n) for all fixed d. In the full-length version we also discuss the effect of observing noisy samples. There we show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered.

UR - http://www.scopus.com/inward/record.url?scp=51849099933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51849099933&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-85363-3_28

DO - 10.1007/978-3-540-85363-3_28

M3 - Conference contribution

AN - SCOPUS:51849099933

SN - 3540853626

SN - 9783540853626

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 343

EP - 356

BT - Approximation, Randomization and Combinatorial Optimization

Y2 - 25 August 2008 through 27 August 2008

ER -