TY - GEN
T1 - Reconstructing Graph Diffusion History from a Single Snapshot
AU - Qiu, Ruizhong
AU - Wang, Dingsu
AU - Ying, Lei
AU - Poor, H. Vincent
AU - Zhang, Yifang
AU - Tong, Hanghang
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/6
Y1 - 2023/8/6
N2 - Diffusion on graphs is ubiquitous with numerous high-impact applications, ranging from the study of residential segregation in socioeconomics and activation cascading in neuroscience, to the modeling of disease contagion in epidemiology and malware spreading in cybersecurity. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis-Hastings Markov chain Monte Carlo method (M-H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M-H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method. Our code is available at https://github.com/q-rz/KDD23-DITTO. The appendix can be found at https://arxiv.org/abs/2306.00488.
AB - Diffusion on graphs is ubiquitous with numerous high-impact applications, ranging from the study of residential segregation in socioeconomics and activation cascading in neuroscience, to the modeling of disease contagion in epidemiology and malware spreading in cybersecurity. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis-Hastings Markov chain Monte Carlo method (M-H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M-H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method. Our code is available at https://github.com/q-rz/KDD23-DITTO. The appendix can be found at https://arxiv.org/abs/2306.00488.
KW - graph diffusion
KW - graph neural network (gnn)
KW - history reconstruction
KW - markov chain monte carlo (mcmc)
UR - http://www.scopus.com/inward/record.url?scp=85171375401&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171375401&partnerID=8YFLogxK
U2 - 10.1145/3580305.3599488
DO - 10.1145/3580305.3599488
M3 - Conference contribution
AN - SCOPUS:85171375401
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1978
EP - 1988
BT - KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
Y2 - 6 August 2023 through 10 August 2023
ER -