Counting Motifs with Graph Sampling

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Abstract

Applied researchers often construct a network from data that has been collected from a random sample of nodes, with the goal to infer properties of the parent network from the sampled version. Two of the most widely used sampling schemes are subgraph sampling, where we sample each vertex independently with probability p and observe the subgraph induced by the sampled vertices, and neighborhood sampling, where we additionally observe the edges between the sampled vertices and their neighbors. In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective. We show that: for parent graph G with maximal degree d, for any connected motif h on k vertices, to estimate the number of copies of h in G, denoted by s = s(h, G), with a multiplicative error of ε, • For subgraph sampling, the optimal sampling ratio p is Θk(max{(sε2)− k1dsε k−21 }), which only depends on the size of the motif but not its actual topology. Furthermore, we show that Horvitz-Thompson type estimators are universally optimal for any connected motifs. • For neighborhood sampling, we propose a family of estimators, encompassing and outperforming the Horvitz-Thompson estimator and achieving the sampling ratio Ok(min{(sε d2 ) k−1 1q dsε k−22 }), which again only depends on the size of h. This is shown to be optimal for all motifs with at most 4 vertices and cliques of all sizes. The matching minimax lower bounds are established using certain algebraic properties of subgraph counts. These results allow us to quantify how much more informative neighborhood sampling is than subgraph sampling, as empirically verified by experiments on synthetic and real-world data. We also address the issue of adaptation to the unknown maximum degree, and study specific problems for parent graphs with additional structures, e.g., trees or planar graphs.

Original languageEnglish (US)
Pages (from-to)1966-2011
Number of pages46
JournalProceedings of Machine Learning Research
Volume75
StatePublished - 2018
Externally publishedYes
Event31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden
Duration: Jul 6 2018Jul 9 2018

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Keywords

  • Graph sampling
  • Horvitz-Thompson estimator
  • graph homomorphism numbers
  • minimax lower bounds
  • network motifs

Fingerprint

Dive into the research topics of 'Counting Motifs with Graph Sampling'. Together they form a unique fingerprint.

Cite this