Graph edge partitioning via neighborhood heuristic

Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, Zhenguo Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Scopus citations

Abstract

We consider the edge partitioning problem that partitions the edges of an input graph into multiple balanced components, while minimizing the total number of vertices replicated (one vertex might appear in more than one partition). This problem is critical in minimizing communication costs and running time for several large-scale distributed graph computation platforms (e.g., PowerGraph, Spark GraphX). We first prove that this problem is NP-hard, and then present a new partitioning heuristic with polynomial running time. We provide a worst-case upper bound of replication factor for our heuristic on general graphs. To our knowledge, we are the first to provide such bound for edge partitioning algorithms on general graphs. Applying this bound to random power-law graphs greatly improves the previous bounds of expected replication factor. Extensive experiments demonstrated that our partitioning algorithm consistently produces much smaller replication factors on various benchmark data sets than the state-of-the-art. When deployed in the production graph engine, PowerGraph, in average it reduces replication factor, communication, and running time by 54%, 66%, and 21%, respectively.

Original languageEnglish (US)
Title of host publicationKDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages605-614
Number of pages10
ISBN (Electronic)9781450348874
DOIs
StatePublished - Aug 13 2017
Externally publishedYes
Event23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada
Duration: Aug 13 2017Aug 17 2017

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F129685

Conference

Conference23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Country/TerritoryCanada
CityHalifax
Period8/13/178/17/17

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • Distributed graph mining
  • Graph edge partitioning

Fingerprint

Dive into the research topics of 'Graph edge partitioning via neighborhood heuristic'. Together they form a unique fingerprint.

Cite this