A truth discovery approach with theoretical guarantee

Houping Xiao, Jing Gao, Zhaoran Wang, Shiyu Wang, Lu Su, Han Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Scopus citations

Abstract

In the information age, people can easily collect information about the same set of entities from multiple sources, among which conflicts are inevitable. This leads to an important task, truth discovery, i.e., to identify true facts (truths) via iteratively updating truths and source reliability. However, the convergence to the truths is never discussed in existing work, and thus there is no theoretical guarantee in the results of these truth discovery approaches. In contrast, in this paper we propose a truth discovery approach with theoretical guarantee. We propose a randomized gaussian mixture model (RGMM) to represent multi-source data, where truths are model parameters. We incorporate source bias which captures its reliability degree into RGMM formulation. The truth discovery task is then modeled as seeking the maximum likelihood estimate (MLE) of the truths. Based on expectation-maximization (EM) techniques, we propose population-based (i.e., on the limit of infinite data) and sample-based (i.e., on a finite set of samples) solutions for the MLE. Theoretically, we prove that both solutions are contractive to an ϵ-ball around the MLE, under certain conditions. Experimentally, we evaluate our method on both simulated and real-world datasets. Experimental results show that our method achieves high accuracy in identifying truths with convergence guarantee.

Original languageEnglish (US)
Title of host publicationKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1925-1934
Number of pages10
ISBN (Electronic)9781450342322
DOIs
StatePublished - Aug 13 2016
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: Aug 13 2016Aug 17 2016

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume13-17-August-2016

Other

Other22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Country/TerritoryUnited States
CitySan Francisco
Period8/13/168/17/16

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • Asymptotic consistency
  • Mixture model
  • Truth discovery

Fingerprint

Dive into the research topics of 'A truth discovery approach with theoretical guarantee'. Together they form a unique fingerprint.

Cite this