Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

Research output: Contribution to journalConference articlepeer-review

Abstract

This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted K nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from O(NK), the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley’s computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart.

Original languageEnglish (US)
Pages (from-to)2557-2565
Number of pages9
JournalProceedings of Machine Learning Research
Volume238
StatePublished - 2024
Event27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024 - Valencia, Spain
Duration: May 2 2024May 4 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Efficient Data Shapley for Weighted Nearest Neighbor Algorithms'. Together they form a unique fingerprint.

Cite this