List-decodeable linear regression

Sushrut Karmalkar, Adam R. Klivans, Pravesh K. Kothari

Research output: Contribution to journalConference articlepeer-review

42 Scopus citations

Abstract

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any a < 1, our algorithm takes as input a sample {(xi, yi)}i=n of n linear equations where an of the equations satisfy yi = hxi, l*i+ ? for some small noise ? and (1 - a)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/a) - a fixed constant - that contains an l that is close to l*. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/a)O(1/a8) time algorithm to find a O(1/a) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that l* has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the “identifiability to algorithms” paradigm based on the sum-of-squares method.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: Dec 8 2019Dec 14 2019

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'List-decodeable linear regression'. Together they form a unique fingerprint.

Cite this