Abstract
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any a < 1, our algorithm takes as input a sample {(xi, yi)}i=n of n linear equations where an of the equations satisfy yi = hxi, l*i+ ? for some small noise ? and (1 - a)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/a) - a fixed constant - that contains an l that is close to l*. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/a)O(1/a8) time algorithm to find a O(1/a) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that l* has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the “identifiability to algorithms” paradigm based on the sum-of-squares method.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 32 |
State | Published - 2019 |
Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada Duration: Dec 8 2019 → Dec 14 2019 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing