## Abstract

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any a < 1, our algorithm takes as input a sample {(x_{i}, y_{i})}_{i}=_{n} of n linear equations where an of the equations satisfy yi = hx_{i}, l^{*}i+ ? for some small noise ? and (1 - a)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/a) - a fixed constant - that contains an l that is close to l^{*}. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. As a corollary of our algorithmic result, we obtain a (d/a)^{O}(1^{/a}^{8)} time algorithm to find a O(1/a) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in regular directions, we give an algorithm that achieves similar guarantee under the promise that l^{*} has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. To solve the problem we introduce a new framework for list-decodable learning that strengthens the “identifiability to algorithms” paradigm based on the sum-of-squares method.

Original language | English (US) |
---|---|

Journal | Advances in Neural Information Processing Systems |

Volume | 32 |

State | Published - 2019 |

Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada Duration: Dec 8 2019 → Dec 14 2019 |

## All Science Journal Classification (ASJC) codes

- Computer Networks and Communications
- Information Systems
- Signal Processing