TY - GEN
T1 - Beyond Moments
T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023
AU - Jia, He
AU - Kothari, Pravesh K.
AU - Vempala, Santosh S.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an ϵ-corrupted sample from a distribution D obtained by applying an unknown affine transformation x → A x+b to the uniform distribution on a d-dimensional hypercube [-1,1]d, our algorithm constructs A, b such that the total variation distance of the distribution D from D is O(ϵ) using poly (d) time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying ϵ. In particular, if the rows of A are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the ℓ_2 distances between the row vectors of A and A′, ∑_i=1d||a_(i)-a_(i)||_2=O(ϵ). In contrast, the strongest known prior results only yield an ϵO(1) (relative) bound on the distance between individual a_i's and their estimates and translate into an O(d ϵO(1)) bound on the total variation distance.Prior algorithms for this problem rely on implementing standard approaches [12] for ICA based on the classical method of moments [18], [32] combined with robust moment estimators. We prove that any approach that relies on method of moments must provably fail to obtain a dimension independent bound on the total error ∑_i||a_(i)-a_(i)||_2 (and consequently, also in total variation distance). Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm, Robust Gradient Descent, is based on a new method that iteratively improves its estimate of the unknown affine transformation whenever the requirements of the certificate are not met.
AB - We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an ϵ-corrupted sample from a distribution D obtained by applying an unknown affine transformation x → A x+b to the uniform distribution on a d-dimensional hypercube [-1,1]d, our algorithm constructs A, b such that the total variation distance of the distribution D from D is O(ϵ) using poly (d) time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying ϵ. In particular, if the rows of A are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the ℓ_2 distances between the row vectors of A and A′, ∑_i=1d||a_(i)-a_(i)||_2=O(ϵ). In contrast, the strongest known prior results only yield an ϵO(1) (relative) bound on the distance between individual a_i's and their estimates and translate into an O(d ϵO(1)) bound on the total variation distance.Prior algorithms for this problem rely on implementing standard approaches [12] for ICA based on the classical method of moments [18], [32] combined with robust moment estimators. We prove that any approach that relies on method of moments must provably fail to obtain a dimension independent bound on the total error ∑_i||a_(i)-a_(i)||_2 (and consequently, also in total variation distance). Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm, Robust Gradient Descent, is based on a new method that iteratively improves its estimate of the unknown affine transformation whenever the requirements of the certificate are not met.
KW - Affine Transformation
KW - Independent Component Analysis
KW - Method of Moments
KW - Robust Statistics
UR - http://www.scopus.com/inward/record.url?scp=85182401410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182401410&partnerID=8YFLogxK
U2 - 10.1109/FOCS57990.2023.00147
DO - 10.1109/FOCS57990.2023.00147
M3 - Conference contribution
AN - SCOPUS:85182401410
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 2408
EP - 2429
BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023
PB - IEEE Computer Society
Y2 - 6 November 2023 through 9 November 2023
ER -