TY - GEN

T1 - Extractor-Based time-Space lower bounds for learning

AU - Garg, Sumegha

AU - Raz, Ran

AU - Tal, Avishay

N1 - Funding Information:
Research supported by the Simons Collaboration on Algorithms and Geometry and by the National Science Foundation grant No. CCF-1412958. †This work was done when the author was a member at the Institute for Advanced Study, Princeton, NJ. Research supported by the Simons Collaboration on Algorithms and Geometry and by the National Science Foundation grant No. CCF-1412958.
Publisher Copyright:
© 2018 Copyright held by the owner/author(s).

PY - 2018/6/20

Y1 - 2018/6/20

N2 - A matrix M : A× X → {−1, 1} corresponds to the following learning problem: An unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a1,b1), (a2,b2) . . ., where for every i, ai ∈ A is chosen uniformly at random and bi = M(ai,x). Assume that k, ℓ,r are such that any submatrix of M of at least 2−k ·|A| rows and at least 2−ℓ ·|X| columns, has a bias of at most 2−r. We show that any learning algorithm for the learning problem corresponding to M requires either a memory of size at least Ω (k · ℓ), or at least 2Ω(r) samples. The result holds even if the learner has an exponentially small success probability (of 2−Ω(r)). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least Ω ((log |X|) · (log |A|)) or an exponential number of samples, achieving a tight Ω ((log |X|) · (log |A|)) lower bound on the size of the memory, rather than a bound of Ω min (log |X|)2, (log |A|)2 obtained in previous works by Raz [FOCS’17] and Moshkovitz and Moshkovitz [ITCS’18]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on the work of Raz [FOCS’17] that gave a general technique for proving memory-samples lower bounds.

AB - A matrix M : A× X → {−1, 1} corresponds to the following learning problem: An unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a1,b1), (a2,b2) . . ., where for every i, ai ∈ A is chosen uniformly at random and bi = M(ai,x). Assume that k, ℓ,r are such that any submatrix of M of at least 2−k ·|A| rows and at least 2−ℓ ·|X| columns, has a bias of at most 2−r. We show that any learning algorithm for the learning problem corresponding to M requires either a memory of size at least Ω (k · ℓ), or at least 2Ω(r) samples. The result holds even if the learner has an exponentially small success probability (of 2−Ω(r)). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least Ω ((log |X|) · (log |A|)) or an exponential number of samples, achieving a tight Ω ((log |X|) · (log |A|)) lower bound on the size of the memory, rather than a bound of Ω min (log |X|)2, (log |A|)2 obtained in previous works by Raz [FOCS’17] and Moshkovitz and Moshkovitz [ITCS’18]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on the work of Raz [FOCS’17] that gave a general technique for proving memory-samples lower bounds.

KW - Branching programs

KW - Extractors

KW - Learning from samples

KW - Time-space tradeoff

UR - http://www.scopus.com/inward/record.url?scp=85049901585&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049901585&partnerID=8YFLogxK

U2 - 10.1145/3188745.3188962

DO - 10.1145/3188745.3188962

M3 - Conference contribution

AN - SCOPUS:85049901585

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 297

EP - 310

BT - STOC 2018 - Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

A2 - Henzinger, Monika

A2 - Kempe, David

A2 - Diakonikolas, Ilias

PB - Association for Computing Machinery

T2 - 50th Annual ACM Symposium on Theory of Computing, STOC 2018

Y2 - 25 June 2018 through 29 June 2018

ER -