We prove a general time-space lower bound that applies for a large class of learning problems and shows that for every problem in that class, any learning algorithm requires either a memory of quadratic size or an exponential number of samples. As a special case, this gives a new proof for the time-space lower bound for parity learning [R16].

Our result is stated in terms of the norm of the matrix that corresponds to the learning problem. Let $X$, $A$ be two finite sets. Let $M: A \times X \rightarrow \{-1,1\}$ be a matrix. The matrix $M$ corresponds to the following learning problem: An unknown element $x \in X$ was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2)\ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$.

Let $\sigma$ be the largest singular value of $M$ and note that always $\sigma \leq |A|^{1/2} \cdot |X|^{1/2}$. We show that if $\sigma \leq |A|^{1/2} \cdot |X|^{1/2 - \epsilon}$, then any learning algorithm for the corresponding learning problem requires either a memory of size quadratic in $\epsilon n$ or number of samples exponential in $\epsilon n$, where $n = \log_2 |X|$.