Learning probability distributions; What can, What can't be done

A possible high level description of statistical learning is that it aims to learn about some unknown probability distribution ("environment”) from samples it generates ("training data”). In its most general form, assuming no prior knowledge and asking to find accurate approximations to the data generating distributions, there can be no success guarantee. In this talk I will discuss two major directions of relaxing that too hard problem. First, I will address the situation under common prior knowledge assumption - I will describe settling the question of the sample complexity of learning mixtures of Gaussians. Secondly, I will address what can be learnt about unknown distributions when no prior knowledge is applied. I will describe a surprising result. Namely, the independence from set theory of a basic statistical learnability problem. As a corollary, I will show that there can be no combinatorial dimension that characterizes the families of random variables that can be reliably learnt (in contrast with the known VC-dimension like characterizations of common supervised learning tasks). Both parts of the talks use novel notions of sample compression schemes as key components. The first part is based on joint work with Hasan Ashiani, Nick Harvey, Chris Law, Abas Merhabian and Yaniv Plan and the second part on work with Shay Moran, Pavel Hrubes, Amir Shpilka and Amir Yehudayoff.

Date

Speakers

Shai Ben-David

Affiliation

University of Waterloo