In this talk we will argue that much of the traditional notions of 'distance' (e.g. KL-divergence, extensions of TV such as D_A discrepancy, density-ratios, Wasserstein distance) can yield an over-pessimistic picture of transferability. Instead, we show that some new notions of 'relative dimension' between source and target (which we simply term 'transfer-exponents') capture a continuum from easy to hard transfer. Transfer-exponents uncover a rich set of situations where transfer is possible even at fast rates, helps answer questions such as the benefit of unlabeled or labeled target data, yields a sense of optimal vs suboptimal transfer heuristics, and have interesting implications for related problems such as multi-task learning.
Finally, transfer-exponents provide guidance as to *how* to efficiently sample target data so as to guarantee improvement over source data alone. We illustrate these new insights through various simulations on controlled data, and on the popular CIFAR-10 image dataset.
The talk is based on work with Guillaume Martinet, and ongoing work with Steve Hanneke.