Many challenging problems in modern applications amount to finding relevant results from an enormous output space of potential candidates, for example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can be in the millions to billions. Moreover, observational or training data is often limited for many of the so-called “long-tail” of items in the output space.
Classical algorithms typically provide "one size fits all" performance, and do not leverage properties or patterns in their inputs. A recent line of work aims to address this issue by developing algorithms that use machine learning predictions to improve their performance. In this talk I will present two examples of this type, in the context of streaming and sketching algorithms.
Suppose you are monitoring discrete events in real time. Can you predict what events will happen in the future, and when? Can you fill in past events that you may have missed? A probability model that supports such reasoning is the neural Hawkes process (NHP), in which the Poisson intensities of K event types at time t depend on the history of past events. This autoregressive architecture can capture complex dependencies. It resembles an LSTM language model over K word types, but allows the LSTM state to evolve in continuous time.
A brief review will be provided first on how deep learning has disrupted speech recognition and language processing industries since 2009. Then connections will be drawn between the techniques (deep learning or otherwise) for modeling speech and language and those for financial markets. Similarities and differences of these two fields will be explored. In particular, three unique technical challenges to financial investment are addressed: extremely low signal-to-noise ratio, extremely strong nonstationarity (with adversarial nature), and heterogeneous big data.
There are three core orthogonal problems in reinforcement learning: (1) Crediting actions (2) generalizing across rich observations (3) Exploring to discover the information necessary for learning. Good solutions to pairs of these problems are fairly well known at this point, but solutions for all three are just now being discovered. I’ll discuss several such results and dive into details on a few of them.
Probably Approximately Correct (PAC) learning has attempted to analyse the generalisation of learning systems within the statistical learning framework. It has been referred to as a ‘worst case’ analysis, but the tools have been extended to analyse cases where benign distributions mean we can still generalise even if worst case bounds suggest we cannot. The talk will cover the PAC-Bayes approach to analysing generalisation that is inspired by Bayesian inference, but leads to a different role for the prior and posterior distributions.
In handling wide range of experiences ranging from data instances, knowledge, constraints, to rewards, adversaries, and lifelong interplay in an ever-growing spectrum of tasks, contemporary ML/AI research has resulted in thousands of models, learning paradigms, optimization algorithms, not mentioning countless approximation heuristics, tuning tricks, and black-box oracles, plus combinations of all above.
Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data.