In this talk, I'm going to talk about modularizing the dynamics by learning multiple recurrent modules that are independent by default, but interact sparingly, and show how such an inductive bias gives rise to better out of distribution generalization.
Modularity, Attention and Credit Assignment: Efficient information dispatching in neural computations
April 16, 2020
Physical processes in the world often have a modular structure, with complexity emerging through combinations of simpler subsystems. Machine learning seeks to uncover and use regularities in the physical world. Although these regularities manifest themselves as statistical dependencies, they are ultimately due to dynamic processes governed by physics. These processes are often independent and only interact sparsely..Despite this, most machine learning models employ the opposite inductive bias, i.e., that all processes interact. This can lead to poor generalization (if data is limited) and lack of robustness to changing task distributions.