Three fundamental factors determine the quality of a statistical learning algorithm: expressiveness, optimization and generalization. The classic strategy for handling these factors is relatively well understood. In contrast, the radically different approach of deep learning, which in the last few years has revolutionized the world of artificial intelligence, is shrouded by mystery. This talk will describe a series of works aimed at unraveling some of the mysteries behind expressiveness and optimization. I will begin by establishing an equivalence between convolutional networks - the most successful deep learning architecture to date, and hierarchical tensor decompositions. The equivalence will be used to answer various questions concerning the expressiveness of convolutional networks. I will then turn to discuss recent work analyzing optimization of deep linear networks. Surprisingly, in stark contrast with conventional wisdom, we find that depth, despite its non-convex nature, can accelerate optimization.