Recent breakthroughs in machine learning, especially deep learning, often involve learning complex and high-dimensional models on massive datasets with non-convex optimization. Though empirically successful, the formal study of such non-convex methods is much less developed. My research aims to develop new algorithmic approaches and analysis tools in these settings.
The talk will showcase a few results. First, we show that matrix completion — a famous problem in machine learning — can be solved by stochastic gradient descent on the straightforward non-convex objective function in polynomial time. (Formally, we show that all local minima of the objective are also global minima.) Then, we will analyze the landscape of the objective functions for linearized recurrent neural nets and residual nets, and demonstrate that over-parameterization and re-parameterization of the models can make the optimization easier.