Special Seminar in Computing & Mathematical Sciences
February 22, 2017
Optimization Challenges in Deep Learning
Professor Benjamin Recht,
Dept. of Electrical Engineering and Computer Science,
When training large-scale deep neural networks for pattern recognition, hundreds of hours on clusters of GPUs are required to achieve state-of-the-art performance. Improved optimization algorithms could potentially enable faster industrial prototyping and make training contemporary models more accessible.
In this talk, I will attempt to distill the key difficulties in optimizing large, deep neural networks for pattern recognition. In particular, I will emphasize that many of the popularized notions of what make these problems "hard" are not true impediments at all. I will show that it is not only easy to globally optimize neural networks, but that such global optimization remains easy when fitting completely random data.
I will argue instead that the source of difficulty in deep learning is a lack of understanding of generalization. I will provide empirical evidence of high-dimensional function classes that are able to achieve state-of-the-art performance on several benchmarks without any obvious forms of regularization or capacity control. These experiments reveal that traditional learning theory fails to explain why large neural networks generalize. I will close by proposing some possible paths towards a framework of generalization that explains these experimental findings.
Special Seminars in Computing + Mathematical Sciences