Despite wide empirical success, many of the most commonly used learning approaches lack a clear mathematical foundation and often rely on poorly understood heuristics. Even when theoretical guarantees do exist they are often too crude and/or pessimistic to explain their success in practical regimes of operation or serve as a guiding principle for practitioners. Furthermore, in many scenarios such as those arising in scientific applications they require significant resources (compute, data, etc.) to work reliably.
The first part of the talk takes a step towards building a stronger theoretical foundation for such nonconvex learning. In particular, I will focus on demystifying the generalization and feature learning capability of modern overparameterized learning where the parameters of the learning model (e.g. neural network) exceed the size of the training data. Our result is based on an intriguing spectral bias phenomena for gradient descent, that puts the iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Notably this analysis overcomes a major theoretical bottleneck in the existing literature and goes beyond the “lazy” training regime which requires unrealistic hyperparameter choices (e.g. very small step sizes, large initialization or wide models). In the second part of the talk I will discuss the challenges and opportunities of using AI for scientific applications and medical image reconstruction in particular. I will discuss our work on designing new architectures that lead to state of the art performance and report on techniques to significantly reduce the required data for training.
Mahdi Soltanolkotabi is an associate professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. His research focuses on developing the mathematical foundations of modern data science via characterizing the behavior and pitfalls of contemporary nonconvex learning and optimization algorithms with applications in deep learning, large scale distributed training, federated learning, computational imaging, and AI for scientific applications. Mahdi is the recipient of the Information Theory Society Best Paper Award, Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in mathematics, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), the Viterbi school of engineering junior faculty research award, and faculty research awards from Google and Amazon.