# Machine Learning Seminar Series: Greg Ongie (UChicago) & Blake Woodworth (TTIC)

**A Function Space View of Overparameterized Neural Networks (Ongie)**

Contrary to classical bias/variance tradeoffs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of the weights of the network. This raises the question: What functions are well-approximated by neural networks whose weights are bounded in norm? In this talk, I will give some partial answers to this question. In particular, I will give a precise characterization of the space of functions realizable as a two-layer (i.e., one hidden layer) neural network with ReLU activations having an unbounded number of units, but where the Euclidean norm of the weights in the network remains bounded. Surprisingly, this characterization is naturally posed in terms of the Radon transform as used in computational imaging, and I will show how tools from Radon transform analysis yield novel insights about learning with two and three-layer ReLU networks.

**The Complexity of Finding Stationary Points in Convex and Non-Convex Optimization (Woodworth)**

Non-convex optimization algorithms typically guarantee convergence to approximate stationary points of the objective. However, the fundamental complexity of finding such points is poorly understood, even in the convex setting, and especially in comparison to our very thorough understanding of the complexity of finding points with near optimal function value in the convex case. In this talk, I will discuss two recent papers in which we tightly bound the stochastic first-order oracle complexity of finding an approximate stationary point, first for the convex case and then for the non-convex case. An important implication of our work is that, in a certain sense, plain SGD is an optimal algorithm for stochastic non-convex optimization.

Remote broadcast at TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

**Host:** *Nati Srebro and Rebecca Willett*

#### Greg Ongie

I am a postdoc affiliated with the Committee on Compuational and Applied Mathematics (CCAM) within the Department of Statistics at the University of Chicago, supported by Rebecca Willett.

Research interests:

Machine learning, optimization, and compressed sensing, with applications to image reconstruction in MRI, CT, and related inverse problems. My recent projects include:

- Learning to solve linear inverse problems in imaging
- Matrix completion with non-linear models
- Online algorithms for dynamic MRI reconstruction
- Continuous domain compressed sensing
- Fast algorithms for large-scale structured low-rank matrix completion
- Generalizations of total variation regularization
- Non-convex optimization for large-scale sparsity regularized inverse problems

#### Blake Woodworth

I am a fourth-year PhD student in computer science at the Toyota Technological Institute at Chicago (TTIC) advised by Nati Srebro. My research is in machine learning and optimization, and I am interested in both the theoretical analysis of learning and optimization algorithms as well as practically efficient methods for learning from data.

Before coming to TTIC I studied at Yale University where I received a B.S. in computer science, advised by Dan Spielman. At Yale, my coursework was spread evenly across the computer science, mathematics, and statistics departments; I was also a peer tutor for several programming-intensive computer science courses.

From September 2017-July 2019 I was supported by a NSF Graduate Research Fellowship. From July 2019, I am supported by a Google PhD Fellowship in machine learning.