Sadhika Malladi (Princeton)- Deep Learning Theory in the Age of Generative AI
Abstract: Large neural networks, like language models (LMs), have demonstrated remarkable success in executing complex tasks, but little is understood about why these models work and how various design choices affect model behavior. Performing thorough empirical ablations to understand modern-day training paradigms is generally computationally infeasible, underscoring the need for theory-driven insights and improvements. However, traditional theoretical analysis of deep networks usually requires restrictive assumptions that are far from practical settings.
In this talk, I will present flexible yet rigorous theoretical frameworks for understanding LM pre-training and fine-tuning, along with their algorithmic implications. For fine-tuning, I propose a formal understanding of fine-tuning that motivates the design of MeZO, a zeroth-order optimizer that reduces memory consumption by up to 12x while preserving performance. I will also discuss recent work exposing surprising failure modes of preference learning, a specialized form of fine-tuning used to steer LMs to exhibit desired behaviors. In the pre-training regime, I use stochastic differential equations (SDEs) to design principled and efficient hyperparameter selection algorithms for highly distributed training settings. I will conclude by exploring promising directions for co-developing deep learning theory and practice.
Speakers
Sadhika Malladi
Sadhika Malladi is a final-year PhD student in Computer Science at Princeton University advised by Sanjeev Arora. Her research advances deep learning theory to capture modern-day training settings, yielding practical training improvements and meaningful insights into model behavior. She has co-organized multiple workshops, including Mathematical and Empirical Understanding of Foundation Models at ICLR 2024 and Mathematics for Modern Machine Learning (M3L) at NeurIPS 2024. She was named a 2025 Siebel Scholar.