Sadhika Malladi (Princeton)- Deep Learning Theory in the Age of Generative AI

Date & Time:

February 4, 2025 2:00 pm – 3:00 pm

Location:

Crerar 390, 5730 S. Ellis Ave., Chicago, IL,

02/04/2025 02:00 PM 02/04/2025 03:00 PM America/Chicago Sadhika Malladi (Princeton)- Deep Learning Theory in the Age of Generative AI Crerar 390, 5730 S. Ellis Ave., Chicago, IL,

Abstract: Large neural networks, like language models (LMs), have demonstrated remarkable success in executing complex tasks, but little is understood about why these models work and how various design choices affect model behavior. Performing thorough empirical ablations to understand modern-day training paradigms is generally computationally infeasible, underscoring the need for theory-driven insights and improvements. However, traditional theoretical analysis of deep networks usually requires restrictive assumptions that are far from practical settings.

In this talk, I will present flexible yet rigorous theoretical frameworks for understanding LM pre-training and fine-tuning, along with their algorithmic implications. For fine-tuning, I propose a formal understanding of fine-tuning that motivates the design of MeZO, a zeroth-order optimizer that reduces memory consumption by up to 12x while preserving performance. I will also discuss recent work exposing surprising failure modes of preference learning, a specialized form of fine-tuning used to steer LMs to exhibit desired behaviors. In the pre-training regime, I use stochastic differential equations (SDEs) to design principled and efficient hyperparameter selection algorithms for highly distributed training settings. I will conclude by exploring promising directions for co-developing deep learning theory and practice.

Speakers

Sadhika Malladi

PhD Candidate, Princeton University

Sadhika Malladi is a final-year PhD student in Computer Science at Princeton University advised by Sanjeev Arora. Her research advances deep learning theory to capture modern-day training settings, yielding practical training improvements and meaningful insights into model behavior. She has co-organized multiple workshops, including Mathematical and Empirical Understanding of Foundation Models at ICLR 2024 and Mathematics for Modern Machine Learning (M3L) at NeurIPS 2024. She was named a 2025 Siebel Scholar.

Resources

Community

Helping Elementary School Children Learn About Digital Privacy and Security With Micro-Lessons

New Study Reveals Gaps in Common Types of Cybersecurity Training

Jasmine Lu on Sustainable Computing: Rethinking E-Waste and Innovation

Hao Zhu (Stanford)- Ushering AI Agents to an Open Social World

Noah Apthorpe (Colgate)- Measuring the Impacts of Technology Policy: Age Gating, Authentication Security, and User Protection

Jovan Stojkovic (UIUC)- Chasing the “Tail at Scale”: Toward Cloud-Native Architectures

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

Speakers

Sadhika Malladi

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

In The News: U.N. Officials Urge Regulation of Artificial Intelligence

UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago, Stanford Researchers Explore How Robots and Computers Can Help Strangers Have Meaningful In-Person Conversations

Postdoc Alum John Paparrizos Named ICDE Rising Star

New EAGER Grant to Asst. Prof. Eric Jonas Will Explore ML for Quantum Spectrometry

Assistant Professor Chenhao Tan Receives Sloan Research Fellowship

UChicago Scientists Develop New Tool to Protect Artists from AI Mimicry

Professors Rebecca Willett and Ben Zhao Discuss the Future of AI on Public Radio

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups