Qi Lei (Princeton) – Theoretical Foundations of Pretrained Models

Return to Full Calendar
Date(s):
January 24, 2022 at 4:30pm - 5:30pm
Location:
Live Stream
Event Audience:
all
Qi Lei

Speaker: Qi Lei Associate Research Scholar, Princeton University

Qi Lei is an associate research scholar at the ECE department of Princeton University. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program from 2019-2020. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop sample- and computationally efficient machine learning algorithms and bridge the theoretical and empirical gap in machine learning. Qi has received several awards, including the Outstanding Dissertation Award, National Initiative for Modeling and Simulation Graduate Research Fellowship, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship.

Abstract: Theoretical Foundations of Pretrained Models

Watch via Live Stream.

A pre-trained model refers to any model trained on broad data at scale and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks. The rise of pre-trained models (e.g., BERT, GPT-3, CLIP, Codex, MAE) transforms applications in various domains and aligns with how humans learn. Humans and animals first establish their concepts or impressions from different data domains and data modalities. The learned concepts then help them learn specific tasks with minimal external instructions. Accordingly, we argue that a pre-trained model follows a similar procedure through the lens of deep representation learning. 1) Learn a data representation that filters out irrelevant information from the training tasks; 2) Transfer the data representation to downstream tasks with few labeled samples and simple models.

This talk establishes some theoretical understanding for pre-trained models under different settings, ranging from supervised pretraining, meta-learning, and self-supervised learning to domain adaptation or domain generalization. I will discuss the sufficient (and sometimes necessary) conditions for pre-trained models to work based on the statistical relation between training and downstream tasks. The theoretical analyses partly answer how they work, when they fail, guide technical decisions for future work, and inspire new methods in pre-trained models.

Sponsor: DSI/CS/Statistics Joint Seminar

Host: Rebecca Willett

Type: talk