Date & Time:
February 20, 2025 2:00 pm – 3:00 pm
Location:
Crerar 390, 5730 S. Ellis Ave., Chicago, IL,
02/20/2025 02:00 PM 02/20/2025 03:00 PM America/Chicago Peter Hase (Anthropic)- AI Safety Through Interpretable and Controllable Language Models Crerar 390, 5730 S. Ellis Ave., Chicago, IL,

Abstract: The AI research community has become increasingly concerned about risks arising from capable AI systems, ranging from misuse of generative models to misalignment of agents. My research aims to address problems in AI safety by tackling key issues with the interpretability and controllability of large language models (LLMs). In this talk, I present research showing that we are well beyond the point of thinking of AI systems as “black boxes.” AI models, and LLMs especially, are more interpretable than ever. Advances in interpretability have enabled us to control model reasoning and update knowledge in LLMs, among other promising applications. My work has also highlighted challenges that must be solved for interpretability to continue progressing. Building from this point, I argue that we can explain LLM behavior in terms of “beliefs”, meaning that core knowledge about the world determines downstream behavior of models. Furthermore, model editing techniques provide a toolkit for intervening on beliefs in LLMs in order to test theories about their behavior. By better understanding beliefs in LLMs and developing robust methods for controlling their behavior, we will create a scientific foundation for building powerful and safe AI systems.

Speakers

Peter Hase

Resident AI Researcher, Anthropic

Peter Hase is an AI Resident at Anthropic. He recently completed his PhD at the University of North Carolina at Chapel Hill, advised by Mohit Bansal. His research focuses on NLP and AI Safety, with the goal of explaining and controlling the behavior of machine learning models. He is a recipient of a Google PhD Fellowship and before that a Royster PhD Fellowship. While at UNC, he also worked at Meta, Google, and the Allen Institute for AI.

Related News & Events

UChicago CS News

Federal budget cuts threaten to decimate America’s AI superiority—and other countries are watching

Feb 25, 2025
Netflix logo on phone screen
UChicago CS News

The Hidden Cost of Netflix’s Autoplay: A Study on Viewing Patterns and User Control

Feb 25, 2025
Raul Castro Fernandez
UChicago CS News

Raul Castro Fernandez among six UChicago scientists awarded prestigious Sloan Fellowships in 2025

Feb 18, 2025
UChicago CS News

Quantum Leap: New Research Reveals Secrets of Random Quantum Circuits

Feb 04, 2025
UChicago CS News

Fred Chong from the Department of Computer Science Named ACM Fellow for Contributions to Quantum Computing

Jan 22, 2025
UChicago CS News

Rethinking AI as a Thought Partner: Perspectives on Writing, Programming, and More

Jan 16, 2025
UChicago CS News

UChicago Partners On New National Science Foundation Large-Scale Research Infrastructure For Education

Dec 10, 2024
UChicago CS News

Saturdays with CSIL — How Undergraduates are Transforming CS Education for Local High School Students

Dec 05, 2024
UChicago CS News

UChicago Researchers Receive Google Privacy Faculty Award for Research on AI Privacy Risks

Nov 22, 2024
UChicago CS News

The Climate App Designed to Tackle Chatham’s Flooding Crisis

Nov 21, 2024
In the News

Globus Receives Multiple Honors in 2024 HPCwire Readers’ and Editors’ Choice Awards

Nov 20, 2024
In the News

Argonne Team Breaks New Ground in AI-Driven Protein Design

Nov 15, 2024
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube