Cyrus Rashtchian (UCSD) – Foundations of Data Science: Algorithms, Models, Explainability

Date & Time:

March 31, 2021 12:30 pm – 1:30 pm

Location:

Live Stream

03/31/2021 12:30 PM 03/31/2021 01:30 PM America/Chicago Cyrus Rashtchian (UCSD) – Foundations of Data Science: Algorithms, Models, Explainability Live Stream

Foundations of Data Science: Algorithms, Models, Explainability

Watch via live stream

Building a theory for data science involves formulating new theoretical frameworks for important applications, as well as developing efficient and reliable solutions for associated computational challenges. Central themes of my research include new models and algorithms for bioinformatics and trustworthy machine learning. In this talk, I first describe my work on trustworthy machine learning, where I will present a new model for explainable k-means clustering based on small decision trees and a new algorithm for finding a tree-based clustering with provably low cost. This work is the first to identify an unsupervised learning problem where explainable-by-design algorithms do not suffer from a large loss in their effectiveness. Turning to DNA data storage, I will provide an overview of this exciting, emerging technology. It promises orders of magnitude improved density and longevity compared to existing storage media. However, efficiently retrieving data that has been stored in DNA requires solving many interesting theoretical and practical problems. I will survey my contributions in this area, including efficient DNA synthesis methods, a distributed clustering algorithm for edit distance, and new statistical reconstruction algorithms. Next, I will discuss how to reconstruct node-labeled trees when given samples from an appropriately defined deletion channel. This involves new combinatorial and statistical algorithms, and it showcases a difficult model where worst-case reconstruction is possible with a polynomial number of samples. Finally, I will share my plans for future work in the areas of statistical reconstruction, trustworthy machine learning, and applied algorithms more generally.

Host: Sanjay Krishnan

Cyrus Rashtchian

Postdoctoral Researcher, University of California, San Diego

Cyrus Rashtchian is currently a postdoc in the Computer Science & Engineering department at the University of California, San Diego. He received his Ph.D. in Computer Science & Engineering in 2018, advised by Paul Beame, from the University of Washington, Seattle, and his BS in Computer Science from the University of Illinois, Urbana-Champaign. He has broad research interests in the foundations of data science, including DNA data storage, robust and explainable machine learning, statistical reconstruction, clustering, and distributed algorithms. In general, he applies diverse geometric and algorithmic tools to problems in data science, with a keen eye for new applications. Prior to UCSD, he completed research internships at Facebook Reality Labs, Microsoft Research, and Cray. He has published in top machine learning and theoretical computer science conferences, including SODA, COLT, ITCS, ICML, NeurIPS, and AISTATS, and journals such as Nature Biotechnology and the Annals of Applied Probability. Personal website: http://www.cyrusrashtchian.com

Resources

Community

What’s Real and What’s Not? Watermarking to Identify AI-Generated Text

Enhancing Multitasking Efficiency: The Role of Muscle Stimulation in Reducing Mental Workload

From wildfires to bird calls: Sage redefines environmental monitoring

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

Foundations of Data Science: Algorithms, Models, Explainability

Cyrus Rashtchian

NeurIPS 2023 Award-winning paper by DSI Faculty Bo Li, DecodingTrust, provides a comprehensive framework for assessing trustworthiness of GPT models

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

UChicago Undergrad Analyzes Machine Learning Models Used By CPD, Uncovers Lack of Transparency About Data Usage

In The News: U.N. Officials Urge Regulation of Artificial Intelligence

UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Mike Franklin, Dan Nicolae Receive 2023 Arthur L. Kelly Faculty Prize

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers