New CS/Stats Prof. Rebecca Willett Explores Foundations, Applications of Data Science

In a world increasingly governed by data, it is critical that it is applied properly. As data-driven methods gain popularity for driving public policy, business operations, hiring, basic science, medical decisions, and virtually every other aspect of our lives, it’s important that people understand the foundations of data science and apply it appropriately. If not, bias, spurious correlations, and other statistical landmines can distort results, with what could now be grave human and societal consequences.

Joint Computer Science and Statistics Professor Rebecca Willett helps neuroscientists, physicians, astronomers, climate researchers, and even farmers avoid these missteps and maximize the discovery potential of data. Through a combination of fundamental research and diverse interdisciplinary collaboration, Willett has advanced the practice of data science into new fields and deeper insights. After faculty positions at Duke University and the University of Wisconsin, she joined UChicago in summer 2018 to both continue her work and help build new data science research and education initiatives.

“I was really excited that the university was investing in data science in a large scale way,” Willett said. “I thought there was a lot of opportunity for growth and building and I was enthusiastic to play a central role in those efforts.”

Shortly after her arrival at UChicago, Willett led a multi-university team awarded an NSF grant to find new “El Nino”-like weather patterns, using data science for improved seasonal forecasting using growing quantities of climate measurements. The project is emblematic of her research portfolio, which features many projects where her group develops new fundamental methodology and theory that are inspired by challenges faced by domain scientists with difficult, data-intensive questions, including image analysis, signal processing, and using machine learning for prediction and optimization.

For example, she recently helped a neuroscience group develop algorithms to segment and parameterize images of neural tissue, in order to test a new method for controlling the growth of stem cells. In another project, she helped develop the image processing methods within a smartphone app farmers can use to quickly measure corn kernels for dairy cow feed and make critical, on-the-fly decisions about harvesting methods to improve cow nutrition.

Other projects help researchers deal with high-dimensional data, where there is abundance of features associated with each data point. For example, data science methods help avoid false conclusions from linking medical and genetic data, where the sheer scale of possible connections can create misleading correlations.  

“A pervasive theme is how to draw reliable conclusions from data,” Willett said, “especially when data are high-dimensional. For instance, we record vast quantities of data about each patient’s health history, including test results, treatments, demographic information, family history, imaging data, genetic information, and physician notes. Such large numbers of features makes it difficult to tease out risk factors for health conditions that were previously unrecognized. It becomes even more challenging as we strive to ensure methods are robust to errors in health records, lab tests not conducted, or treatments that were untried.”

“In general, mitigating the challenges associated with high-dimensional data is a key research thrust in data science, and relies upon developing novel geometric representations of data and incorporating physical models as much as possible”

Willett’s co-appointment reflects the combination of skills needed to address these questions in a practical way. While many of the methods to analyze data are steeped in statistics, computer science helps her understand how effective methods can be computed in reasonable time, and what could potentially go wrong.

“For some projects, I have developed novel software and tools that practitioners or researchers in other fields can use on their data ,” Willett said. “For others I have examined methods that are  already in use and developed theory to better characterize whether these methods are reasonable, whether there are some pitfalls we should be aware of, and where there may be room for improvement”

That philosophy aligns with UChicago initiatives to launch new programming and collaborations in data science that combine efforts from the Departments of Statistics and Computer Science, including a course debuting this fall co-taught by department chairs Dan Nicolae and Michael Franklin.

“The fact that CS and Stats work so well together and the unified vision of the two departments means a lot,” Willett said. “I think that's going to allow us to establish ourselves as a world-class machine learning and data science group.”

Related News

More UChicago CS stories from this research area.
UChicago CS News

UChicago Partners On New National Science Foundation Large-Scale Research Infrastructure For Education

Dec 10, 2024
In the News

Data Ecology: A Socio-Technical Approach to Controlling Dataflows

Sep 18, 2024
UChicago CS News

NeurIPS 2023 Award-winning paper by DSI Faculty Bo Li, DecodingTrust, provides a comprehensive framework for assessing trustworthiness of GPT models

Feb 01, 2024
Video

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Jan 26, 2024
Video

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Jan 23, 2024
UChicago CS News

UChicago Undergrad Analyzes Machine Learning Models Used By CPD, Uncovers Lack of Transparency About Data Usage

Oct 31, 2023
In the News

In The News: U.N. Officials Urge Regulation of Artificial Intelligence

"Security Council members said they feared that a new technology might prove a major threat to world peace."
Jul 27, 2023
UChicago CS News

UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

Jun 29, 2023
UChicago CS News

UChicago Team Wins The NIH Long COVID Computational Challenge

Jun 28, 2023
UChicago CS News

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Jun 27, 2023
Michael Franklin
UChicago CS News

Mike Franklin, Dan Nicolae Receive 2023 Arthur L. Kelly Faculty Prize

Jun 02, 2023
UChicago CS News

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Apr 14, 2023
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube