Raul Castro Fernandez (MIT) – Data Discovery: Unleashing the Value of Data

Date & Time:

April 4, 2019 3:30 pm – 4:30 pm

Location:

Crerar 390, 5730 S. Ellis Ave., Chicago, IL,

04/04/2019 03:30 PM 04/04/2019 04:30 PM America/Chicago Raul Castro Fernandez (MIT) – Data Discovery: Unleashing the Value of Data Crerar 390, 5730 S. Ellis Ave., Chicago, IL,

Data Discovery: Unleashing the Value of Data

Organizations use only a small portion of all data they own. Consequently, most of the potential value is untapped. This happens because their analysts suffer a data discovery problem: when solving a task that requires data, analysts spend more time finding the relevant data than solving the task at hand. The core problem is that there is not adequate infrastructure to support the many different discovery problems organizations face. Hence, finding data remains largely a manual and time-consuming process.

In this talk I'll present Aurum, a system that radically changes how users interact with their organizations' data. With Aurum users can solve discovery problems in minutes instead of weeks. To achieve this, Aurum has three novel features: 1) it makes data discovery programmable so users can solve many different discovery problems by writing different programs; 2) it solves data discovery queries fast, so users can solve their problems in minutes instead of weeks; 3) it scales to large amounts of data, so no relevant data is left behind. In addition, I'll explain how Aurum handles not only structured data such as tables in databases, data lakes, and spreadsheets, but also unstructured data such as PDF files, word documents, and even conversations from Slack channels.

I'll conclude with a vision for how to make data easier to work with and to program, a key ingredient needed to exploit all data available in organizations and enable new applications.

Host: Aaron Elmore

Raul Castro Fernandez

Assistant Professor of Computer Science

In my research I build high-performance systems for discovering, preparing, and processing data. I often use techniques from data management, statistics, and machine learning. At MIT I work with professors Sam Madden and Mike Stonebraker. Before MIT, I completed my PhD at Imperial College London with Peter Pietzuch.

Learn More

Resources

Community

What’s Real and What’s Not? Watermarking to Identify AI-Generated Text

Enhancing Multitasking Efficiency: The Role of Muscle Stimulation in Reducing Mental Workload

From wildfires to bird calls: Sage redefines environmental monitoring

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

Data Discovery: Unleashing the Value of Data

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Postdoc Alum John Paparrizos Named ICDE Rising Star

Asst. Prof. Rana Hanocka Receives NSF Grant to Develop New AI-Driven 3D Modeling Tools

UChicago and NYU Research Team Finds Edtech Tools Could Pose Privacy Risks For Students

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups

High School Students Find Their Place in Computing Through Wearables Workshop