UChicago CS Research Finds New Angle on Database Query Processing with Geometry

When people ask a question of a database, they don’t always want or need an exact answer. In fast-moving industries such as finance or in conditions where the data is always growing and changing, an approximate answer delivered quickly is often good enough. Approximate query processing, or AQP, seeks approaches that balance speed, efficiency, and accuracy in situations where a “perfect” calculation is unnecessary.

A new paper from the research group of UChicago CS assistant professor Sanjay Krishnan addresses this goal with a new solution: JanusAQP. In work accepted for next year’s IEEE International Conference on Data Engineering (ICDE), former postdoctoral researcher Stavros Sintos and PhD graduate Xi Liang joined Krishnan to create a new system that quickly and efficiently delivers query results for dynamic databases.

The work capped three years of research that combined Sintos’ expertise in computational geometry with Liang’s pursuit of new algorithmic approaches for database structures and query processing. Their collaboration produced a fresh perspective on a problem that has faced database researchers for nearly 40 years, Krishnan said.

Sanjay Krishnan, Assistant Professor of Computer Science.

“People have long thought that AQP is a great idea; we want to cut down resources, we want to make things faster,” Krishnan said. “But there are challenges of sparsity and dynamism. Approximations are really good at capturing general properties of your data, but often, you’re interested in outliers, or very rare things that are happening inside your data set. And with real world data, any use case that is large means that it’s usually accumulating data over time. This paper is the closest to a practical AQP system that really addresses these two challenges that you will see out there.”

The project began with a 2020 paper by Liang, now a software engineer at Databricks, with Krishnan, former postdoctoral researcher Zechao Shang, and UChicago CS faculty Aaron Elmore and Michael Franklin. In that paper, accepted at the ACM Special Interest Group on Management of Data (SIGMOD) conference, the team described a framework for dealing with missing data when calculating basic data queries such as SUM, COUNT, and AVG.

Their solution relied upon a partitioned data structure that broke the full dataset down into smaller parts for faster analysis. But the strategy required a high precomputation cost and only worked with static data, Liang said.

Enter Sintos, now an assistant professor at the University of Illinois Chicago, who brought his knowledge in computational geometry to bear on the partitioning approach. A 2021 paper described a new data structure akin to a “tree” of pre-computed data aggregates representing the overall dataset, which can then be quickly sampled for approximate results.

“Data has geometry,” Sintos said. “You can imagine that there are some parts of the space that the data lie on that are more important than others. So instead of just taking uniform samples over everything, we focus on the areas that have more data, or where the value of data is high. We precompute some statistics by building a partition over the data, splitting the data into smaller buckets. Then when we have a query, we can use the samples from these buckets along with the precomputed statistics, and in the end, we combine all of these together to return the final result.”

For the JanusAQP system, the authors further improved these tree partition structures to work with dynamic data, where new information is continuously added. Liang, Sintos, and Krishnan built a “catch-up” process to ingest new data and rebuild the structure while still remaining responsive to queries in real time.

“The system has to somehow place the new data into the data structure that has already been learned,” Liang said. “That catch-up process can be done in the background. Meanwhile, the system can deal with the new data online, while the background can still keep learning from the historic data.”

While Liang and Sintos have moved on to new positions in industry and academia, Krishnan’s group will continue building upon their work. One promising area is blending the AQP approach with machine learning in a mutually beneficial relationship.

“There’s this rich work in computational geometry and data structures that is often ignored, and they’re just a great tool to have,” Krishnan said. “They are space-efficient, they can be updated easily, they’re extremely flexible to capture different types of data and different data types. So how can we use them to make machine learning more resource efficient? I do think that Xi and Stavros’ work has pushed the envelope on approximation very significantly, much more significantly than I would have thought when this project started.”

Why UChicago CS

Our Building

Highlights

Quantum Computing

Data Science

Research Areas

Labs & Groups

Partners

Postdoctoral Researchers

Undergraduate

Master's Programs

PhD Programs

Resources

Community

Faculty

Diversity @ UChicago CS

Our BPC Plan

News

What’s Real and What’s Not? Watermarking to Identify AI-Generated Text

Enhancing Multitasking Efficiency: The Role of Muscle Stimulation in Reducing Mental Workload

From wildfires to bird calls: Sage redefines environmental monitoring

Events

Video

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

UChicago CS Research Finds New Angle on Database Query Processing with Geometry

Related News

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Postdoc Alum John Paparrizos Named ICDE Rising Star

UChicago and NYU Research Team Finds Edtech Tools Could Pose Privacy Risks For Students

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups

High School Students Find Their Place in Computing Through Wearables Workshop

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022