When people ask a question of a database, they don’t always want or need an exact answer. In fast-moving industries such as finance or in conditions where the data is always growing and changing, an approximate answer delivered quickly is often good enough. Approximate query processing, or AQP, seeks approaches that balance speed, efficiency, and accuracy in situations where a “perfect” calculation is unnecessary.

A new paper from the research group of UChicago CS assistant professor Sanjay Krishnan addresses this goal with a new solution: JanusAQP. In work accepted for next year’s IEEE International Conference on Data Engineering (ICDE), former postdoctoral researcher Stavros Sintos and PhD graduate Xi Liang joined Krishnan to create a new system that quickly and efficiently delivers query results for dynamic databases.

The work capped three years of research that combined Sintos’ expertise in computational geometry with Liang’s pursuit of new algorithmic approaches for database structures and query processing. Their collaboration produced a fresh perspective on a problem that has faced database researchers for nearly 40 years, Krishnan said.

Sanjay Krishnan, Assistant Professor of Computer Science.
Sanjay Krishnan, Assistant Professor of Computer Science.

“People have long thought that AQP is a great idea; we want to cut down resources, we want to make things faster,” Krishnan said. “But there are challenges of sparsity and dynamism. Approximations are really good at capturing general properties of your data, but often, you’re interested in outliers, or very rare things that are happening inside your data set. And with real world data, any use case that is large means that it’s usually accumulating data over time. This paper is the closest to a practical AQP system that really addresses these two challenges that you will see out there.”

The project began with a 2020 paper by Liang, now a software engineer at Databricks, with Krishnan, former postdoctoral researcher Zechao Shang, and UChicago CS faculty Aaron Elmore and Michael Franklin. In that paper, accepted at the ACM Special Interest Group on Management of Data (SIGMOD) conference, the team described a framework for dealing with missing data when calculating basic data queries such as SUM, COUNT, and AVG.

Their solution relied upon a partitioned data structure that broke the full dataset down into smaller parts for faster analysis. But the strategy required a high precomputation cost and only worked with static data, Liang said.

Enter Sintos, now an assistant professor at the University of Illinois Chicago, who brought his knowledge in computational geometry to bear on the partitioning approach. A 2021 paper described a new data structure akin to a “tree” of pre-computed data aggregates representing the overall dataset, which can then be quickly sampled for approximate results.

“Data has geometry,” Sintos said. “You can imagine that there are some parts of the space that the data lie on that are more important than others. So instead of just taking uniform samples over everything, we focus on the areas that have more data, or where the value of data is high. We precompute some statistics by building a partition over the data, splitting the data into smaller buckets. Then when we have a query, we can use the samples from these buckets along with the precomputed statistics, and in the end, we combine all of these together to return the final result.”

For the JanusAQP system, the authors further improved these tree partition structures to work with dynamic data, where new information is continuously added. Liang, Sintos, and Krishnan built a “catch-up” process to ingest new data and rebuild the structure while still remaining responsive to queries in real time.

“The system has to somehow place the new data into the data structure that has already been learned,” Liang said. “That catch-up process can be done in the background. Meanwhile, the system can deal with the new data online, while the background can still keep learning from the historic data.”

While Liang and Sintos have moved on to new positions in industry and academia, Krishnan’s group will continue building upon their work. One promising area is blending the AQP approach with machine learning in a mutually beneficial relationship.

“There’s this rich work in computational geometry and data structures that is often ignored, and they’re just a great tool to have,” Krishnan said. “They are space-efficient, they can be updated easily, they’re extremely flexible to capture different types of data and different data types. So how can we use them to make machine learning more resource efficient? I do think that Xi and Stavros’ work has pushed the envelope on approximation very significantly, much more significantly than I would have thought when this project started.”

Related News

More UChicago CS stories from this research area.
Garcia sitting in a jet engine
UChicago CS News

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

Jan 30, 2023
UChicago CS News

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups

Jan 19, 2023
Two students looking at a wearable device
UChicago CS News

High School Students Find Their Place in Computing Through Wearables Workshop

Jan 13, 2023
Haifeng Xu
UChicago CS News

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022

Nov 28, 2022
UChicago CS News

Asst. Prof. Aloni Cohen Receives Award For Revealing Flaws in Deidentifying Data

Sep 09, 2022
UChicago CS News

UChicago Hosts NSF Workshop on Frontiers of Quantum Advantage

Aug 15, 2022
UChicago CS News

New 2022-23 Faculty Add Expertise in Linguistics, Visualization, Economics, and Data Science Education

Aug 11, 2022
In the News

UChicago Co-Leads $10 Million NSF Institute on Foundations of Data Science

Aug 09, 2022
UChicago CS News

UChicago CS Faculty Receive Industry Grants From J.P. Morgan, Google

Jul 19, 2022
In the News

Bill Fefferman Comments on New Standards for Quantum-Proof Cryptography

Jul 07, 2022
UChicago CS News

UChicago London Colloquium Features Data Science, Quantum Research

Jul 01, 2022
UChicago CS News

Faculty Bill Fefferman and Chenhao Tan Receive Google Research Scholar Awards

Jun 21, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube