Over the last decade, the bottleneck for data analytics has shifted from the collection of information to the analysis of increasingly massive and unwieldy datasets. The gap is only growing as the Internet of Things brings online more devices capable of relentless data collection, from smart electric meters to cheap home sensors. Methods designed for analyzing or comparing a format as straightforward as time series data become untenable when applied to thousands or millions of time series, forcing researchers to work with compressed or reduced data.

With a new fellowship from data services company NetApp, UChicago CS postdoctoral researcher John Paparrizos hopes to reduce this compromise, with new approaches that enable multi-faceted analysis on compressed, large-scale data. By making it possible to run clustering, classification, prediction, and other analytic tasks on data while it is still compressed, these approaches can help researchers avoid the headaches of working with raw, overabundant data without sacrificing the fine detail of observations.

Specifically, Paparrizos — a researcher in the laboratory of Liew Family Chair of Computer Science Michael J. Franklin — will build a unified approach to support several different analytic tasks on compressed data: indexing, classification, clustering, sampling, and visualization. Previously, papers have largely focused on specialized approaches that handle one task at a time and for a particular dataset in mind, making it hard for users to generalize these approaches in different settings and applications.

“For example, when an algorithm requires the use of a particular distance measure to compare time series, you have limitations on what kind of compression method you can use and, therefore, what kind of indexing mechanism you can use to accelerate the computation,” Paparrizos said. “In this project, our goal is to automatically learn to effectively compress time series such that the low-dimensional data are compatible with classic, well-studied, indexing mechanisms and, importantly, preserve the invariance to time-series distortions offered by user-defined comparison methods in the high-dimensional space.”

The project will evaluate the effectiveness of that approach on datasets from two real-world applications — high-resolution energy usage information collected by utility companies from smart meters and image data from satellites capturing Earth’s surface over time. Currently, researchers often need to reduce the dimensionality of these datasets in order to conduct comparisons and other analyses, losing accuracy in the process.

“Most of the highly accurate algorithms are very difficult to scale when you have databases with more than 100,000 time series, so for millions of time series, you need to find better ways to compress the data in order to offer a scalable solution,” Paparrizos said. “The challenge is to demonstrate minimal loss in accuracy while performing analytics on large-scale time-series collections.”

After development and testing, Paparrizos will then work to integrate the new methods into popular large-scale analytics software, such as Apache Spark. The NetApp fellowship provides funding for one year of work on the project. To read more about Paparrizos’ fellowship, visit the NetApp website.

Related News

More UChicago CS stories from this research area.
No Name

UChicago Team Wins The NIH Long COVID Computational Challenge

Jun 28, 2023
No Name

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Jun 27, 2023
No Name

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Apr 14, 2023
No Name

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

Apr 07, 2023
No Name

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

Apr 03, 2023
Students posing at competition
No Name

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Mar 17, 2023
No Name

Postdoc Alum John Paparrizos Named ICDE Rising Star

Mar 15, 2023
Young students on computers
No Name

UChicago and NYU Research Team Finds Edtech Tools Could Pose Privacy Risks For Students

Feb 21, 2023
Garcia sitting in a jet engine
No Name

Student Spotlight: Gabi Garcia’s Bridge Between CS and Classics

Jan 30, 2023
No Name

UChicago Launches Transform Accelerator for Data Science & Emerging AI Startups

Jan 19, 2023
Two students looking at a wearable device
No Name

High School Students Find Their Place in Computing Through Wearables Workshop

Jan 13, 2023
Haifeng Xu
No Name

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022

Nov 28, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube