Over the last decade, the bottleneck for data analytics has shifted from the collection of information to the analysis of increasingly massive and unwieldy datasets. The gap is only growing as the Internet of Things brings online more devices capable of relentless data collection, from smart electric meters to cheap home sensors. Methods designed for analyzing or comparing a format as straightforward as time series data become untenable when applied to thousands or millions of time series, forcing researchers to work with compressed or reduced data.

With a new fellowship from data services company NetApp, UChicago CS postdoctoral researcher John Paparrizos hopes to reduce this compromise, with new approaches that enable multi-faceted analysis on compressed, large-scale data. By making it possible to run clustering, classification, prediction, and other analytic tasks on data while it is still compressed, these approaches can help researchers avoid the headaches of working with raw, overabundant data without sacrificing the fine detail of observations.

Specifically, Paparrizos — a researcher in the laboratory of Liew Family Chair of Computer Science Michael J. Franklin — will build a unified approach to support several different analytic tasks on compressed data: indexing, classification, clustering, sampling, and visualization. Previously, papers have largely focused on specialized approaches that handle one task at a time and for a particular dataset in mind, making it hard for users to generalize these approaches in different settings and applications.

“For example, when an algorithm requires the use of a particular distance measure to compare time series, you have limitations on what kind of compression method you can use and, therefore, what kind of indexing mechanism you can use to accelerate the computation,” Paparrizos said. “In this project, our goal is to automatically learn to effectively compress time series such that the low-dimensional data are compatible with classic, well-studied, indexing mechanisms and, importantly, preserve the invariance to time-series distortions offered by user-defined comparison methods in the high-dimensional space.”

The project will evaluate the effectiveness of that approach on datasets from two real-world applications — high-resolution energy usage information collected by utility companies from smart meters and image data from satellites capturing Earth’s surface over time. Currently, researchers often need to reduce the dimensionality of these datasets in order to conduct comparisons and other analyses, losing accuracy in the process.

“Most of the highly accurate algorithms are very difficult to scale when you have databases with more than 100,000 time series, so for millions of time series, you need to find better ways to compress the data in order to offer a scalable solution,” Paparrizos said. “The challenge is to demonstrate minimal loss in accuracy while performing analytics on large-scale time-series collections.”

After development and testing, Paparrizos will then work to integrate the new methods into popular large-scale analytics software, such as Apache Spark. The NetApp fellowship provides funding for one year of work on the project. To read more about Paparrizos’ fellowship, visit the NetApp website.

Related News

More UChicago CS stories from this research area.
UChicago CS News

New 2022-23 CS Faculty Add Expertise in Linguistics, Visualization, Economics, and Data Science Education

Aug 11, 2022
In the News

UChicago Co-Leads $10 Million NSF Institute on Foundations of Data Science

Aug 09, 2022
UChicago CS News

UChicago CS Faculty Receive Industry Grants From J.P. Morgan, Google

Jul 19, 2022
In the News

Bill Fefferman Comments on New Standards for Quantum-Proof Cryptography

Jul 07, 2022
UChicago CS News

UChicago London Colloquium Features Data Science, Quantum Research

Jul 01, 2022
UChicago CS News

Faculty Bill Fefferman and Chenhao Tan Receive Google Research Scholar Awards

Jun 21, 2022
UChicago CS News

UChicago CS Chair Michael Franklin Part of SIGMOD Award-Winning Team

Jun 15, 2022

Data Science Institute Summit

Jun 15, 2022
UChicago CS News

DSI Summer Lab Returns In-Person With 49 Students From Across the U.S.

Jun 14, 2022
UChicago CS News

First-Year PhD Student Co-Authors Outstanding Paper Award Winner at TQC 2022

Apr 28, 2022
UChicago CS News

University of Chicago Named in National Science Foundation’s $20 Million CONECT Award under the Forthcoming ACCESS Program

Apr 27, 2022
UChicago CS News

UChicago CS Labs Join Museum of Science & Industry For Robot Block Party

Apr 20, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube