Date & Time:
July 18, 2022 3:00 pm – 4:00 pm
Crerar 346, 5730 S. Ellis Ave., Chicago, IL,
07/18/2022 03:00 PM 07/18/2022 04:00 PM America/Chicago Lydia Lucchesi (ANU) – Smallset Timelines: A Visual Representation of Data Preprocessing Decisions UChicago HCI Club Seminar Crerar 346, 5730 S. Ellis Ave., Chicago, IL,

Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A “Smallset” is a small selection of rows from the original dataset containing instances of dataset alterations. The Timeline is comprised of Smallset snapshots representing different points in the preprocessing stage and captions to describe the alterations visualised at each point. Edits, additions, and deletions to the dataset are highlighted with colour. We develop the R software package, smallsets, that can create Smallset Timelines from R and Python data preprocessing scripts. Constructing the figure asks practitioners to reflect on and revise decisions as necessary, while sharing it aims to make the process accessible to a diverse range of audiences. We present two case studies to illustrate use of the Smallset Timeline for visualising preprocessing decisions. Case studies include software defect data and income survey benchmark data, in which preprocessing affects levels of data loss and group fairness in prediction tasks, respectively. We envision Smallset Timelines as a go-to data provenance tool, enabling better documentation and communication of preprocessing tasks at large.


Lydia Lucchesi

PhD Student, Australia National University

Lydia is a PhD Candidate in Computer Science at the Australian National University. She completed a BA in statistics at the University of Missouri, USA, followed by a post-bachelor fellowship at the Institute for Health Metrics and Evaluation. Her current research focuses on the visualisation of data quality. She is a co-developer of the Vizumap R package, a toolkit for visualising uncertainty in spatial data.

Related News & Events

UChicago CS News

The Computing Pipeline: A Foundation for Diversifying Computer Science

Nov 28, 2022
man browsing Netflix
UChicago CS News

Trending Now: How Netflix Chills Our Free Will

Nov 14, 2022
UChicago CS News

UChicago CS Research Finds New Angle on Database Query Processing with Geometry

Nov 08, 2022
UChicago CS News

UChicago AI Summit Examines Promise and Concerns for Science and Society

Nov 01, 2022
UChicago CS News

UChicago Research Tests Whether Robots or Humans Are Better Game Partners

Oct 18, 2022
UChicago CS News

Five UChicago CS Students Named to Siebel Scholars 2023 Class

Sep 22, 2022
UChicago CS News

UChicago CS Students Emily Wenger and Xu Zhang Receive Harper Fellowships

Sep 14, 2022
UChicago CS News

First In-Person Robotics Class Lets Students See Code Come To (Artificial) Life

Sep 06, 2022
UChicago CS News

High School Students in College Prep Program Visit UChicago CS

Aug 23, 2022
UChicago CS News

New 2022-23 Faculty Add Expertise in Linguistics, Visualization, Economics, and Data Science Education

Aug 11, 2022
UChicago CS News

Head’s Up: UChicago CS Grad Student Designs Device That Directs User’s Head

Jul 26, 2022
UChicago CS News

UChicago CS Faculty Receive Industry Grants From J.P. Morgan, Google

Jul 19, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube