Date & Time:
July 18, 2022 3:00 pm – 4:00 pm
Location:
Crerar 346, 5730 S. Ellis Ave., Chicago, IL,
07/18/2022 03:00 PM 07/18/2022 04:00 PM America/Chicago Lydia Lucchesi (ANU) – Smallset Timelines: A Visual Representation of Data Preprocessing Decisions UChicago HCI Club Seminar Crerar 346, 5730 S. Ellis Ave., Chicago, IL,

Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A “Smallset” is a small selection of rows from the original dataset containing instances of dataset alterations. The Timeline is comprised of Smallset snapshots representing different points in the preprocessing stage and captions to describe the alterations visualised at each point. Edits, additions, and deletions to the dataset are highlighted with colour. We develop the R software package, smallsets, that can create Smallset Timelines from R and Python data preprocessing scripts. Constructing the figure asks practitioners to reflect on and revise decisions as necessary, while sharing it aims to make the process accessible to a diverse range of audiences. We present two case studies to illustrate use of the Smallset Timeline for visualising preprocessing decisions. Case studies include software defect data and income survey benchmark data, in which preprocessing affects levels of data loss and group fairness in prediction tasks, respectively. We envision Smallset Timelines as a go-to data provenance tool, enabling better documentation and communication of preprocessing tasks at large.

Speakers

Card Image 1b942aa

Lydia Lucchesi

PhD Student, Australia National University

Lydia is a PhD Candidate in Computer Science at the Australian National University. She completed a BA in statistics at the University of Missouri, USA, followed by a post-bachelor fellowship at the Institute for Health Metrics and Evaluation. Her current research focuses on the visualisation of data quality. She is a co-developer of the Vizumap R package, a toolkit for visualising uncertainty in spatial data.

Related News & Events

Card Image 4c9aba8
Card Image
UChicago CS News

Five UChicago CS students named to Siebel Scholars Class of 2024

Card Image 0d400e8
Card Image
UChicago CS News

UChicago Computer Scientists Design Small Backpack That Mimics Big Sensations

Card Image d8cafb6
Card Image
UChicago CS News

UChicago Team Wins The NIH Long COVID Computational Challenge

Card Image 3e82f49
Card Image
UChicago CS News

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Card Image 1d63130
Card Image
UChicago CS News

Computer Science Class Shows Students How To Successfully Create Circuit Boards Without Engineering Experience

Card Image b33c650
Card Image
UChicago CS News

UChicago CS Researchers Shine at CHI 2023 with 12 Papers and Multiple Awards

Card Image 18b0a1b
Card Image
UChicago CS News

New Prototypes AeroRigUI and ThrowIO Take Spatial Interaction to New Heights – Literally

Card Image 370efe4
Card Image
UChicago CS News

PhD Student Kevin Bryson Receives NSF Graduate Research Fellowship to Create Equitable Algorithmic Data Tools

Card Image 6876e12
Card Image
UChicago CS News

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

Card Image 27d8be8
Card Image
UChicago CS News

UChicago / School of the Art Institute Class Uses Art to Highlight Data Privacy Dangers

Card Image e876952
Card Image
UChicago CS News

UChicago, Stanford Researchers Explore How Robots and Computers Can Help Strangers Have Meaningful In-Person Conversations

Card Image 2407ca0
Card Image
UChicago CS News

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube