While data science enables rapid societal advancement, deferring decisions to machines does not automatically avoid egregious equity or privacy violations. Without safeguards in the scientific process — from data collection to algorithm design to model deployment — machine learning models can easily inherit or amplify existing biases and vulnerabilities present in society. My research focuses on explicitly encoding algorithms with ethical norms and constructing frameworks ensuring that statistics and machine learning methods are deployed in a socially responsible manner. In particular, I develop theoretically rigorous and empirically verified algorithms to mitigate automated bias and protect individual privacy.
I will highlight this through two main contributions:
(1) A new oracle-efficient and convergent algorithm to provably achieve minimax group fairness — fairness measured by worst-case outcomes across groups — in general settings (“Minimax Group Fairness: Algorithms and Experiments,” https://dl.acm.org/doi/10.1145/3461702.3462523).
(2) A framework for producing a sensitive attribute proxy that allows one to train a fair model even when the original sensitive features are not available (“Multiaccurate Proxies for Downstream Fairness,” https://dl.acm.org/doi/10.1145/3531146.3533180).
Emily Diana is a Ph.D. candidate in Statistics and Data Science at the Wharton School of the University of Pennsylvania, where her research focuses on the intersection of ethical algorithm design and socially responsible machine learning. She holds a B.A. in Applied Mathematics from Yale College and an M.S. in Statistics from Stanford University. Before graduate school, she spent two years as a software developer at Lawrence Livermore National Laboratory (LLNL), working on high-performance computing and government finite element physics simulation codes. She is honored to be the 2022 recipient of Wharton’s J. Parker Memorial Bursk Prize for Excellence in Research and to have been recognized as both a 2022 Future Leader in Data Science by the Michigan Institute for Data Science and a 2021 Rising Star in EECS by MIT.