UChicago Leads New Research Coordination Network for Promoting CS Reproducibility
A critical, but oft-neglected, piece of the scientific method is reproducibility, the confirmation of past results through repeating experiments. As science has grown larger and more technically complex, these reproductions become more difficult and expensive, particularly with rising usage of big data and high-performance computing.
With a $1.5 million grant from the National Science Foundation (NSF), University of Chicago scientists will lead a new research coordination network to incentivize reproducibility in computer science. Called REPETO, the esperanto word for “repeat,” the new network’s leadership also includes the University of California, Santa Cruz and New York University.
UChicago CS affiliates on the grant include lead principal investigator Kate Keahey, Senior Scientist at The University of Chicago Consortium for Advanced Science and Engineering (UChicago CASE) and Argonne National Laboratory, and Haryadi Gunawi, associate professor of computer science.
Keahey emphasizes that the project will focus on promoting “practical reproducibility,” new technologies and incentives that encourage rerunning experiments for education and research.
“For many years, a huge barrier to repeating somebody else’s experiment was that you might not have the same level of access to the same type of hardware that they did their experiment on” Keahey said. “The fact that the NSF now funds open testbeds supporting a broad range of computer science experiments means that this obstacle goes away – but we still face barriers in the shape of cost of repeating an experiment and lack of community practices. Our objective is to make reproducibility practical by both lowering its cost and aligning it with mainstream research and education activities.”
NSF-funded platforms such as Keahey’s Chameleon, a cloud computing testbed, FABRIC, and PAWR now provide researchers with the resources they need to reproduce large-scale computational experiments. Many computer science conferences and publications now also require scientists to package their research – including data, methods, and the computing environment – for easy replication by outside observers.
But incentivizing the use of those resources in a field that rewards novelty over verification requires new strategies. The REPETO steering committee will gather representatives from NSF testbeds, conference organizers, and educators to promote the use of shared infrastructure and electronic artifacts, create new educational methodologies and programs, and work with existing conferences and workshops to facilitate opportunities for creating reproducible research.
One of these programs includes a “Summer of Reproducibility,” scheduled to start in 2023 and inspired by Google’s popular Summer of Code, but one where students would work on activities that reproduce published research. The project will also organize hackathons around reproducibility, and create new curricula that draw upon testbed resources and make it possible for students to recreate large-scale experiments in their coursework.
“This approach will be a much more engaging way to learn and to explore existing research, as it draws you into the ongoing scientific debate,” Keahey said. “We’re going to produce examples on how to teach using reproducibility, create sets of best practices and methodology guides, and repositories of experiments packaged for reproducibility.”
The REPETO grant was funded as part of the NSF Findable Accessible Interoperable Reusable (FAIR) Open Science Research Coordination Networks (FAIROS RCN) initiative. The $12.5 million program funded 10 new project groups that “advance the means by which investigators can share information and ideas, coordinate ongoing or planned research activities, foster synthesis and new collaborations, develop community standards, and in other ways advance science and education through communication and sharing of research products.”