As the scale of scientific research grows larger, so too do the barriers. Data-intensive and computation-heavy work across all fields can benefit from parallel computing and high-performance resources, but not everyone knows how to scale up from a laptop to a supercomputer.
Since its debut in 2017, Parsl has sought to remove these obstacles, integrating with the popular Python programming language to make it simple to parallelize code and utilize large-scale computer systems around the world. Researchers have used the software to search the universe for dark energy, study traumatic brain injuries in MRI data, and comb through billions of molecules for potential COVID-19 therapeutics.
Now, with two new grants, the project led by University of Chicago, Argonne National Laboratory, and University of Illinois scientists enters a new phase of sustainability, community, and outreach. Funding from the National Science Foundation and the Chan Zuckerberg Initiative (CZI) will support the growing open-source community around Parsl, integrate the software with popular tools used by disciplines such as biology, astrophysics, and materials science, and expand education and engagement for new users from a variety of backgrounds.
“We want to build up a community-governed project that’s owned and managed by the community,” said Kyle Chard, Research Associate Professor at UChicago CS and lead of Parsl. “We want to see it having an impact pervasively across research, and we want to democratize the ability for researchers to be able to run their analyses, their simulations, their machine learning models at much larger scales than they can do today, enabling them to very easily go from their laptop to a cloud or institutional cluster all the way up to a supercomputer.”
Parsl was among the first recipients of a new class of NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) grants, Transition to Sustainability, aimed at helping established projects develop a sustainability plan for continuing to support science and engineering research. The award will help Parsl build upon an already robust community of GitHub contributors to the project, creating an on-ramp for new code and charter positions for representatives from different scientific domains. These project champions will function as a decentralized advisory and outreach committee for Parsl, help adapt the software for specialized use in new fields, and create online and in-person tutorials to increase the user base.
“We’ve been working on building this software to meet community needs for many years, and we’re excited to have this opportunity to be supported in transitioning the project and its governance to a more community-oriented approach,” said Daniel S. Katz, National Center for Supercomputing Applications (NCSA) Chief Scientist and Parsl co-founder in an NCSA article on the NSF grant.
The Chan Zuckerberg Initiative grant will also focus on new uses of Parsl in biomedical research, an area where the software has already shown great potential. In 2020, researchers from the Department of Energy and US Veteran Affairs used Parsl to conduct a massive genotyping study of data from nearly 500,000 subjects in the Million Veterans Program. Other groups have used the software to create workflows for analyzing medical images on supercomputers and studying whole-genome and transcriptome sequences to discover new risk factors and treatments for breast cancer.
In the next phase of the project, the Parsl team hopes to make it even easier for biomedical researchers to utilize the tool and scale up their research. The CZI award will enable the team to extend the software to work seamlessly with popular workflow languages such as WDL and CWL, so that researchers can create new programs from pre-published building blocks, simplifying the process of scaling research tasks from one computer to many.
“One of the things we’ve proposed is to integrate with tools that are widely used by the biomedical community,” Chard said. “Our hope is to augment that ecosystem with Parsl capabilities to basically enable biomedical workflows to run at much larger scales on increasingly heterogeneous computers. One of the driving philosophies behind Parsl is to meet researchers in their environment. That has underpinned our choice of Python and is influencing the direction here to integrate with common tools used in biomedicine.”
The expansion of the Parsl community was on display in September at Parsl and funcX Fest 2022, a two-day hybrid gathering of researchers, developers, and cyberinfrastructure experts from as far away as New Zealand and the United Kingdom. The event featured many lightning talks (viewable on YouTube) from users of Parsl and FuncX, its sister project that provides a platform for delegating computational tasks and data to remote resources.