When any successful project attracts a large community, there will be growing pains. Since its launch in 2015, the experimental cloud computing testbed Chameleon has realized its vision of becoming a shared scientific instrument for computer science research and education. The project, led by Argonne Senior Scientist and UChicago CASE Affiliate Kate Keahey, has supported over 800 projects and 6,000 users from around the world and developed several new applications and deployments.
But until 2020, it was still using its original, basic identity and access management system, which created unnecessary hurdles for adoption and use. A new system would ideally allow for single sign-on across the Chameleon ecosystem as well as federated identity support — similar to the option of using Google, Facebook, or a University account to log into various websites and apps. Changing the sign-on system for a project this large without disruption is no small feat; in a new paper with former Chameleon DevOps lead Jason Anderson, Keahey described it as “tantamount to rebuilding the foundation under a skyscraper with thousands of inhabitants.”
The writeup of this epic moving operation received a Best Paper Award at the 2022 Practice and Experience in Advanced Research Computing (PEARC) conference, one of the leading supercomputing meetings. The honor recognizes that their approach doesn’t just benefit Chameleon and its users, but could help other projects make similar changes with minimal disruption.
“We were five years into operating the system, and our users created thousands of artifacts — images, orchestration templates, datasets — that were tied to their identity,” Keahey said. “We asked, how do we now port it to a completely different identity management system and preferably while the testbed is still operating? We tried a few solutions, but they were all either inefficient or brittle or did not scale.”
The team eventually arrived at a two-tiered architecture, combining a single sign-on solution built using the open source software Keycloak with a federated identity system provided by fellow University of Chicago project Globus. The former allows Chameleon users to use a single username and password to access the system through multiple routes, including Jupyter notebooks and the command-line interface. The latter allows for those login details to be connected with the user’s host institution login or providers such as Google, ORCiD, and the project’s original identity system from the Texas Advanced Computing Center.
The system created a smoother experience for established users, as well as an easier on-ramp for new Chameleon users and new “associate sites” deploying and operating their own versions of the Chameleon testbed using the project’s CHI-in-a-box software package. The new approach to Chameleon account management is fundamental to Chameleon becoming a federation rather than just a testbed — and that ultimately lowers the cost of computer science experimentation, Keahey said.
“With the single sign-on and access via federated identity, we can scale to many more sites,” Keahey said. “The fact that other people are now deploying Chameleon in their institutions means that we’ve managed to bring the cost of operating resources for computer science research down sufficiently, so that it is now feasible to operate them without a huge barrier of expertise.”
Building the new system was the first challenge. The second was migrating Chameleon’s thousands of users, the majority of which use the testbed on an intermittent basis. Many accounts use the cloud testbed intensively for a few weeks while conducting research or taking a class, but then stop using it for several months or years.
So during the initial migration in late 2020, the team chose a multi-phase approach, first asking active users to opt-in to the new system when they logged into Chameleon, then later requiring them to opt-out of the switch if they still wanted to use their old accounts. Many active users moved over to the new system during this initial window, and continue to migrate to their new accounts in a steady drip, all without any loss of the assets tied to their original log-in and without disruption to Chameleon operations.
“The large part of the innovation is architecture that supports this notion of continuous migration,” Keahey said. “It’s not a ‘use it or lose it’ kind of thing, it stays there and accommodates those users who come back after months or years away.”
The PEARC award acknowledges the technical accomplishments of this smooth rollout, and spotlights the work for other large systems and research communities facing account management and migration challenges. For Keahey, it’s another satisfying step in constantly building and improving Chameleon as a scientific instrument for the entire computer science community.
“As individual researchers, you can only do so much,” Keahey said. “But if you build a testbed, and there’s thousands of researchers working on it, your work is a multiplier for all the cool stuff that they can do.”