It may not sound like the most exciting role, but the humble “scheduler” holds the key to the future of large-scale computing. Supercomputers, data centers, and even modern personal computers all benefit today from the concept of parallel computing, running multiple tasks at the same time instead of sequentially. This produces tremendously faster computing, but also dramatically scales up the complexity of deciding the optimal order to run as many as millions of interdependent jobs.

For decades, schedulers got by with a “greedy” algorithm — a short-term strategy which simply grabs the next task in line whenever resources become available. But as high-performance computing and data center operators worry more about energy costs and the end of Moore’s Law, this simple solution may no longer suffice.

At the 2018 Supercomputing conference in Dallas, a team of University of Chicago researchers presented an alternate approach to scheduling known as “Divide and Conquer.”  The algorithm, developed by graduate students Gokalp Demirci, Ivana Marincic, and David Kim with associate professor of computer science Henry Hoffmann, applies a longer-term perspective and exploits configurable resources to achieve better results while adhering to a strict cap on energy consumption.

The algorithm is the first improvement on an approximation for scheduling with resource and precedence constraints since 1975, Hoffmann said.

“I think that most systems people figured these greedy algorithms work, and it's close to the best thing we can find in most situations, so I’ll just keep using greedy schedulers. That's why nobody's looked at it for 43 years, but now is really the time to solve it,” Hoffmann said. “The combination of power management and exascale means this is the right time.”

A More Efficient Exascale

As high-performance computing reaches for the exascale — systems that can run one billion billion calculations per second — experts emphasize that merely building bigger computers is no longer the solution. If today’s petascale machines were simply scaled up a thousandfold, they would consume 200 megawatts of power, as much as roughly 130,000 residential homes. The Department of Energy has proposed capping the energy consumption of exascale systems at 20-40 megawatts, presenting a difficult engineering challenge for both hardware and software.

Improving schedulers could be low-hanging fruit to help meet these goals. While full optimization is prohibitively complex — calculating the best schedule for a supercomputer would require a second supercomputer, Hoffmann said — there’s still plenty of room to improve beyond the simple, greedy algorithm.

Further help comes from the improved ability to fine-tune how a large computing system delegates power. If given a very large task and a very small task that can run concurrently, more resources can be directed to the large task so that the two finish at roughly the same time, freeing up more space for subsequent jobs.

That last idea is exploited by the “Divide and Conquer” algorithm, which looks to the future to put the emphasis on when tasks will end, instead of just starting what’s available whenever resources are free. For a typical computing job, the algorithm looks at the full workload, commonly depicted with nodes and edges as a directed acyclic graph (DAG), and continually, recursively divides the problem into subproblems. Those subproblems can then be organized to run concurrently where possible, and assigned different amounts of resources so that they finish together, leaving more resources available to start the next group of subproblems.

When tested against greedy approaches on DAGs of up to 10,000 nodes in a simulated supercomputer, the Divide and Conquer approach improved performance by as much as 75 percent. That’s more than just a footrace, as the faster performance was achieved using the same amount of power, suggesting significant gains in energy efficiency.

“It's neat because every scheduler of practical use was greedy, and this one is not. And we get a big win by not being greedy,” Hoffmann said. “Divide and Conquer will be harder to implement, so that's going to be an issue, but our first results are really promising and indicate this is actually going to be very valuable in practice as well.”

A Bridge Between Theory and Architecture

The advance was made possible by a productive collaboration across research areas within UChicago CS. Demirci and Kim — graduate students studying theoretical computer science with Professors Janos Simon and Laszlo Babai, respectively — started working on the problem in Hoffmann’s Computer Architecture class, partnering with Hoffmann’s systems computer science student Marincic.

“I always work with theory and machine learning students to figure out how we can relate what they’re doing to computer architecture,” Hoffmann said. “We said, this is what the algorithms are doing, this is what we're doing in systems, and there's a gap, so why don't we see if we can bridge that gap?”

“At Chicago, we have this historically extremely strong theory department and our systems are strong, but newer. I thought this was a great project that brings the old strength and the new strength together.”

The group’s first paper, “Approximation Algorithms for Scheduling with Resource and Precedence Constraints” was presented at the Symposium on Theoretical Aspects of Computer Science (STACS) in March 2018. The Supercomputing paper, “A Divide and Conquer Algorithm for DAG Scheduling under Power Constraints,” was presented Wednesday, November 14th.

Related News

More UChicago CS stories from this research area.
UChicago CS News

Professor Fred Chong Named IEEE Fellow

Dec 09, 2022
UChicago CS News

Associate Professor Diana Franklin Named ACM Distinguished Member

Dec 07, 2022
In the News

UChicago CS Researchers Share in Special Prize on COVID-19 Research

Dec 01, 2022
Haifeng Xu
UChicago CS News

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022

Nov 28, 2022
UChicago CS News

UChicago’s Parsl Project Pivots to Sustainability and Community with New Grants

Nov 17, 2022
man browsing Netflix
UChicago CS News

Trending Now: How Netflix Chills Our Free Will

Nov 14, 2022
UChicago CS News

UChicago CS Research Finds New Angle on Database Query Processing with Geometry

Nov 08, 2022
In the News

Alumnus Pranav Gokhale Named to Crain’s 40 Under 40

Nov 07, 2022
UChicago CS News

Prof. Diana Franklin Discusses Quantum Computing Education on Entangled Things Podcast

Nov 03, 2022
UChicago CS News

UChicago AI Summit Examines Promise and Concerns for Science and Society

Nov 01, 2022
UChicago CS News

New Schmidt Futures Fellowship at UChicago to Foster Next Generation of AI-Driven Scientists

Oct 26, 2022
UChicago CS News

New UpDown Project Uses “Intelligent Data Movement” to Accelerate Graph Analytics

Oct 21, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube