PNNL Researchers Achieve High-Level Quality of Service
With the computing crunch caused by the generative AI (GenAI) boom, it’s more important than ever for scientific workflows to intelligently divvy out power.
Now, a research team from Pacific Northwest National Laboratory has successfully tackled the difficult task of scheduling scientific workflows while guaranteeing a certain quality of service (QoS) using machine reasoning and a novel decision tree algorithm. They focus on communication-intensive, data-driven scientific workflows where QoS scheduling can significantly influence end-to-end performance.
Not all computing workloads are created equal. A computational chemistry model may need a concentrated burst of computing power, while a regular application may need steady, predictable performance. Data backup can typically wait, while real-time data analysis cannot. New jobs arrive at odd hours and sometimes without warning. These challenges can stymie automated service providers.
Today, scheduling systems rely on prescribed, human-defined rules and estimates.
In contrast, QoSFlow works by partitioning multiple competing workflows into regions. Based on the availability of resources and target cost limits, the algorithm will devise an optimal path to execute the compute job to meet those goals.
“Laboratories are moving toward automation and AI-guided research,” said Nathan Tallent, the PNNL researcher who led the work. “Right now, you have to write custom scripts and plan out how everything is going to be orchestrated in a workflow. But what you'd rather do is be able to input a high-level objective and say, ‘I need to get there as fast as possible,’ or ‘I need to just do it as soon as possible but never use this other resource because it's too expensive.’ This tool provides a path to get to this level of specificity for automated laboratories.”
A recent study with three different complex workflows showed that QoSFlow’s recommendations outperform the best-performing benchmark by 27 percent.
The team presented the research at the International Parallel and Distributed Processing Symposium (IPDPS) in New Orleans, LA. The conference is organized annually by IEEE Computer Society’s Technical Committee on Parallel Processing. Team members include interns from the University of Delaware and Illinois Institute of Technology and are led by Nathan Tallent, along with colleagues Jesun Firoz, Lenny Guo and Zhen Peng of PNNL.
Tallent also presented a keynote address: Automating Performance Optimization of Data Flow Within HPC Workflows. The presentation centered on DataFlowDrs, a new comprehensive suite of tools for optimizing data flow and storage in high-performance computing workflows.
This research was supported in part by the National Science Foundation and by the Department of Energy, Office of Advanced Scientific Computing Research, through the AT SCALE initiative at PNNL.
Other research outcomes led by Tallent presented at IPDPS include:
- PowerMorph, a solution designed to optimize hardware resources and AI training processes to efficiently manage energy consumption
- CARAT: “Client-Side Adaptive RPC and Cache Co-Tuning” for Parallel File Systems
- Accelerating AI Compression through Lightweight Lossless Encoding and Pipelined Workflows
- Characterizing Dataflow for I/O-Aware Scheduling in HPC Workflows.
Legal Disclaimer:
EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.