All Tags AWS ai algorithm-design architecture browser cloud cloud-efficiency cloud-principles cost-reduction data-centric data-compression data-processing deployment design documentation edge-computing email-sharing energy-efficiency energy-footprint enterprise-optimization green-ai hardware libraries llm locality machine-learning maintainability management measured microservices migration mobile model-optimization model-training multi-objective network-traffic parameter-tuning performance queries rebuilding scaling services storage-optimization strategies tabs template testing workloads

Tactic: Adopt Malleable Jobs

Tactic sort: Awesome Tactic

Type: Architectural Tactic

Category: resource-allocation

Tags: management performance

Title

Adopt Malleable Jobs

Description

Transition from fixed-size job scheduling to malleable jobs that can dynamically adjust their allocated resources (e.g., number of vCPUs or nodes) during execution based on system load and availability. Malleable jobs allow the system to expand or shrink job execution footprints in real time, thereby reducing energy waste, response time, and waiting time. This tactic is especially effective when paired with resource-aware monitoring and scheduling components like the Adaptive Batch Scheduler (ABS), which dynamically coordinates resource allocation to optimize performance and prevent overprovisioning or underutilization

Participant

Scientific software developers

Related software artifact

Slurm batch job scheduler, runtime resource monitor.

Context

HPC environments and scientific workflows using traditional batch systems (e.g., Slurm) where jobs are pre-assigned fixed resource allocations, often leading to underutilized or idle computational resources

Software feature

Job execution management, parallel task scheduling

Tactic intent

To reduce energy consumption and improve computational efficiency by dynamically adapting job sizes to current hardware resource availability

Target quality attribute

Energy efficiency

Other related quality attributes

< unknown >

Measured impact

waiting and response times, energy usage

Source

Stoico, Vincenzo and Voronovs, Dmitrijs and Malavolta, Ivano and Lago, Patricia, How Does Parallelism Impact the Energy Efficiency and Performance of High-Performance Scientific Software? The Case of Haddock (February 13, 2025). (DOI: http://dx.doi.org/10.2139/ssrn.5137167)