All Tags AWS ai algorithm-design architecture browser cloud cloud-efficiency cloud-principles cost-reduction data-centric data-compression data-processing deployment design documentation edge-computing email-sharing energy-efficiency energy-footprint enterprise-optimization green-ai hardware libraries llm locality machine-learning maintainability management measured microservices migration mobile model-optimization model-training multi-objective network-traffic parameter-tuning performance queries rebuilding scaling services storage-optimization strategies tabs template testing workloads

Tactic: Using Adaptive Response for Sustainable LLM Inference

Tactic sort: Awesome Tactic

Type: Architectural Tactic

Category: green-ml-enabled-systems

Tags: machine-learning workloads

Title

Using Adaptive Response for Sustainable LLM Inference

Description

Dynamically adjusts the behavior of the LLM inference process (e.g., precision, token generation strategy, number of beams, early stopping) based on real-time workload, input length, or user device constraints to reduce energy usage and improve sustainability under varying conditions

Participant

AI engineers

Related software artifact

Inference scheduler, model runtime configuration logic

Context

LLMs deployed in heterogeneous environments such as edge devices, public cloud, or user-facing applications where load varies over time

Software feature

Inference scheduling, Adaptive computation configuration

Tactic intent

To improve energy efficiency by adapting inference complexity to the needs of the task or system status

Target quality attribute

Energy efficiency

Other related quality attributes

< unknown >

Measured impact

Energy consumption per inference Latency vs. throughput trade-offs

Source

Pelin R. Kuran, Improving the Environmental Sustainability of Large Language Model Inference: A Rapid Review (DOI: https://drive.google.com/file/d/1jOcGP65anFemXiHKSa3bhyScSEmEcY4o/view)