All Tags
AWS
ai
algorithm-design
architecture
browser
cloud
cloud-efficiency
cloud-principles
cost-reduction
data-centric
data-compression
data-processing
deployment
design
documentation
edge-computing
email-sharing
energy-efficiency
energy-footprint
enterprise-optimization
green-ai
hardware
libraries
llm
locality
machine-learning
maintainability
management
measured
microservices
migration
mobile
model-optimization
model-training
multi-objective
network-traffic
parameter-tuning
performance
queries
rebuilding
scaling
services
storage-optimization
strategies
tabs
template
testing
workloads
Tactic: [Using Adaptive Response for Sustainable LLM Inference]
Tactic sort:
Awesome Tactic
Type: Architectural Tactic
Category: green-ml-enabled-systems
Title
[Using Adaptive Response for Sustainable LLM Inference]
Description
Dynamically adjusts the behavior of the LLM inference process (e.g., precision, token generation strategy, number of beams, early stopping) based on real-time workload, input length, or user device constraints to reduce energy usage and improve sustainability under varying conditions
Participant
AI engineers
Related software artifact
Inference scheduler, model runtime configuration logic
Context
LLMs deployed in heterogeneous environments such as edge devices, public cloud, or user-facing applications where load varies over time
Software feature
Inference scheduling, Adaptive computation configuration
Tactic intent
To improve energy efficiency by adapting inference complexity to the needs of the task or system status
Target quality attribute
Energy efficiency
Other related quality attributes
< unknown >
Measured impact
Energy consumption per inference Latency vs. throughput trade-offs
