All Tags
AWS
ai
algorithm-design
architecture
browser
cloud
cloud-efficiency
cloud-principles
cost-reduction
data-centric
data-compression
data-processing
deployment
design
documentation
edge-computing
email-sharing
energy-efficiency
energy-footprint
enterprise-optimization
green-ai
hardware
libraries
llm
locality
machine-learning
maintainability
management
measured
microservices
migration
mobile
model-optimization
model-training
multi-objective
network-traffic
parameter-tuning
performance
queries
rebuilding
scaling
services
storage-optimization
strategies
tabs
template
testing
workloads
Tactic: Optimize On-Device LLMs via Shorter Content
Tactic sort:
Awesome Tactic
Type: Software Practice
Category: green-ml-enabled-systems
Title
Optimize On-Device LLMs via Shorter Content
Description
Reduce the energy consumption of on-device LLM inference by constraining the length of generated outputs. Developers can achieve shorter responses through prompt design or by encouraging summarization instead of verbose outputs. This is particularly effective for mobile and embedded devices with limited processing capacity and energy budgets.
Participant
Developers
Related software artifact
AI-powered applications
Context
On-device LLM inference in mobile or embedded systems
Software feature
Content generation using LLMs
Tactic intent
To lower the energy cost of on-device LLM generation by reducing output length
Target quality attribute
Energy efficiency
Other related quality attributes
Performance, User experience
Measured impact
A 1,000-word response consumed about 9× the energy of a 100-word response on device
