All Tags AWS ai algorithm-design architecture browser cloud cloud-efficiency cloud-principles cost-reduction data-centric data-compression data-processing deployment design documentation edge-computing email-sharing energy-efficiency energy-footprint enterprise-optimization green-ai hardware libraries llm locality machine-learning maintainability management measured microservices migration mobile model-optimization model-training multi-objective network-traffic parameter-tuning performance queries rebuilding scaling services storage-optimization strategies tabs template testing workloads

Tactic: Optimize On-Device LLMs via Shorter Content

Tactic sort: Awesome Tactic

Type: Software Practice

Category: green-ml-enabled-systems

Tags: energy-efficiency llm mobile

Title

Optimize On-Device LLMs via Shorter Content

Description

Reduce the energy consumption of on-device LLM inference by constraining the length of generated outputs. Developers can achieve shorter responses through prompt design or by encouraging summarization instead of verbose outputs. This is particularly effective for mobile and embedded devices with limited processing capacity and energy budgets.

Participant

Developers

Related software artifact

AI-powered applications

Context

On-device LLM inference in mobile or embedded systems

Software feature

Content generation using LLMs

Tactic intent

To lower the energy cost of on-device LLM generation by reducing output length

Target quality attribute

Energy efficiency

Other related quality attributes

Performance, User experience

Measured impact

A 1,000-word response consumed about 9× the energy of a 100-word response on device

Source

Stoico, V., Malavolta, I. (2025). On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content (DOI: https://doi.org/10.1109/CAIN66642.2025.00016)