All Tags AWS ai algorithm-design architecture browser cloud cloud-efficiency cloud-principles cost-reduction data-centric data-compression data-processing deployment design documentation edge-computing email-sharing energy-efficiency energy-footprint enterprise-optimization green-ai hardware libraries llm locality machine-learning maintainability management measured microservices migration mobile model-optimization model-training multi-objective network-traffic parameter-tuning performance queries rebuilding scaling services storage-optimization strategies tabs template testing workloads

Tactic: Offload LLM Content Generation to Remote Server

Tactic sort: Awesome Tactic

Type: Architectural Tactic

Category: green-ml-enabled-systems

Tags: cloud energy-efficiency llm

Title

Offload LLM Content Generation to Remote Server

Description

Shift the generation of LLM content from client devices to remote servers with high-performance hardware. This reduces the energy use on client devices, improves execution time, and ensures smoother user experience when privacy and offline access are not strict requirements.

Participant

Software architects and developers

Related software artifact

AI-powered applications

Context

Applications integrating LLMs on client devices

Software feature

Content generation using LLMs

Tactic intent

To reduce energy consumption on client devices by performing LLM inference on remote servers

Target quality attribute

Energy efficiency

Other related quality attributes

Performance, User experience, Privacy

Measured impact

Fetching from a remote server consumed 3.5×–8.9× less energy than on-device generation for 100, 500, and 1,000 word outputs

Source

Stoico, V., Malavolta, I. (2025). On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content (DOI: https://doi.org/10.1109/CAIN66642.2025.00016)