ovr.news

Solutions that work, including long-horizon plans with outcomes

EnergyLens Predicts LLM Energy Use

arxiv.org · 15 May 2026
Read on arxiv.org

Why this is here: EnergyLens achieved mean absolute percentage errors between 9.25% and 13.19% when predicting multi-GPU energy use during LLM prefill and decode.

Researchers at an unspecified location present EnergyLens, a framework predicting energy use during large language model (LLM) inference. The system aims to help practitioners optimize LLMs for sustainability and datacenter efficiency. It uses an interface based on einsum to model LLM specifications like fusion and parallelism.

EnergyLens incorporates load-imbalance-aware modeling and an energy model for multiple GPUs. Validation on Llama3 and Qwen3-MoE shows mean absolute percentage errors of 9.25% to 13.19% for prefill and decode energy on multiple GPUs. The tool also assessed SM allocations with 12.97% error.

The exploration revealed substantial energy variation—up to 1.47x in prefill and 52.9x in decode—across configurations. The researchers note that optimizing compute-communication overlap proves difficult without data-driven insight. Further work will explore distributed serving strategies and refine the energy models.

Surfaced by the Solutions lens — one of the vital signs ovr.news reads.

How we evaluated this
AI summary

read the original for the full story — Read on arxiv.org . How we work →

Why are you reporting this article?

Why are you reporting this article?