EnergyLens Predicts LLM Energy Use
Why this is here: EnergyLens achieved mean absolute percentage errors between 9.25% and 13.19% when predicting multi-GPU energy use during LLM prefill and decode.
Researchers at an unspecified location present EnergyLens, a framework predicting energy use during large language model (LLM) inference. The system aims to help practitioners optimize LLMs for sustainability and datacenter efficiency. It uses an interface based on einsum to model LLM specifications like fusion and parallelism.
EnergyLens incorporates load-imbalance-aware modeling and an energy model for multiple GPUs. Validation on Llama3 and Qwen3-MoE shows mean absolute percentage errors of 9.25% to 13.19% for prefill and decode energy on multiple GPUs. The tool also assessed SM allocations with 12.97% error.
The exploration revealed substantial energy variation—up to 1.47x in prefill and 52.9x in decode—across configurations. The researchers note that optimizing compute-communication overlap proves difficult without data-driven insight. Further work will explore distributed serving strategies and refine the energy models.
Surfaced by the Solutions lens — one of the vital signs ovr.news reads.
How we evaluated this
AI summary
read the original for the full story — Read on arxiv.org . How we work →