EnergyLens Predicts LLM Energy Use

Why this is here: EnergyLens achieved mean absolute percentage errors between 9.25% and 13.19% when predicting multi-GPU energy use during LLM prefill and decode.

Researchers at an unspecified location present EnergyLens, a framework predicting energy use during large language model (LLM) inference. The system aims to help practitioners optimize LLMs for sustainability and datacenter efficiency. It uses an interface based on einsum to model LLM specifications like fusion and parallelism.

EnergyLens incorporates load-imbalance-aware modeling and an energy model for multiple GPUs. Validation on Llama3 and Qwen3-MoE shows mean absolute percentage errors of 9.25% to 13.19% for prefill and decode energy on multiple GPUs. The tool also assessed SM allocations with 12.97% error.

The exploration revealed substantial energy variation—up to 1.47x in prefill and 52.9x in decode—across configurations. The researchers note that optimizing compute-communication overlap proves difficult without data-driven insight. Further work will explore distributed serving strategies and refine the energy models.