LLM Token Production Energy Cost

Calculation Methodology

Energy Calculation: The calculator uses the active parameter count when estimating compute. Each token triggers roughly $2 \times N_{active}$ floating-point operations, so total energy is $E = \frac{2 \times N_{active} \times T}{\eta}$ , where $\eta$ is hardware efficiency in FLOPs/Joule.

• Overall parameters represent the full model size (all experts for MoE).
• Active parameters are the subset actually multiplied for a single token. Dense models haveactive = overall; MoE models often have active ≪ overall.

This distinction means MoE models show lower compute-energy and cost here than equally sized dense models. We do not currently account for memory bandwidth or expert-routing overhead, so real-world MoE energy can be somewhat higher.

Carbon Emissions: $E_{\text{kWh}} \times I_{\text{grid}}$ , where $I_{\text{grid}}$ is the region-specific carbon intensity (kg CO₂/kWh). Selecting cleaner grids (lower $I_{grid}$ ) therefore reduces emissions even when energy use is unchanged.

Hardware Assumptions: Based on NVIDIA H100 specifications (~ $6.59 \times 10^{11}$ FLOPs/Joule, conservative estimate). Precision improvements (FP16/FP8) increase efficiency by 2×/4× respectively.

Variable definitions: $N_{active}$ – active parameters (billions); $T$ – token count; $E_{\text{kWh}}$ – total energy in kilowatt-hours; $I_{\text{grid}}$ – regional carbon intensity (kg CO₂/kWh); $\eta$ – hardware efficiency (FLOPs/J).

Sources: Özcan et al. (2023), "Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations"