Picture a professional hair dryer. Turned on. Twenty-four hours a day. For three months.
That is one NVIDIA H100 GPU training a frontier model. 700 watts steady, pinned at TDP, never throttling.
Now multiply by 25,000. Then add 30% for CPUs, networking and cooling.
Welcome to the basement of AI, where the metric that matters isn't parameter count. It's the megawatt.
The number that should scare you
According to convergent estimates from Goldman Sachs and Epoch AI, training GPT-4 burned roughly 50 GWh over 90-100 days of continuous training. Fifty gigawatt-hours.
Translated: the annual electricity consumption of about 16,000 Italian households (assuming 3 MWh/household/year, per ARERA data).
One single training run. Once. To produce one weights file.
And it isn't even the record. Projections for next-generation models point to 100-500 GWh per run.
Why a single GPU eats this much
An H100 SXM has a TDP of 700 W. The PCIe variant tops out at 350 W. The previous generation A100 sat at 400 W.
Then Blackwell arrived.
The B200, launched between late 2024 and early 2025, climbs to 1,000 W TDP. A GB200 NVL72 rack — 72 Blackwell GPUs packed into a single cabinet — pulls roughly 120 kW.
A traditional enterprise server rack consumes 8-10 kW. We're talking about twelve times the thermal density.
At these levels air cooling is dead. You need direct-to-chip liquid or full immersion. You need reinforced floors to take the weight. You need dedicated medium-voltage feeders straight off the transformer, not regular PDUs.
The empirical rule that governed enterprise data centers for twenty years — 5-10 kW per rack — has been pulverized. New builds are designed starting at 50-150 kW per rack. Everything changes: piping, generators, UPS, fire suppression.
Clusters: where the math turns brutal
A single frontier training job today runs on 10,000 to 100,000 GPUs in parallel, all synchronized via InfiniBand or NVLink interconnects.
Math for a 25,000-H100 cluster:
- 25,000 × 700 W = 17.5 MW for GPUs alone
- +30-40% for host CPUs, network switches, storage, cooling: ~25 MW total
Twenty-five megawatts. The average power draw of a small Italian town of 20,000 residents.
Push to 100,000 GPUs and you hit 100 MW. That's the instantaneous demand of a city the size of Lecce.
The race to the gigawatt
In July 2024, xAI brought up Colossus in Memphis: 100,000 H100s operational in 122 days, an industrial record. Elon Musk's stated target is one million GPUs by the end of 2025.
Microsoft and OpenAI announced Stargate, a cluster targeting 5 GW by 2030. Five gigawatts equals five mid-sized nuclear reactors. In one data center.
Meta, Google, Amazon and Oracle are all doubling AI infrastructure CAPEX. Goldman Sachs (April 2024) estimates US data center power demand will grow 160% by 2030, going from 4% to 8% of national consumption.
For perspective: 8% of US electricity demand is more than all residential California and all industrial Florida combined today. Data centers alone. In the US alone.
What models actually cost to train
To give a sense of scale, here are the public (or methodologically-transparent) numbers for the best-known model training runs:
| Model | Year | Parameters | Training (MWh) | Equivalent |
|---|---|---|---|---|
| BERT base | 2018 | 110M | 1.5 | 50 households/yr |
| GPT-3 | 2020 | 175B | 1,287 | 430 households/yr |
| Llama 3 70B | 2024 | 70B | 2,700 | 900 households/yr |
| Llama 3.1 405B | 2024 | 405B | 16,000 | 5,300 households/yr |
| GPT-4 (estimate) | 2023 | ~1.7T MoE | ~50,000 | 16,000 households/yr |
| Frontier 2026+ | 2026+ | ? | 100,000-500,000 | one mid-sized city |
Sources: Patterson et al. 2021 (GPT-3), Meta Llama 3 paper (Llama), Goldman Sachs / Epoch AI (GPT-4). GPT-4 numbers are NOT official — OpenAI doesn't publish them — but they're consistent with known cluster size, duration and energy efficiency.
In four years, from 2020 to 2024, the energy cost of a single training run has grown by a factor of 40. The curve is exponential. It's not slowing down.
Inference: the problem nobody looks at
This is the part that changes everything.
Training GPT-4 cost 50 GWh once. Then the model gets served. For years. To hundreds of millions of users.
Goldman Sachs (2024) estimates a single ChatGPT query consumes about 2.9 Wh. A traditional Google search consumes 0.3 Wh. Ten times more.
Back-of-envelope at ChatGPT scale:
- ~200M active users
- ~50 queries/day per active user
- 2.9 Wh per query
Result: ~29 GWh per day. More than 10 TWh/year just to serve ChatGPT. Almost what the city of Bologna consumes in a year.
| Operation | Energy | Equivalent |
|---|---|---|
| Google search | 0.3 Wh | 10W LED on for 2 min |
| ChatGPT query (text) | 2.9 Wh | Hair dryer 1 min |
| GPT-4 query with image | ~10 Wh | Hair dryer 3 min |
| SDXL image generation | ~30 Wh | Microwave 90 sec |
| Sora-class video generation | 1-3 kWh | One dryer cycle |
Inference is "always-on." On a 3-5 year horizon, total energy spent on inference will far exceed what was spent on training. It's the background current nobody turns off.
The IEA projection that triggered the alarm
The IEA's Electricity 2024 report is the reference document for anyone who wants credible numbers.
Global data centers consumed 460 TWh in 2022. The IEA projection for 2026 is over 1,000 TWh — a doubling in four years. Of that total, 80-130 TWh will be directly attributable to AI.
For context: Italy's entire annual electricity consumption is about 300 TWh. We're saying global AI, by itself, will consume roughly one-third of Italy.
The real bottleneck isn't silicon
This is the part few people get.
NVIDIA can produce more chips. TSMC can ramp 3 nm capacity. But you cannot synthesize a gigawatt.
A 1 GW cluster requires:
- 2-3 years to obtain grid interconnection (in the US, depending on ERCOT or PJM queues)
- Environmental, construction and grid permits
- A dedicated high-voltage transmission line
- Custom transformers (themselves in global shortage: 18-24 month lead times)
The chip arrives in three months. The power arrives in three years.
That's why hyperscalers are doing things that would have been unthinkable until yesterday:
- Microsoft signed a 20-year PPA to reopen Three Mile Island (Pennsylvania — yes, the same site of the 1979 accident)
- Amazon bought a data center campus powered directly off the Susquehanna nuclear plant
- Stargate will reportedly land in Texas, with on-site natural gas generation to bypass ERCOT interconnection queues
- Google and Oracle are funding Small Modular Reactor (SMR) startups for 2030
They're not buying electricity. They're buying power plants.
What's happening in Italy (and in Puglia)
Italy isn't playing this game at US scale. But it's playing.
The Mezzogiorno AI data center plan leans on renewables in the South — Puglia produces a surplus of solar and wind that today gets partially curtailed because of grid constraints. That same energy, consumed locally by data center loads, becomes a competitive advantage.
The logic is simple: data is light, energy is heavy. Moving tokens is cheaper than moving megawatts.
For latency-sensitive inference workloads (customer-service chatbots, on-premise AI agents, industrial automation systems) having compute within 50 ms network distance of the Italian end user is no longer a nice-to-have. It's a technical requirement. And geographically, the South is where renewable energy is abundant and latency to the average European user stays acceptable.
The uncomfortable truth
AI doesn't have a chip problem. It has an energy problem.
And energy, unlike models, has one unit of measure: the watt. You can't compress it, quantize it, or distill it. It's there or it isn't.
Whoever controls power over the next five years will control AI. Not researchers, not Cuda kernel engineers. Whoever knows how to switch on a gigawatt in the right place at the right time.
That's why, reading the next installments of this series, you'll find a word increasingly often that was unspeakable in tech circles in 2020: nuclear.