The metric the industry uses (WUE) counts a liter of water in dried-out Phoenix the same as a liter in Finland, which never made sense to me. My index, PSWI, fixes that by multiplying peak water use by how stressed the local water supply actually is, using data from WRI Aqueduct. I ran it on 53 large facilities from 8 companies and found a 4,100x gap between the best and worst operators. I also found that moving just 10% of the workload to less stressed regions would cut the whole fleet's water stress by 33%. The paper ends with a proposal for SEC-style rules that would make companies report this every quarter.
Build and test a new metric, the Peak-Stress Water Index (PSWI), that measures the water impact of AI data centers while actually accounting for how scarce water is where each one sits, then run it across the biggest operators to see who carries the real environmental risk.
- —Read through the existing water metrics (WUE, WPUe, EWIF) and figured out why they all miss the same thing: where the water actually comes from
- —Collected and double-checked the public water use data for 53 large data center and colocation facilities across 8 big tech companies
- —Wrote the PSWI calculation pipeline in Python with Pandas and NumPy
- —Found the exact location of all 53 facilities and pulled the WRI Aqueduct 4.0 water stress score for each one
- —Ran a 1,000-iteration Monte Carlo sensitivity analysis (shifting the coefficients by up to 50% in either direction) to check that the rankings held up
- —Wrote the full paper, including a policy proposal for SEC-style rules that would require companies to report PSWI
- —Put the whole dataset and all of my code on GitHub so anyone can check it
- —How to combine location-based scarcity data with each facility's own numbers to build a metric that shows what a single number like WUE hides
- —How to use Monte Carlo simulation to measure uncertainty in a ranking. Getting a Spearman correlation of 0.982 told me the ranking wasn't going to flip around just because my inputs were a little off
- —How big the gap is between what companies put in their sustainability reports and what they quietly leave out
- —How to build a policy argument inside a research paper so it works for both science reviewers and the people who actually write regulations
- —How to stress-test my own conclusions against my own assumptions, and how to be honest about it when I present the results
AI data centers use tens of billions of gallons of water a year just for cooling, and that number keeps going up as AI gets bigger. The problem is that the industry reports it with WUE, a metric that counts a liter of water in Phoenix the same as a liter in Finland. Because WUE ignores location, a company can show an 'improving' efficiency number while it keeps building in places that are already running out of water. I kept seeing this contradiction in the sustainability reports of a few major operators, so I started this project to actually measure it.
I collected the public water data for 53 large facilities and found the exact location of each one so I could pull its WRI Aqueduct 4.0 water stress score. PSWI just multiplies a facility's peak yearly water use by that local stress score. To make sure the ranking wasn't fragile, I ran 1,000 Monte Carlo iterations that shifted both the water numbers and the stress scores by up to 50% in either direction. The Spearman correlation came out to 0.982, which means the ranking barely moves even when the inputs are off. A facility that ranks badly doesn't get to climb out of it just because its reported numbers aren't exact.
Across all 53 facilities, there is a 4,100x gap between the best operator (Google's site in Hamina, Finland) and the worst. When I looked at the whole fleet together, moving just 10% of the workload from high-stress regions to low-stress ones cut the fleet's total water stress by 33%, and it didn't cost any computing power to do it. The paper proposes SEC-style rules that would make every company running large data centers report its PSWI every quarter, so this data would just be available instead of something a student has to piece together from scattered reports.