Why I Gave Up Trying to Run Gemma 3 27B on AWS P3dn.24xlarge (And You Should Too)

Press enter or click to view image in full size

Spoiler: Your V100s won’t cut it, no matter how hard you try

Last week, I spent three days banging my head against what seemed like a straightforward task: getting Google’s shiny new Gemma 3 27B model running on our AWS P3dn.24xlarge instances. With 256GB of total VRAM across eight V100s, I figured we had plenty of headroom.

I was wrong. Dead wrong.

If you’re sitting on a fleet of V100s and eyeing Gemma 3’s impressive benchmarks, let me save you some time and frustration. Here’s what I learned the hard way.

My Setup: The P3dn.24xlarge Beast

On paper, these instances look pretty beefy:

8 Tesla V100s with 32GB each (256GB total VRAM)
96 vCPUs backed by 768GB of system RAM
NVLink interconnects and 100 Gbps networking
A painful $31/hour that comes straight out of my pocket

Back in 2017, this was cutting-edge stuff. In 2024? Well, that’s where things get interesting.

The Math That Got My Hopes Up

I started by doing what any engineer would do — napkin math to see if this would even fit: