Press enter or click to view image in full size
Spoiler: Your V100s won’t cut it, no matter how hard you try
Last week, I spent three days banging my head against what seemed like a straightforward task: getting Google’s shiny new Gemma 3 27B model running on our AWS P3dn.24xlarge instances. With 256GB of total VRAM across eight V100s, I figured we had plenty of headroom.
I was wrong. Dead wrong.
If you’re sitting on a fleet of V100s and eyeing Gemma 3’s impressive benchmarks, let me save you some time and frustration. Here’s what I learned the hard way.
My Setup: The P3dn.24xlarge Beast
On paper, these instances look pretty beefy:
- 8 Tesla V100s with 32GB each (256GB total VRAM)
- 96 vCPUs backed by 768GB of system RAM
- NVLink interconnects and 100 Gbps networking
- A painful $31/hour that comes straight out of my pocket
Back in 2017, this was cutting-edge stuff. In 2024? Well, that’s where things get interesting.
The Math That Got My Hopes Up
I started by doing what any engineer would do — napkin math to see if this would even fit: