PETaflop cluster

3 min read Original article ↗
Posted on November 10, 2025  •  3 minutes  • 530 words

Instead of going to therapy I built another Kubernetes cluster.

I got a NVIDIA DGX Spark and wanted to see what it’s capable of. Lots of people have done benchmarks and comparisons, but I needed to see what it practically felt like to build something.

Because it’s a local, powerful AI computer I had to think of something AI related, and I wanted some way to show it’s local. Of course this could all be faked, but it’s a lot more fun to work with constraints.

If you like this cluster you may also like my Cubernetes cluster built in an Apple G4 Cube.

Hardware list

Software list

Architucture

It’s a pretty basic software stack.

A diagram with block components described in the below paragraph

The ngrok Kubernetes operator runs inside the cluster and provides an ingress to the workload. The IOTA runs the Kubernetes control plane, ngrok operator, and application frontend. The Spark only runs ComfyUI to process the img2img jobs.

I could have run everything on the Spark, but I kept needing to reinstall the Spark for a variety of reasons. I found it easier just to keep the frontend and Kubernetes on a dedicated system and route to the ComfyUI API on a separate machine.

The frontend app is 100% AI written. I knew roughly what I wanted with an img2img workflow, but I didn’t know how to implement it locally.

I spent the majority of my time learning ComfyUI, finding random models on Huggingface, finding broken links, and watching YouTube tutorials. The AI ecosystem is a mess.

I got comfortable enough to understand what I needed and then used Claude to help me figure out how to implement it.

Using the application

I built the backpack so I could use it at Kubecon in Atlanta. The idea was to just wear it all week and start conversations.

a screenshot of a webpage with a webcam with me with my thumb up

I recharged it at the Sidero booth when the battery died, and wore it when it had power. The Spark draws about 50w at idle so I estimated I would get about 3 hours of battery life. In reality, I got more than 3 hours of usage and recharging from 30% took almost 2 hours. During the recharge the Spark had to be turned off.

People could scan the QR code, take a picture, and get a stylized image back. Because the backpack was on my back I didn’t require them to talk to me, but of course there were lots of questions.

The only major problem I ran into was keeping the backpack connected to the internet. At first I was teathering from my phone hotspot, but it was slow. I switched to using the conference wifi but it was spotty and frequently disconnected when walking around. I switched to using USB teathering on my phone and that was much more stable.

The build was a pretty simple 2 node Kubernetes cluster. The hardest part was finding a backup big enough to show it off.