With Nvidia's GB10 Superchip, I'm Running Serious AI Models in My Living Room

15 points by the_arun 25 days ago · 20 comments

Reader

The initial benchmarks I saw didn’t really have the DGX spark (GB10) doing much better in generation throughput than the AMD Strix Halo. Prefill, the GB10 does pretty well, much better than the Strix.

Memory bandwidth is 273 Gb/s. Which is nowhere near a GPU’s. It’s a 4K machine. Personally, I’d rather have two GPUs and run a quantize model. I have two 32GB AMD r9700 cards, cost $2600. Quantized models get me 120K ish of context window and TPS is about 60% of what I see with the same model on my 4090 (which only has enough vram to load weights and about 6K context).

Sure I can’t run a 100B+ model but neither can a single GB10 unless no context window is what you are going for. So you buy a second 4K machine?

EnPissant 24 days ago

Strix Halo is pretty useless for inference because the prefill is too slow.
At least this thing is actually useful, and there are $3k variants available.
- Zetaphor 24 days ago
  
  I keep reading comments saying it's useless from people who clearly haven't actually used it.
  I'm building and using this machine daily for building and using applications with LLMs, TTS, STT, ASR, and image generation.
  - androiddrew 24 days ago
    
    Yeah, there is a lot of advantage to having this machine because the CUDA stack is still king. My Two AMD GPUs are suffering when it comes to working with ROCm stack. I have forks of Ollama and VLLM that took many weekends to figure out.
    
    Zetaphor 24 days ago
    
    If you're on Strix Halo, check out Donato's prebuilt toolboxes for ROCm with RADV or Vulkan:
    https://github.com/kyuz0/amd-strix-halo-toolboxes
    It takes all the work out of it, you just start llama-server in the container context and you're off doing inference without having to figure out dependencies.
    
    androiddrew 24 days ago
    
    Oh yeah he is doing great things. Not on Strix myself but his dual AMD AI Pro r9700 build ironically is the same machine I built.
  - EnPissant 24 days ago
    
    Which, GB10 or Strix Halo?
    
    androiddrew 24 days ago
    
    Pretty sure they mean the GB10
    
    Zetaphor 24 days ago
    
    Nope, I was referring to Strix Halo
    
    androiddrew 24 days ago
    
    Oh, well awesome. Glad to see you are getting so much out of the Strix line. I am eagerly awaiting the next gen. I think that will be a tipping point in AMD’s favor. I am a bit of an AMD nerd, even though they don’t seem to love their developers as much as Nvidia.
    Before anyone gives me grief my company has a strategic partnership with Nvidia, I do AMD under the cover of darkness. So I live in both worlds. I’m a bleeding heart for the under dog…if being a 360B market cap company makes you the “under dog”.
    
    Zetaphor 24 days ago
    
    Strix Halo
    
    EnPissant 23 days ago
    
    I don't believe you. It has very poor compute.
    
    Zetaphor 23 days ago
    
    Are you basing that on your informed first hand experience, or based on your assumptions backed by no actual experience using the hardware?
    I don't know what you want me to tell you, you're welcome to believe whatever you want but that doesn't change the reality I experience actually using the thing.
    Benchmark numbers and first hand reviews are readily available if you bothered to look.
    
    EnPissant 22 days ago
    
    I am basing it on benchmark numbers. It's compute is just too poor to be useful for LLMs or Image generation.
    For example: For LLMs, it's easy to do the math, and see how long you will be waiting for 50k input tokens.
- androiddrew 24 days ago
  
  Ah I see ASUS machine for 3K. Hopefully in a year or two we can get a better machine with twice the ram for same price. Then I’d probably buy one.

BoredPositron 24 days ago

Should have used one of it for the headline.

pyuser583 24 days ago

Is it 128 gb ram or vram?

wmf 24 days ago

It's unified memory so up to ~120 GB can be used as VRAM.

Settings

With Nvidia's GB10 Superchip, I'm Running Serious AI Models in My Living Room

Keyboard Shortcuts