AMD MI300X vs. Nvidia H100 LLM Benchmarks
blog.runpod.ioFascinating, despite the significantly better specs (and VRAM) on the AMD MI300x, the Nvidia H100 seems to match performance at lower batch sizes, and only loses out slightly at larger batches, I'm guessing the differentiator is mostly VRAM (192 GB in MI300 vs 80 GB in the Nvidia chip.)
Does anyone know if this is just due to ROCm vs CUDA implementations? Or something else?
I expect that the AMD also looses out when multigpu starts to be required for it (which is arguably going to be for much larger models than for the h100, but a 70B parameter model with bf16 training is going to hit multigpu in terms of memory requirements) as their interconnect is just way slower.
Yes but as far as i understand it, the interconnect is not really important for model inference. But for model training more so.
Depends if you can fit the whole model into vram or not. If you can’t then you need some sort of gpu parallelism, and you need some sort of communication between the different gpus. But maybe that messaging is small enough that it doesn’t majorly slow down inference. I’m not sure.