Settings

Theme

Llama 405B 506 tokens/second on an H200

developer.nvidia.com

21 points by moondistance a year ago · 5 comments

Reader

EgoIncarnate a year ago

not "an H200", "In the table above, tensor parallelism is compared to pipeline parallelism with each across eight GPUs"

  • FanaHOVA a year ago

    Title on HN is wrong. The article says GPUs and it's referring to one of their 8xH200 boxes.

7e a year ago

And this is why nobody submits MLPerf against NVIDIA.

moondistanceOP a year ago

Significant further optimizations. FP8!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection