Settings

Theme

Show HN: WebGPU LLM inference comprehensive benchmark

arxiv.org

2 points by yu3zhou4 a month ago · 2 comments

Reader

emanuele-em a month ago

The finding that naive single-op benchmarks overestimate dispatch cost by ~20x is wild. Curious how much the torch-webgpu backend could close the gap with CUDA if you went aggressive on kernel fusion, 53% improvement on Vulkan already is significant. Any plans to try wgsl-level custom kernels?

  • yu3zhou4OP a month ago

    Honestly there is a lot for room of improvement in torch-webgpu for performance. Needs involvement of community but the opportunities are definitely there

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection