Settings

Theme

Ask HN: Why is LLM training still GPU-hungry despite DeepSeek?

3 points by takinola 5 months ago · 2 comments · 1 min read

Reader

When DeepSeek released R-1 everyone thought that signaled the end of the GPU-intensive LLM training approach. It does not appear to have worked out that way as GPU demand continues to grow unabated. What happened? Is the DeepSeek training method unreproducible or impractical in some way?

PaulHoule 5 months ago

See https://en.wikipedia.org/wiki/Jevons_paradox

cratermoon 5 months ago

The DeepSeek method requires spending money on very good programmers and giving them the tools and time to build out optimizations. The hype-driving LLM cycles and companies with multi-billion dollar valuations prioritize time-to-market and throw money at more and bigger GPUs to solve performance bottlenecks.

It's "impractical" if the goal is to make as much money as possible before the bubble pops.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection