Settings

Theme

SOTA Model in 8B Size?

huggingface.co

2 points by ConteMascetti71 10 months ago · 2 comments

Reader

ConteMascetti71OP 10 months ago

I think it's not possible to have the same knowledge capabilities of greater models...but.... reasoning?

ConteMascetti71OP 10 months ago

..we distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection