Ask HN: Local model experiences with 'high-reasoning distill' finetunes
What are your experiences with all the different variations of finetunes on small models (<40B) with those popular datasets? My personal experience is mostly with the 'Opus-Reasoning' ones on qwen models, and aside from the output being subjectively better looking (ascii charts and all), in actual coding performance every one I've tried tends to become a lot more overconfident, writing more messy and buggy code and tries to gaslight me that the task I give it is impossible when it cannot achieve it.
I have seen them perform better on public benchmarks in some cases, which shouldn't be ignored completely, but that doesn't seem to translate to better output on real work in my limited testing.
What are your observations? Any specific ones that you lean towards, or have had good experiences with?
No comments yet.