Show HN: Distilled 0.6B text-to-SQL model

5 points by maciejgryka 16 days ago · 0 comments · 1 min read

Reader

We used our platform to fine-tune a tiny text-to-SQL model using distillation from DeepSeek V3. Repo has instructions for how to replicate this.

This is definitely not the best-performing model like this out there! But I found it surprising we were able to get to this much out of it: stone's throw away from a teacher 1000x the size!

We also ran the same thing using the 4B Qwen and matched the teacher accuracy, though here the difference is merely 100x :)

I find this pretty cool - obviously our distilled models can only do this one task and don't generalize, but that's often exactly what you want when you're building agentic systems.

Happy to answer any questions!

No comments yet.

Settings

Show HN: Distilled 0.6B text-to-SQL model

Keyboard Shortcuts