Artificial Intelligence is constantly evolving. New LLMs are being released daily, but only a minority are making breakthroughs. DeepSeek is one of them.
DeepSeek represents a notable step forward in this field, offering a series of models designed to enhance reasoning capabilities in large language models (LLMs).
[You can read the full article here if you do not have a Medium subscription]
The foundation of this work is DeepSeek-R1-Zero, the initial model in the series. It was trained using large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT). This direct application of RL enabled the model to develop reasoning behaviors such as self-verification, reflection, and chain-of-thought (CoT) problem-solving. However, limitations such as repetitive outputs and occasional language inconsistencies highlighted areas for improvement.
The next iteration, DeepSeek-R1, addresses these challenges by incorporating a cold-start data phase before RL training. This adjustment improved performance across tasks like math, coding, and reasoning, bringing the model closer to the benchmarks set by OpenAI-o1.
The first feedbacks that I can share on my side, is that the model does well on specific tasks (like Math), but behaves poorly on more classic tasks.
Importantly, the DeepSeek series has been open-sourced, along with several distilled versions, making these advancements more accessible to the…