The path to open-sourcing the DeepSeek inference engine

550 points by Palmik 7 months ago

In March, vLLM picked up some of the improvements in the DeepSeek paper. Through these, vLLM v0.7.3's DeepSeek performance jumped to about 3x+ of what it was before [1].

What's exciting is that there's still so much room for improvement. We benchmark around 5K total tokens/s with the sharegpt dataset and 12K total token/s with random 2000/100, using vLLM and under high concurrency.

DeepSeek-V3/R1 Inference System Overview [2] quotes "Each H800 node delivers an average throughput of 73.7k tokens/s input (including cache hits) during prefilling or 14.8k tokens/s output during decoding."

Yes, DeepSeek deploys a different inference architecture. But this goes onto show just how much room there is for improvement. Looking forward to more open source!

[1] https://developers.redhat.com/articles/2025/03/19/how-we-opt...

[2] https://github.com/deepseek-ai/open-infra-index/blob/main/20...

vintagedave - 7 months ago

I really empathised with this part:

> Codebase Divergence: Our engine is based on an early fork of vLLM from over a year ago. Although structurally similar, we’ve heavily customized it for DeepSeek models, making it difficult to extend for broader use cases.

I've been there. Probably a few of us have.

Their approach of working on splitting out maintainable sublibraries and sharing info directly even if not integrated seems a really nice way of working with the community -- ie, they have obstacles, but they're not letting the obstacles cause them to take the easy route of not contributing at all. And while it might seem better to someone wanting to use their techniques to share only working code, not info on the techniques, at least it's still knowledge sharing. And again I think it'd be easier for them not to do it. So kudos to them.

bonoboTP - 7 months ago

Non-runnable code can be really useful. I often wish it was available for some papers even if I never run it just to check what they actually did, because text and equations are often not specific enough.
rvnx - 7 months ago

They customized and optimized vLLM for their use case, so much that it became a different product (e.g. Debian vs Ubuntu).
The fact they share back some of their improvements is great.

avodonosov - 7 months ago

What motivates the commercial AI companies to share their research results and know-how?

Why did Google published the Transformer architecture instead of keeping it to themselves?

I understand that people may want to do good things for humanity, facilitate progress, etc. But if an action goes against commercial interest, how can the company management take it and not get objections from shareholders?

Or there is a commercial logic that motivates sharing of information and intellectual property? What logic is that?

nodja - 7 months ago

My understanding is that frontier researchers will work for companies that will let them publish papers and discuss them with their peers.
When you're an engineer at the tier of these AI researchers, winning an extra 100k/year on top of you current 500k (numbers out of my ass) is not worth it vs getting name recognition. Being known as one of the authors that made the transformer for example will enable you work with other bright minded individuals and create even better things.
So essentially these commercial companies have "we'll let you publish papers when you work for us" as a perk.
- htrp - 7 months ago
  
  > When you're an engineer at the tier of these AI researchers, winning an extra 100k/year on top of you current 500k (numbers out of my ass) is not worth it vs getting name recognition. Being known as one of the authors that made the transformer for example will enable you work with other bright minded individuals and create even better things.
  Also, instead of an extra 100k a year, you get to raise a billion dollars in VC funds for your next company
  - make3 - 7 months ago
    
    & then sell it back to Google, in this case https://www.axios.com/2024/08/05/google-characterai-venture-...
- - 7 months ago
  
  [deleted]
timClicks - 7 months ago

There are a few commercially valid strategies.
1. Goodwill and mindshare. If you're known as "the best" or "the most innovative", then you'll attract customers.
2. Talent acquisition. Smart people like working with smart people.
3. Becoming the standard. If your technology becomes widely adopted, and you've been using it the longest, then you're suddenly be the best placed in your industry to make use of the technology while everyone retools.
4. Deception. Sometimes you publish work that's "old" internally but is still state of the art. This provides your competition with a false sense of where your research actually is.
5. Freeride on others' work. Maybe experimenting with extending an idea is too expensive/risky to fund internally? Perhaps a wave of startups will try. Acquire one of them that actually makes it work.
6. Undercut the market leader. If your industry has a clear market leader, the others can use open source to cooperate to erode that leadership position.
anon373839 - 7 months ago

> Or there is a commercial logic that motivates sharing of information and intellectual property? What logic is that?
There absolutely is a sound commercial justification to share research: long-term growth through advancement of the field. (Deep learning would never have made the progress it has without open research!)
If this seems quaint, it’s because we’re too accustomed to short-term, transactional, Wall Street thinking.
- ENGNR - 7 months ago
  
  Plus, it's probably going to leak anyway. If it's really just an idea, and you need to hire humans to work on it who will move in and out or your org.
  Might as well get some dubious medium term gain rather than spend a bunch of money on security for nothing.
- esperent - 7 months ago
  
  It's not so much that it seems quaint, it's that we are accustomed to short-term, transactional, Wall Street thinking from companies like Google.
  For very good reason, because that's exactly how they behave in all other areas. The question remains, why do they appear altruistic when it comes to sharing papers?
  I find it hard to believe that it's actual altruism. It's far more likely that it's transactional behavior that just appears altruistic from the outside.
  - robertlagrant - 7 months ago
    
    > it's that we are accustomed to short-term, transactional, Wall Street thinking from companies like Google.
    Out of all of the companies in the world, I wouldn't put Google near the bottom of the list in terms of stuff they've discovered and released to the world.
  - kmacdough - 7 months ago
    
    It's such a rapidly developing field with much of the progress happening in small labs on the open source models. Eventually, the field will coverage and stabilize. For now, the bet is too be open and supportive, to be close to the progress and be in best position when the dust settles.
  - anon373839 - 7 months ago
    
    It isn’t altruism! It’s good business: it pursues economic gain through mutual benefit.
  - avodonosov - 7 months ago
    
    People may be altruistic, but in a company setting they may have no possibility for altruism. CEO decisions influence property of others (the shareholders) so he can not freely pursue altruistic goals.
    I heard that Dodge v. Ford Motor Co. was an important precedent in the US. https://en.m.wikipedia.org/wiki/Dodge_v._Ford_Motor_Co.
    
    owisd - 7 months ago
    
    These days the courts give wide latitude to companies to offer virtually any plausible reason why superficially altruistic acts are in fact good long term for shareholder value. Anyone wanting to do what Ford did just needs to keep their mouth shut about the real reasons.
    
    avodonosov - 7 months ago
    
    Interesting. Do you know any particular court cases?
    My wikipedia link above in turn links to https://en.m.wikipedia.org/wiki/Shareholder_primacy, which says in the last paragraph: "The doctrine waned in later years."
    This probably confirms what you say, but I'd be interested to learn about specific cases.
  - andai - 7 months ago
    
    Everyone benefits from the gains, everyone gets more customers and more investment.
Der_Einzige - 7 months ago

The ACL, NeurIPS, ICLR and the rest of AI professional organizations are why this happens. Forced open sourcing of everything. No pay to access. It’s the ideal open academic environment for rapid innovation. We must jealously defend our current system, as it will soon come under attack by those who get angry about democratization of the means of computation.
Also, lots of copyright abolitionists in AI. Many people who work in the space delight in the idea of making information, especially their own, free.
The ghost of Aaron Swartz runs through every researcher in this space.
larodi - 7 months ago

Indeed, is there a chance Google did not evaluate properly what the transformer will eventually be used for/become. It was created for translation as an improvement on seq2seq, right? Which was for translation, not for thinking, and to a certain extent... still is about translation, and are not other emergent capabilities actually a side-effect, only observed later when parameter size grew?
bcoughlan - 7 months ago

I would guess it comes down to that the best researchers in the world want their work out in the open
- behnamoh - 7 months ago
  
  Ilya doesn't. He's a strong proponent of closed source and censorship.
  - esperent - 7 months ago
    
    He is just one person. He happens to be the most famous scientist working on this field at the moment it became a gold rush, but it's work built on the shoulders of those who came before, whose discoveries are just as important.
  - Zambyte - 7 months ago
    
    Let's hope Totally Safe Intelligence doesn't end up having the same relationship with their name as Totally Open AI.
  - - 7 months ago
    
    [deleted]
lofaszvanitt - 7 months ago

The more people copy your outdated thing, the better for you, because they always gonna lag behind you.
Kholin - 7 months ago

This may be related to Google's business model. Google's main businesses - search engine and advertising - both rely on an open web ecosystem. Therefore, Google has long maintained a friendly attitude toward open source and the open web, such as with Chromium, Noto fonts, Go, Flutter, and others. By providing infrastructure tools that benefit the open web, Google extends the reach of its searchable content and advertising. When the entire Web ecosystem benefits, Google ultimately benefits as well. This model also aligns with the philosophy of the open source community, where everyone is a beneficiary and naturally becomes a contributor.
0x008 - 7 months ago

All of the major labs have one thing in common: they have nearly unlimited data and money, but what they don’t have unlimited is talent and ideas. It’s just a way of progressing without having to „hire every idea“.
choonway - 7 months ago

If you don't allow them to publish research work, your greatest talents will leave.
I used to work in such a restrictive environment. Nobody worth their salt stayed long.
bobxmax - 7 months ago

It's worth noting that, while a noteworthy paper, nobody really expected the Transformer at the time to be the breakthrough it eventually became.
- nialv7 - 7 months ago
  
  and back then in 2017, AI hasn't really been productized yet, people behind Transformer were researchers, publishing their results were the norm.
- avodonosov - 7 months ago
  
  Transformer is just an example. We observe a constant stream of information shared by companies, even now, when "AI" is booming.
xwolfi - 7 months ago

Well Deepseek's survival also depends on the giant amount of hype they can generate, and they won't get more investor money just by having done a one-hit wonder. Becoming deeply integrated in the AI ecosystem with various tools and innovative discoveries will most like be more beneficial than protecting the secrets of their first success.
- WiSaGaN - 7 months ago
  
  Deepseek doesn't need hype to survive. They are bankrolled by their now billionaire founder.
runeks - 7 months ago

> Why did Google published the Transformer architecture instead of keeping it to themselves?
Because they make their money from advertisements. Not their AI models. Same for Meta.
Compare that to e.g. OpenAI who's trying to make money from their AI models, and are thus underbid by Google and Meta.
HH_GU - 7 months ago

Just as the company's name DEEPSEEK, it's commercial company and invest their based on AI, but the company's founder has more targets which are more common for human. Money is number for them, they want to do more, especially for DEEPSEEK.
victorbjorklund - 7 months ago

If google never published it (and we pretend like it would not have leaked) then we would never have the LLM:s we have today (including Googles). Everyone would loose.
buyucu - 7 months ago

Deepseek is not a commercial AI company. They are the hobby of a hedge fund, something they do on the side for fun and glory.
varelse - 7 months ago

[dead]
poin3tkn2 - 7 months ago

[dead]

londons_explore - 7 months ago

"We have something that would be of interest to the open source community, but it needs a lot of tidying to even run outside our company, and we don't have the manpower to properly maintain it when released".

Plenty of companies are in this position.

Please just open source anyway with a note saying "we won't be maintaining this, but feel free to fork!"

lolinder - 7 months ago

Unfortunately that's not really feasible in the current state of open source. There are enormous numbers of entitled users out there who become a parasitic drain on any project that is open sourced. Solo maintainers can theoretically just develop a think skin, but companies can actually find that the damage to their public image from not having their FOSS project in tip top shape is greater than the benefits of open sourcing it in the first place.
- golergka - 7 months ago
  
  You can always just disable issues and all other feedback channels. I judge companies on the state of their open source libraries, but it really depends on how the company positions it. Facebook pushed for react to become the default framework, so they deserve the scrutiny when it doesn’t hold up. However, with a clear disclaimer like that? I think that people would have different expectations.
  - lolinder - 7 months ago
    
    You might, but consumers of FOSS code are extremely unreasonable.
- boredatoms - 7 months ago
  
  Just put a tarball url on a plain website then. No community interaction required to share code
- 3abiton - 7 months ago
  
  > There are enormous numbers of entitled users out there who become a parasitic drain on any project that is open sourced.
  Been there with AOSP, but that won't be changing anytime soon. I highly doubt noobs will learn the open source etiquette unfortunately.

oldgun - 7 months ago

Nice. We've seen some good engineering work from DeepSeek. Keep it coming.

jimmydoe - 7 months ago

yes, before usa figures out a way to tariff open source.
- fragmede - 7 months ago
  
  https://www.instagram.com/reel/DIVBmgUvFsN/

holoduke - 7 months ago

I wonder if the large quantity release of opensource AI tools, models etc is a deliberate strategy of China to counter the US dominance. A good thing for the market imho

jeffrallen - 7 months ago

What if it turns out Deep seek is actually the first GenAI, and this is the way forward is has chosen: open source itself?

Kind of like how biological information is always trying to find new places to reproduce itself. Viruses and fungi do not come with Toss and EULAs. :)

animal531 - 7 months ago

I spent the last two or so months using it as an assistant for code and my conclusion is that it is terrible compared to even the free model of ChatGPT.

The incidence of bugs, it not understanding what you're asking or just generating code that is straight up wrong is much worse. Even with guidance it will often be unable to fix issues, leaving you to do all the manual legwork to get things working. Usually you're better off just having done everything yourself from the start.

During those two months they really improved GPT as well, its generation speed is now much much faster, and the quality of its output has become a lot better.

CrimpCity - 7 months ago

That’s interesting since this has been my exact opposite experience.
What type of coding are you doing? Did you locally roll your own coding assistant with a local model of DeepSeek or are you prompting via the web?
- tvshtr - 7 months ago
  
  Same for me, the reasoning model is really useful.

gizmodo59 - 7 months ago

As much as I want to geek out and run the things locally, if I have the money I just want to use a SaaS. I want to spend time creating new applications and not toy around with setup, infrastructure etc. I’d gladly pay ChatGPT even more if they keep up with features and they seem to have done that quite well since deepseek (new models often, best image gen hands down, very fast inference compared to 6 months back and even small things like memory).

I sometimes feel guilty though. With all this power, I’m just bounded by lack of ideas and execution.

buyucu - 7 months ago

Deepseek is everything OpenAI claims to be.

wseqyrku - 7 months ago

DeepSeek is the only company that dares to say 'towards AGI' next to OpenAI.

efeamzaov - 7 months ago

[dead]

rfoo - 7 months ago

tl;dr "we had our vLLM fork and it's unmaintainable now; guess we are going to rebuild it, in the public this time"

lukeschlather - 7 months ago

I get the impression their setup is very hard to maintain but it's worth every penny. They've done optimizations that wring incredible performance out of the hardware they have, but they also have specific machine configurations and I wouldn't be surprised if they have complicated hacks that get 100% speedups for some stuff but those speedups disappear if you have a slightly different motherboard configuration. Also there's suggestion they've made firmware hacks which are worth it at their scale, but might be very dangerous and difficult to apply especially on a small scale. (And some of their hacks might involve both firmware and cluster-level optimizations, which would be useless or counterproductive independently.)
And even if you have somewhat similar hardware, the code might not be that helpful, you might be better off with a sketch of the solution and implementing it yourself. If you've got a large enough cluster it's going to pay for itself anyway.
Havoc - 7 months ago

Unmaintainable seems unduly harsh. There is a big gap between maintainable internally and ready for public consumption
- rfoo - 7 months ago
  
  > Codebase Divergence: Our engine is based on an early fork of vLLM from over a year ago
  If you are in the same boat you'll see how much changed in vLLM compared to one year ago. Also, this meant that they haven't rebased for over a year, I don't believe that's because they don't want, it's because they effectively can't.
  Yeah, surely they can maintain it as-is. But it will be increasingly hard to port over anything community has.
maknee - 7 months ago

They're going to spend time and effort into making their optimizations public. Would you rather have them keep their changes internal?

nashashmi - 7 months ago

I feel like this is one way to implement censorship.

sampton - 7 months ago

There's an ongoing debate whether LLM should be considered intelligent when it's just generating tokens from latent space. Meanwhile there are humans that are only capable of spitting out the same 5 tokens yet still considered to be "intelligent".
- xwolfi - 7 months ago
  
  hehehe