Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI
fortune.comI might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.
Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.
What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?
FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?
They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.
We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.
What a time to be alive. Chinese companies were copying everything from the west, now it seems the opposite.
(annoying voice) "Hold on to your papers...!"
there's entire books on this phenomenon, it isn't at all unusual. happened to japan, korea, etc. moving up the value chain!
Can anyone explain why Meta's share price was untouched by the deep seek announcement ? They have spent billions on AI infra?
According to this article they are rattled in some way...
OpenAI and others are valued for expected future revenue of running the models. And they were also valued as having magic "secret sauce" in their closed source models. Investors are now pulling back from this kind of company.
Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.
The revenue model for both Meta and Deepseek is to apply the model to their business, not just sell it as a chatbot or API. That's why they publish it, they benefit from the community improvements and ironing out bugs.
My guess: they're somewhat uniquely positioned for the data. With 'the feeds' they're closer to a source/can withstand more. They plan to monetize another way
I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"
Four war rooms to read a document; so Meta
This interpretation is heavily based on the journalists choice of words designed to create drama. If Meta can recreate this success in llama, they just cut their power bill by 80+%. That deserves jumping on something immediately and not waiting for next half’s planning cycle.
Spun differently - Meta just reacted to take advantage of a new opportunity in just a couple of weeks. Completely reshooting an entire years worth of work for dozens of engineers. That sounds… appropriate? For an announcement big enough to chop $600M off nvidia market cap.
Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.
They are still social-media company. And make most money from there. AI is like metaverse bets. And AI being cheaper to create might even be positive for them, if they can figure out a use case.
They make all their money on ads in FB and IG. It's how their stock barely budged despite losing $30b on a VR ghost town.
They are the users of AI, not sellers of AI. Better and cheaper AI would benefit them, no matter who trained it.
i think it's because openai makes a bunch of money off "AI stuff" by being regarded the best at this game... and guess what, there's a new player that makes "AI stuff" as good as them (or possibly better) and maybe even cheaper. this could be a threat to their source of revenue.
Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.
Expensive models are AI companies core business.
Meta can use cheap models to enhance core business.
Meta has been aware of DeepSeek for a long time (as Zuckerberg mentioned the company by name in his podcast with Joe Rogan) and a “war room” is just a meeting room.
My experience is that a "War Room" is just a meeting room, but one where 1. engineers are rounded up to work in (because as we all know, developers type code faster when co-located in a single room under pressure), and 2. where panicked executives occasionally wander in to say things like "How are things going?" and "What's the current status?" and "Do you have an ETA for when we can stop panicking?"
> Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI
"Gentlemen, you can't fight in the war room."