Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

27 points by zathan a year ago · 20 comments

Reader

ahzhou a year ago

I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.

Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.

pptr a year ago

What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?
FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?
- ahzhou a year ago
  
  They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.
  We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
  There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
  [1] https://arxiv.org/abs/2401.06066
  - karmakaze a year ago
    
    It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.

marjann a year ago

What a time to be alive. Chinese companies were copying everything from the west, now it seems the opposite.

tetris11 a year ago

(annoying voice) "Hold on to your papers...!"
floydnoel a year ago

there's entire books on this phenomenon, it isn't at all unusual. happened to japan, korea, etc. moving up the value chain!

bamboozled a year ago

Can anyone explain why Meta's share price was untouched by the deep seek announcement ? They have spent billions on AI infra?

According to this article they are rattled in some way...

alecco a year ago

OpenAI and others are valued for expected future revenue of running the models. And they were also valued as having magic "secret sauce" in their closed source models. Investors are now pulling back from this kind of company.
Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.
The revenue model for both Meta and Deepseek is to apply the model to their business, not just sell it as a chatbot or API. That's why they publish it, they benefit from the community improvements and ironing out bugs.
bravetraveler a year ago

My guess: they're somewhat uniquely positioned for the data. With 'the feeds' they're closer to a source/can withstand more. They plan to monetize another way
I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"
Four war rooms to read a document; so Meta
edmundsauto a year ago

This interpretation is heavily based on the journalists choice of words designed to create drama. If Meta can recreate this success in llama, they just cut their power bill by 80+%. That deserves jumping on something immediately and not waiting for next half’s planning cycle.
Spun differently - Meta just reacted to take advantage of a new opportunity in just a couple of weeks. Completely reshooting an entire years worth of work for dozens of engineers. That sounds… appropriate? For an announcement big enough to chop $600M off nvidia market cap.
Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.
Ekaros a year ago

They are still social-media company. And make most money from there. AI is like metaverse bets. And AI being cheaper to create might even be positive for them, if they can figure out a use case.
rchaud a year ago

They make all their money on ads in FB and IG. It's how their stock barely budged despite losing $30b on a VR ghost town.
YetAnotherNick a year ago

They are the users of AI, not sellers of AI. Better and cheaper AI would benefit them, no matter who trained it.
znpy a year ago

i think it's because openai makes a bunch of money off "AI stuff" by being regarded the best at this game... and guess what, there's a new player that makes "AI stuff" as good as them (or possibly better) and maybe even cheaper. this could be a threat to their source of revenue.
Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.
maxglute a year ago

Expensive models are AI companies core business.
Meta can use cheap models to enhance core business.

OfCounsel a year ago

Meta has been aware of DeepSeek for a long time (as Zuckerberg mentioned the company by name in his podcast with Joe Rogan) and a “war room” is just a meeting room.

ryandrake a year ago

My experience is that a "War Room" is just a meeting room, but one where 1. engineers are rounded up to work in (because as we all know, developers type code faster when co-located in a single room under pressure), and 2. where panicked executives occasionally wander in to say things like "How are things going?" and "What's the current status?" and "Do you have an ETA for when we can stop panicking?"

hulitu a year ago

> Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

"Gentlemen, you can't fight in the war room."

Settings

Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

Keyboard Shortcuts