Groq surpasses 1,200 tokens/sec with Llama 3 8B

43 points by YourCupOTea 2 years ago · 32 comments

Reader

LorenDB 2 years ago

Groq is an insane company. SambaNova (discussed yesterday[0]) is also very promising. However, what I really want to see is local AI accelerator chips a la Tenstorrent Grayskull that can boost local generation to hundreds of tokens per second while being more efficient than GPUs.

[0]: https://news.ycombinator.com/item?id=40508797

frozenport 2 years ago

Samba is on gen 4 silicon and still lagging, somebody over there is doing something wrong
- snhbsqub 2 years ago
  
  How are they lagging? They are running faster than anyone else at full precision and with many many fewer chips than Groq. Groq is not real.
  - frozenport 2 years ago
    
    Well I've been using the groq public api, and its approx. the rates claimed.
    Economics and costs are hard to predict. For example, Groq is not using HBM chips. So probably the cards are a lot easier to source.
    Its not clear what the capacity of these systems are in terms of total users, or even tokens per second. Then you factor in cost. Then you realize all vendors will match a competitors pricing. Then you realize Groq doesn't sell chips.
    ¯\_(ツ)_/¯
    The only thing you have is the public API to benchmark against: https://artificialanalysis.ai/
    
    snhbsqub 2 years ago
    
    - Groq has exactly 0 dollars in revenue - Groq requires 576 chips to run a single model - Groq can do low latency inference, but can't handle batches, and can't run a diversity of different models on each deployment - Groq quantizes the models, significantly affecting quality to get more speed (and don't communicate this to end users, which is very deceptive) - Groq can only run inference, cannot train on their systems
    - SambaNova has real revenue from big customers - SambaNova can run any model on a single node at the speed Groq requires - SambaNova can do low latency inference just like Groq, but can also run large batches and host hundreds of models on a single deployment - SambaNova does not quantize models unless explicitly stated - SambaNova can run training at perf competitive with Nvidia, as well as fastest inference in the world at full precision
    It really isn't a competition. Groq has done great as garnering hype in recent months, but it is a house of cards.
    
    frozenport 2 years ago
    
    I think semi analysis commented that they have pipelines instead of batches[1].
    So every clock cycle you're doing useful work rather than loading up people into batches. And thats why the arch will probably win for inference, for training you're basically competing with software eco system and silicon density. AKA NVIDIA can give TSMC more money to get more ALUs on the die.
    I think other places have attempted dataflow (FPGA etc) but they all basically had buffers (due to non-determinism in networks stack and even ram). SambaNova seems indistinguishable from an FPGA with a few clock cycles difference. I think they blew their shot with a Series D ($600 million???) where they made more of the same old. Maybe Intel will buy them to augment Altera? Looks like chasing parity with existing strategies.
    I buy the Groq hype because its something different, certainly the public demo helped. HN is about the future.
    [1] https://www.semianalysis.com/p/groq-inference-tokenomics-spe...

windowshopping 2 years ago

Is groq related to Twitter's grok or is that just a very unfortunate naming coincidence?

porphyra 2 years ago

Unrelated --- groq wrote an angry blog post complaining about Elon's xAI's grok: https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/
- meesles 2 years ago
  
  Not that angry! I appreciate the tone, though they deserve every right to protect their trademark.
  - flutas 2 years ago
    
    > though they deserve every right to protect their trademark.
    And they can, Twitter (why everything gets claimed as his personal work I never know) isn't using their trademark.
    As they (Groq) themselves have said...
    > the difference of one consonant (q, k) only matters to scrabblers and spell checkers
    Grok the term has been around since at least 1961.[0] The fact that a company decided to take a common term (especially in the CS field), change one letter and trademark it doesn't mean nobody can use the original spelling at all.
    Funnily enough, Groq is trying to claim grok and groq are not associated terms in court filings while trying to bully another company with the same name:
    > The word “Groq” essentially did not exist before Ross created it and has no known meaning in any language beyond its intended association with Groq, Inc.
    vs that companies reply
    > The word “grok” originated in Robert Heinlein’s 1961 novel Stranger in a Strange Land. Merriam Webster defines “grok” as “to understand profoundly and intuitively.” The Oxford English Dictionary defines “grok” as “[t]o understand intuitively or by empathy; to establish rapport with.”
    Once Groq realized their trademark didn't include healthcare data, they tried to trademark...the other companies name.
    [0]: https://en.wikipedia.org/wiki/Grok
    
    verdverm 2 years ago
    
    Trademarks are more nuanced than you are relaying here.
    Groq, in arguing that their mark is different from "grok" (at the USPTO) is because one cannot trademark common words. They are applying for plain marks (without font/color/logo) and this is very normal. I went through this with a proper name trademark
    In the Groq vs Grok, they are arguing that the average person will confuse the marks (as can be seen in many HN posts about Groq, like this one). Their argument is that Grok should not be given a trademark beforehand due to this potential confusion. They can also take the case to court should the trademark be granted. Given the common confusion, Groq appears to have good standing to make this argument.
    To call someone defending their own trademarks "bullying" is inaccurate
    
    flutas 2 years ago
    
    > Their argument is that Grok should not be given a trademark beforehand due to this potential confusion.
    Groq says no such thing. Their two public things so far include
    1) a company that rebranded to Groq Healthcare < 2 year after Groq launched (their trademark at the time had nothing to do with health, they then added it to their trdemark and tried to trademark the competitors name)
    2) a C&D to twitter over the name
spiderfarmer 2 years ago

I think groq has more users and a better business model.
- snhbsqub 2 years ago
  
  Groq makes zero revenue, needs hundreds on chips to run 1 model, and runs everything at lower precision. SambaNova has a lot of revenue, and runs at that speed at full precision on a single node. It really isn’t a competition.
verdverm 2 years ago

Different companies and efforts
Groq - mainly hardware, the LPU (https://wow.groq.com/lpu-inference-engine/)
Grok - Elon's de jour AI endeavor
Me1000 2 years ago

Completely unrelated.
lxgr 2 years ago

They seem to be unrelated, but sharing an etymology: https://arxiv.org/abs/2201.02177
- lxgr 2 years ago
  
  Classic HN – downvotes without explanation.
  I might well be wrong about the etymology here, but I understand "grokking" to be a term for a phenomenon in training neural networks.
  What I'm not sure about is which was there first – AI companies called some version of "grok" or that term.
  - flutas 2 years ago
    
    The term grok came from Robert Heinlein’s 1961 novel Stranger in a Strange Land and got picked up by the CS field heavily around the late 60s.
    https://en.wikipedia.org/wiki/Grok
    
    lxgr 2 years ago
    
    I do know that meaning of "grok", but I always assumed the more specific one in the context of neural networks was what informed these two naming choices, although I really don't know the exact timeline.
    Didn't know about Heinlein coining it though, that's cool!
    
    fieryscribe 2 years ago
    
    Unrelated, but you just reminded me of an old blog: Groklaw. I can't believe it's been over 10 years since it was active
  - verdverm 2 years ago
    
    From the HN commenting guidelines
    > Please don't comment about the voting on comments. It never does any good, and it makes boring reading.
    
    lxgr 2 years ago
    
    I'm well aware of the "complaints about downvotes beget downvotes" meme and was expecting it here, but sometimes I am genuinely curious about the nature of the disagreement. Here I really just wanted to learn what people think the actual etymology is. I get and appreciate "I don't find this contribution helpful", but I really dislike a "I think you're factually wrong but can't be bothered to correct you" downvote.
    As an aside, I wonder when "please don't make a quote from the HN commenting guidelines the only contribution of your comment" will join that list...
    
    verdverm 2 years ago
    
    > As an aside, I wonder when "please don't make a quote from the HN commenting guidelines the only contribution of your comment" will join that list...
    HN is largely community driven moderation, helping dang do his job, so I suspect this meta don't wouldn't make it
    I didn't comment on the substance because others already had by then, not sure why they didn't prefer your OG comment...

andy_xor_andrew 2 years ago

When reading Hacker News you develop a signal/noise filter, where lots of headlines make bold claims but you filter them out as embellishment or exaggeration.

My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.

But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.

saberience 2 years ago

The issue is that their chips need a huge amount of server blades and there's a big doubt whether this model actually scales. That is, how will Groq handle much larger models with a context of hundreds of thousands or millions of tokens? Right now this would require them to deploy a cluster with thousands of chips, versus 10 chips for say an NVidia system.
The other issue they don't mention is power, space, efficiency etc. We want to run larger models with less power, fewer server blades, at lower cost. Not use more server blades, more chips, more power, etc.
- verdverm 2 years ago
  
  Cerebrus faces similar challenges with their wafer scale chips.
  If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context
  - snhbsqub 2 years ago
    
    SambaNova is the only one if the chip startups that is viable. It surprises me that people don’t see this.
frozenport 2 years ago

8 year old unicorn++ with a public demo sounds credible?

behnamoh 2 years ago

They're not responsive to my questions on Twitter, so I'm asking here:

    When will Groq support a real API (not experimental beta preview)?

    When will Groq support logprobs?!

    When will Groq actually tell us what their rate limit is?!

Until these aren't answered, many of us can't actually build on Groq.

Edit: It seems I'm getting downvoted by Groq employees...

porphyra 2 years ago

Try asking in the groq discord [0]. Some groq employees are fairly responsive there.
For groqcloud the rate limits are fairly clear [1]. For example, for llama3-8b-8192 you get 30 requests per minute, 14400 per day, and 30000 tokens per minute. That said, it's the beta free tier so it sometimes goes down randomly and the limits may be different once they start charging for it.
I'm not affiliated with groq but I use groqcloud to make some simple chatbots since it's currently free.
[0] https://discord.com/invite/n8KtCjfAug
[1] https://console.groq.com/settings/limits

Settings

Groq surpasses 1,200 tokens/sec with Llama 3 8B

Keyboard Shortcuts