How the new Raspberry Pi AI Hat supercharges LLMs at the edge

blog.novusteck.com

9 points by novusteck 2 years ago · 11 comments

Reader

This article is AI generated, and they didn't even fact check it. An AI module like this can help a lot with processing for certain types of neural networks, but LLMs are not one of them.

LLM inference is basically bottlenecked by RAM bandwidth and how much RAM you have. Every token to be generated needs to iterate over the whole model, pulling it piece by piece from the RAM to the CPU, where some relatively small calculations are applied.

Having a separate NPU like this connected via PCIE makes LLMs much slower, since you're bottlenecked by a PCIE 3.0 x1 connection instead of your full memory bandwidth.

visarga 2 years ago

I was expecting to see how they deploy, maximum model size and tokens/s.

telgareith 2 years ago

Answer: smol, and a fraction.
- bhouston 2 years ago
  
  The 1B to 3B models should be runnable on Raspberry PI 5 with 8GB. That seems reasonable. Probably equivalent to the new Apple Intelligence that runs locally on device in terms of performance if you can use the same model.
  - telgareith a year ago
    
    You're joking right? a rpi5 isn't in the same league as Apple SoCs.

bhouston 2 years ago

I think the new AI hats are awesome and the TOPS of 37M is amazing.

But this article is poor. Especially later part of the article that lists the benefits of the AI accelerator reads like it was written by ChatGPT because it has a formal tone, it is wordy and repeats basics facts already covered in the article.

Eduard 2 years ago

"cloud" has always been vague. But "edge" has become so wishy-washy, at best meaning "not cloud, but sort of", I consider its journalistic use as incompetence.

exe34 2 years ago

"cloud" - somebody else's computer.
"edge" - it's like embedded, but with 5 layers of abstraction and abysmal performance.
hth.

Springtime 2 years ago

Not to be disparaging but this reads itself like it was written/padded out by an LLM.

bhouston 2 years ago

My thoughts exactly. I am glade I am not the only one who recognizes ChatGPT voice in the article.

throwaway81523 2 years ago

What a crap article. The only actual info in it is that there are two models of the accelerator, and their prices, and that one is 2x faster. It gives the speed in TOPS but that is useless since it doesn't say what the operations are.

Others mention the article itself looks AI generated. I didn't spot that, but it would explain some things.

Settings

How the new Raspberry Pi AI Hat supercharges LLMs at the edge

Keyboard Shortcuts