How the new Raspberry Pi AI Hat supercharges LLMs at the edge
blog.novusteck.comThis article is AI generated, and they didn't even fact check it. An AI module like this can help a lot with processing for certain types of neural networks, but LLMs are not one of them.
LLM inference is basically bottlenecked by RAM bandwidth and how much RAM you have. Every token to be generated needs to iterate over the whole model, pulling it piece by piece from the RAM to the CPU, where some relatively small calculations are applied.
Having a separate NPU like this connected via PCIE makes LLMs much slower, since you're bottlenecked by a PCIE 3.0 x1 connection instead of your full memory bandwidth.
I was expecting to see how they deploy, maximum model size and tokens/s.
Answer: smol, and a fraction.
The 1B to 3B models should be runnable on Raspberry PI 5 with 8GB. That seems reasonable. Probably equivalent to the new Apple Intelligence that runs locally on device in terms of performance if you can use the same model.
You're joking right? a rpi5 isn't in the same league as Apple SoCs.
I think the new AI hats are awesome and the TOPS of 37M is amazing.
But this article is poor. Especially later part of the article that lists the benefits of the AI accelerator reads like it was written by ChatGPT because it has a formal tone, it is wordy and repeats basics facts already covered in the article.
"cloud" has always been vague. But "edge" has become so wishy-washy, at best meaning "not cloud, but sort of", I consider its journalistic use as incompetence.
"cloud" - somebody else's computer.
"edge" - it's like embedded, but with 5 layers of abstraction and abysmal performance.
hth.
Not to be disparaging but this reads itself like it was written/padded out by an LLM.
My thoughts exactly. I am glade I am not the only one who recognizes ChatGPT voice in the article.
What a crap article. The only actual info in it is that there are two models of the accelerator, and their prices, and that one is 2x faster. It gives the speed in TOPS but that is useless since it doesn't say what the operations are.
Others mention the article itself looks AI generated. I didn't spot that, but it would explain some things.