Settings

Theme

How the new Raspberry Pi AI Hat supercharges LLMs at the edge

blog.novusteck.com

9 points by novusteck 2 years ago · 11 comments

Reader

Chordless 2 years ago

This article is AI generated, and they didn't even fact check it. An AI module like this can help a lot with processing for certain types of neural networks, but LLMs are not one of them.

LLM inference is basically bottlenecked by RAM bandwidth and how much RAM you have. Every token to be generated needs to iterate over the whole model, pulling it piece by piece from the RAM to the CPU, where some relatively small calculations are applied.

Having a separate NPU like this connected via PCIE makes LLMs much slower, since you're bottlenecked by a PCIE 3.0 x1 connection instead of your full memory bandwidth.

visarga 2 years ago

I was expecting to see how they deploy, maximum model size and tokens/s.

  • telgareith 2 years ago

    Answer: smol, and a fraction.

    • bhouston 2 years ago

      The 1B to 3B models should be runnable on Raspberry PI 5 with 8GB. That seems reasonable. Probably equivalent to the new Apple Intelligence that runs locally on device in terms of performance if you can use the same model.

bhouston 2 years ago

I think the new AI hats are awesome and the TOPS of 37M is amazing.

But this article is poor. Especially later part of the article that lists the benefits of the AI accelerator reads like it was written by ChatGPT because it has a formal tone, it is wordy and repeats basics facts already covered in the article.

Eduard 2 years ago

"cloud" has always been vague. But "edge" has become so wishy-washy, at best meaning "not cloud, but sort of", I consider its journalistic use as incompetence.

  • exe34 2 years ago

    "cloud" - somebody else's computer.

    "edge" - it's like embedded, but with 5 layers of abstraction and abysmal performance.

    hth.

Springtime 2 years ago

Not to be disparaging but this reads itself like it was written/padded out by an LLM.

  • bhouston 2 years ago

    My thoughts exactly. I am glade I am not the only one who recognizes ChatGPT voice in the article.

throwaway81523 2 years ago

What a crap article. The only actual info in it is that there are two models of the accelerator, and their prices, and that one is 2x faster. It gives the speed in TOPS but that is useless since it doesn't say what the operations are.

Others mention the article itself looks AI generated. I didn't spot that, but it would explain some things.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection