Ask HN: Good minimal hardware to run LLMs and general NNs
Premise: I have been mostly out of access to the news in the past six months.
What is the current state of portable, minimal hardware (not assembled desktop boxes with extra cards) to run LLMs and contemporary NNs (Text-to-Image, Speech-to-Text etc.) in general at decent speed and with an eye to power efficiency?
I have read about about boards with shared memory, SoC with NPUs, and about mini-pc from leading brands (that did not run standard Linux though). I do not know if any good solutions are on the market, or if we are still waiting for a leap from future contenders.
Edit: not just "power efficiency", but the overall cost, the overall "dollar-per-token" value would be interesting.
I am of course more interested in the experience of the community than in rumors. What I have found is: For power efficiency, move memory as little as possible. So, for flexible power efficient hardware, use UMA. But UMA (Mac M, Strix Halo/Ryzen AI) seems not comparatively cheap at this stage. Or, for less flexibility use a GPU with enough VRAM. For smaller solutions in the ARM realm: general SoC are neither fast nor efficient, whereas edge mobile devices are promising but not cheap. It becomes a matter of numeric comparison really - there are compromises and no clear winners.