Settings

Theme

Ask HN: Is it feasible to run a model on device for complete privacy?

3 points by mazinz 10 days ago · 7 comments · 1 min read


Tried Gemma, Qwen and a few others. Need vision and larger context windows for an application I am working on. Results were quite poor Gemma 4E2B probably the best of the ones I tired but still fell apart and keep hallucinating with ~5000 tokens. Cloud based models had no problems even even Gemini 3.1 Flash-Lite and GPT-5.4 mini do a lot better and a way faster.

mc7alazoun 10 days ago

Feasible but too expensive! I get that privacy is a priority for you but unfortunately if you want quality models you'd still have to maybe use frontier closed models..

  • mazinzOP 10 days ago

    No open source model that’s any good?

    • vitalyan1234 10 days ago

      the Gemma you tried is tiny, there are 31B and 26B (A4B) variants. there's also Qwen 3.6 with 27B and 35B (A3B) variants, reportedly pretty good. try them on open router or something. these require 30-40 Gb of memory to run between RAM and VRAM, less if quantized beyond near-lossless 8 bit.

      there are near-SOTA open models, but they are 1T+ parameters, i.e. they require over a terabyte of memory to run.

benoau 10 days ago

It's technically feasible, really just a question of whether this is worth $10,000(s) to you and you're willing to spend it.

  • mazinzOP 10 days ago

    Why financially crippling? It’s free to run on device. The native Apple Intelligence works well for smaller context windows and text only.

    • benoau 10 days ago

      You can get poor results "for free" from your laptop, but the devices you need for the large models are very expensive.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection