Settings

Theme

Ask HN: Best LLM model for a RAG-based Android app across all smartphones?

1 points by swaminarayan 14 days ago · 1 comment · 1 min read


I am developing RAG based android app using llama.cpp. For offline processing I am using Qwen 1.5 2.5B model using Q4 quantization, I am offloading computation to GPU if present, However for low end android phones my android app is either crashing due to OOM or If not crashing and no GPU available in that case it takes lots of time to generate text. I also tried with SmolLM 135M model for low end devices but it struggle to follow instruction very well.

In this case I am considering openAI API for low end android phones. And for vector storage I am using in house developed https://github.com/hash-anu/snkv,

I am not sure how other people are running LLM model on low end android devices, I would appreciate any insights or best practices.

swaminarayanOP 14 days ago

Found my answer: https://github.com/google-ai-edge/gallery?tab=readme-ov-file

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection