Is anyone compressing AI models for the 4B people without GPUs or internet?

3 points by yashpxl 4 months ago · 1 comment · 2 min read

Hey HN,

I'm a 20yo solo builder from India,

I got frustrated that every capable AI model assumes you have a GPU, a credit card, or reliable internet. None of those are true for most of the world — including me.

So I started digging into the compression literature and ways through which i can solve this problem

What I found: - DeepSeek distilled 671B reasoning into 1.5B that runs on a laptop - TRM (Samsung, 2025) beat DeepSeek R1 on ARC-AGI with 7M parameters by iterating instead of scaling - RWKV runs in constant memory with no quadratic attention cost - GRPO lets you specialize a tiny model on a narrow domain in hours on CPU

The techniques exist. What doesn't exist: a systematic effort to apply all of them together, specifically for low-resource languages and low-end hardware, and give the results away free.

I'm building this. Calling it KIRO.

The goal is simple: take every major open source frontier model, compress it into domain-specific versions under 500MB, and deploy them offline on the cheapest Android hardware available.

Starting with math/physics education because that's the problem I know personally. Expanding to healthcare triage, legal aid, and agricultural advisory.

Currently running my first experiment on my i3 — R1-1.5B vs Qwen-7B on Hindi math problems. Will post results when training finishes.

Two honest questions for HN:

1. Is anyone else working on this specific intersection — compression + low-resource languages + offline deployment?

2. What would make this genuinely useful vs just technically interesting to you?

Everything will be open source.

mdritch 4 months ago

I like the idea. About question 2, I think you need some way to publicly benchmark your stripped-down models' performance. Your models probably won't be able to perform on the standard benchmarks, but the big models will probably be able to work on your custom eval sets, such as those Hindi math problems.

I would publish: 1) your domain specific eval set 2) your model's results on that eval set 3) biglab's model's results on that eval set

That would give users a way to determine if your model is actually capable in that reduced domain

Settings

Is anyone compressing AI models for the 4B people without GPUs or internet?

Keyboard Shortcuts