N0X — The Full AI Stack in One Browser Tab

2 min read Original article ↗

Zero backend · Zero API keys · 100% private

LLM inference, autonomous agents, document RAG, code execution, image generation — running entirely on your GPU. No server. No account. Your data never leaves your machine.

STREAMING AGENT THOUGHTS

Autonomous ReAct Agent

A full reasoning loop running entirely in your browser. The LLM thinks, picks tools, executes them, reads results, and iterates — with every thought streaming live token-by-token. Watch the model reason in real time. No server. No API. Pure WebGPU autonomy.

Live thought streaming Multi-tool orchestration Per-step trace UI Loop detection + OOM protection

WebGPU Inference

Direct-to-metal execution via MLC WebLLM. 40 open-source models from 360MB to 70B — downloaded once, cached in your browser forever.

Llama 3.3 70BDeepSeek R1 70BQwen 2.5 32BMistral 7BQwen 0.5B+35 more

Zero Tracking

No server processes your data. Prompts, documents, and memory live in IndexedDB on your device. Disable optional search/image hooks for a fully air-gapped runtime.

Document RAG

Drop PDFs, DOCX, CSVs, or text files. Sentence-boundary chunking with 50% overlap, MiniLM embeddings, and MMR reranking for diverse, accurate retrieval — all in a Web Worker.

PDFDOCXTXTMDCSVJSON

Conversation Branching

NEW

Hover any message and click the branch icon to fork the conversation from that exact point. Explore alternative directions without losing your original thread. Branches are saved automatically.

More Capabilities

Python SandboxPyodide WASM runtime

Deep SearchDDG + Tavily synthesis

Image GenFlux / Stable Horde

Voice I/OSTT + TTS native

Persistent MemoryIndexedDB long-term

5 PersonasEngineer · Writer · Tutor…