🧭 Guided Mode
Answer simple questions about your machine — OS, RAM level, and use case. We handle the rest.
LocalClaw recommends the best open-source LLM for LM Studio based on your RAM, GPU and use case. 100% private — everything runs in your browser. No data collected. Ever.
Answer simple questions about your machine — OS, RAM level, and use case. We handle the rest.
Know your specs? Select RAM, GPU and priorities directly for instant AI model recommendations.
Paste your system diagnostics output. We auto-detect OS, RAM and GPU to find your perfect model.
🆕 GPT-OSS — 20B (OpenAI) New!
DeepSeek V3.2 — 671B MoE
DeepSeek V3.2 — 671B MoE
Qwen 3 — 4B, 8B, 14B, 32B
Llama 3.3 — 3B, 8B, 70B
Gemma 3 — 1B, 4B, 12B, 27B
DeepSeek R1 — 7B, 14B, 32B, 70B
Phi-4 — 3.8B Mini, 14B
GLM 4.7 — 9B Flash, 26B
Trinity Large — 70B MoE
Kimi K2.5 — 1T MoE
Mistral — 7B, 24B
MiniMax M2.1 — 45B MoE
LLaVA / Gemma Vision — 7B, 27B
Qwen 2.5 Coder — 7B, 32B
Text-to-Speech models that run 100% offline on your hardware. Perfect for voice assistants, audiobooks, accessibility, and creative projects.
Qwen3 TTS New!
30+ languages, streaming
MeloTTS
Voice cloning, Chinese/EN
Piper
Raspberry Pi optimized
Coqui XTTS
6s voice cloning
+ 10 more…
Bark, MMS, Fish Speech
⚡ Real-time 🎭 Voice Cloning 🌍 50+ Languages 💻 CPU/GPU/Edge
LM Studio is a free desktop application that lets you run Large Language Models (LLMs) locally on your computer. No internet needed, no data sent anywhere. It provides a chat interface similar to ChatGPT but everything runs on YOUR hardware.
Quantization is a compression technique that reduces model size while preserving most of the quality. Think of it like JPEG compression for images. Q4 = more compressed (smaller, slightly lower quality), Q8 = less compressed (larger, nearly original quality). Q5_K_M is the sweet spot for most users.
Rule of thumb: the model file size + ~2-3 GB for the system. A 5 GB model needs at least 8 GB RAM. On macOS with Apple Silicon, the unified memory makes things more efficient. On Windows/Linux with a GPU, VRAM helps offload the model.
Apple Silicon (M1-M4) uses unified memory, meaning your entire RAM is available for the model. This is incredibly efficient. NVIDIA GPUs are faster for inference but limited by VRAM (typically 8-24 GB). Both are great choices.
Yes! LocalClaw runs entirely in your browser — zero data is collected or sent anywhere. When using LM Studio with recommended models, everything runs locally on your machine. No cloud, no tracking, no API calls.
For 8 GB RAM: Qwen 3 8B and Llama 3.3 8B offer the best quality. For 16 GB: Qwen 3 14B is king. For 32 GB+: Qwen 3 32B and DeepSeek R1 32B rival GPT-4. For coding: Qwen 2.5 Coder 7B. For vision: Gemma 3 12B. For reasoning: DeepSeek R1 series.
OpenClaw is a free, open-source, self-hosted AI assistant gateway. It connects your chat surfaces (desktop app, CLI, web UI) and tools to local or remote model backends like LM Studio, Ollama, or any OpenAI-compatible server. It manages conversations, routes prompts, and extends functionality through a skills/plugin system — all 100% offline, with zero telemetry. Get the Full Install Pack to auto-install OpenClaw alongside your AI model.