Run Chrome's local Gemini Nano through a Go bridge.
Chrome ships a built-in LLM (Gemini Nano, about 3B parameters, GPU-accelerated). It is gated behind an Origin Trial and a window.LanguageModel JS API exposed to pages. fenster starts headless Chrome Canary, drives it over CDP, sets userGesture:true for the model-download gate, and exposes the result as a UNIX tool and an OpenAI-compatible HTTP server on localhost:11434. Inference stays on-device and does not require an API key.
| Mode | Command | What you get |
|---|---|---|
| UNIX tool | fenster "prompt" / echo "text" | fenster |
Pipe-friendly answers, file attachments, JSON output, exit codes |
| OpenAI-compatible server | fenster --serve |
Drop-in local http://localhost:11434/v1 backend for OpenAI SDKs |
fenster --chat is an interactive REPL for testing prompts.
Cross-platform sister of apfel (macOS-only, Apple Intelligence). Same wire format. Clients written for one work against the other.
Real run transcript: EXAMPLE.md contains live Gemini Nano output generated through fenster's OpenAI-compatible HTTP server by scripts/generate-examples.sh.
How it works
┌─────────────────────────────────────────────────────────┐
│ fenster (Go binary, single static executable) │
│ ┌──────────────────┐ ┌───────────────────────────┐ │
HTTP client ──> │ │ HTTP/SSE :11434 │ │ Chrome supervisor (CDP) │ │
(curl, IDE, │ │ stdlib net/http │<──>│ chromedp + flock lockfile │ │
openai SDK) │ │ /v1/* + /health │ │ ~/.fenster/run/chrome.json│ │
│ └──────────────────┘ └────────────┬──────────────┘ │
CLI (UNIX) ───> │ ┌──────────────────┐ │ │
│ │ oneshot / chat │ │ spawn (one shared)
│ │ stdin / -f files │ │ across N processes
│ └──────────────────┘ v │
└─────────────────────────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────┐
│ Headless Chrome Canary 149+ (--headless=new) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Profile: ~/.fenster/profile-canary │ │
│ │ Local State pre-bootstrapped with │ │
│ │ enabled_labs_experiments to flip on Built-in │ │
│ │ AI APIs without --enable-features churn │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Page served from http://127.0.0.1:11434/ │ │
│ │ (about:blank does NOT expose LanguageModel, │ │
│ │ must be a real http origin - Chrome's rule) │ │
│ └────────────────────────┬────────────────────────┘ │
│ v │
│ ┌─────────────────────────────────────────────────┐ │
│ │ window.LanguageModel (Chrome's Prompt API) │ │
│ │ Runtime.evaluate { userGesture: true } over │ │
│ │ CDP synthesizes a user click, so │ │
│ │ LanguageModel.create() is allowed to download │ │
│ │ the model and run prompts │ │
│ └────────────────────────┬────────────────────────┘ │
│ v │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Gemini Nano (~3B params, ~2.4 GB on disk) │ │
│ │ GPU inference (Metal / DirectML / Vulkan) │ │
│ │ 16 GB RAM CPU fallback if no GPU │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Also shipped (alternative bridge, MV3-blessed path):
extension/ ── Chrome MV3 service worker, nativeMessaging permission
internal/nm ── 4-byte LE length prefix + UTF-8 JSON Native Messaging host
Currently the CDP path is the default; the extension is wired and ready
for cases where pure CDP is not enough (locked-down enterprise builds, etc.).
The Chrome that fenster spawns is invisible. AppKit reports zero windows. No Dock icon. FENSTER_CHROME_HEADED=1 surfaces it for debugging. Many fenster --serve instances on the same machine attach to one shared Chrome via a flock lockfile - no dialog floods, no twenty Chrome icons.
Tech stack: Go 1.22+ stdlib first (net/http 1.22 patterns, log/slog, embed.FS, context.Context everywhere), chromedp for CDP, cobra for CLI, golang.org/x/term for TTY detection. No third-party HTTP router. No mocks of Chrome. Single static binary.
Requirements & Install
Chrome Canary 149+ (Stable does not expose LanguageModel even with --enable-features=PromptAPIForGeminiNano - empirically tested), GPU with >4 GB VRAM (or 16 GB RAM CPU fallback), 22 GB free disk. Building from source needs Go 1.22+.
go install github.com/Arthur-Ficial/fenster/cmd/fenster@latest brew install --cask google-chrome@canary # the Built-in AI Origin Trial requires Canary today fenster doctor # verify your environment, tells you exactly what is missing
fenster doctor reports what is missing and what to do about it. It checks Chrome channel, GPU, disk, profile bootstrap state, and whether the model is downloaded.
Quick Start
UNIX tool
Quote prompts with ! in single quotes (zsh/bash history expansion): fenster 'Hello, Chrome!'.
# Single prompt fenster "What is the capital of Austria?" # Stream output fenster --stream "Write a haiku about code" # Pipe input echo "Summarize: $(cat README.md)" | fenster # Attach file content to prompt fenster -f README.md "Summarize this project" # Attach multiple files fenster -f old.go -f new.go "What changed between these two files?" # Combine files with piped input git diff HEAD~1 | fenster -f CONVENTIONS.md "Review this diff against our conventions" # JSON output for scripting fenster -o json "Translate to German: hello" | jq .content # System prompt fenster -s "You are a pirate" "What is recursion?" # Quiet mode for shell scripts result=$(fenster -q "Capital of France? One word.")
OpenAI-compatible server
fenster --serve # foreground; spawns headless Chrome FENSTER_TOKEN=$(uuidgen) fenster --serve # bearer-protected fenster --serve --token-auto # auto-generate token, print to stderr
curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"gemini-nano","messages":[{"role":"user","content":"Hello"}]}'
from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused") resp = client.chat.completions.create( model="gemini-nano", messages=[{"role": "user", "content": "What is 1+1?"}], ) print(resp.choices[0].message.content)
fenster --serve shares one Chrome instance across every server you start. The first one launches Chrome (about 5 minutes on first ever boot to download the 2.4 GB Gemini Nano model). Subsequent starts attach to the same browser via the lockfile.
Chat REPL
fenster --chat is a small REPL for testing prompts or MCP servers.
fenster --chat fenster --chat -s "You are a helpful coding assistant" fenster --chat --mcp ./mcp/calculator/server.py # chat with MCP tools fenster --chat --debug # debug output to stderr
Ctrl-C exits. Type quit or hit Ctrl-D to exit cleanly.
Architecture (longer version)
The design priority order is UNIX tool first, HTTP server second, chat third. Everything else hangs off these.
1. UNIX tool path (fenster "prompt", pipes, -f, -o json)
stdin / argv / -f files
│
v
cmd/fenster/main.go (cobra)
│
v
internal/oneshot ── builds a single ChatCompletionRequest
│
v
internal/backend.Backend ── interface: EchoBackend (no Chrome) | ChromeCDPBackend (real)
│
v
stdout (text) / stdout (JSON envelope) / stderr (errors)
exit code: 0 success, 1 generic, 2 invalid args, 3 doctor fail, 64 not implemented
Pure pipe behavior, no daemon required for one-shots. The Backend interface lets the same code path run against EchoBackend (deterministic, no Chrome - used for tests and FENSTER_BACKEND=echo smoke checks) or ChromeCDPBackend (real model). When --serve is on, the CLI process keeps Chrome supervised; when it is a one-shot, fenster connects to an existing shared Chrome via the lockfile or starts one and keeps it for the next call.
2. HTTP server path (fenster --serve)
HTTP request (curl, openai-python, IDE)
│
v
internal/server (stdlib net/http 1.22 patterns, no router dep)
├─ /health ── liveness (loopback by default; --public-health to flip)
├─ /v1/models ── { "gemini-nano": ... }
├─ /v1/chat/completions ── stream + non-stream, OpenAI envelope
├─ /v1/completions ── 501
├─ /v1/embeddings ── 501
└─ middleware: bearer auth, origin check, CORS, request validation
│
v
internal/backend.ChromeCDPBackend
├─ initOnce() ── Runtime.evaluate {userGesture:true} → LanguageModel.create()
├─ sentinel session in window.__fensterSentinel ── created once, .clone() per request
├─ PreWarm() ── pays the cold-start tax in the background at server boot
└─ splitMessages() ── flattens OpenAI tool_calls/tool messages into text history
│
v
chromedp (CDP client)
│
v
Headless Chrome Canary, page = http://127.0.0.1:11434/ (must be a real http origin)
│
v
window.LanguageModel.promptStreaming(history) → Gemini Nano on GPU
│
v
SSE chunks back up the stack, OpenAI-shaped, byte-for-byte apfel-compatible
Single shared Chrome per machine: ~/.fenster/run/chrome.json holds the PID and CDP URL, protected by flock(2). Every fenster --serve instance attaches to the same browser. First one launches it, last one to leave optionally cleans up. Sentinel session reuse drops first-byte latency from ~23s to ~2s on warm starts.
3. The Chrome extension bridge (alternative path, shipped)
fenster also ships a real MV3 Chrome extension and a Native Messaging host. The extension's only job is to call LanguageModel.create() from inside Chrome's extension context (where it is also exposed) and stream chunks back to a Native Messaging host process over Chrome's stdio framing protocol (4-byte little-endian length prefix + UTF-8 JSON).
extension/service-worker.js ── connectNative("com.fullstackoptimization.fenster")
│
│ Chrome stdio (4-byte LE prefix + JSON)
v
internal/nm ── Native Messaging framing
internal/bridge ── Unix-socket multiplex to fenster supervisor
│
v
internal/backend.ChromeBackend (alternative to ChromeCDPBackend)
This path is wired and tested but is not the default. The CDP path has fewer deployment requirements: no per-OS Native Messaging manifest install step, no extension ID drift, and no Chrome Web Store packaging. The extension remains available for environments where Runtime.evaluate {userGesture:true} is blocked but an installed extension is permitted.
Pros and cons of this architecture
Pros
- Free, on-device, no API key. Gemini Nano runs locally on hardware that already shipped with Chrome. No tokens metered, no rate limits, no telemetry to a vendor.
- OpenAI wire-format compatible. Drop-in for
openai-python,openai-node, LangChain, anything that takes abase_url. Same envelope as apfel; clients written for one work against the other unmodified. - Single static Go binary.
go installand the binary runs. No Python venv, no Node, no Docker. Cross-compiles to darwin-arm64, darwin-amd64, linux-amd64, linux-arm64, windows-amd64. - Invisible by default.
--headless=newplus no AppKit window plus no Dock icon.FENSTER_CHROME_HEADED=1surfaces Chrome for debugging. - One shared Chrome per machine. Lockfile-based supervisor means N
fenster --serveprocesses attach to one Chrome; first one launches, others reuse. - Fast warm path. Sentinel session reuse plus pre-warm at server boot brings first-token latency from ~23s (cold
LanguageModel.create()) to ~2s on subsequent prompts. - GPU acceleration via Chrome. Metal on macOS, DirectML on Windows, Vulkan on Linux. No CUDA, no ROCm, no per-OS driver setup.
- Explicit limits.
fenster doctorreports environment state. 501 responses for/v1/embeddingsand legacy completions, not silent stubs. - Stdlib-first Go. ~10k LOC. No third-party HTTP router. No testify. No DI framework. Direct deps:
cobra,chromedp,golang.org/x/term.
Cons
- Chrome Canary 149+ required. Stable Chrome does not expose
LanguageModeleven with--enable-features=PromptAPIForGeminiNano(empirically tested). The Built-in AI APIs are on a public Origin Trial; the gate can change between Chrome versions. - First boot is heavy. ~2.4 GB Gemini Nano model download on first launch. Minutes on a fast connection. The model lives inside Chrome's profile; nothing fenster can do to skip it.
- GPU floor. ~4 GB VRAM minimum. CPU fallback needs ~16 GB RAM and is slow. Realistic target is ChromeOS Plus, Windows 10/11, macOS 13+, modern Linux desktops.
- ~3B parameter model. Gemini Nano is small. Reasoning, math, and long-context tasks are not its strength. Output quality is well below GPT-4-class models.
- Tool calling is shimmed. Chrome's Prompt API does not expose OpenAI-shape tool calls. fenster maps them to
responseConstraintJSON-schema constraints and parses host-side. - No embeddings, no fine-tune, no
logit_bias. What Chrome exposes is the entire surface. - Origin-trial fragility. Future Chrome versions may change the API surface or pull the gate. fenster tracks upstream as it shifts.
- CDP
userGesture:trueis the trigger mechanism, not a guarantee. Chrome accepts a synthesized user gesture from CDP for the download trigger today. The MV3 extension bridge is the documented fallback if that path closes. - Multi-step agentic loops drift. For agent work that needs stronger reasoning, a hosted frontier model is a better fit. fenster's fit is local privacy-sensitive single-turn Q&A and structured-output tasks.
- A whole browser to run a 3B model. Headless Chrome can crash, hang, or fail to download the model. fenster supervises and restarts, but the process footprint is browser-sized.
Status (today, April 2026)
v0.0.1, 172 of 233 apfel integration tests pass against fenster with the real Gemini Nano model running headless. All Go unit tests are green and race-clean.
| Wave | passing | gain |
|---|---|---|
| baseline (Echo backend, no model) | 84 | |
| security middleware + debug logs | 96 | +12 |
| real Gemini Nano via CDP | 105 | +9 |
-f/--file + flat -o json |
128 | +23 |
--update/--release + USAGE: + exit codes |
139 | +11 |
| man-page lints | 142 | +3 |
| footgun preflight + /health on loopback | 146 | +4 |
--token-auto + --no-origin-check + WWW-Auth |
151 | +5 |
| ANSI under TTY + chat TUI + tool flatten | 169 | +18 |
chat ai› + tool messages + --stream/--json |
172 | +3 |
Path to 100% lives in docs/status.md. Every remaining task is a GitHub issue.
Implementation notes
What is actually in the code, file by file:
- Profile Local State bootstrap (internal/chrome/chrome.go). fenster writes Chrome's
Local StateJSON withenabled_labs_experimentsset before Chrome launches. On boot Chrome reads it and the Built-in AI flags are enabled. No--enable-featuresflag string, no manualchrome://flagstoggling. - Real
http://127.0.0.1origin (internal/backend/chrome_cdp.go).about:blankdoes not exposeLanguageModel; the API requires a real http origin. fenster's HTTP server doubles as the page Chrome navigates to. - Synthesized user gesture over CDP.
Runtime.evaluate {userGesture: true}causes Chrome to treat the call as user-initiated.LanguageModel.create()requires a user gesture for the model-download trigger. - Sentinel session reuse. One
LanguageModelsession is created at startup and stashed onwindow.__fensterSentinel; every request uses.clone().LanguageModel.create()costs ~5-8s,.clone()costs ~50ms. - Pre-warm goroutine at server boot.
PreWarm()callsinitOnce()on a background goroutine so the cold-start cost is paid before the first client request. - Single shared Chrome via lockfile (internal/chrome/shared.go).
~/.fenster/run/chrome.jsonplusflock(2). Multiplefenster --serveinstances attach to the same Chrome; first one launches, the rest reuse. - MV3 extension and Native Messaging host (extension/, internal/nm/). Service worker, manifest with
nativeMessagingpermission, 4-byte LE length-prefix framing, per-OS manifest installer. Wired and tested. Not the default path; documented fallback for environments where the CDPuserGesturepath is blocked. - apfel pytest suite vendored (Tests/integration/). 233 transport-agnostic tests written for apfel's Swift server, talking HTTP to
localhost:11434. fenster's Go server passes 172 of them today.
Architecture decisions you should know
These behavior notes come from the current implementation and test matrix:
- Chrome Canary 149+ is required. Stable Chrome does not expose
LanguageModeleven with--enable-features=PromptAPIForGeminiNano.fenster doctorwill guide you. - Headless mode works.
--headless=newplus a bootstrappedLocal Stateplus a realhttp://127.0.0.1origin plususerGesture:trueCDPRuntime.evaluatetrips the model-download gate. - One shared Chrome per machine via
~/.fenster/run/chrome.jsonlockfile. Manyfenster --serveinstances attach to the same browser. - Sentinel session reuse plus pre-warm at startup. First request after
fenster --servereturns in under 2 seconds because the coldLanguageModel.create()tax is paid in the background. - Ctrl-C semantics ported from apfel. SIGINT at the chat prompt exits 130 with a terminal reset. SIGINT mid-response cancels the request and returns to the prompt.
Performance principles in docs/architecture.md. Chrome flags, why each one is needed, in docs/chrome-flags.md. Native Messaging framing details in docs/native-messaging.md.
Build from source
git clone https://github.com/Arthur-Ficial/fenster cd fenster make build # release binary to bin/fenster make test-fast # Go unit + non-model integration in 30 seconds make test # full apfel-compat suite, real Gemini Nano, about 5 minutes
Modern Go (1.22+), stdlib first. Direct deps: cobra, chromedp, term. No third-party HTTP router. No mocks of Chrome.
Sister project
apfel is fenster's macOS Apple-Intelligence twin. Wire format is byte-for-byte compatible. Clients written for apfel work against fenster too.
Contributing
See the open issues. Issues with up-for-grabs are well-scoped places to start. The big ones:
FEN-201MCP host-side execute loop (auto-tool dispatch)FEN-202Chat TUI completeness (arrow keys, JSON-mode, MCP integration)FEN-203cli_e2e text-matching cornersFEN-205Tool-calling shim (responseConstraint)FEN-206Real streaming SSE fromLanguageModel.promptStreaming()FEN-207HTTP MCP client (Streamable HTTP)FEN-208Distribution: Homebrew tap, Scoop bucket, apt deb
License
MIT. See LICENSE.