GitHub - aosmith/gremlin

11 min read Original article ↗

A local, browser-native multi-agent coordinator. A TypeScript module routes all inter-agent messages; a Svelte UI lets you watch every conversation and inject human input at any point. No server required.


Getting started

Pick a provider, get a model, and run. Here are links for the most common setups:

Provider Type Get started
Ollama Local, free ollama.com — install, then ollama pull llama3.2
LM Studio Local, free lmstudio.ai — download a model, start the local server
WebLLM In-browser, free No install — select WebLLM in Settings, pick a model, runs via WebGPU (Chrome 113+)
OpenRouter Cloud, free tier openrouter.ai — create account, copy API key
Groq Cloud, free tier console.groq.com — create account, copy API key
OpenAI Cloud, paid platform.openai.com/api-keys
Anthropic Cloud, paid console.anthropic.com — create API key
Google Gemini Cloud, free tier aistudio.google.com — create API key
Together Cloud, paid api.together.xyz — create account, copy API key

Once you have a provider ready, open GREMLIN → Settings (⚙) → pick your provider → paste your key (if needed) → pick a model → Save. Type a task and hit Run.


Features

🧠 TypeScript coordinator In-memory agent state machine with message routing
🤝 Collaborative agents Agents have individual system prompts and message each other autonomously
🔍 Full visibility Every inter-agent message is shown in real time — nothing hidden
💬 Human-in-the-loop Click any agent during a run to read its thread and inject instructions
🗂 7 built-in modes General · Engineering · Finance · Industrial · Medicine · Networking · Prediction Markets
Dev mode Engineering agents write real files to your disk via the File System Access API
🔎 Web search DuckDuckGo out of the box — or configure Brave, Serper, Tavily, SearXNG
🌐 WebLLM Run quantised models entirely in the browser via WebGPU — no server or key needed
🖼 Multimodal Attach, paste, or drag-and-drop images into your task prompt
🔌 10 providers Ollama, LM Studio, WebLLM, OpenRouter, Groq, OpenAI, Anthropic, Gemini, Together, Custom
📦 Single HTML file dist/index.html has everything inlined — JS, CSS, all assets
🏠 Fully local No backend, no accounts; only outbound traffic is calls to your chosen LLM

Quick start

Prerequisites

Tool Version Install
Node.js 18+ nodejs.org

Run

git clone https://github.com/aosmith/gremlin.git
cd gremlin/web
npm install
npm run dev   # → http://localhost:5173

The Vite dev server is the runtime. It provides a built-in CORS proxy for web search, auto-starts Ollama (if installed), and launches the browser sidecar. Every user runs npm run dev locally — there is no separate production deployment step.

Build (optional)

npm run build   # → dist/index.html (single self-contained file)

The built file can be hosted statically, but loses the CORS proxy and sidecars. For the full experience, use npm run dev.


Modes

Switch modes with the tab bar below the navbar. Each mode loads a different set of agents. Agent configs are saved per-mode.

Mode Agents Use for
General 🌐 CEO · Researcher · Analyst · Critic · Writer · TDD Engineer · Editor · Chief of Staff Research, writing, analysis
Engineering CTO · Frontend Dev · Backend Dev · Full-Stack Dev · DevOps Eng · QA Engineer · Security Eng · Simplicity Eng · TDD Engineer · Editor · Staff Engineer Software projects
Finance 📈 Capital Allocator · Value Analyst · Quant Analyst · Filings Analyst · Risk Manager · Sector Analyst · TDD Engineer · Editor · Investment Strategist Investment research
Industrial 🏭 General Manager · Manufacturing Eng · Operations Manager · Supply Chain · Quality Engineer · Commercial Manager · TDD Engineer · Editor · Plant Controller Manufacturing, operations & supply chain
Medicine 🩺 Attending Physician · Internist · Radiologist · Lab Medicine · Clinical Pharmacist · Nurse Practitioner · Chief of Medicine Clinical reasoning, diagnosis, treatment
Networking 📡 NOC Director · Transport Engineer · IP/MPLS Engineer · Voice/UC Engineer · RF/Wireless Engineer · Security Analyst · TDD Engineer · Editor · Service Assurance Lead Telecom NOC, triage, routing
Prediction Markets 🔮 Market Strategist · Probability Modeler · News Scanner · Whale Tracker · Arbitrage Analyst · Risk Assessor · TDD Engineer · Editor · Trade Architect Forecasting, prediction markets
+ New Mode Snapshot of your current agents Any custom team

Click + New Mode to save your current agent configuration as a named mode. Custom modes can be deleted; the seven built-in ones cannot.


Dev mode (Engineering)

When the Engineering mode is active, a 📁 Open Folder button appears in the toolbar. Select a local project directory and the agents gain file system tools:

Tool Description
write_file(path, content) Create or overwrite a file (parent directories are created automatically)
read_file(path) Read a file's full content
list_directory(path) List files and subdirectories

Tool calls appear as messages in the Activity Monitor. Written files show up in a file tree in the left sidebar; click any file to open it in the code viewer panel.

The File System Access API requires Chrome 86+ or Edge 86+. Firefox does not support it.


Browser tools

Agents can interact with a live browser when a sidecar server is running at http://127.0.0.1:3131:

Tool Description
browse_navigate(url) Navigate to a URL; returns page title and HTTP status
browse_content(selector?) Extract text from the current page or a specific element
browse_click(selector) Click an element by CSS selector
browse_type(selector, text) Type into an input field; optionally submit
browse_evaluate(script) Run JavaScript in the page context
browse_assert(checks) Assert conditions about elements; returns PASS/FAIL
browse_links() List all links on the current page
browse_wait(selector) Wait for an element to appear (up to 10s)

Web search

Agents can search the web and fetch pages during a run. Two search tools are available:

Tool Description
web_search(query) Search the web via the configured provider
web_fetch(url) Fetch and extract text from any URL (max 30k chars)

DuckDuckGo is the default search provider — no API key needed. For better results, configure a provider in Settings:

Provider API key needed Notes
DuckDuckGo (default) No HTML scrape via CORS proxy
Brave Search Yes Free tier: 2,000 queries/mo
Serper (Google) Yes Free tier: 2,500 queries/mo
Tavily Yes AI-optimised, free tier: 1,000 queries/mo
SearXNG No Self-hosted; set your instance URL in Settings
Cloudflare Yes (Account ID + API token) Browser Rendering crawl — renders JS-heavy pages; also powers web_fetch

WebLLM (in-browser inference)

Select WebLLM 🌐 in Settings to run a quantised model entirely in your browser using WebGPU. No API key or external server needed.

Requirements: Chrome 113+ or Edge 113+ (WebGPU). Firefox and Safari do not support WebGPU by default.

Workflow:

  1. Open Settings → select WebLLM → pick a model
  2. Click Run — the model downloads to your browser cache on first use (~500 MB – 4 GB depending on model)
  3. A progress bar appears below the navbar during loading; subsequent runs use the cache instantly

Available models (curated list):

Model Size (approx) Notes
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC ~700 MB Fastest; good for testing
Llama-3.2-1B-Instruct-q4f16_1-MLC ~800 MB Small, fast
Llama-3.2-3B-Instruct-q4f16_1-MLC ~2 GB Good balance (default)
Qwen2.5-3B-Instruct-q4f16_1-MLC ~2 GB Strong at reasoning
Phi-3-mini-4k-instruct-q4f16_1-MLC ~2 GB Microsoft's compact model
Phi-3.5-mini-instruct-q4f16_1-MLC ~2 GB Microsoft's newer compact model
gemma-2-2b-it-q4f16_1-MLC ~1.5 GB Google's 2B model
Qwen2.5-7B-Instruct-q4f16_1-MLC ~4 GB Higher quality
Mistral-7B-Instruct-v0.3-q4f16_1-MLC ~4 GB Strong general-purpose
Llama-3.1-8B-Instruct-q4f32_1-MLC ~4 GB Best quality locally

LLM providers

Open ⚙ Settings to pick a provider. The provider grid covers:

Provider Kind Notes
Ollama Local ollama pull llama3.2 · click ↺ Discover
LM Studio Local Start local server · click ↺ Discover
WebLLM In-browser WebGPU required; no key needed
OpenRouter Cloud 200+ models; free tier available
Groq Cloud Very fast inference; free tier
OpenAI Cloud GPT-4o and variants
Anthropic Cloud Claude models
Gemini Cloud Google Gemini; free tier
Together Cloud Open-source cloud models
Custom Cloud Any OpenAI-compatible endpoint

API keys are stored only in localStorage and never sent anywhere except your chosen endpoint.


Agent communication protocol

Agents respond with JSON. GREMLIN parses and routes the messages automatically:

{
  "analysis": "My reasoning goes here",
  "messages": [
    { "to": "critic",      "content": "Can you verify finding #3?" },
    { "to": "synthesizer", "content": "Final results: …" }
  ],
  "done": false,
  "result": null
}
  • "done": true marks the agent's task as complete
  • "result" contains the final output (the synthesizer's result is shown to the user)
  • Plain-text responses are accepted as a fallback — treated as a final result

Agents also have access to protocol tools (send_message, mark_done) for providers that support tool calling.


Architecture

┌──────────────────────────────────────────────────────────┐
│                         Browser                          │
│                                                          │
│  ┌─────────────────┐    ┌────────────────────────────┐   │
│  │  Svelte 5 UI    │───▶│  Coordinator (TypeScript)  │   │
│  │  (TypeScript)   │◀───│  • Agent registry          │   │
│  │                 │    │  • Message routing         │   │
│  │  Activity       │    │  • Session state           │   │
│  │  Monitor        │    └────────────────────────────┘   │
│  │  File Tree      │                                     │
│  │  Code Viewer    │                                     │
│  └────────┬────────┘                                     │
│           │                                              │
│  ┌────────▼────────┐    ┌────────────────────────────┐   │
│  │  AgentRunner.ts │    │  File System Access API    │   │
│  │  • Calls LLM    │    │  write_file / read_file /  │   │
│  │  • Tool loop    │───▶│  list_directory            │   │
│  │  • Routes msgs  │    │  (Engineering mode only)   │   │
│  │  • Web search   │    └────────────────────────────┘   │
│  └────────┬────────┘                                     │
│           │ fetch()  or  WebGPU (WebLLM)                 │
└───────────┼──────────────────────────────────────────────┘
            ▼
   ┌─────────────────────────────────┐
   │  LLM                            │
   │  Anthropic / OpenAI / Gemini    │
   │  Ollama / Groq / OpenRouter     │
   │  Together / WebLLM (WebGPU)     │
   └─────────────────────────────────┘

Project structure

gremlin/
├── web/
│   ├── src/
│   │   ├── App.svelte              # Main layout, navbar, mode bar, modals
│   │   ├── app.css                 # Glass-morphism design system
│   │   ├── lib/
│   │   │   ├── types.ts            # Types, providers, mode presets, agent configs
│   │   │   ├── coordinator.ts      # In-memory agent state & message router
│   │   │   ├── api.ts              # LLM call functions (all formats + tool loops)
│   │   │   ├── agentRunner.ts      # Multi-agent orchestration engine
│   │   │   ├── store.svelte.ts     # Reactive global state (Svelte 5 runes)
│   │   │   ├── filesystem.ts       # File System Access API wrapper
│   │   │   ├── tools.ts            # Tool definitions + executor (dev/search/protocol)
│   │   │   ├── webllm.ts           # WebLLM / WebGPU wrapper
│   │   │   ├── cleanContent.ts     # Protocol JSON → readable display text
│   │   │   ├── tableCards.ts       # Prose enhancement (tables, callouts)
│   │   │   ├── sanitize.ts        # DOMPurify XSS sanitization wrapper
│   │   │   ├── search.ts          # Web search implementation
│   │   │   ├── cloudflare.ts     # Cloudflare Browser Rendering crawl
│   │   │   ├── teamGenerator.ts   # Dynamic team generation
│   │   │   └── headless.ts        # Headless browser integration
│   │   └── components/
│   │       ├── ActivityMonitor.svelte  # Real-time message feed
│   │       ├── AgentCard.svelte        # Sidebar agent widget
│   │       ├── AgentPanel.svelte       # Agent detail + human input
│   │       ├── AgentEditModal.svelte   # Add/edit agent dialog
│   │       ├── SettingsModal.svelte    # Provider + API settings
│   │       ├── HelpModal.svelte        # Setup guide + onboarding
│   │       ├── SessionHistory.svelte   # Past session browser
│   │       ├── FileTree.svelte         # Project file tree (dev mode)
│   │       ├── CodeViewer.svelte       # File content viewer (dev mode)
│   │       └── NewModeModal.svelte     # Create custom mode dialog
│   ├── scripts/
│   │   └── smoke-test.mjs          # Post-build smoke test
│   ├── index.html
│   ├── package.json
│   └── vite.config.ts
├── proxy/
│   ├── worker.js                   # Cloudflare Worker CORS proxy
│   └── wrangler.toml
├── setup.sh                        # One-shot build script
└── package.json                    # Root npm scripts

Deployment

The recommended way to run GREMLIN is npm run dev on your local machine. The Vite dev server provides a CORS proxy, auto-starts Ollama, and launches the browser sidecar — none of which are available when hosting the built file statically.

Static hosting (advanced)

The built file (web/dist/index.html) can be hosted on Cloudflare Pages, GitHub Pages, Netlify, etc. Most LLM providers (OpenAI, Anthropic, OpenRouter, Gemini, Together) support browser CORS natively, so they work without a proxy. However, web search and some services (Groq, DuckDuckGo, Brave Search) need a CORS proxy.

To deploy your own proxy for static hosting:

cd proxy
npx wrangler login
npx wrangler deploy

Then update the proxy URL in Settings → Advanced.

Local LLM providers (Ollama, LM Studio)

If you host the site on a custom domain (e.g. gremlin.example.com) and want to use a local Ollama instance, you need to allow the origin:

OLLAMA_ORIGINS=https://gremlin.example.com ollama serve

Ollama only accepts requests from localhost by default. LM Studio has a similar CORS setting in its server preferences.


Troubleshooting

API 401 / 403 → Check your API key in Settings.

CORS error or Failed to fetch in browser console → Make sure you are running via npm run dev — the dev server provides the CORS proxy → If using the built HTML file: serve over HTTP, not file://, and configure a proxy URL in Settings → If using Ollama from a hosted site: set OLLAMA_ORIGINS=https://your-domain.com when starting Ollama

WebLLM: WebGPU is not available → Use Chrome 113+ or Edge 113+. Firefox and Safari do not support WebGPU by default.

WebLLM: model loads but produces garbled output → Some quantised models struggle with strict JSON. Switch to a larger model or use a server-side provider.

Agents produce plain text instead of JSON → Smaller or older models sometimes ignore the JSON format instruction. Try a larger or more instruction-tuned model.


License

PolyForm Noncommercial 1.0.0 — free for personal and noncommercial use.