Phlox — A full-featured, self-hostable AI platform

Self-hostable · Open source · Apache 2.0

A full-featured AI platform
you actually own.

Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG, code execution, an OpenAI-compatible gateway, and per-user cost accounting — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint, including fully local models.

Runs over

AWS Bedrock
OpenAI
Ollama
OpenRouter
vLLM
LiteLLM
LM Studio

localhost:5173

The Phlox chat interface showing a streaming conversation, tool calls, and an artifact panel

~40built-in agent tools

100%self-hostable & offline-capable

8built-in themes

6+model providers, one config

Bring your own model. Or run it all locally.

Named provider profiles cover AWS Bedrock and any OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings — runs offline with no cloud API key. Switch profiles live, with a built-in connection tester.

Define as many provider profiles as you like, switch between them instantly
Local embeddings (e.g. nomic-embed-text) keep RAG fully offline
Edit profiles, pricing, and limits live — no server restart required

config.yml

default_profile: local-ollama
profiles:
  local-ollama:
    type: openai
    label: "Ollama (local)"
    endpoint: http://localhost:11434/v1
    api_key: ollama       # ignored by Ollama
    model: qwen3.6:35b
    supports_tools: true

Everything in one app

A complete assistant, not just a chat box

Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.

💬

Streaming chat

Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.

🤖

Agentic harness

The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.

🤝

Human-in-the-loop

Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.

🧰

Code execution & artifacts

Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.

📚

Documents & RAG

Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.

🌐

Opt-in web search

A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.

🧠

Cross-conversation memory

Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.

🖼️

Multimodal

Attach images to messages for vision models, persisted and replayed into the provider as image content parts.

🔌

MCP integration

Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.

🚪

OpenAI-compatible gateway

Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.

💵

Usage & cost accounting

Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.

🧮

Spend budgets

Set a monthly cost cap per user or department. Warn at an adjustable threshold, then block priced models once the budget is reached — across chat and the API gateway. Resets each month.

🎨

Theming

Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.

The agentic core

A real agent, not "chat that calls tools"

Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.

01 Tool loop

Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.

02 Planning & sub-agents

update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.

03 Memory & checkpoints

save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.

04 Approvals & permissions

Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.

Knowledge & memory

Your documents, searched the right way

Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and embeds them into Qdrant, then retrieves with true hybrid search — a dense semantic vector and a sparse lexical vector per chunk, fused with RRF and reranked, returning numbered citations the model is instructed to cite.

Global knowledge base or per-conversation document scoping
Dependency-free sparse vectors and reranker work fully offline
SQLite stays the source of truth — the index can always be rebuilt
Cross-conversation memory recalls durable facts into every turn

ParsePDF · DOCX · TXT · MD · code

Chunk & embeddense + sparse vectors

Hybrid searchRRF fusion across both vectors

Rerankcross-encoder-ready seam

Citenumbered sources [n]

Built for teams

Multi-user, isolated, and accountable

Auth is on by default, data is scoped strictly per user, and every sensitive tool runs behind a permission gate you control.

🔐

Auth & SSO

Local accounts (bcrypt + JWT) or Microsoft Entra ID SSO. user / admin roles, strict per-user isolation — admins manage accounts but can't read others' content.

📦

Container sandbox

Run agent code in an ephemeral Podman/Docker container with CPU, memory, and PID limits plus network isolation — or a fast local subprocess for trusted single-user use.

🛡️

Per-tool permissions

Each tool is auto, ask, or deny. Mutating and execution tools default to ask; an Agent-mode toggle auto-approves for a single turn.

⚙️

Live admin config

Edit provider profiles, model pricing, resilience, and sandbox limits from an admin panel — applied without a restart. API keys are write-only and masked.

📊

Observability

Per-request structured logs, an optional OpenTelemetry tracing seam, and per-turn token/cost capture in a durable ledger.

💵

Departmental chargeback

A durable usage ledger outlives the accounts it tracks — a departed user's costs stay billable after their account is deleted. Usage by month × user × department × model, CSV-exportable.

🧮

Spend budgets

Cap monthly spend per user or department. Users see a warning as they near the limit; priced models are blocked once it's reached — enforced for both chat and the API gateway, with a monthly reset.

The platform layer

An LLM gateway and cost ledger for the whole team

Beyond chat, Phlox is an OpenAI-compatible gateway with per-user API keys, live model pricing, department-level chargeback, and monthly spend budgets — the governance layer that turns a chat app into shared infrastructure.

Usage and cost dashboard grouped by department, user, and model with per-month totals and CSV export — Usage & cost, grouped by month × department × user × model — exportable to CSV for finance.

Spend budgets

Set a cap. Warn, then enforce.

Give a user or a whole department a monthly cost ceiling. Phlox warns as they approach it and blocks priced models once it's reached — while free, locally-hosted models keep working. Enforcement is shared by chat and the API gateway, and budgets reset each month. Most-restrictive budget wins when both a user and department cap apply.

Admin Budgets panel showing monthly spend caps for departments and users with spent-vs-limit progress bars, editable limit and warning thresholds, and a form to add a budget — Admin budgets — a monthly cap per user or department, with an adjustable warning threshold and live spend bars.

Under the hood

Two clean processes

A FastAPI backend handles LLM orchestration, the agent harness, MCP, RAG, code execution, auth, and SQLite persistence. A React + Vite frontend renders the rich, streaming UI.

Frontend React + Vite + Tailwind

Zustand store — live streaming assembly
SSE stream parser for /api/chat
Tool cards, reasoning, inline artifacts
CSS-variable theme tokens

Backend FastAPI

Resumable agent loop + tool registry
Permission gate — the security seam
Providers: OpenAI-compatible & Bedrock
RAG · sandbox · workspace · MCP
SQLite source of truth + Qdrant index

In dev, Vite proxies /api to FastAPI. In production, FastAPI serves the built SPA from frontend/dist — one command to run the whole thing.

Make it yours

Eight themes, instant switching

A semantic CSS-variable token layer means themes change with no rebuild — and adding your own is two small edits.

Phlox Dark

Phlox Light

Fred Hutch

Hutch Night

Dark

Light

Sandstone

+ your own

Up and running in minutes

Quick start

Prerequisites: Python 3.11+ with uv, Node 18+, and a model provider — a local Ollama is the easiest.

1 · Backend

# from backend/
uv sync
cp config.yml.example config.yml   # set your provider
uv run uvicorn app.main:app --reload --port 8000

2 · Frontend

# from frontend/, separate terminal
npm install
npm run dev
# open http://localhost:5173

On Windows run both with ./scripts/dev.ps1; on macOS/Linux ./scripts/dev.sh. Auth is on by default with a seeded admin / admin — change it and set a real jwt_secret before sharing access.

Self-host your own AI assistant today

Open source under Apache 2.0. Clone it, point it at a model, and run.

A full-featured AI platformyou actually own.

Bring your own model. Or run it all locally.

A complete assistant, not just a chat box

Streaming chat

Agentic harness

Human-in-the-loop

Code execution & artifacts

Documents & RAG

Opt-in web search

Cross-conversation memory

Multimodal

MCP integration

OpenAI-compatible gateway

Usage & cost accounting

Spend budgets

Theming

A real agent, not "chat that calls tools"

01 Tool loop

02 Planning & sub-agents

03 Memory & checkpoints

04 Approvals & permissions

Your documents, searched the right way

Multi-user, isolated, and accountable

Auth & SSO

Container sandbox

Per-tool permissions

Live admin config

Observability

Departmental chargeback

Spend budgets

An LLM gateway and cost ledger for the whole team

Set a cap. Warn, then enforce.

Two clean processes

Eight themes, instant switching

Quick start

Self-host your own AI assistant today

A full-featured AI platform
you actually own.