💬
Streaming chat
Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.
Self-hostable · Open source · Apache 2.0
Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG, code execution, an OpenAI-compatible gateway, and per-user cost accounting — running over any model provider: AWS Bedrock or any OpenAI-compatible endpoint, including fully local models.
Runs over
localhost:5173
~40built-in agent tools
100%self-hostable & offline-capable
8built-in themes
6+model providers, one config
Named provider profiles cover AWS Bedrock and any OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings — runs offline with no cloud API key. Switch profiles live, with a built-in connection tester.
nomic-embed-text) keep RAG fully offlineconfig.yml
default_profile: local-ollama
profiles:
local-ollama:
type: openai
label: "Ollama (local)"
endpoint: http://localhost:11434/v1
api_key: ollama # ignored by Ollama
model: qwen3.6:35b
supports_tools: true
Everything in one app
Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.
💬
Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.
🤖
The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.
🤝
Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.
🧰
Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.
📚
Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.
🌐
A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.
🧠
Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.
🖼️
Attach images to messages for vision models, persisted and replayed into the provider as image content parts.
🔌
Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.
🚪
Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.
💵
Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.
🧮
Set a monthly cost cap per user or department. Warn at an adjustable threshold, then block priced models once the budget is reached — across chat and the API gateway. Resets each month.
🎨
Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.
The agentic core
Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.
Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.
update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.
save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.
Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.
Knowledge & memory
Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and embeds them into Qdrant, then retrieves with true hybrid search — a dense semantic vector and a sparse lexical vector per chunk, fused with RRF and reranked, returning numbered citations the model is instructed to cite.
1
ParsePDF · DOCX · TXT · MD · code
2
Chunk & embeddense + sparse vectors
3
Hybrid searchRRF fusion across both vectors
4
Rerankcross-encoder-ready seam
5
Citenumbered sources [n]
Built for teams
Auth is on by default, data is scoped strictly per user, and every sensitive tool runs behind a permission gate you control.
🔐
Local accounts (bcrypt + JWT) or Microsoft Entra ID SSO. user / admin roles, strict per-user isolation — admins manage accounts but can't read others' content.
📦
Run agent code in an ephemeral Podman/Docker container with CPU, memory, and PID limits plus network isolation — or a fast local subprocess for trusted single-user use.
🛡️
Each tool is auto, ask, or deny. Mutating and execution tools default to ask; an Agent-mode toggle auto-approves for a single turn.
⚙️
Edit provider profiles, model pricing, resilience, and sandbox limits from an admin panel — applied without a restart. API keys are write-only and masked.
📊
Per-request structured logs, an optional OpenTelemetry tracing seam, and per-turn token/cost capture in a durable ledger.
💵
A durable usage ledger outlives the accounts it tracks — a departed user's costs stay billable after their account is deleted. Usage by month × user × department × model, CSV-exportable.
🧮
Cap monthly spend per user or department. Users see a warning as they near the limit; priced models are blocked once it's reached — enforced for both chat and the API gateway, with a monthly reset.
The platform layer
Beyond chat, Phlox is an OpenAI-compatible gateway with per-user API keys, live model pricing, department-level chargeback, and monthly spend budgets — the governance layer that turns a chat app into shared infrastructure.
Spend budgets
Give a user or a whole department a monthly cost ceiling. Phlox warns as they approach it and blocks priced models once it's reached — while free, locally-hosted models keep working. Enforcement is shared by chat and the API gateway, and budgets reset each month. Most-restrictive budget wins when both a user and department cap apply.
Under the hood
A FastAPI backend handles LLM orchestration, the agent harness, MCP, RAG, code execution, auth, and SQLite persistence. A React + Vite frontend renders the rich, streaming UI.
Frontend React + Vite + Tailwind
/api/chatBackend FastAPI
In dev, Vite proxies /api to FastAPI. In production, FastAPI serves the built SPA from frontend/dist — one command to run the whole thing.
Make it yours
A semantic CSS-variable token layer means themes change with no rebuild — and adding your own is two small edits.
Phlox Dark
Phlox Light
Fred Hutch
Hutch Night
Dark
Light
Sandstone
+ your own
Up and running in minutes
Prerequisites: Python 3.11+ with uv, Node 18+, and a model provider — a local Ollama is the easiest.
1 · Backend
# from backend/
uv sync
cp config.yml.example config.yml # set your provider
uv run uvicorn app.main:app --reload --port 8000
2 · Frontend
# from frontend/, separate terminal
npm install
npm run dev
# open http://localhost:5173
On Windows run both with ./scripts/dev.ps1; on macOS/Linux ./scripts/dev.sh.
Auth is on by default with a seeded admin / admin —
change it and set a real jwt_secret before sharing access.
Open source under Apache 2.0. Clone it, point it at a model, and run.