GitHub - llama-farm/llamafarm: Deploy any AI model, agent, database, RAG, and pipeline locally or remotely in minutes

LlamaFarm - Run your own AI anywhere

Build powerful AI locally, extend anywhere.

Desktop App Downloads

Get started instantly — no command line required:

Platform	Download
Mac (M1+)	Download
Mac (Intel/Universal)	Download
Windows	Download
Linux (x86_64)	Download
Linux (ARM64)	Download

LlamaFarm is an open-source framework for building retrieval-augmented and agentic AI applications. It provides a complete platform with multiple runtime options, composable RAG pipelines, and specialized ML capabilities—all configured through YAML.

Local-first developer experience with a single CLI (lf) that manages projects, datasets, and chat sessions
Multiple runtime options including Universal Runtime (HuggingFace models, OCR, anomaly detection), Ollama, and OpenAI-compatible endpoints
Composable RAG pipelines configured through YAML, not code
Extendable everything: runtimes, embedders, databases, parsers, extractors, and CLI commands

Video demo (90 seconds): https://youtu.be/W7MHGyN0MdQ

Quickstart

Option 1: Desktop App

Download the desktop app above and run it. No additional setup required.

Option 2: CLI + Development Mode

Install the CLI

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bash

Windows (via winget):

winget install LlamaFarm.CLI

Create and run a project

lf init my-project      # Generates llamafarm.yaml
lf start                # Starts services and opens Designer UI

Chat with your AI

lf chat                           # Interactive chat
lf chat "Hello, LlamaFarm!"       # One-off message

The Designer web interface is available at http://localhost:8000.

Option 3: Development from Source

git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm

# Install Nx globally and initialize the workspace
npm install -g nx
nx init --useDotNxInstallation --interactive=false  # Required on first clone

# Start all services (run each in a separate terminal)
nx start server           # FastAPI server (port 8000)
nx start rag              # RAG worker for document processing
nx start universal-runtime # ML models, OCR, embeddings (port 11540)

Architecture

LlamaFarm consists of three main services:

Service	Port	Purpose
Server	8000	FastAPI REST API, Designer web UI, project management
RAG Worker	-	Celery worker for async document processing
Universal Runtime	11540	ML model inference, embeddings, OCR, anomaly detection

All configuration lives in llamafarm.yaml—no scattered settings or hidden defaults.

Runtime Options

Universal Runtime (Recommended)

The Universal Runtime provides access to HuggingFace models plus specialized ML capabilities:

Text Generation - Any HuggingFace text model
Embeddings - sentence-transformers and other embedding models
OCR - Text extraction from images/PDFs (Surya, EasyOCR, PaddleOCR, Tesseract)
Document Extraction - Forms, invoices, receipts via vision models
Text Classification - Pre-trained or custom models via SetFit
Named Entity Recognition - Extract people, organizations, locations
Reranking - Cross-encoder models for improved RAG quality
Anomaly Detection - Isolation Forest, One-Class SVM, Local Outlier Factor, Autoencoders

runtime:
  models:
    default:
      provider: universal
      model: Qwen/Qwen2.5-1.5B-Instruct
      base_url: http://127.0.0.1:11540/v1

Ollama

Simple setup for GGUF models with CPU/GPU acceleration:

runtime:
  models:
    default:
      provider: ollama
      model: qwen3:8b
      base_url: http://localhost:11434/v1

OpenAI-Compatible

Works with vLLM, Together, Mistral API, or any OpenAI-compatible endpoint:

runtime:
  models:
    default:
      provider: openai
      model: gpt-4o
      base_url: https://api.openai.com/v1
      api_key: ${OPENAI_API_KEY}

Core Workflows

CLI Commands

Task	Command
Initialize project	`lf init my-project`
Start services	`lf start`
Interactive chat	`lf chat`
One-off message	`lf chat "Your question"`
List models	`lf models list`
Use specific model	`lf chat --model powerful "Question"`
Create dataset	`lf datasets create -s pdf_ingest -b main_db research`
Upload files	`lf datasets upload research ./docs/*.pdf`
Process dataset	`lf datasets process research`
Query RAG	`lf rag query --database main_db "Your query"`
Check RAG health	`lf rag health`

RAG Pipeline

Create a dataset linked to a processing strategy and database
Upload files (PDF, DOCX, Markdown, TXT)
Process to parse, chunk, and embed documents
Query using semantic search with optional metadata filtering

lf datasets create -s default -b main_db research
lf datasets upload research ./papers/*.pdf
lf datasets process research
lf rag query --database main_db "What are the key findings?"

Designer Web UI

The Designer at http://localhost:8000 provides:

Visual dataset management with drag-and-drop uploads
Interactive configuration editor with live validation
Integrated chat with RAG context
Switch between visual and YAML editing modes

Configuration

llamafarm.yaml is the source of truth for each project:

version: v1
name: my-assistant
namespace: default

# Multi-model configuration
runtime:
  default_model: fast

  models:
    fast:
      description: "Fast local model"
      provider: universal
      model: Qwen/Qwen2.5-1.5B-Instruct
      base_url: http://127.0.0.1:11540/v1

    powerful:
      description: "More capable model"
      provider: universal
      model: Qwen/Qwen2.5-7B-Instruct
      base_url: http://127.0.0.1:11540/v1

# System prompts
prompts:
  - name: default
    messages:
      - role: system
        content: You are a helpful assistant.

# RAG configuration
rag:
  databases:
    - name: main_db
      type: ChromaStore
      default_embedding_strategy: default_embeddings
      default_retrieval_strategy: semantic_search
      embedding_strategies:
        - name: default_embeddings
          type: UniversalEmbedder
          config:
            model: sentence-transformers/all-MiniLM-L6-v2
            base_url: http://127.0.0.1:11540/v1
      retrieval_strategies:
        - name: semantic_search
          type: BasicSimilarityStrategy
          config:
            top_k: 5

  data_processing_strategies:
    - name: default
      parsers:
        - type: PDFParser_LlamaIndex
          config:
            chunk_size: 1000
            chunk_overlap: 100
        - type: MarkdownParser_Python
          config:
            chunk_size: 1000
      extractors: []

# Dataset definitions
datasets:
  - name: research
    data_processing_strategy: default
    database: main_db

Environment Variable Substitution

Use ${VAR} syntax to inject secrets from .env files:

runtime:
  models:
    openai:
      api_key: ${OPENAI_API_KEY}
      # With default: ${OPENAI_API_KEY:-sk-default}
      # From specific file: ${file:.env.production:API_KEY}

See the Configuration Guide for complete reference.

REST API

LlamaFarm provides an OpenAI-compatible REST API:

Chat Completions

curl -X POST http://localhost:8000/v1/projects/default/my-project/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false,
    "rag_enabled": true
  }'

RAG Query

curl -X POST http://localhost:8000/v1/projects/default/my-project/rag/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the requirements?",
    "database": "main_db",
    "top_k": 5
  }'

See the API Reference for all endpoints.

Specialized ML Capabilities

The Universal Runtime provides endpoints beyond chat:

OCR & Document Extraction

curl -X POST http://localhost:11540/v1/ocr \
  -F "file=@document.pdf" \
  -F "backend=surya"

Anomaly Detection

# Train on normal data
curl -X POST http://localhost:11540/v1/anomaly/fit \
  -H "Content-Type: application/json" \
  -d '{"model": "sensor-detector", "backend": "isolation_forest", "data": [[22.1], [23.5], ...]}'

# Detect anomalies
curl -X POST http://localhost:11540/v1/anomaly/detect \
  -H "Content-Type: application/json" \
  -d '{"model": "sensor-detector", "data": [[22.0], [100.0], [23.0]], "threshold": 0.5}'

Text Classification & NER

See the Models Guide for complete documentation.

Examples

Example	Description	Location
FDA Letters Assistant	Multi-PDF ingestion, regulatory queries	`examples/fda_rag/`
Raleigh Planning Helper	Large ordinance documents, geospatial queries	`examples/gov_rag/`
OCR & Document Processing	Image text extraction, form parsing	`examples/ocr_and_document/`

Development & Testing

# Python server tests
cd server && uv sync && uv run --group test python -m pytest

# CLI tests
cd cli && go test ./...

# RAG tests
cd rag && uv sync && uv run pytest tests/

# Universal Runtime tests
cd runtimes/universal && uv sync && uv run pytest tests/

# Build docs
nx build docs

Extensibility

Add runtimes by implementing provider support and updating schema
Add vector stores by implementing store backends (Chroma, Qdrant, etc.)
Add parsers for new file formats (PDF, DOCX, HTML, CSV, etc.)
Add extractors for custom metadata extraction
Add CLI commands under cli/cmd/

See the Extending Guide for step-by-step instructions.

Community & Support

Discord - Chat with the team and community
GitHub Issues - Bug reports and feature requests
Discussions - Ideas and proposals
Contributing Guide - Code style and contribution process

License

Licensed under the Apache 2.0 License. See CREDITS for acknowledgments.

Build locally. Deploy anywhere. Own your AI.