Build AI Applications | Llama Stack

Quick Start

Get up and running with Llama Stack in just a few commands. Build your first RAG application locally.

# Install uv and start Ollama
ollama run llama3.2:3b --keepalive 60m

# Install server dependencies
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install

# Run Llama Stack server
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter

# Try the Python SDK
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
  base_url="http://localhost:8321"
)

response = client.chat.completions.create(
  model="Llama3.2-3B-Instruct",
  messages=[{
    "role": "user",
    "content": "What is machine learning?"
  }]
)

🔗

Unified APIs

One consistent interface for all your AI needs - inference, safety, agents, and more.

🔄

Provider Flexibility

Swap between providers without code changes. Start local, deploy anywhere.

🛡️

Production Ready

Built-in safety, monitoring, and evaluation tools for enterprise applications.

📱

Multi-Platform

SDKs for Python, Node.js, iOS, Android, and REST APIs for any language.

Llama Stack Ecosystem

Complete toolkit for building AI applications with Llama Stack

🛠️

SDKs & Clients

Official client libraries for multiple programming languages

🚀

Example Applications

Ready-to-run examples to jumpstart your AI projects

☸️

Kubernetes Operator

Deploy and manage Llama Stack on Kubernetes clusters