GitHub - prime-vector/open-agent-spec: Open Agent Spec is a declarative YAML standard and CLI for defining and generating AI agents. One spec, any LLM engine (OpenAI, Anthropic, Grok, Cortex, local, custom).

Define AI agents as contracts, not scattered prompts.

Open Agent Spec lets you define an agent once in YAML, validate inputs and outputs against a schema, and either run it directly with oa run or generate a Python scaffold with oa init.

Why This Exists

Most agent systems are hard to reason about:

outputs are not strictly typed
behaviour is buried in prompts
logic is split across Python, Markdown, and framework abstractions
swapping models often breaks things in subtle ways

The Idea

Open Agent Spec treats an agent like infrastructure.

Think OpenAPI or Terraform, but for AI agents.

You define:

input schema
output schema
prompts
model configuration

Then OA enforces the boundary:

input -> LLM -> validated output

If the output does not match schema, the task fails fast with a validation error.

For example, this shape mismatch can silently break downstream systems:

instead of:

Super Quick Start

Install (Python 3.10+):

pipx install open-agent-spec

oa init aac
oa validate aac
export OPENAI_API_KEY=your_key_here
oa run --spec .agents/example.yaml --task greet --input '{"name":"Alice"}' --quiet

With OA you can:

define tasks, prompts, model config, and expected I/O in YAML
run a spec directly without generating code first
keep .agents/*.yaml in your repo and call them from CI
generate a Python project scaffold when you want to customize implementation

First Run

Shortest path from install to a working agent:

1. Create the agents-as-code layout (aac = repo-native .agents/ directory):

This creates:

.agents/
├── example.yaml   # minimal hello-world spec
├── review.yaml    # code-review agent that accepts a diff file
├── change.diff    # sample diff for immediate review-agent testing
└── README.md      # quick usage notes

2. Validate the generated specs:

3. Set an API key for the engine in your spec (OpenAI by default):

export OPENAI_API_KEY=your_key_here

4. Run the example agent:

oa run --spec .agents/example.yaml --task greet --input '{"name":"Alice"}' --quiet

--quiet prints the task output JSON only, good for piping to jq or scripting:

{
  "response": "Hello Alice!"
}

Omit --quiet for the full execution envelope with Rich formatting.

5. Run the review agent with the bundled sample diff:

oa run --spec .agents/review.yaml --task review --input .agents/change.diff --quiet

Or review your own change:

git diff > change.diff
oa run --spec .agents/review.yaml --task review --input change.diff --quiet

Write Your Own Spec

Start from this shape:

open_agent_spec: "1.5.0"

agent:
  name: hello-world-agent
  role: chat

intelligence:
  type: llm
  engine: openai
  model: gpt-4o

tasks:
  greet:
    description: Say hello to someone
    input:
      type: object
      properties:
        name:
          type: string
      required: [name]
    output:
      type: object
      properties:
        response:
          type: string
      required: [response]

prompts:
  system: >
    You greet people by name.
  user: "{{ name }}"

Validate first, then run:

oa validate --spec agent.yaml
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Features

Multi-task pipelines with `depends_on`

Chain tasks declaratively. OA merges upstream outputs into downstream inputs automatically — no glue code required.

tasks:
  extract:
    description: Pull key facts from raw text.
    # ... input / output / prompts

  summarise:
    description: Summarise the extracted facts.
    depends_on: [extract]   # extract's output is merged into summarise's input
    # ... prompts

depends_on is a data contract, not execution control. OA has no branching, loops, or conditionals by design. See examples/multi-task/.

Tools — native, MCP, and custom

Let the model call tools declared in the spec. Three backends, zero SDK dependencies.

tools:
  reader:
    type: native
    native: file.read          # built-in: file.read/write, http.get/post, env.read

  search:
    type: mcp
    endpoint: http://localhost:3000   # any MCP server (JSON-RPC 2.0 over HTTP)

  classifier:
    type: custom
    module: my_pkg.tools:ClassifierTool   # your own Python class

tasks:
  analyse:
    tools: [reader, search, classifier]
    # ...

See examples/file-reader/ and examples/mcp-search/.

Spec composition — delegate tasks to other specs

A task can hand off its implementation to another spec entirely. Great for building shared specialist agents that many pipelines reuse.

tasks:
  sentiment_of_summary:
    description: Delegate to the shared sentiment specialist.
    spec: ./shared/sentiment.yaml   # local path or oa:// registry URL
    task: analyse_sentiment
    depends_on: [summarise]         # upstream outputs merged in automatically

See examples/spec-composition/.

Spec Registry — share specs via `oa://`

Publish and consume specs from the hosted registry at openagentspec.dev/registry/. Reference them with the oa:// shorthand — the runner resolves and fetches them automatically.

tasks:
  review:
    spec: oa://prime-vector/code-reviewer   # resolves to latest hosted spec
    task: review

Browse the registry at openagentspec.dev/registry. Available specs: summariser, classifier, sentiment, code-reviewer, keyword-extractor, memory-retriever.

History threading — stateless multi-turn chat

Pass prior conversation turns as a history input field. OA injects them into the LLM message list between system and user turns. OA never stores history — your application manages the list.

tasks:
  chat:
    input:
      type: object
      properties:
        message: {type: string}
        history:
          type: array
          description: Prior turns injected by the caller. OA never writes to this field.

oa run --spec spec.yaml --task chat \
  --input '{"message":"What did I just say?","history":[{"role":"user","content":"Hello"},{"role":"assistant","content":"Hi there!"}]}'

See examples/chat-agent/.

Memory retriever — LLM re-ranker for long-term memory

Your application fetches candidate turns from an external store. The memory-retriever registry spec uses an LLM to select the most relevant ones and returns them as a history array ready to inject into any chat task.

tasks:
  recall:
    spec: oa://prime-vector/memory-retriever
    task: retrieve   # input: query + candidates → output: history + memory_count

  respond:
    depends_on: [recall]
    spec: ./chat-agent/spec.yaml
    task: chat

See examples/memory-chat/.

Immutable Inference Sandboxing (IIS)

Declare hard execution constraints in the spec. The runner enforces them before any tool call reaches the I/O layer — no network connection opened, no file handle created, no exception to catch.

sandbox:
  tools:
    allow: [file.read, http.get]     # SANDBOX_TOOL_VIOLATION if anything else is called
  http:
    allow_domains: [api.example.com] # SANDBOX_DOMAIN_VIOLATION for other hosts
  file:
    allow_paths: [./data/]           # SANDBOX_PATH_VIOLATION for paths outside this prefix

tasks:
  restricted:
    sandbox:                         # per-task override tightens the root sandbox
      tools:
        allow: [file.read]

See examples/sandboxed-agent/.

Behavioural contracts

Declare what the model output must contain. The behavioural-contracts library enforces the contract after parsing, before the result is returned.

behavioural_contract:
  version: "1.0"
  response_contract:
    output_format:
      required_fields: [confidence]   # CONTRACT_VIOLATION if missing

tasks:
  classify:
    behavioural_contract:
      response_contract:
        output_format:
          required_fields: [label]    # effective required_fields: [confidence, label]

Install: pip install 'open-agent-spec[contracts]'

Multiple engines

Switch models by changing one line. All engines except Anthropic and Codex speak the OpenAI Chat Completions API over raw HTTP — no SDK required.

intelligence:
  type: llm
  engine: openai       # openai | anthropic | grok | xai | cortex | local | codex | custom
  model: gpt-4o-mini

The same oa run command works against any engine — drop the intelligence: block below into your spec, export the matching key, and run.

OpenAI

intelligence:
  type: llm
  engine: openai
  model: gpt-4o-mini

export OPENAI_API_KEY=sk-...
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Anthropic (Claude)

intelligence:
  type: llm
  engine: anthropic
  model: claude-3-5-sonnet-20241022

export ANTHROPIC_API_KEY=sk-ant-...
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Grok / xAI

intelligence:
  type: llm
  engine: grok          # or "xai" — same provider
  model: grok-3-latest

export XAI_API_KEY=xai-...
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Local (Ollama, LM Studio, vLLM, llama.cpp)

intelligence:
  type: llm
  engine: local
  endpoint: http://localhost:11434/v1   # default: Ollama
  model: llama3.2

# No API key required.
ollama serve && ollama pull llama3.2
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Cortex (self-hosted, OpenAI-compatible)

intelligence:
  type: llm
  engine: cortex
  endpoint: https://cortex.mycompany.com/v1
  model: my-cortex-model
  config:
    api_key_env: CORTEX_API_KEY

export CORTEX_API_KEY=...
oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet

Custom (your own Python class)

intelligence:
  type: llm
  engine: custom
  module: my_pkg.providers:MyProvider

Implement invoke(system, user, config, history) on MyProvider. See docs/REFERENCE.md for the protocol.

npm / Node.js CLI

Run OA specs from Node.js without Python.

npm install -g @prime-vector/open-agent-spec
oa-run --spec agent.yaml --task greet --input '{"name":"Alice"}'

Supports OpenAI and Anthropic, depends_on chains, and history threading.

Generate a Python Scaffold

If you want editable generated code instead of running the YAML directly:

oa init --spec agent.yaml --output ./agent

Generated structure:

agent/
├── agent.py
├── models.py
├── prompts/
├── requirements.txt
├── .env.example
└── README.md

Core Idea

Most agent projects end up hand-rolling the same pieces:

prompt templates
model configuration
task definitions
routing glue
runtime wrappers

OA moves those concerns into a declarative spec so they can be reviewed, versioned, and reused.

The intended model is:

spec defines the agent contract
oa run executes the spec directly
oa init generates a starting implementation when you need code
external systems can orchestrate multiple specs however they want

OA deliberately does not prescribe:

orchestration
evaluation
governance
long-running runtime architecture

Common Commands

Command	Purpose
`oa init aac`	Create `.agents/` with starter specs
`oa validate aac`	Validate all specs in `.agents/`
`oa validate --spec agent.yaml`	Validate one spec
`oa test agent.test.yaml`	Run YAML eval cases (model + assertions on task output); `--quiet` for CI JSON
`oa run --spec agent.yaml --task greet --input '{"name":"Alice"}' --quiet`	Run one task directly from YAML
`oa init --spec agent.yaml --output ./agent`	Generate a Python scaffold
`oa update --spec agent.yaml --output ./agent`	Regenerate an existing scaffold

Specification

The formal specification defines what a conforming OA runtime must do, independent of any specific implementation.

Resource	Contents
spec/open-agent-spec-1.5.md	Formal specification — normative MUST/SHOULD/MAY requirements for OA 1.5.0
spec/schema/oas-schema-1.5.json	Canonical JSON Schema for validating spec documents
spec/conformance/README.md	Conformance test structure and contribution guide

An independent implementor can build a conforming runtime from spec/open-agent-spec-1.5.md alone.

More Detail

Resource	Contents
openagentspec.dev	Project website
docs/REFERENCE.md	Spec structure, engines, templates, `.agents/` usage
examples/multi-agent	Multi-agent orchestration example — manager, workers, task board, dashboard
Repository	Source, issues, workflows

Notes

The CLI command is oa (not oas).
Python 3.10+ is required.
oa run requires the relevant provider API key for the engine in your spec.

About

OA Open Agent Spec was dreamed up by Andrew Whitehouse in late 2024, with a desire to give structure and standardisation to early agent systems
In early 2025 Prime Vector was formed taking over the public facing project

License

MIT | see LICENSE.

Open Agent Stack