GitHub - incantx/incantx

Note: this project is unreleased.

Test agent conversation flows (including tool calls) using transcript-style YAML fixtures.

1. Install

incantx runs on Node.js (recommended) and Bun.

Node.js: >= 18
Bun: >= 1.3

Install globally (choose one):

# Bun
bun add -g incantx

# npm
npm i -g incantx

Or install in a project:

Or run directly

# Bun
bunx incantx

# npm
npx incantx

2. Writing a fixture

Fixture files are YAML with:

optional file-level system: default system prompt (applies if a fixture transcript has no system: entries)
optional file-level agent: config
fixtures: list, each with:
- id: ...
- transcript: list of chat turns, ending with expect: (required)

Transcript entries are one of:

system: "..."
user: "..."
assistant: "..." or assistant: { content: "...", tool_calls: [...] }
tool: { name: "...", tool_call_id?: "...", json: {...} } (or content: "...")
expect: { ... } (required, must be last)

Single turn example:

system: You are a helpful assistant.

agent:
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: weather-requests-tool
    transcript:
      - user: What's the weather in Dublin?
      - expect:
          content_match: exact
          content: ""
          tool_calls_match: contains
          tool_calls:
            - get_weather: { location: Dublin, unit: c }
          tool_results_match: contains
          tool_results:
            - get_weather: { condition: Rain }

Multi-turn example (tool call + follow-up):

agent:
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: umbrella-followup
    transcript:
      - system: You are a helpful assistant.
      - user: What's the weather in Dublin?
      - assistant:
          content: ""
          tool_calls:
            - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } }
      - tool:
          tool_call_id: call_1
          name: get_weather
          json: { temp_c: 10, condition: Rain }
      - user: What should I bring?
      - expect:
          content_match: contains
          content: umbrella

Notes:

expect.tool_calls asserts against the assistant message’s tool_calls.
expect.tool_results asserts against tool_messages returned by your agent in the same response (incantx does not execute tools).
To test multi-turn “tool → user followup” flows, include tool: entries in the transcript before the user: followup.

3. Writing the agent process

incantx runs your agent as a subprocess and communicates via JSON Lines over stdin/stdout:

incantx writes one request JSON object per line to stdin
your agent writes one response JSON object per line to stdout

Request shape (OpenAI Chat Completions style):

{
  "messages": [{ "role": "user", "content": "hi" }],
  "tools": [],
  "tool_choice": "auto",
  "model": "optional-model-id"
}

Response shape:

success: { "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages"?: [...] }
error: { "error": { "message": "..." } }

If your agent returns tool_calls, it can also return tool_messages in the same response so fixtures can assert on tool results.

JavaScript helper

incantx includes a helper that wraps the JSONL stdin/stdout loop, so you only write the nextTurn function:

import { runJsonlAgent } from "incantx/agent";

await runJsonlAgent(async (req) => {
  // your logic here
  return { message: { role: "assistant", content: "..." }, tool_messages: [] };
});

Python example agent script

Create src/incantx_agent.py and put your agent logic in a single function (next_turn below):

#!/usr/bin/env python3
import json
import sys
from typing import Any, Dict


def next_turn(req: Dict[str, Any]) -> Dict[str, Any]:
    # TODO: wire in your own agent logic here.
    #
    # Return either:
    #   { "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages": [...] }
    # or:
    #   { "error": { "message": "..." } }
    messages = req.get("messages", [])
    last = messages[-1] if messages else None
    user_text = last.get("content", "") if isinstance(last, dict) and last.get("role") == "user" else ""
    return {"message": {"role": "assistant", "content": f"hello: {user_text}"}}


def main() -> None:
    for line in sys.stdin:
        raw = line.strip()
        if not raw:
            continue
        try:
            req = json.loads(raw)
            res = next_turn(req)
        except Exception as e:
            res = {"error": {"message": str(e)}}
        sys.stdout.write(json.dumps(res) + "\n")
        sys.stdout.flush()


if __name__ == "__main__":
    main()

4. Running tests

Run a single fixture file or a directory:

incantx tests/fixtures --judge off

LLM judge modes:

--judge auto (default): run LLM expectations only if OPENAI_API_KEY is set; otherwise mark them SKIP
--judge off: never call an LLM judge
--judge on: require OPENAI_API_KEY (fail if missing)
--judge-model <model>: override judge model (defaults to gpt-4o-mini)

The CLI exits non-zero if any fixture fails.

5. Fixture reference

incantx fixtures are transcript-style YAML files. Each fixture runs the agent once and asserts on the next assistant message returned for that transcript.

File shape

system: You are a helpful assistant. # optional default system prompt for all fixtures in this file

agent:   # optional default agent config for all fixtures in this file
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: example
    transcript: [...]

`system` (default system prompt)

If set, system is prepended to each fixture’s transcript only when that fixture transcript contains no system: ... entries.

`agent` (subprocess)

agent can appear at the file level and/or per fixture (fixture-level overrides file-level).

Fields:

type: "subprocess" (optional, defaults to "subprocess")
command: string array, required (e.g. ["python3", "src/incantx_agent.py"])
cwd: string, optional
env: map of strings, optional (values support ${VAR} expansion from the current process env)
timeout_ms: number, optional (defaults to 20000)

`fixtures[]`

Each fixture:

id: string, required
agent: optional (same shape as above)
transcript: array of transcript entries, required

`transcript[]` entries

Each entry is a one-key object. The final entry must be expect: ....

Supported entries:

system: <string>
user: <string>
assistant: <string> (shorthand for { content: "..." })
assistant: { content?: <string>, tool_calls?: [...] }
tool: { name: <string>, tool_call_id?: <string>, json?: <any>, content?: <string> }
expect: { ... } (required, must be last)

Notes:

assistant.content defaults to "" if omitted.
tool must include either json (encoded as JSON into content) or content (raw string).
If a tool entry omits tool_call_id, incantx will infer it only when exactly one tool call id has appeared earlier in the transcript.

`assistant.tool_calls` (in transcript)

When you include tool calls in an assistant: transcript entry, incantx converts them into OpenAI-style tool_calls with JSON-stringified function.arguments.

Two supported syntaxes:

tool_calls:
  - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } }
  - { name: get_time, args: { tz: Europe/Dublin } }

or the short form:

tool_calls:
  - get_weather: { location: Dublin, unit: c }

If id is omitted, incantx generates call_1, call_2, ...

`expect` (assertions on the next assistant message)

expect asserts on the single assistant message returned by your agent for this transcript, plus any tool_messages your agent returned alongside it.

Fields:

content: string, optional
content_match: "contains" | "exact" (defaults to "contains" when content is provided)
tool_calls: list, optional
tool_calls_match: "contains" | "exact" (defaults to "contains" when tool_calls is provided)
tool_results: list, optional
tool_results_match: "contains" | "exact" (defaults to "contains" when tool_results is provided)
llm: string, optional (LLM-judged expectation; behavior depends on --judge)

`expect.tool_calls`

Matches against the agent’s returned message.tool_calls:

contains: each expected call must be present somewhere; extra calls allowed; order ignored
exact: lengths must match and entries are matched by index

Each expected tool call matches:

name vs tool_calls[].function.name
args as a JSON subset match vs parsed tool_calls[].function.arguments

Supported syntaxes:

tool_calls:
  - { name: get_weather, args: { location: Dublin } }
  - get_weather: { location: Dublin }

`expect.tool_results`

Matches against the agent’s returned tool_messages (same JSONL response). incantx does not execute tools.

Matching:

contains: each expected result must be present somewhere; extra results allowed; order ignored
exact: lengths must match and entries are matched by index

Each expected tool result can match by:

name and/or tool_call_id
content with optional content_match: contains|exact
content_json as a JSON subset match against parsed tool_messages[].content

Supported syntaxes:

tool_results:
  - { name: get_weather, content_json: { condition: Rain } }
  - get_weather: { condition: Rain }

`expect.llm` and `--judge`

--judge off: any fixture with expect.llm is marked SKIP
--judge auto: uses OpenAI judge only if OPENAI_API_KEY is set; otherwise SKIP
--judge on: requires OPENAI_API_KEY (run fails if missing)

1. Install

2. Writing a fixture

3. Writing the agent process

JavaScript helper

Python example agent script

4. Running tests

5. Fixture reference

File shape

system (default system prompt)

agent (subprocess)

fixtures[]

transcript[] entries

assistant.tool_calls (in transcript)

expect (assertions on the next assistant message)

expect.tool_calls

expect.tool_results

expect.llm and --judge