GitHub - incantx/incantx

7 min read Original article ↗

Note: this project is unreleased.

Test agent conversation flows (including tool calls) using transcript-style YAML fixtures.

1. Install

incantx runs on Node.js (recommended) and Bun.

  • Node.js: >= 18
  • Bun: >= 1.3

Install globally (choose one):

# Bun
bun add -g incantx

# npm
npm i -g incantx

Or install in a project:

Or run directly

# Bun
bunx incantx

# npm
npx incantx

2. Writing a fixture

Fixture files are YAML with:

  • optional file-level system: default system prompt (applies if a fixture transcript has no system: entries)
  • optional file-level agent: config
  • fixtures: list, each with:
    • id: ...
    • transcript: list of chat turns, ending with expect: (required)

Transcript entries are one of:

  • system: "..."
  • user: "..."
  • assistant: "..." or assistant: { content: "...", tool_calls: [...] }
  • tool: { name: "...", tool_call_id?: "...", json: {...} } (or content: "...")
  • expect: { ... } (required, must be last)

Single turn example:

system: You are a helpful assistant.

agent:
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: weather-requests-tool
    transcript:
      - user: What's the weather in Dublin?
      - expect:
          content_match: exact
          content: ""
          tool_calls_match: contains
          tool_calls:
            - get_weather: { location: Dublin, unit: c }
          tool_results_match: contains
          tool_results:
            - get_weather: { condition: Rain }

Multi-turn example (tool call + follow-up):

agent:
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: umbrella-followup
    transcript:
      - system: You are a helpful assistant.
      - user: What's the weather in Dublin?
      - assistant:
          content: ""
          tool_calls:
            - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } }
      - tool:
          tool_call_id: call_1
          name: get_weather
          json: { temp_c: 10, condition: Rain }
      - user: What should I bring?
      - expect:
          content_match: contains
          content: umbrella

Notes:

  • expect.tool_calls asserts against the assistant message’s tool_calls.
  • expect.tool_results asserts against tool_messages returned by your agent in the same response (incantx does not execute tools).
  • To test multi-turn “tool → user followup” flows, include tool: entries in the transcript before the user: followup.

3. Writing the agent process

incantx runs your agent as a subprocess and communicates via JSON Lines over stdin/stdout:

  • incantx writes one request JSON object per line to stdin
  • your agent writes one response JSON object per line to stdout

Request shape (OpenAI Chat Completions style):

{
  "messages": [{ "role": "user", "content": "hi" }],
  "tools": [],
  "tool_choice": "auto",
  "model": "optional-model-id"
}

Response shape:

  • success: { "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages"?: [...] }
  • error: { "error": { "message": "..." } }

If your agent returns tool_calls, it can also return tool_messages in the same response so fixtures can assert on tool results.

JavaScript helper

incantx includes a helper that wraps the JSONL stdin/stdout loop, so you only write the nextTurn function:

import { runJsonlAgent } from "incantx/agent";

await runJsonlAgent(async (req) => {
  // your logic here
  return { message: { role: "assistant", content: "..." }, tool_messages: [] };
});

Python example agent script

Create src/incantx_agent.py and put your agent logic in a single function (next_turn below):

#!/usr/bin/env python3
import json
import sys
from typing import Any, Dict


def next_turn(req: Dict[str, Any]) -> Dict[str, Any]:
    # TODO: wire in your own agent logic here.
    #
    # Return either:
    #   { "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages": [...] }
    # or:
    #   { "error": { "message": "..." } }
    messages = req.get("messages", [])
    last = messages[-1] if messages else None
    user_text = last.get("content", "") if isinstance(last, dict) and last.get("role") == "user" else ""
    return {"message": {"role": "assistant", "content": f"hello: {user_text}"}}


def main() -> None:
    for line in sys.stdin:
        raw = line.strip()
        if not raw:
            continue
        try:
            req = json.loads(raw)
            res = next_turn(req)
        except Exception as e:
            res = {"error": {"message": str(e)}}
        sys.stdout.write(json.dumps(res) + "\n")
        sys.stdout.flush()


if __name__ == "__main__":
    main()

4. Running tests

Run a single fixture file or a directory:

incantx tests/fixtures --judge off

LLM judge modes:

  • --judge auto (default): run LLM expectations only if OPENAI_API_KEY is set; otherwise mark them SKIP
  • --judge off: never call an LLM judge
  • --judge on: require OPENAI_API_KEY (fail if missing)
  • --judge-model <model>: override judge model (defaults to gpt-4o-mini)

The CLI exits non-zero if any fixture fails.

5. Fixture reference

incantx fixtures are transcript-style YAML files. Each fixture runs the agent once and asserts on the next assistant message returned for that transcript.

File shape

system: You are a helpful assistant. # optional default system prompt for all fixtures in this file

agent:   # optional default agent config for all fixtures in this file
  type: subprocess
  command: ["python3", "src/incantx_agent.py"]

fixtures:
  - id: example
    transcript: [...]

system (default system prompt)

If set, system is prepended to each fixture’s transcript only when that fixture transcript contains no system: ... entries.

agent (subprocess)

agent can appear at the file level and/or per fixture (fixture-level overrides file-level).

Fields:

  • type: "subprocess" (optional, defaults to "subprocess")
  • command: string array, required (e.g. ["python3", "src/incantx_agent.py"])
  • cwd: string, optional
  • env: map of strings, optional (values support ${VAR} expansion from the current process env)
  • timeout_ms: number, optional (defaults to 20000)

fixtures[]

Each fixture:

  • id: string, required
  • agent: optional (same shape as above)
  • transcript: array of transcript entries, required

transcript[] entries

Each entry is a one-key object. The final entry must be expect: ....

Supported entries:

  • system: <string>
  • user: <string>
  • assistant: <string> (shorthand for { content: "..." })
  • assistant: { content?: <string>, tool_calls?: [...] }
  • tool: { name: <string>, tool_call_id?: <string>, json?: <any>, content?: <string> }
  • expect: { ... } (required, must be last)

Notes:

  • assistant.content defaults to "" if omitted.
  • tool must include either json (encoded as JSON into content) or content (raw string).
  • If a tool entry omits tool_call_id, incantx will infer it only when exactly one tool call id has appeared earlier in the transcript.

assistant.tool_calls (in transcript)

When you include tool calls in an assistant: transcript entry, incantx converts them into OpenAI-style tool_calls with JSON-stringified function.arguments.

Two supported syntaxes:

tool_calls:
  - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } }
  - { name: get_time, args: { tz: Europe/Dublin } }

or the short form:

tool_calls:
  - get_weather: { location: Dublin, unit: c }

If id is omitted, incantx generates call_1, call_2, ...

expect (assertions on the next assistant message)

expect asserts on the single assistant message returned by your agent for this transcript, plus any tool_messages your agent returned alongside it.

Fields:

  • content: string, optional
  • content_match: "contains" | "exact" (defaults to "contains" when content is provided)
  • tool_calls: list, optional
  • tool_calls_match: "contains" | "exact" (defaults to "contains" when tool_calls is provided)
  • tool_results: list, optional
  • tool_results_match: "contains" | "exact" (defaults to "contains" when tool_results is provided)
  • llm: string, optional (LLM-judged expectation; behavior depends on --judge)

expect.tool_calls

Matches against the agent’s returned message.tool_calls:

  • contains: each expected call must be present somewhere; extra calls allowed; order ignored
  • exact: lengths must match and entries are matched by index

Each expected tool call matches:

  • name vs tool_calls[].function.name
  • args as a JSON subset match vs parsed tool_calls[].function.arguments

Supported syntaxes:

tool_calls:
  - { name: get_weather, args: { location: Dublin } }
  - get_weather: { location: Dublin }

expect.tool_results

Matches against the agent’s returned tool_messages (same JSONL response). incantx does not execute tools.

Matching:

  • contains: each expected result must be present somewhere; extra results allowed; order ignored
  • exact: lengths must match and entries are matched by index

Each expected tool result can match by:

  • name and/or tool_call_id
  • content with optional content_match: contains|exact
  • content_json as a JSON subset match against parsed tool_messages[].content

Supported syntaxes:

tool_results:
  - { name: get_weather, content_json: { condition: Rain } }
  - get_weather: { condition: Rain }

expect.llm and --judge

  • --judge off: any fixture with expect.llm is marked SKIP
  • --judge auto: uses OpenAI judge only if OPENAI_API_KEY is set; otherwise SKIP
  • --judge on: requires OPENAI_API_KEY (run fails if missing)