Note: this project is unreleased.
Test agent conversation flows (including tool calls) using transcript-style YAML fixtures.
1. Install
incantx runs on Node.js (recommended) and Bun.
- Node.js: >= 18
- Bun: >= 1.3
Install globally (choose one):
# Bun bun add -g incantx # npm npm i -g incantx
Or install in a project:
Or run directly
# Bun bunx incantx # npm npx incantx
2. Writing a fixture
Fixture files are YAML with:
- optional file-level
system:default system prompt (applies if a fixture transcript has nosystem:entries) - optional file-level
agent:config fixtures:list, each with:id: ...transcript:list of chat turns, ending withexpect:(required)
Transcript entries are one of:
system: "..."user: "..."assistant: "..."orassistant: { content: "...", tool_calls: [...] }tool: { name: "...", tool_call_id?: "...", json: {...} }(orcontent: "...")expect: { ... }(required, must be last)
Single turn example:
system: You are a helpful assistant. agent: type: subprocess command: ["python3", "src/incantx_agent.py"] fixtures: - id: weather-requests-tool transcript: - user: What's the weather in Dublin? - expect: content_match: exact content: "" tool_calls_match: contains tool_calls: - get_weather: { location: Dublin, unit: c } tool_results_match: contains tool_results: - get_weather: { condition: Rain }
Multi-turn example (tool call + follow-up):
agent: type: subprocess command: ["python3", "src/incantx_agent.py"] fixtures: - id: umbrella-followup transcript: - system: You are a helpful assistant. - user: What's the weather in Dublin? - assistant: content: "" tool_calls: - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } } - tool: tool_call_id: call_1 name: get_weather json: { temp_c: 10, condition: Rain } - user: What should I bring? - expect: content_match: contains content: umbrella
Notes:
expect.tool_callsasserts against the assistant message’stool_calls.expect.tool_resultsasserts againsttool_messagesreturned by your agent in the same response (incantx does not execute tools).- To test multi-turn “tool → user followup” flows, include
tool:entries in the transcript before theuser:followup.
3. Writing the agent process
incantx runs your agent as a subprocess and communicates via JSON Lines over stdin/stdout:
- incantx writes one request JSON object per line to stdin
- your agent writes one response JSON object per line to stdout
Request shape (OpenAI Chat Completions style):
{
"messages": [{ "role": "user", "content": "hi" }],
"tools": [],
"tool_choice": "auto",
"model": "optional-model-id"
}Response shape:
- success:
{ "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages"?: [...] } - error:
{ "error": { "message": "..." } }
If your agent returns tool_calls, it can also return tool_messages in the same response so fixtures can assert on tool results.
JavaScript helper
incantx includes a helper that wraps the JSONL stdin/stdout loop, so you only write the nextTurn function:
import { runJsonlAgent } from "incantx/agent"; await runJsonlAgent(async (req) => { // your logic here return { message: { role: "assistant", content: "..." }, tool_messages: [] }; });
Python example agent script
Create src/incantx_agent.py and put your agent logic in a single function (next_turn below):
#!/usr/bin/env python3 import json import sys from typing import Any, Dict def next_turn(req: Dict[str, Any]) -> Dict[str, Any]: # TODO: wire in your own agent logic here. # # Return either: # { "message": { "role": "assistant", "content": "...", "tool_calls": [...] }, "tool_messages": [...] } # or: # { "error": { "message": "..." } } messages = req.get("messages", []) last = messages[-1] if messages else None user_text = last.get("content", "") if isinstance(last, dict) and last.get("role") == "user" else "" return {"message": {"role": "assistant", "content": f"hello: {user_text}"}} def main() -> None: for line in sys.stdin: raw = line.strip() if not raw: continue try: req = json.loads(raw) res = next_turn(req) except Exception as e: res = {"error": {"message": str(e)}} sys.stdout.write(json.dumps(res) + "\n") sys.stdout.flush() if __name__ == "__main__": main()
4. Running tests
Run a single fixture file or a directory:
incantx tests/fixtures --judge off
LLM judge modes:
--judge auto(default): run LLM expectations only ifOPENAI_API_KEYis set; otherwise mark themSKIP--judge off: never call an LLM judge--judge on: requireOPENAI_API_KEY(fail if missing)--judge-model <model>: override judge model (defaults togpt-4o-mini)
The CLI exits non-zero if any fixture fails.
5. Fixture reference
incantx fixtures are transcript-style YAML files. Each fixture runs the agent once and asserts on the next assistant message returned for that transcript.
File shape
system: You are a helpful assistant. # optional default system prompt for all fixtures in this file agent: # optional default agent config for all fixtures in this file type: subprocess command: ["python3", "src/incantx_agent.py"] fixtures: - id: example transcript: [...]
system (default system prompt)
If set, system is prepended to each fixture’s transcript only when that fixture transcript contains no system: ... entries.
agent (subprocess)
agent can appear at the file level and/or per fixture (fixture-level overrides file-level).
Fields:
type:"subprocess"(optional, defaults to"subprocess")command: string array, required (e.g.["python3", "src/incantx_agent.py"])cwd: string, optionalenv: map of strings, optional (values support${VAR}expansion from the current process env)timeout_ms: number, optional (defaults to 20000)
fixtures[]
Each fixture:
id: string, requiredagent: optional (same shape as above)transcript: array of transcript entries, required
transcript[] entries
Each entry is a one-key object. The final entry must be expect: ....
Supported entries:
system: <string>user: <string>assistant: <string>(shorthand for{ content: "..." })assistant: { content?: <string>, tool_calls?: [...] }tool: { name: <string>, tool_call_id?: <string>, json?: <any>, content?: <string> }expect: { ... }(required, must be last)
Notes:
assistant.contentdefaults to""if omitted.toolmust include eitherjson(encoded as JSON intocontent) orcontent(raw string).- If a
toolentry omitstool_call_id, incantx will infer it only when exactly one tool call id has appeared earlier in the transcript.
assistant.tool_calls (in transcript)
When you include tool calls in an assistant: transcript entry, incantx converts them into OpenAI-style tool_calls with JSON-stringified function.arguments.
Two supported syntaxes:
tool_calls: - { id: call_1, name: get_weather, args: { location: Dublin, unit: c } } - { name: get_time, args: { tz: Europe/Dublin } }
or the short form:
tool_calls: - get_weather: { location: Dublin, unit: c }
If id is omitted, incantx generates call_1, call_2, ...
expect (assertions on the next assistant message)
expect asserts on the single assistant message returned by your agent for this transcript, plus any tool_messages your agent returned alongside it.
Fields:
content: string, optionalcontent_match:"contains"|"exact"(defaults to"contains"whencontentis provided)tool_calls: list, optionaltool_calls_match:"contains"|"exact"(defaults to"contains"whentool_callsis provided)tool_results: list, optionaltool_results_match:"contains"|"exact"(defaults to"contains"whentool_resultsis provided)llm: string, optional (LLM-judged expectation; behavior depends on--judge)
expect.tool_calls
Matches against the agent’s returned message.tool_calls:
contains: each expected call must be present somewhere; extra calls allowed; order ignoredexact: lengths must match and entries are matched by index
Each expected tool call matches:
namevstool_calls[].function.nameargsas a JSON subset match vs parsedtool_calls[].function.arguments
Supported syntaxes:
tool_calls: - { name: get_weather, args: { location: Dublin } } - get_weather: { location: Dublin }
expect.tool_results
Matches against the agent’s returned tool_messages (same JSONL response). incantx does not execute tools.
Matching:
contains: each expected result must be present somewhere; extra results allowed; order ignoredexact: lengths must match and entries are matched by index
Each expected tool result can match by:
nameand/ortool_call_idcontentwith optionalcontent_match: contains|exactcontent_jsonas a JSON subset match against parsedtool_messages[].content
Supported syntaxes:
tool_results: - { name: get_weather, content_json: { condition: Rain } } - get_weather: { condition: Rain }
expect.llm and --judge
--judge off: any fixture withexpect.llmis markedSKIP--judge auto: uses OpenAI judge only ifOPENAI_API_KEYis set; otherwiseSKIP--judge on: requiresOPENAI_API_KEY(run fails if missing)