Xorq is an executable memory system for tabular data work. Xorq gives agents a catalog of executable pipelines instead of markdown notes. It turns ephemeral agent work such as pandas scripts, sklearn pipelines, ad-hoc tables into durable, composable, executable artifacts that any future agent or human can discover, reproduce and reuse.
It comes with a CLI for agents and a TUI for humans with a git-native catalog.

The Problem
Coding agents are great at accomplishing closed-loop task but in the process
accumulate tech-debt and unnecessary complexity. For example, if you ask a
coding agent to build a dashboard, you are more likely than not to get a folder
of one-off Python scripts that import each other in non-obvious ways, an
embedded JSON holding intermediate state, and a requirements.txt that was
last regenerated two sessions ago. It may also execute end-to-end on your
laptop. Verifying by reproducing on another machine, or productionizing any of
it, means rewriting some of it. And every time you rewrite, more complexity
gets introduced.
| Pain | Symptom |
|---|---|
| Imperative, stateful artifacts | An agent run leaves you with a folder of .py, .json, and .html files. Reproducing the result means re-running them in the right order without a declarative spec |
| No discoverable, shared index | "Team memory" today is ~/.claude/memory/*.md, with a MEMORY.md index of one-liners pointing to the notes. There's no executable catalog two agents can both pull into context |
| No lineage graph | Rename a column upstream and a downstream model breaks at runtime. The dependency lived only in chat history, not in a graph that could have flagged it before it shipped. |
| No portable environment | A pipeline that ran in one agent session has no path to another sandbox, your machine, or production. |
Two ways to start
With an agent. Install the Xorq plugin in Claude Code and let it build catalogs for you:
/plugin marketplace add xorq-labs/claude-plugins
/plugin install xorq@xorq-plugins
The plugin adds four slash commands:
/xorq:init— load CSV or Parquet files as catalog entries/xorq:catalog-explore— browse what's already in a catalog/xorq:composer— combine entries into new joined/aliased entries/xorq:builder— assemble ML pipelines and semantic-layer entries
The agent does the building; you keep the catalog.
Manually. Install the library and start composing expressions in Python:
❯ pip install xorq[examples] ❯ xorq init -t penguins
Design choices
| Choice | What it enables |
|---|---|
| Ibis as expression system | Declarative dataframe expressions that compile to many engines. |
| Git for state and storage | The catalog is a git repo of entries with git-annex support for large files |
| uv for reproducible environments | Each entry ships with a wheel and pinned requirements.txt. |
| DataFusion for embedded compute | Pipelines execute in-process SQL and UDF execution |
| Arrow for IPC and network | Operators exchange Arrow RecordBatches |
Supported engines
The same expression can run against any of these backends, and into_backend
moves data between them.
| Category | Engines |
|---|---|
| Embedded | DataFusion, DuckDB, SQLite, pandas |
| Warehouses | Snowflake, Databricks, Trino, Postgres |
| Lakehouse | PyIceberg |
| Arrow Flight | GizmoSQL |
Comparison
A Xorq memory is a computation you reason about by its invariants (schema, lineage, content hash, deterministic execution), the way you reason about a matrix by its properties rather than its entries.
| Approach | Memory item | Answer produced by | Provenance & reproducibility |
|---|---|---|---|
| Agent memory (Mem0, etc) | Markdown snippets | LLM reading the prompt | None |
| MCP / open context servers | Tool bindings | Tool at runtime; LLM consumes as text | Per-tool |
| dbt | SQL model files | Warehouse executing compiled SQL | manifest.json captures lineage; env (warehouse, packages) pinned externally |
| Xorq | Content-addressed expression + pinned env | Engine executing the expression | expr.yaml + uv-pinned env shipped with the artifact |
Benchmark
On DABStep — 450 data-analysis questions over payment transaction data — a Xorq semantic catalog of 33 named expressions takes Haiku from 50% to 84%, 8pp above the Sonnet baseline.
Where the agent looks for context mattered more than which base model it used. Full write-up: Orientation Over Reasoning.
Under the hood
The Expression — declarative Ibis, multi-engine, Arrow-native
Write declarative Ibis expressions that run like a tool. Xorq extends Ibis with
caching, multi-engine execution, and UDFs. Below, xo._ is the Ibis row
reference — xo._.species refers to the species column of the current table.
import xorq.api as xo from xorq.caching import ParquetCache penguins = xo.examples.penguins.fetch() penguins_agg = ( penguins .filter(xo._.species.notnull()) .group_by("species") .agg(avg_bill_length=xo._.bill_length_mm.mean()) ) expr = ( penguins_agg .cache(ParquetCache.from_kwargs()) )
One expression, many engines
expr = penguins.into_backend(xo.sqlite.connect()) expr.ls.backends
(<xorq.backends.sqlite.Backend at 0x107debda0>,
<xorq.backends.xorq_datafusion.Backend at 0x1669002c0>)
Expressions are tools, Arrow is the pipe
Unix pipes text streams between small programs. Xorq pipes Arrow streams between expressions.
unix : programs :: xorq : arrow-transforms
In [6]: expr.to_pyarrow_batches()
Out[6]: <pyarrow.lib.RecordBatchReader at 0x15dc3f570>
Workflows, without state
Xorq executes expressions as Arrow RecordBatch streams — no DAG of tasks to checkpoint, just data flowing through transforms.
Scikit-learn pipelines
Xorq translates scikit-learn Pipeline objects to deferred expressions via
Pipeline.from_instance(sklearn_pipeline). End-to-end sklearn examples live in
xorq-labs/xorq-gallery.
The Catalog — a git repo of build artifacts on the filesystem
The catalog is a git repo of build artifacts on filesystem. xorq catalog add
packages a build directory -- manifest (expr.yaml + *_metadata.json),
Python environment via uv -- into an entry.
Build and add
❯ xorq uv build expr.py Building wheel... Successfully built ... builds/fa2122f6a9e9 ❯ xorq catalog -p git-catalogs/penguins init Initialized catalog at /git-catalogs/penguins ❯ xorq catalog add builds/fa2122f6a9e9/ -a penguins-agg Added fa2122f6a9e9
Git history
Every catalog operation is a commit you can read:
❯ git -C git-catalogs/penguins reflog
17dd4e9 (HEAD -> main) HEAD@{0}: add: fa2122f6a9e9 (aliases penguins-agg)
9f5d242 HEAD@{1}: add catalog.yaml
9915df3 HEAD@{2}: commit: Switching to main
Catalog layout
❯ tree git-catalogs/penguins
git-catalogs/penguins
├── aliases
│ └── penguins-agg.zip -> ../entries/fa2122f6a9e9.zip
├── entries
│ └── fa2122f6a9e9.zip
├── metadata
│ └── fa2122f6a9e9.zip.metadata.yaml
└── catalog.yaml
Aliases are symlinks, entries are zipped builds, and metadata sidecars are plain YAML. An agent that clones the repo can discover everything with file operations — no service to call, no API to learn:
# List aliased entries ❯ ls git-catalogs/penguins/aliases/ # Find entries that emit an 'avg_bill_length' column ❯ grep -l 'avg_bill_length' git-catalogs/penguins/metadata/*.yaml # Find entries running on DataFusion ❯ grep -l 'xorq_datafusion' git-catalogs/penguins/metadata/*.yaml # Find source entries (vs. unbound, expr_builder kinds) ❯ grep -l 'kind: source' git-catalogs/penguins/metadata/*.yaml
Inside an entry
A build directory contains the manifest plus everything needed to reproduce it. The zipped build is the entry stored in the catalog.
❯ tree builds/fa2122f6a9e9
├── build_metadata.json
├── expr.yaml
├── expr_metadata.json
├── profiles.yaml
├── requirements.txt
└── xorq-0.3.24-py3-none-any.whl
The manifest (expr.yaml + *_metadata.json) is the content-addressed
specification of the pipeline. The entry packages it with deps and source
for reproducible execution.
# Input-addressed, composable, portable # Abridged expr.yaml definitions: nodes: '@read_b5f228c91f16': op: Read method_name: read_parquet name: penguins read_kwargs: - [hash_path, .../penguins/20250703T145709Z-c3cde/penguins.parquet] - [table_name, penguins] schema_ref: schema_f11dda6745cc '@filter_fa4a3fde7765': op: Filter parent: { node_ref: '@read_b5f228c91f16' } predicates: - { op: NotNull, arg: { op: Field, name: species, ... } } '@aggregate_eb3109707390': op: Aggregate parent: { node_ref: '@filter_fa4a3fde7765' } by: species: { op: Field, name: species, ... } metrics: avg_bill_length: op: Mean arg: { op: Field, name: bill_length_mm, ... } '@cachednode_fa2122f6a9e9': op: CachedNode parent: { node_ref: '@aggregate_eb3109707390' } cache: type: ParquetCache relative_path: parquet schema_ref: schema_9271d5e9d443 expression: node_ref: '@cachednode_fa2122f6a9e9' schema_ref: { schema_ref: schema_9271d5e9d443 }
The Tools — catalog, run, serve
The entry is the unit of executable memory that includes the manifest plus environment to run it. The tools — catalog, run, serve — are how agents and humans compose with it.
Catalog
Once an entry is published, agents discover it straight from the catalog
filesystem — metadata/*.yaml sidecars sit next to the zipped entries, so
listing, filtering, and lookup-by-alias/hash all work with plain file reads
and git (no service required). Humans open the TUI to preview data,
schema, lineage, and git history side-by-side.
❯ xorq catalog list-aliases penguins-agg ❯ xorq catalog list fa2122f6a9e9
Run
❯ xorq run builds/fa2122f6a9e9 -o out.parquet
Additionally, you can serve an unbound expression over Arrow Flight. with xorq serve-* commands.
Learn more
- Quickstart
- Why xorq?
- Claude Code plugin
- Scikit-learn
- A Git-Native Semantic Layer — building a portable semantic catalog with Xorq
- Orientation Over Reasoning — Haiku + Xorq catalog hits 84% on DABStep, above the Sonnet baseline
Pre-1.0. Expect breaking changes with migration guides.
