Concise bracket syntax
Pipeline operations chain left-to-right inside [ ], in the
order they execute. No SQL keywords, no macro magic, no implicit coercions.
Backtick-quoted names handle columns with dots or spaces.
Ibex
Ibex is a statically typed language for columnar DataFrame and
TimeFrame manipulation. Write concise bracket-pipeline queries —
filter, aggregate, roll, resample, reshape, join, rename — and run them in the fast
interpreter or compile them to standalone C++ binaries. Wire sources and sinks
into real-time Stream pipelines — including WebSocket
servers for live browser dashboards — with no extra syntax.
1.3× faster than Polars on aggregation 5× faster than Pandas 10–20× faster on rolling windows
Pipeline operations chain left-to-right inside [ ], in the
order they execute. No SQL keywords, no macro magic, no implicit coercions.
Backtick-quoted names handle columns with dots or spaces.
as_timeframe promotes a DataFrame to a time-indexed structure
with O(n) sort detection. Rolling windows use a two-pointer O(n) scan with
a single result-column allocation per call — no copies, no heap churn.
ibex_compile transpiles any .ibex script to
idiomatic C++ using the ibex::ops::* library. Compiled and
interpreted outputs are behaviour-equivalent; both run near peak throughput.
Optional effects { ... } annotations on fn and
extern fn let Ibex check side-effect assumptions early.
Extern parameters can also declare call semantics with
const, mutable, or consume.
There is no optimizer pass yet; this is currently a safety check.
No implicit Int ↔ Float
coercion. Explicit cast constructors (Int64(x),
Float64(x)) and a five-mode round()
built-in — including banker’s rounding — cover every numeric
conversion. Optional annotations are validated when present.
Eight built-in rand_* functions — covering normal,
uniform, gamma, exponential, Bernoulli, Poisson, Student-t, and integer
distributions — generate one draw per row in a single vectorized
pass with no locking between threads.
Language tour
A complete analytical pipeline in a handful of composable clauses.
Load & Filter
extern fn declares a C++ data-source function as a
first-class Ibex binding. The compiler resolves it at link time;
the REPL loads the corresponding plugin .so at runtime.
Filter expressions support arithmetic, comparisons, and boolean
logic (and, or, not).
Multiple clauses chain in reading order.
Output plugins mirror the input API: write_csv,
write_json, and write_parquet accept a
DataFrame and a path and return the number of rows written.
extern fn read_csv(path: String) -> DataFrame
from "csv.hpp";
extern fn read_json(path: String) -> DataFrame
from "json.hpp";
let prices = read_csv("prices.csv");
let config = read_json("config.json");
// Keep rows where price exceeds 1.0
let active = prices[filter price > 1.0];
// Chain filter with a column projection
prices[
filter price > 1.0,
select { symbol, price, volume }
];
// Write results back to disk
extern fn write_csv(df: DataFrame, path: String) -> Int
from "csv.hpp";
extern fn write_json(df: DataFrame, path: String) -> Int
from "json.hpp";
extern fn write_parquet(df: DataFrame, path: String) -> Int
from "parquet.hpp";
let n = write_csv(active, "out/active.csv");
write_json(active, "out/active.json");
write_parquet(active, "out/active.parquet");
Inline Tables
The Table { … } constructor builds
a DataFrame column-by-column. Each column value is either
an inline array literal [v, v, …]
or any Ibex expression that produces a table — no file or
plugin declaration required.
For expression columns, the runtime extracts the column from the result table: a single-column result is used directly; a multi-column result is matched by column name. Literal and expression columns may be freely mixed.
All columns must have equal row counts. The result is a first-class
DataFrame: chain operations, promote to
as_timeframe, or join with another table.
// Array-literal columns
let t = Table {
symbol = ["AAPL", "GOOG", "MSFT"],
price = [150.0, 140.0, 300.0],
};
// Columns from existing table expressions
let summary = Table {
symbol = prices[select { symbol }],
high = prices[select { high = max(price) }, by symbol],
low = prices[select { low = min(price) }, by symbol],
};
// Mix literals and expressions
let ref = Table {
label = ["open", "close"],
value = ohlc[select { value = mean(price) }],
};
// Promote to TimeFrame
let tf = as_timeframe(
Table { ts = [1000, 2000, 3000], price = [10, 20, 30] },
"ts"
);
Aggregation
select doubles as a projection and an aggregation clause.
Add by to group; omit it for a global aggregate.
Aggregate functions: first, last,
sum, mean, min, max,
count, median, std,
ewma. All skip null rows by default.
// Mean sepal length and row count, by species
iris[select {
mean_sl = mean(`Sepal.Length`),
n = count()
}, by Species];
// OHLC per symbol — all in one pass
prices[select {
open = first(price),
high = max(price),
low = min(price),
close = last(price),
traded = sum(volume)
}, by symbol];
Grouped Update
update + by evaluates each expression
per group and writes the result back to every row in
that group — no row reduction, no separate join step. This is
the equivalent of a SQL window function with
PARTITION BY and no frame clause.
Use it to attach group statistics (mean, z-score, rank) alongside the original columns without losing any rows.
// Attach species mean to every individual row
iris[update { group_mean = mean(`Sepal.Length`) }, by Species];
// Z-score within each symbol group
prices[update {
z = (price - mean(price)) / std(price)
}, by symbol];
// Combine grouped update with filter — keeps all original rows
prices[
update { avg = mean(price) }, by symbol,
filter price > avg
];
Derived Columns
update appends named computed columns to every row.
All existing columns pass through untouched. A
tuple left-hand side unpacks a multi-column result
in one assignment. The bare update = expr form merges
all columns from a table-returning expression.
// Enrich every row with return, range, and notional
ohlcv[update {
ret = (close - open) / open,
range = (high - low) / open,
notional = close * volume
}];
// Unpack a multi-column result with a tuple LHS
df[update { (delta, gamma) = compute_greeks() }];
// Merge all columns from a table-returning function
prices[update = gen_prices(symbols)];
// Chain an update with an aggregation
let daily = ohlcv[update { ret = (close - open) / open }];
daily[select {
avg_ret = mean(ret),
n_sessions = count()
}, by sector]
[order { avg_ret desc }];
Rename
rename maps old column names to new ones while keeping
every other column intact. Use the single-name shorthand or the
braced multi-rename form. Column order is preserved.
Because rename runs before select in
canonical order, renamed columns are immediately visible to the
rest of the pipeline.
// Single rename — no braces needed
trades[rename p = price];
// Rename multiple columns at once
trades[rename { p = price, q = volume }];
// Rename then select using the new name
trades[
filter price > 15,
rename p = price,
select { p, volume }
];
// Works with update and aggregation too
trades[
rename { sym = symbol, vol = volume },
select { total_vol = sum(vol) }, by sym
];
Order & Distinct
order accepts a single column, a multi-key block with
asc / desc annotations, or no
argument at all to sort by every column in schema order.
distinct deduplicates on a single column or a set.
// Sort by a single key (ascending by default)
iris[order `Sepal.Length`];
// Multi-key sort with explicit directions
results[order { avg_ret desc, symbol asc }];
// Sort by all columns in schema order
iris[order];
// Unique species names
iris[distinct Species];
// Unique (species, length) pairs
iris[distinct { Species, `Sepal.Length` }];
Joins
The as-of join attaches the latest right row at-or-before each left timestamp. It is the standard pattern for enriching tick data with bar data, without look-ahead bias.
Join keys are named with on. When the expression after
on is a comparison or boolean expression rather than a
column name, it becomes a non-equijoin predicate
(theta join) evaluated for every pair of rows.
Both tables must be TimeFrames for an asof join.
// Inner join — drop non-matching rows
let enriched = daily join fund on symbol;
// Left join — preserve all left rows
let with_meta = prices left join metadata on symbol;
// Right join — preserve all right rows
let with_scores = prices right join scores on symbol;
// Outer join — preserve rows from both sides
let unioned = prices outer join metadata on symbol;
// Semi join — keep left rows that have a match
let matched = prices semi join metadata on symbol;
// Anti join — keep left rows that have no match
let missing = prices anti join metadata on symbol;
// Cross join — cartesian product
let cartesian = prices cross join metadata;
// Non-equijoin — pair rows where a column satisfies an inequality
let events = ticks join windows on ts >= start && ts < end;
// As-of join — each tick gets the latest bar at or before ts
let tf = as_timeframe(ticks, "ts");
let bars = as_timeframe(bars_1m, "ts");
tf asof join bars on ts;
Null Handling
Every column carries an Arrow-compatible validity bitmap alongside
its value buffer. Null propagates through arithmetic and
comparisons: null * x = null,
null > 5 = null.
filter silently drops rows where the predicate is null.
Use is null / is not null to inspect the
validity bitmap directly — these predicates always return a
valid Bool, never null. Three-valued boolean operators
follow SQL rules: true OR null = true,
false AND null = false.
Nulls arise from joins with unmatched rows (left join: unmatched
right columns; right join: unmatched left columns; outer join:
both sides), dcast (missing pivot combinations), and certain
aggregate functions (std on fewer than 2 non-null
values, ewma on an empty group,
skew on fewer than 3, kurtosis on
fewer than 4).
// Left join introduces nulls for unmatched rows
let enriched = employees left join departments on dept_id;
// Test the validity bitmap with IS NULL / IS NOT NULL
enriched[filter { dept_name is null }];
enriched[filter { dept_name is not null }];
// Null propagates through arithmetic
// bonus is null wherever budget is null
enriched[select { name, bonus = salary + budget }];
// 3VL: true OR null = true; keeps the row
enriched[filter { dept_name is not null || salary > 80000 }];
// Aggregates skip null rows automatically
enriched[select { avg_sal = mean(salary) }, by dept_name];
Scalar Functions
Ibex ships a small set of built-in scalar functions that apply
element-wise to a column. Math: abs, log,
sqrt. Date/time extraction: year,
month, day, hour,
minute, second.
All scalar functions operate on Numeric or
Date | Timestamp columns and produce a new
column of the return type. Use them inside update,
select, or filter expressions.
// Math functions
trades[update {
log_ret = log(price / lag(price, 1)),
vol = sqrt(variance),
notional = abs(pnl)
}];
// Date / time extraction
trades[update {
yr = year(date),
mo = month(date),
dy = day(date)
}];
trades[update { hr = hour(timestamp) }];
// Use in filter
trades[filter year(date) = 2024];
Type System
Ibex never coerces Int ↔ Float
implicitly. Use type-name constructors — Int64(x),
Float64(x), etc. — for explicit numeric casts. They
work on scalars and on entire columns.
Float → Int casts require the value to be a
whole number. Use round(x, mode) first when the
value may have a fractional part. Five modes are available:
nearest (ties away from zero), bankers
(ties to even — statistically unbiased), floor,
ceil, trunc.
Type annotations on let bindings and function
parameters are optional. When present, they are validated:
scalar types must match exactly; schema annotations require all
declared columns to be present with the correct types.
// Explicit scalar casts
let n: Int64 = Int64(3.0); // ok — 3.0 is a whole number
let f: Float64 = Float64(42); // ok — Int → Float always succeeds
// Int64(3.9) ← runtime error: fractional part
// Round a Float column to Int
prices[update { vol_int = round(volume_f, nearest) }];
// All five rounding modes
round(3.7, nearest) // → 4 (ties away from zero)
round(2.5, nearest) // → 3 (tie → away from zero)
round(2.5, bankers) // → 2 (tie → nearest even)
round(3.5, bankers) // → 4 (tie → nearest even)
round(3.7, floor) // → 3 (toward −∞)
round(3.7, ceil) // → 4 (toward +∞)
round(3.7, trunc) // → 3 (toward zero)
// Column-level cast after rounding
prices[update { vol_int = Int64(round(volume_f, nearest)) }];
// Annotation validation
let y: Float64 = Float64(42); // explicit cast satisfies annotation
Vectorized RNG
Eight built-in rand_* functions each produce one
independent draw per row — no row-by-row loop, no
user-space scalar overhead. Each thread has its own
mt19937_64 seeded from std::random_device,
so parallel queries produce independent streams without locking.
Continuous distributions return Float64:
rand_uniform, rand_normal,
rand_student_t, rand_gamma,
rand_exponential.
Discrete distributions return Int64:
rand_bernoulli, rand_poisson,
rand_int.
// Gaussian noise column
df[update { noise = rand_normal(0.0, 1.0) }];
// Uniform weight, biased coin, die roll — all in one pass
df[update {
w = rand_uniform(0.0, 1.0),
flip = rand_bernoulli(0.7),
die = rand_int(1, 6)
}];
// Simulate inter-arrival times and Poisson counts
df[update {
wait = rand_exponential(2.5),
events = rand_poisson(4.0)
}];
rep & Boolean Masks
rep(x) mirrors R’s rep(): it fills
a column by repeating a scalar literal or cycling an existing column.
Named arguments — times, each, and
length_out — control the exact repetition pattern.
Passing a Bool literal produces a first-class
Column<Bool> (a boolean mask), useful as a
computed flag or validity indicator alongside numeric columns.
// Constant-fill: all rows set to zero
df[update { zero = rep(0) }];
// All-true boolean mask
df[update { active = rep(true) }];
// Repeat each element of a column twice
df[update { rep2 = rep(price, each=2) }];
// Named args: times, each, length_out
df[update { flag = rep(flag_col, times=50) }];
Cumulative Functions
cumsum(col) and cumprod(col) compute
prefix sums and products over any Int or
Float column. Both return the same type as their input
and are valid in select and update blocks
— DataFrame or TimeFrame — with or without a
window clause.
Typical use: cumulative P&L, compounded returns, or an index-rebased price series.
// Valid in select (produces only the cumulative column)
df[select { cum_pnl = cumsum(pnl) }];
// Valid in update (adds column alongside existing ones)
df[update { cum_pnl = cumsum(pnl) }];
// Compounded return: product of (1 + daily_ret)
df[update {
growth = cumprod(1.0 + daily_ret)
}];
// Also valid on a TimeFrame (no window clause needed)
tf[update { running_vol = cumsum(volume) }];
Null Fill
Three built-in functions cover data.table’s
nafill semantics in a single O(n) pass:
fill_null(col, value) — constant fillfill_forward(col) — LOCF (last observation carried forward)fill_backward(col) — NOCB (next observation carried backward)
All three work on any column type (Int,
Float, String, …) in
select or update blocks.
Unfillable leading (LOCF) or trailing (NOCB) nulls remain null.
// Constant fill: replace null prices with 0
df[update { price = fill_null(price, 0) }];
// LOCF: carry last known price into gaps
df[update { price = fill_forward(price) }];
// NOCB: fill backwards from next available value
df[update { price = fill_backward(price) }];
// Chain: LOCF then fill remaining leading nulls with 0
df[update { price = fill_forward(price) }]
[update { price = fill_null(price, 0) }];
Rolling Windows
as_timeframe validates sort order in O(n) and records the
time-index column. window <dur> sets the
lookback; rolling functions (rolling_sum,
rolling_mean, rolling_count, lag)
use a two-pointer scan with no per-row heap allocation.
Duration literals: 1s, 30s, 1m,
5m, 1h, …
let tf = as_timeframe(ticks, "ts");
// Previous tick's price
tf[update { prev_price = lag(price, 1) }];
// Tick count in the last 60 seconds
tf[window 1m, update { ticks_1m = rolling_count() }];
// Multiple rolling aggregates in one pass
tf[window 5m, update {
sum_5m = rolling_sum(price),
mean_5m = rolling_mean(price)
}];
Resample
resample <dur> floors timestamps into
fixed-width intervals and reduces each bucket to one output row.
Combine with by for per-symbol bars.
The output TimeFrame carries the bucket start time as its
time index, ready for downstream joins or further resampling.
let tf = as_timeframe(ticks, "ts");
// 1-minute OHLC bars
let bars = tf[resample 1m, select {
open = first(price),
high = max(price),
low = min(price),
close = last(price)
}];
// Per-symbol 1-minute bars
tf[resample 1m, select {
open = first(price),
close = last(price)
}, by symbol];
// Enrich ticks with the latest bar's close
tf asof join bars on ts;
Reshape
melt converts a wide DataFrame to long format: id columns
stay fixed while measure columns are unpivoted into
variable / value rows.
Combine with select to choose specific measure columns.
dcast is the inverse — it pivots a long table back to
wide format. Distinct values of the pivot column become new column names.
Combine with by for row keys and select for the
value column. Missing cells are filled with null.
// Wide → long: unpivot all columns except symbol
ohlc[melt symbol];
// → symbol | variable | value
// AAPL | open | 150.0
// AAPL | high | 155.0
// ...
// Only unpivot specific columns
ohlc[melt symbol, select { open, close }];
// Long → wide: pivot back
long[dcast variable, select value, by symbol];
// → symbol | open | high | low | close
// Roundtrip: melt then dcast recovers original data
let long = wide[melt symbol];
let wide2 = long[dcast variable, select value, by symbol];
Scalar & Codegen
scalar pulls one typed value out of a single-cell result
table. It is available as a binding in subsequent expressions.
ibex_compile transpiles a .ibex file to a
self-contained C++ source file. The helper script compiles and links it
in one step.
// Pull a single value from an aggregate
let total = scalar(
prices[select { total = sum(price) }],
total
);
// Use it in subsequent expressions
prices[update { weight = price / total }];
# Compile and run in one step
scripts/ibex-run.sh examples/quant.ibex
Stream
Stream { } wires a source extern, an anonymous
transform block, and a sink extern into a continuous event loop.
The transform is written in plain Ibex — the same clauses used
for batch queries.
When the transform contains resample, the runtime
automatically switches to TimeBucket mode:
it buffers rows and emits one output batch per closed time bucket.
Otherwise it operates PerRow, forwarding each
incoming batch immediately.
In TimeBucket mode the bucket is flushed by whichever trigger fires first: a wall-clock check (the configured duration has elapsed since the bucket opened) or a data-timestamp check (an incoming row carries a later bucket's timestamp). The wall-clock trigger gives prompt delivery for real-time feeds; the data-timestamp trigger handles historical replay correctly.
Source timeout contract: the wall-clock check
runs only when the source call returns. Sources should use a short
internal timeout and return StreamTimeout{} when no
data arrives — the runtime fires the flush check and immediately
calls the source again. Ibex does not buffer data on behalf of the
source; whether messages arriving during this window are preserved
depends on the transport. For OS kernel-backed sockets (UDP, TCP)
the kernel's receive buffer holds packets safely. For user-space
transports use StreamBuffered (see below).
Use import to load a library stub that registers
the extern source and sink functions as a plugin.
// Load the UDP plugin (registers udp_recv / udp_send)
import "udp";
// Receive ticks, resample to 1-minute OHLC bars, forward over UDP
let ohlc_stream = Stream {
source = udp_recv(9001),
transform = [resample 1m, select {
open = first(price),
high = max(price),
low = min(price),
close = last(price)
}],
sink = udp_send("127.0.0.1", 9002)
};
// PerRow stream: forward filtered rows with no buffering
let live_filter = Stream {
source = udp_recv(9001),
transform = [filter price > 100.0, rename p = price],
sink = udp_send("127.0.0.1", 9003)
};
// WebSocket sink: push OHLC bars to browser dashboards
import "websocket";
let ws_stream = Stream {
source = udp_recv(9001),
transform = [resample 1m, select {
open = first(price),
high = max(price),
low = min(price),
close = last(price)
}],
// ws_send starts a TCP listener; browsers connect with new WebSocket("ws://…")
sink = ws_send(8080)
};
StreamBuffered
For in-process or user-space transports the plugin needs a
thread-safe queue between its producer and the Ibex event loop.
StreamBuffered provides one out of the box — a lockless
SPSC ring buffer paired with a compatible ExternFn —
so the plugin author doesn't have to implement their own.
Use make_buffered_source(producer_fn) to let the Ibex
query control the capacity: the plugin only supplies the
data-production logic; the ring is lazily initialised on the first
event-loop call using the capacity argument from the
query. A producer thread is started at that point,
calls buf.write(table) (blocking on backpressure), and
signals completion with buf.close(). The event loop
drains the ring, returning StreamTimeout{} when empty
so wall-clock bucket flushes still fire on schedule.
Cache-line-separated std::atomic head and tail indices
keep producer and consumer on independent cache lines — no false
sharing, no mutex on the hot path.
When to use each approach: for UDP/TCP sockets the
OS kernel already manages a receive buffer (SO_RCVBUF)
independently of the application — no application-level queue is
needed. Return StreamTimeout{} directly from
recvfrom with a short socket timeout instead.
StreamBuffered is for in-process queues, shared-memory
channels, or any transport without a kernel-managed receive buffer.
// C++ plugin — only the data-production logic
#include <ibex/runtime/stream_buffered.hpp>
registry.register_table("my_src",
ibex::runtime::make_buffered_source([](ibex::runtime::StreamBuffered& buf) {
for (auto& batch : my_data_source) {
buf.write(batch); // yields if ring full
}
buf.close();
}));
// Ibex query — capacity is a first-class tuning parameter
extern fn my_src(capacity: Int) -> TimeFrame from "plugin.hpp";
extern fn my_sink(df: DataFrame) -> Int from "plugin.hpp";
Stream {
source = my_src(512),
transform = [resample 1s, select { close = last(price) }],
sink = my_sink()
};
Performance
Release build, clang++, WSL2. Polars and data.table run multi-threaded on all cores; Ibex is single-threaded throughout.
Aggregation — 4 M rows, 252 symbols
| Query | Ibex | Polars | Pandas |
|---|---|---|---|
| mean by symbol | 28.4 ms | 40.1 ms | 181 ms |
| OHLC by symbol | 34.9 ms | 48.0 ms | 249 ms |
| count by sym×day | 12.6 ms | 66.2 ms | 328 ms |
| mean by sym×day | 14.0 ms | 76.8 ms | 367 ms |
| OHLC by sym×day | 20.6 ms | 73.9 ms | 400 ms |
| filter simple | 19.5 ms | 8.40 ms | 30.7 ms |
Geometric mean across 10 queries: 1.3× faster than Polars, 5× faster than Pandas, 2.1× faster than data.table, 3.5× faster than dplyr. Filter queries favour Polars, which uses parallel SIMD scans.
TimeFrame — 1 M rows, 1 s spacing
| Operation | Ibex | Polars | data.table |
|---|---|---|---|
| as_timeframe (sort) | 0.28 ms | 4.78 ms | 6.2 ms |
| lag(price, 1) | 0.97 ms | 4.84 ms | 11.0 ms |
| rolling count 1m | 1.12 ms | 16.9 ms | 12.2 ms |
| rolling sum 1m | 1.43 ms | 19.0 ms | 10.9 ms |
| rolling mean 5m | 1.65 ms | 19.7 ms | 9.6 ms |
| resample 1m OHLC | 24.7 ms | 14.6 ms | 20.0 ms |
Rolling operations use a two-pointer O(n) scan with a single result-column
allocation. Resample delegates to the aggregation path and is slower than
Polars’ parallel group_by_dynamic on this query.
Editor support
A TextMate grammar covering keywords, types, built-in functions, duration literals, backtick-quoted column names, and comments.
Install — WSL
cp -r editors/vscode \
/mnt/c/Users/<username>/.vscode/extensions/ibex-language-0.1.0
Install — macOS / native Linux
cp -r editors/vscode \
~/.vscode/extensions/ibex-language-0.1.0
Fully restart VS Code after copying. .ibex files are highlighted automatically.
Get started
Requirements: Clang 17+, CMake 3.26+, Ninja.
1 — Clone and build
# Debug build (ASan + UBSan)
cmake -B build -G Ninja \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Debug \
-DIBEX_ENABLE_SANITIZERS=ON
cmake --build build
# Release build
cmake -B build-release -G Ninja \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-release
2 — Run the test suite
ctest --test-dir build --output-on-failure
3 — Start the REPL
./build-release/tools/ibex --plugin-path ./build-release/libraries
:load examples/quant.ibex | Load and execute an .ibex script |
:comments [on|off] | Toggle printing script comments during :load |
:tables | List all bound table names |
:schema <table> | Column names and types |
:head <table> [n] | First n rows (default 10) |
:describe <table> | Schema + first n rows |
4 — Compile a script to C++
# Transpile, compile, and run in one step
scripts/ibex-run.sh examples/quant.ibex
# Or transpile only
scripts/ibex-build.sh examples/quant.ibex -o quant