A DuckDB extension implementing Julian Hyde's "Measures in SQL" paper (arXiv:2406.00251).
What is this?
Yardstick adds measure-aware SQL to DuckDB. Measures are aggregations that know how to re-aggregate themselves when the query context changes. This enables:
- Percent of total calculations without CTEs or window functions
- Year-over-year comparisons with simple syntax
- Drill-down analytics that automatically adjust aggregation context
Quick Start & Demo
-- Load the extension INSTALL yardstick FROM community; LOAD yardstick; -- Create the sales table CREATE TABLE sales ( id INTEGER PRIMARY KEY, year INTEGER, region VARCHAR(50), amount DECIMAL(10, 2) ); -- Insert sample data INSERT INTO sales (id, year, region, amount) VALUES (1, 2023, 'North', 15000.00), (2, 2023, 'North', 22000.00), (3, 2023, 'South', 18000.00), (4, 2023, 'South', 12000.00), (5, 2023, 'East', 25000.00), (6, 2023, 'West', 19000.00), (7, 2024, 'North', 28000.00), (8, 2024, 'North', 31000.00), (9, 2024, 'South', 21000.00), (10, 2024, 'South', 16000.00), (11, 2024, 'East', 33000.00), (12, 2024, 'East', 29000.00), (13, 2024, 'West', 24000.00), (14, 2024, 'West', 27000.00); -- Create a view with measures CREATE VIEW sales_v AS SELECT year, region, SUM(amount) AS MEASURE revenue, COUNT(*) AS MEASURE order_count FROM sales; -- Query with AGGREGATE() and AT modifiers SELECT year, region, AGGREGATE(revenue) AS revenue, AGGREGATE(revenue) AT (ALL region) AS year_total, AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS pct_of_year FROM sales_v; -- Variance from the global average SELECT region, AGGREGATE(revenue) AS revenue, AGGREGATE(revenue) AT (ALL) / 4.0 AS expected_if_equal, -- 4 regions AGGREGATE(revenue) - (AGGREGATE(revenue) AT (ALL) / 4.0) AS variance FROM sales_v; -- Nested percentages (% of year, and that year's % of total) SELECT year, region, AGGREGATE(revenue) AS revenue, 100.0 * AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS pct_of_year, 100.0 * AGGREGATE(revenue) AT (ALL region) / AGGREGATE(revenue) AT (ALL) AS year_pct_of_total FROM sales_v; -- Compare 2024 performance to 2023 baseline for each region SELECT region, AGGREGATE(revenue) AT (SET year = 2024) AS rev_2024, AGGREGATE(revenue) AT (SET year = 2023) AS rev_2023, AGGREGATE(revenue) AT (SET year = 2024) - AGGREGATE(revenue) AT (SET year = 2023) AS growth FROM sales_v; -- Filter to specific segments SELECT year, AGGREGATE(revenue) AS total_revenue, AGGREGATE(revenue) AT (SET region = 'North') AS north_revenue, AGGREGATE(revenue) AT (SET region IN ('North', 'South')) AS north_south_combined FROM sales_v;
Notebooks
Notebooks embed their dependencies via juv metadata.
uv tool install juv
# or run without install
uvx juv run docs/yardstick_adtech_kitchen_sink.ipynb
uvx juv run docs/yardstick_looker_demo.ipynbThese notebooks load the local extension if it exists at:
build/release/extension/yardstick/yardstick.duckdb_extension
and fall back to INSTALL yardstick FROM community.
Syntax
Defining Measures
CREATE VIEW view_name AS SELECT dimension1, dimension2, AGG(expr) AS MEASURE measure_name FROM table;
Yardstick automatically handles the grouping. All DuckDB aggregate functions are supported; non-decomposable aggregates (COUNT(DISTINCT), MEDIAN, PERCENTILE_, QUANTILE_, MODE) are recomputed from base rows at query time and support AT modifiers, but can be more expensive.
Querying Measures
SELECT dimensions, AGGREGATE(measure_name) [AT modifier] FROM view_name;
On DuckDB 1.5+, queries containing AGGREGATE() are automatically intercepted by the extension's parser override, so no special prefix is needed. On older versions, the SEMANTIC prefix is still supported as a fallback.
AT Modifiers
| Modifier | Description | Example |
|---|---|---|
AT (ALL) |
Grand total across all dimensions | AGGREGATE(revenue) AT (ALL) |
AT (ALL dim) |
Total excluding specific dimension | AGGREGATE(revenue) AT (ALL region) |
AT (ALL expr) |
Total excluding ad hoc dimension | AGGREGATE(revenue) AT (ALL MONTH(date)) |
AT (SET dim = val) |
Fix dimension to specific value | AGGREGATE(revenue) AT (SET year = 2022) |
AT (SET dim = expr) |
Fix dimension to expression | AGGREGATE(revenue) AT (SET year = year - 1) |
AT (SET expr = val) |
Fix ad hoc dimension to value | AGGREGATE(revenue) AT (SET MONTH(date) = 6) |
AT (WHERE cond) |
Pre-aggregation filter | AGGREGATE(revenue) AT (WHERE region = 'US') |
AT (VISIBLE) |
Use query's WHERE clause | AGGREGATE(revenue) AT (VISIBLE) |
Building
Prerequisites:
- CMake 3.5+
- C++17 compiler
- Cargo
make # builds Rust library and DuckDB extension make test # runs tests
The extension will be at build/release/extension/yardstick/yardstick.duckdb_extension
Limitations
See LIMITATIONS.md for known issues and workarounds.
Key limitations:
- Window-defined measures with
AT (...)must evaluate to a single value per context, or they error
Testimonials
"I used this to integrate into a copilotkit chat interface serving graphs, works really well for the llm." - JFox, DuckDB Discord
References
- Julian Hyde, "Measures in SQL" (2024). arXiv:2406.00251
- DuckDB Extension Template
License
MIT