GitHub - NavodPeiris/grizzlars: High-performance DataFrame library written in C++ with Python bindings.

7 min read Original article ↗

A Python DataFrame library backed by a multithreaded C++ engine — built for speed.

More than 6x less memory consumed on loading large CSVs compared to polars

grizzlars wraps DataFrame, a high-performance C++ DataFrame, with a clean Python API. Columns are stored as typed std::vector<T> buffers — no GIL-bound Python object overhead. Sort, filter, groupby, join, and aggregate operations run in parallel across all CPU cores automatically.


Installation

Requires Python 3.10 or higher


Quick Start

import grizzlars as gl

df = gl.DataFrame({
  "symbol": ["AAPL", "GOOGL", "MSFT", "AMZN", "META"],
  "price":  [189.3,  175.1,   415.2,  185.0,  502.7],
  "volume": [52_000_000, 18_000_000, 22_000_000, 31_000_000, 14_000_000],
  "active": [True, True, True, False, True],
})

print(df)
# Load from CSV
df = gl.read_csv("prices.csv")

Column Types

Python / NumPy type grizzlars type C++ storage
float / float64 "double" std::vector<double>
int / int64 "int64" std::vector<int64_t>
bool "bool" std::vector<bool>
str "string" std::vector<std::string>

The index is always uint64 and defaults to 0..N-1.


API Reference

I/O

grizzlars.read_csv(path, index_col=None, dtype=None)

Read a CSV file into a DataFrame. Uses a multithreaded native C++ reader by default.

df = gl.read_csv("data.csv")

# Promote a column to the index
df = gl.read_csv("data.csv", index_col="Id")

# Force a column to a specific type (triggers slower Python fallback)
df = gl.read_csv("data.csv", dtype={"code": str})

df.to_csv(path, index=True)

Write the DataFrame to a CSV file.

df.to_csv("output.csv")
df.to_csv("output.csv", index=False)  # omit index column

Construction

grizzlars.DataFrame(data=None, index=None)

Build a DataFrame from a dict of lists or NumPy arrays.

df = gl.DataFrame({
  "x": [1, 2, 3],
  "y": [4.0, 5.0, 6.0],
})

# Custom index
df = gl.DataFrame({"x": [10, 20, 30]}, index=[100, 200, 300])

Inspection

df.shape          # (rows, cols) — tuple
len(df)           # row count
df.columns        # list of column names
df.index          # numpy uint64 array of index values
df.dtypes()       # {"col": "double" | "int64" | "bool" | "string", ...}

Column Access & Mutation

# Read a column — returns numpy array (numeric/bool) or list (string)
prices = df["price"]

# Add or overwrite a column in-place
df["log_price"] = np.log(df["price"])
df["label"] = ["cheap", "expensive", "mid"]

# Check membership
"price" in df   # True / False

# Non-mutating variants
df2 = df.with_column("log_price", np.log(df["price"]))
df2 = df.assign(log_price=np.log(df["price"]), rank=[1, 2, 3])

# Select a subset of columns
df2 = df.select(["symbol", "price"])

# Rename columns in-place
df.rename({"symbol": "ticker", "price": "close"})

# Drop a column in-place
df.drop("log_price")

Slicing

df.head(10)          # first 10 rows
df.tail(10)          # last 10 rows

df.iloc[0]           # single row as DataFrame
df.iloc[10:50]       # slice (step=1 only)
df.iloc[-1]          # last row

Filtering

filter() is lazy — the boolean mask is stored and data is only copied when a materialising operation is called. len() and .shape are always O(1).

# Mask mode (recommended — compose with numpy operators)
cheap = df.filter(df["price"] < 200)
active = df.filter(df["active"] == True)

# String operator mode
cheap = df.filter("price", "<", 200)
# Operators: ">" ">=" "<" "<=" "==" "!="

# Combine conditions
mask = (df["price"] < 200) & (df["volume"] > 10_000_000)
df.filter(mask)

# len() and shape are free (no materialisation)
print(len(cheap))     # instant
print(cheap.shape)    # instant

# Materialises on first real operation
print(cheap["symbol"])
cheap.sort("price")

Sorting

All sort operations are non-mutating and return a new DataFrame.

df.sort("price")                       # ascending
df.sort("price", ascending=False)      # descending
df.sort_values("volume", ascending=False)  # alias for sort()
df.sort_index()                        # sort by index ascending
df.sort_index(ascending=False)         # sort by index descending

Statistics

All scalar stats operate on a single column and return a Python float or int.

df.mean("price")         # arithmetic mean
df.std("price")          # sample standard deviation (n-1)
df.sum("price")          # total
df.min("price")          # minimum value
df.max("price")          # maximum value
df.count("price")        # non-null count

df.quantile("price", 0.5)    # median (q in [0, 1])
df.corr("price", "volume")   # Pearson correlation
df.cov("price", "volume")    # sample covariance

df.nunique("symbol")         # number of distinct values
df.unique("symbol")          # sorted array of distinct values
df.n_missing("price")        # count of NaN / empty-string values

# Frequency table — returns DataFrame with ["value", "count"]
df.value_counts("symbol")

df.describe()

Returns a DataFrame with count / mean / std / min / max / sum for every numeric column.

stats = df.describe()
# statistic  |  price  |  volume
# -----------+---------+---------
# count      |  5.0    |  5.0
# mean       |  ...    |  ...
# std        |  ...    |  ...
# min        |  ...    |  ...
# max        |  ...    |  ...
# sum        |  ...    |  ...

GroupBy

groupby() returns a _GroupBy object. Chain .agg() or a shorthand method.

# agg() accepts a dict of {column: function}
# Functions: "mean", "sum", "min", "max", "count", "std"
result = df.groupby("sector").agg({"price": "mean", "volume": "sum"})

# Shorthand methods
df.groupby("sector").mean("price")
df.groupby("sector").sum("volume")
df.groupby("sector").min("price")
df.groupby("sector").max("price")
df.groupby("sector").count("price")
df.groupby("sector").std("price")

GroupBy uses string_view keys internally — zero string copies during bucketing.


Join

Joins operate on the DataFrame index. Load CSVs with index_col= to set the join key.

left  = gl.read_csv("orders.csv",   index_col="order_id")
right = gl.read_csv("products.csv", index_col="order_id")

inner  = left.join(right, how="inner")   # default
left_j = left.join(right, how="left")    # unmatched right → NaN / ""
right_j = left.join(right, how="right")
outer  = left.join(right, how="outer")

The join uses a hash table probe — O(n + m) with parallel column scatter.


Concat

Vertically stack two DataFrames (append rows). The index resets to 0..N-1.

combined = df_a.concat(df_b)

# Stack many frames
from functools import reduce
all_data = reduce(lambda a, b: a.concat(b), frames)

Only columns present in both frames with the same type are kept.


Window Functions

All window functions return a NumPy array (not a new DataFrame).

df.rolling_mean("price", window=20)   # 20-period moving average
df.rolling_sum("volume", window=5)
df.rolling_std("price", window=20)
df.rolling_min("price", window=10)
df.rolling_max("price", window=10)

# Generic form
df.rolling("price", window=20, func="mean")
# func: "mean" | "sum" | "std" | "min" | "max"

Cumulative Functions

df.cumsum("volume")    # cumulative sum
df.cumprod("factor")   # cumulative product
df.cummin("price")     # running minimum
df.cummax("price")     # running maximum

Shift & Percent Change

df.shift("price", n=1)    # lag by 1 period; NaN at boundary
df.shift("price", n=-1)   # lead by 1 period
df.pct_change("price")    # (price[i] - price[i-1]) / price[i-1]; first element NaN

Data Cleaning

# Remove rows with duplicate values in a column (keep first)
df.drop_duplicates("symbol")

# Remove rows where a column is NaN or empty string
df.drop_na("price")

# Fill NaN / empty values in-place (returns self)
df.fillna("price", 0.0)
df.fillna("label", "unknown")

Threading

grizzlars automatically enables multithreading on import using all logical CPU cores. You can adjust it at runtime.

import grizzlars as gl

gl.set_optimum_thread_level()   # auto-detect (called on import)
gl.set_thread_level(4)          # pin to 4 threads
gl.get_thread_level()           # returns current thread count

Performance

grizzlars is built for analytical workloads on large datasets:

  • CSV load — memory-mapped file read, multithreaded chunk parsing, move semantics for string columns
  • Filter — lazy evaluation; boolean mask stored until a materialising operation; len() is always O(1) via SIMD count_nonzero
  • Sortstring_view comparison keys (zero heap allocation per comparison); parallel permutation scatter
  • GroupByunordered_map<string_view> bucketing (zero string copies); parallel aggregation
  • Join — hash table probe O(n + m); parallel column scatter across all cores
  • Aggregate / describe — direct C++ vector reduction, no Python loop overhead

Full test result:

Faster than polars in some scenarios and have significantly lower memory usage

===============================================================================
  Customer data benchmark  —  grizzlars vs polars
  Dataset: customers-2000000.csv  (341227 KiB)
===============================================================================

  Rows: 2,000,000    Columns: 12

  ── Load ──────────────────────────────────────────────────────────────
  read_csv (customers)                       polars   253.72 ms   grizzlars   428.60 ms    → polars is 1.69x faster

  ── Memory ────────────────────────────────────────────────────────────
  RSS delta after load                       polars   925.2 MiB   grizzlars   139.8 MiB

  ── Operations ────────────────────────────────────────────────────────
  sort(Last Name asc)                        polars   291.14 ms   grizzlars   502.89 ms    → polars is 1.73x faster
  filter(Index > 50) → 1,999,950 rows        polars    78.67 ms   grizzlars    54.02 ms    → grizzlars is 1.46x faster
  groupby Country → 243 groups               polars   158.51 ms   grizzlars   103.29 ms    → grizzlars is 1.53x faster
  agg(mean/sum/std/min/max)                  polars     8.92 ms   grizzlars     8.24 ms    → grizzlars is 1.08x faster
  describe                                   polars    97.25 ms   grizzlars   255.81 ms    → polars is 2.63x faster

  ── Joins  (customers ⋈ people-100000.csv) ───────────────────────────
  join inner → 100,000 rows                  polars    30.66 ms   grizzlars   117.82 ms    → polars is 3.84x faster
  join left  → 2,000,000 rows (~50 000 unmatched) polars    38.12 ms   grizzlars   277.43 ms    → polars is 7.28x faster

===============================================================================

Project Structure

grizzlars/
├── DataFrame/             core C++ library
├── grizzlars/             Python package
│   └── __init__.py        DataFrame class + read_csv
├── src/
│   └── grizzlars_bindings.cpp   pybind11 C++ extension
├── tests/
│   ├── data               data for tests
│   ├── functional         functional tests
│   └── performance        performance tests
├── CMakeLists.txt
└── pyproject.toml