GitHub - neul-labs/fast-axolotl: High-performance Rust extensions for Axolotl (no OOM for large datasets) - drop-in acceleration for existing installations.

3 min read Original article ↗

CI Compatibility PyPI Python License

High-performance Rust extensions for Axolotl - drop-in acceleration for existing installations.

Highlights

  • Zero-config acceleration - Just import fast_axolotl before axolotl
  • 77x faster streaming - Rust-based data loading vs HuggingFace datasets
  • Parallel hashing - Multi-threaded SHA256 for deduplication
  • Cross-platform - Linux, macOS, Windows with Python 3.10-3.12

Quick Start

or

import fast_axolotl  # Auto-installs acceleration shim

# Now use axolotl normally - accelerations are active
import axolotl

Benchmark Results

Tested on Linux x86_64, Python 3.11, 16 CPU cores:

Operation Data Size Rust Python Speedup
Streaming Data Loading 50,000 rows 0.009s 0.724s 77x
Parallel Hashing (SHA256) 100,000 rows 0.027s 0.052s 1.9x
Token Packing 10,000 sequences 0.079s 0.033s 0.4x*
Batch Padding 10,000 sequences 0.200s 0.105s 0.5x*

*Token packing and batch padding show overhead for small datasets due to FFI costs. Performance gains are realized with larger datasets typical in LLM training.

See BENCHMARK.md for detailed results.

Compatibility

All features tested and working:

Feature Status
Rust Extension Loading Tested
Module Shimming Tested
Streaming (Parquet, JSON, CSV, Arrow) Tested
Token Packing Tested
Parallel Hashing Tested
Batch Padding Tested
Axolotl Integration Tested

See COMPATIBILITY.md for full test results.

Features

1. Streaming Data Loading

Memory-efficient streaming for large datasets:

from fast_axolotl import streaming_dataset_reader

for batch in streaming_dataset_reader(
    "/path/to/large_dataset.parquet",
    dataset_type="parquet",
    batch_size=1000,
    num_threads=4
):
    process(batch)

Supports: Parquet, Arrow, JSON, JSONL, CSV, Text (with ZSTD/Gzip compression)

2. Token Packing

Replace inefficient torch.cat() loops:

from fast_axolotl import pack_sequences

result = pack_sequences(
    sequences=[[1, 2, 3], [4, 5], [6, 7, 8, 9]],
    max_length=2048,
    pad_token_id=0,
    eos_token_id=2
)
# Returns: {'input_ids': [...], 'labels': [...], 'attention_mask': [...]}

3. Parallel Hashing

Multi-threaded SHA256 for deduplication:

from fast_axolotl import parallel_hash_rows, deduplicate_indices

hashes = parallel_hash_rows(rows, num_threads=0)  # 0 = auto

# Or get unique indices directly
unique_indices, new_hashes = deduplicate_indices(rows)

4. Batch Padding

Efficient sequence padding:

from fast_axolotl import pad_sequences

padded = pad_sequences(
    [[1, 2, 3], [4, 5]],
    target_length=8,
    pad_value=0,
    padding_side="right"
)

Installation

From PyPI

uv pip install fast-axolotl

From Source

git clone https://github.com/neul-labs/fast-axolotl
cd fast-axolotl

# Using uv (recommended)
uv pip install -e .

# Or with pip + maturin
pip install maturin
maturin develop --release

Documentation

Configuration

Enable features in your Axolotl config:

# Enable Rust streaming for large datasets
dataset_use_rust_streaming: true
sequence_len: 32768

# Deduplication uses parallel hashing automatically
dedupe: true

Development

git clone https://github.com/neul-labs/fast-axolotl
cd fast-axolotl

uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
maturin develop

# Run tests
pytest -v

# Run benchmarks
python scripts/benchmark.py

# Run compatibility tests
python scripts/compatibility_test.py

Support

Questions or bugs? Reach out via:

Maintainers

Fast-Axolotl is authored by Dipankar Sarkar (me@dipankar.name) and maintained by the team at Neul Labs.

License

MIT