MLX Ruby
Index
About
Ruby bindings for MLX: a NumPy-like array framework for machine learning.
This repository packages:
- A native Ruby extension backed by the upstream C++ MLX runtime.
- Ruby APIs for
MLX::Core,MLX::NN,MLX::Optimizers,MLX::Utils, distributed helpers, andMLX::DSL. - Graph IR -> ONNX export plus browser harness tooling for WebGPU/wasm paths.
- Parity/contract tooling and benchmark adapters for local models and
mlx-ruby-examples.
Highlights
- Lazy arrays and dynamic graph construction.
- Function transforms (
grad,value_and_grad,vmap,jvp,vjp,compile, and more). - Neural-network layers, losses, initialization, and optimizers.
- Ruby DSL model/training primitives (
train_step, trainer, checkpoints, data pipelines, experiments). - Device-aware execution (
cpu/gpu, including Metal-backed GPU on Apple silicon when available). - Graph IR validation, ONNX export, and WebGPU browser harness generation.
- Extensive parity testing (op-level, model fixture, browser harness, and examples-submodule coverage).
Requirements
- Core build/runtime:
- Ruby
>= 3.3(frommlx.gemspec) - Git (with submodule support)
- CMake
>= 3.25 - C++20-capable toolchain
- macOS: Xcode command-line tools + Metal toolchain
- Linux: standard build tools + BLAS/LAPACK headers (
build-essential cmake libopenblas-dev liblapacke-dev)
- Ruby
- ONNX export / benchmark helpers:
- Python 3 with packages from
requirements.txt onnxpackage available for parity/check utilities in tests and tooling
- Python 3 with packages from
- Web smoke/harness workflows:
- Node.js +
npm(forplaywrightandonnxruntime-web)
- Node.js +
- Docs build:
- Python 3 +
pip doxygenmake- Python deps from
requirements.txt
- Python 3 +
Installation
macOS prerequisite: install MetalToolchain
On macOS, install the Apple Metal toolchain before installing the gem:
xcode-select --install sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer xcodebuild -downloadComponent MetalToolchain
Optional check:
Install from RubyGems
Install from source (recommended for development)
git clone --recurse-submodules https://github.com/skryl/mlx-ruby.git cd mlx-ruby bundle install bundle exec rake build bundle exec rake test
If you already cloned without submodules:
git submodule update --init --recursive
Build and install a local gem:
gem build mlx.gemspec
gem install ./mlx-*.gemUse from another project via local path:
gem "mlx", path: "/absolute/path/to/mlx-ruby"
Verify installation
bundle exec ruby -e 'require "mlx"; puts MLX::VERSION; puts "native=#{MLX.native_available?}"'
Examples
Primary end-to-end examples live in
skryl/mlx-ruby-examples.
Web demo model/export scripts in this repo are under:
examples/web/
Quickstart
Arrays and lazy execution
require "mlx" mx = MLX::Core x = mx.array([1.0, 2.0, 3.0], mx.float32) y = mx.sqrt(x + 1.0) mx.eval(y) # force materialization p y.to_a # => [1.414..., 1.732..., 2.0]
Minimal trainable module
require "mlx" mx = MLX::Core class LinearRegressorDsl < MLX::DSL::Model option :in_dim, default: 3 option :out_dim, default: 1 layer :linear, MLX::NN::Linear, -> { in_dim }, -> { out_dim } def call(x) linear.call(x) end end model = LinearRegressorDsl.new optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-2) step = model.train_step(optimizer: optimizer, sync: :step) do |x:, y:| diff = model.call(x) - y mx.mean(diff * diff) end x = mx.array([[1.0, 2.0, 3.0], [2.0, 1.0, 0.0]], mx.float32) y = mx.array([[1.0], [0.0]], mx.float32) 5.times do |iter| loss = step.call(x: x, y: y) puts "step=#{iter} loss=#{loss.item}" end
Small CNN (single training step)
require "mlx" mx = MLX::Core class SmallCnnDsl < MLX::DSL::Model option :num_classes, default: 10 layer :features do sequential do conv2d 1, 16, 3, padding: 1 relu max_pool2d 2, stride: 2 conv2d 16, 32, 3, padding: 1 relu max_pool2d 2, stride: 2 end end layer :classifier do sequential do fn { |x| MLX::Core.reshape(x, [x.shape[0], 32 * 7 * 7]) } linear 32 * 7 * 7, 64 relu linear 64, num_classes end end def call(x) classifier.call(features.call(x)) end end model = SmallCnnDsl.new(num_classes: 10) optimizer = MLX::Optimizers::Adam.new(learning_rate: 1e-3) step = model.train_step(optimizer: optimizer, sync: :step) do |images:, labels:| logits = model.call(images) MLX::NN.cross_entropy(logits, labels, reduction: "mean") end images = mx.random_uniform([4, 28, 28, 1], 0.0, 1.0, mx.float32) labels = mx.array([1, 3, 4, 7], mx.int32) loss = step.call(images: images, labels: labels) puts "cnn_loss=#{loss.item}"
Karpathy-style nano GPT (single training step)
require "mlx" mx = MLX::Core vocab_size = 65 seq_len = 32 batch_size = 4 dims = 128 heads = 4 layers = 2 class NanoGptDsl < MLX::DSL::Model option :vocab_size option :seq_len option :dims option :heads option :layers layer :token_embedding, MLX::NN::Embedding, -> { vocab_size }, -> { dims } layer :pos_embedding, MLX::NN::Embedding, -> { seq_len }, -> { dims } layer :encoder, MLX::NN::TransformerEncoder, -> { layers }, -> { dims }, -> { heads }, mlp_dims: -> { dims * 4 }, dropout: 0.0, norm_first: true layer :head, MLX::NN::Linear, -> { dims }, -> { vocab_size } def call(input_ids) positions = MLX::Core.arange(0, input_ids.shape[1], 1, MLX::Core.int32) hidden = MLX::Core.add(token_embedding.call(input_ids), pos_embedding.call(positions)) mask = MLX::NN::MultiHeadAttention.create_additive_causal_mask(input_ids.shape[1]) head.call(encoder.call(hidden, mask)) end end tokens = Array.new(batch_size) { Array.new(seq_len) { rand(vocab_size) } } targets = tokens.map { |row| row[1..] + [0] } input_ids = mx.array(tokens, mx.int32) target_ids = mx.array(targets, mx.int32) model = NanoGptDsl.new(vocab_size: vocab_size, seq_len: seq_len, dims: dims, heads: heads, layers: layers) optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-3) step = model.train_step(optimizer: optimizer, sync: :step) do |input_ids:, target_ids:| logits = model.call(input_ids) logits2d = MLX::Core.reshape(logits, [batch_size * seq_len, vocab_size]) labels1d = MLX::Core.reshape(target_ids, [batch_size * seq_len]) MLX::NN.cross_entropy(logits2d, labels1d, reduction: "mean") end loss = step.call(input_ids: input_ids, target_ids: target_ids) puts "nanogpt_loss=#{loss.item}"
Device selection
Default device selection runs during require "mlx":
MLX_DEFAULT_DEVICE=cpu|gpu|metal- fallback:
DEVICE=cpu|gpu|metal
On systems without Metal-backed GPU support, gpu/metal requests fall back
to CPU.
Example:
MLX_DEFAULT_DEVICE=gpu bundle exec ruby your_script.rbOnnx/WebGPU Support
MLX Ruby exposes Graph IR/ONNX/WebGPU entrypoints on MLX::ONNX.
Architecture boundary:
- Public API (
MLX::ONNX):export_onnxexport_onnx_jsonexport_onnx_compatibility_reportexport_graph_irexport_graph_ir_jsongraph_ir_to_onnxgraph_ir_to_onnx_json
- Internal implementation modules:
MLX::ONNXMLX::ONNX::NativeMLX::ONNX::WebGPUHarness
End-to-end flow:
- Export Graph IR hash with
MLX::ONNX.export_graph_ir(or JSON debug payload withMLX::ONNX.export_graph_ir_json). - Generate ONNX JSON debug stubs with
MLX::ONNX.graph_ir_to_onnx_jsonor directly withMLX::ONNX.export_onnx_json. - Run ONNX export readiness diagnostics with
MLX::ONNX.export_onnx_compatibility_reportand inspectunsupported_ops. - Export binary ONNX with
MLX::ONNX.graph_ir_to_onnxor directly withMLX::ONNX.export_onnx(external_dataoptions are available for large models). - Package browser harness assets with
MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness. - Verify runtime behavior with
MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness.
Harness artifacts from MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness:
model.onnxharness.manifest.jsoninputs.example.jsonindex.htmlharness.js- optional external data file (for example
model.data)
Smoke telemetry from MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness uses
onnx_webgpu_telemetry_v1 and reports provider selection/fallback details
(selected_provider, requested_providers, fallback_used) plus timing
fields (run_timings_ms, model_load_latency_ms,
first_inference_latency_ms, steady_state_inference_latency_ms).
Operational requirements:
MLX::ONNX.export_onnxandMLX::ONNX.graph_ir_to_onnxrequire a path-like target (not IO).- Browser smoke tests require Node.js + Playwright (
web/) and optionally localonnxruntime-webassets. - Harness execution providers are
webgpuandwasm. web:assetsexports GPT-2, nanoGPT Shakespeare, and Stable Diffusion assets each run via Hugging Face checkpoints.
Demo asset workflows:
- Generate browser assets:
bundle exec rake web:assets - Start local demo server:
bundle exec rake web:start(orbundle exec rake web:serve)
Web Demo quickstart:
bundle exec rake web:assets bundle exec rake web:serve
GitHub Pages note: the published demo site includes GPT-2, nanoGPT, and Stable Diffusion assets. The Stable Diffusion page may take longer to initialize due to larger model files.
Then open:
http://127.0.0.1:3030/http://127.0.0.1:3030/demo/gpt2/http://127.0.0.1:3030/demo/nanogpt/http://127.0.0.1:3030/demo/stable_diffusion/
API reference:
docs/src/ruby/export.rst
Development
Build native extension
Clean native build artifacts
Run tests
Test task shortcuts:
- CPU-only:
bundle exec rake "test[cpu]" - GPU-only:
bundle exec rake "test[gpu]" - Installed gem artifact test:
bundle exec rake test:gem
Strict mode (per-file timeout):
MLX_STRICT_TESTS=1 MLX_TEST_TIMEOUT=30 bundle exec rake test
Benchmarks (Ruby vs Python implementations)
List tasks:
Run one benchmark lane:
bundle exec rake "benchmark:cpu[local]"
Run all benchmark suites:
bundle exec rake "benchmark:all[local,examples]"
Install benchmark Python dependencies into your active Python environment (for asdf users, this is the Python selected by your current shell / .tool-versions):
bundle exec rake benchmark:depsCommon benchmark environment variables:
| Variable | Default | Purpose |
|---|---|---|
DEVICE |
gpu |
Compute device (cpu, gpu, or metal) |
RUNS |
50 |
Timed iterations (ITERATIONS is accepted for compatibility) |
WARMUP |
10 |
Warmup iterations |
BATCH |
8 |
Batch size |
SEQUENCE_LENGTH |
128 |
Source sequence length |
TARGET_SEQUENCE_LENGTH |
64 |
Target sequence length |
DIMENSIONS |
256 |
Model width |
HEADS |
8 |
Attention heads |
LAYERS |
4 |
Number of layers |
PYTHON |
python3 |
Python executable for cross-language comparison |
BENCHMARK_DEVICES |
cpu,gpu |
Devices for top-level rake benchmark |
EXAMPLES_MODE |
dsl |
Examples-submodule mode (dsl or no_dsl) |
WEBGPU_TIMEOUT |
180 |
WebGPU harness timeout seconds |
WEBGPU_WARMUP |
benchmark warmup | WebGPU warmup runs |
WEBGPU_MEASURE |
benchmark runs | WebGPU measured runs |
REQUIRE_WEBGPU |
unset | Fail instead of skip when WebGPU provider is unavailable |
Quick benchmark smoke command:
bundle exec rake "benchmark:cpu[local]" RUNS=5 WARMUP=1
Build docs
From the repo root:
# One-time setup brew install doxygen # macOS (or install doxygen via apt on Linux) python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt bundle install # Generate docs bundle exec rake docs:build
Docs are written to docs/build/html.
# Quick local preview
ruby -run -e httpd docs/build/html -p 8000Then open http://localhost:8000/.
The repo’s Pages workflow builds docs together with the web demo for deployment.
Repository layout
lib/: Ruby API surface (core,nn,optimizers,dsl, distributed utilities), with ONNX public facade underlib/mlx/onnx.rband ONNX harness helpers underlib/mlx-onnx/**.ext/mlx/: core native extension build bridge (extconf.rb, C++ binding entry).ext/mlx-onnx/: ONNX native binding layer loaded by the core extension.submodules/mlx/: upstream MLX submodule.submodules/mlx-onnx/: extracted ONNX core library submodule used by Ruby bindings.examples/web/: web demo model/export helpers (GPT-2, nanoGPT, Stable Diffusion).tasks/: rake task implementations (build,test,docs,benchmark,web, training/assets exporters).web/: static demo site, generated assets, ONNX WebGPU harness templates.test/: unit/task/parity suites.test/support/parity/: coverage/report generators.docs/: Sphinx + Doxygen documentation sources.
Troubleshooting
missing MLX include dir: initialize submodules (git submodule update --init --recursive).mlx/mlx-onnx revision mismatch detected: sync the pinned submodules:
git submodule update --init --recursive submodules/mlx submodules/mlx-onnx
- Native extension does not load: rebuild manually:
cd ext/mlx
ruby extconf.rb
make -j4- ONNX binary export fails checker/runtime loading: regenerate with
MLX::ONNX.export_onnx/MLX::ONNX.graph_ir_to_onnxand validate with localonnx.checkertooling. - On Apple silicon, verify native architecture:
ruby -e 'require "rbconfig"; puts RbConfig::CONFIG["host_cpu"]'- Web smoke fails due missing runtime dependencies: run
bundle exec rake deps:web(installs/checksonnx,node/npm/npx,playwright, andonnxruntime-web). - If CMake configure fails intermittently, rerun
ruby extconf.rb; the build script already includes a clean-retry path.
Contributing
- Open pull requests against this repository.
- Keep parity snapshots in
test/support/snapshots/parity/in sync with contract changes. - Keep generated parity artifacts under
test/reports/(not source-controlled paths). - Follow upstream MLX contributor guidance where applicable: mlx/CONTRIBUTING.md.
CI currently runs on ubuntu-22.04 and macos-14 with Ruby 3.4 and 4.0.
License
mlx gem is distributed under the MIT license.