GitHub - skryl/mlx-ruby: Ruby Bindings for the MLX Framework

Index

About

Ruby bindings for MLX: a NumPy-like array framework for machine learning.

This repository packages:

A native Ruby extension backed by the upstream C++ MLX runtime.
Ruby APIs for MLX::Core, MLX::NN, MLX::Optimizers, MLX::Utils, distributed helpers, and MLX::DSL.
Graph IR -> ONNX export plus browser harness tooling for WebGPU/wasm paths.
Parity/contract tooling and benchmark adapters for local models and mlx-ruby-examples.

Highlights

Lazy arrays and dynamic graph construction.
Function transforms (grad, value_and_grad, vmap, jvp, vjp, compile, and more).
Neural-network layers, losses, initialization, and optimizers.
Ruby DSL model/training primitives (train_step, trainer, checkpoints, data pipelines, experiments).
Device-aware execution (cpu/gpu, including Metal-backed GPU on Apple silicon when available).
Graph IR validation, ONNX export, and WebGPU browser harness generation.
Extensive parity testing (op-level, model fixture, browser harness, and examples-submodule coverage).

Requirements

Core build/runtime:
- Ruby >= 3.3 (from mlx.gemspec)
- Git (with submodule support)
- CMake >= 3.25
- C++20-capable toolchain
- macOS: Xcode command-line tools + Metal toolchain
- Linux: standard build tools + BLAS/LAPACK headers (build-essential cmake libopenblas-dev liblapacke-dev)
ONNX export / benchmark helpers:
- Python 3 with packages from requirements.txt
- onnx package available for parity/check utilities in tests and tooling
Web smoke/harness workflows:
- Node.js + npm (for playwright and onnxruntime-web)
Docs build:
- Python 3 + pip
- doxygen
- make
- Python deps from requirements.txt

Installation

macOS prerequisite: install MetalToolchain

On macOS, install the Apple Metal toolchain before installing the gem:

xcode-select --install
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
xcodebuild -downloadComponent MetalToolchain

Optional check:

Install from RubyGems

Install from source (recommended for development)

git clone --recurse-submodules https://github.com/skryl/mlx-ruby.git
cd mlx-ruby
bundle install
bundle exec rake build
bundle exec rake test

If you already cloned without submodules:

git submodule update --init --recursive

Build and install a local gem:

gem build mlx.gemspec
gem install ./mlx-*.gem

Use from another project via local path:

gem "mlx", path: "/absolute/path/to/mlx-ruby"

Verify installation

bundle exec ruby -e 'require "mlx"; puts MLX::VERSION; puts "native=#{MLX.native_available?}"'

Examples

Primary end-to-end examples live in skryl/mlx-ruby-examples.

Web demo model/export scripts in this repo are under:

examples/web/

Quickstart

Arrays and lazy execution

require "mlx"

mx = MLX::Core
x = mx.array([1.0, 2.0, 3.0], mx.float32)
y = mx.sqrt(x + 1.0)

mx.eval(y)         # force materialization
p y.to_a           # => [1.414..., 1.732..., 2.0]

Minimal trainable module

require "mlx"

mx = MLX::Core

class LinearRegressorDsl < MLX::DSL::Model
  option :in_dim, default: 3
  option :out_dim, default: 1
  layer :linear, MLX::NN::Linear, -> { in_dim }, -> { out_dim }

  def call(x)
    linear.call(x)
  end
end

model = LinearRegressorDsl.new
optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-2)

step = model.train_step(optimizer: optimizer, sync: :step) do |x:, y:|
  diff = model.call(x) - y
  mx.mean(diff * diff)
end

x = mx.array([[1.0, 2.0, 3.0], [2.0, 1.0, 0.0]], mx.float32)
y = mx.array([[1.0], [0.0]], mx.float32)

5.times do |iter|
  loss = step.call(x: x, y: y)
  puts "step=#{iter} loss=#{loss.item}"
end

Small CNN (single training step)

require "mlx"

mx = MLX::Core

class SmallCnnDsl < MLX::DSL::Model
  option :num_classes, default: 10

  layer :features do
    sequential do
      conv2d 1, 16, 3, padding: 1
      relu
      max_pool2d 2, stride: 2
      conv2d 16, 32, 3, padding: 1
      relu
      max_pool2d 2, stride: 2
    end
  end

  layer :classifier do
    sequential do
      fn { |x| MLX::Core.reshape(x, [x.shape[0], 32 * 7 * 7]) }
      linear 32 * 7 * 7, 64
      relu
      linear 64, num_classes
    end
  end

  def call(x)
    classifier.call(features.call(x))
  end
end

model = SmallCnnDsl.new(num_classes: 10)
optimizer = MLX::Optimizers::Adam.new(learning_rate: 1e-3)

step = model.train_step(optimizer: optimizer, sync: :step) do |images:, labels:|
  logits = model.call(images)
  MLX::NN.cross_entropy(logits, labels, reduction: "mean")
end

images = mx.random_uniform([4, 28, 28, 1], 0.0, 1.0, mx.float32)
labels = mx.array([1, 3, 4, 7], mx.int32)

loss = step.call(images: images, labels: labels)
puts "cnn_loss=#{loss.item}"

Karpathy-style nano GPT (single training step)

require "mlx"

mx = MLX::Core
vocab_size = 65
seq_len = 32
batch_size = 4
dims = 128
heads = 4
layers = 2

class NanoGptDsl < MLX::DSL::Model
  option :vocab_size
  option :seq_len
  option :dims
  option :heads
  option :layers

  layer :token_embedding, MLX::NN::Embedding, -> { vocab_size }, -> { dims }
  layer :pos_embedding, MLX::NN::Embedding, -> { seq_len }, -> { dims }
  layer :encoder, MLX::NN::TransformerEncoder, -> { layers }, -> { dims }, -> { heads },
    mlp_dims: -> { dims * 4 },
    dropout: 0.0,
    norm_first: true
  layer :head, MLX::NN::Linear, -> { dims }, -> { vocab_size }

  def call(input_ids)
    positions = MLX::Core.arange(0, input_ids.shape[1], 1, MLX::Core.int32)
    hidden = MLX::Core.add(token_embedding.call(input_ids), pos_embedding.call(positions))
    mask = MLX::NN::MultiHeadAttention.create_additive_causal_mask(input_ids.shape[1])
    head.call(encoder.call(hidden, mask))
  end
end

tokens = Array.new(batch_size) { Array.new(seq_len) { rand(vocab_size) } }
targets = tokens.map { |row| row[1..] + [0] }

input_ids = mx.array(tokens, mx.int32)
target_ids = mx.array(targets, mx.int32)

model = NanoGptDsl.new(vocab_size: vocab_size, seq_len: seq_len, dims: dims, heads: heads, layers: layers)
optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-3)

step = model.train_step(optimizer: optimizer, sync: :step) do |input_ids:, target_ids:|
  logits = model.call(input_ids)
  logits2d = MLX::Core.reshape(logits, [batch_size * seq_len, vocab_size])
  labels1d = MLX::Core.reshape(target_ids, [batch_size * seq_len])
  MLX::NN.cross_entropy(logits2d, labels1d, reduction: "mean")
end

loss = step.call(input_ids: input_ids, target_ids: target_ids)
puts "nanogpt_loss=#{loss.item}"

Device selection

Default device selection runs during require "mlx":

MLX_DEFAULT_DEVICE=cpu|gpu|metal
fallback: DEVICE=cpu|gpu|metal

On systems without Metal-backed GPU support, gpu/metal requests fall back to CPU.

Example:

MLX_DEFAULT_DEVICE=gpu bundle exec ruby your_script.rb

Onnx/WebGPU Support

MLX Ruby exposes Graph IR/ONNX/WebGPU entrypoints on MLX::ONNX.

Architecture boundary:

Public API (MLX::ONNX):
- export_onnx
- export_onnx_json
- export_onnx_compatibility_report
- export_graph_ir
- export_graph_ir_json
- graph_ir_to_onnx
- graph_ir_to_onnx_json
Internal implementation modules:
- MLX::ONNX
- MLX::ONNX::Native
- MLX::ONNX::WebGPUHarness

End-to-end flow:

Export Graph IR hash with MLX::ONNX.export_graph_ir (or JSON debug payload with MLX::ONNX.export_graph_ir_json).
Generate ONNX JSON debug stubs with MLX::ONNX.graph_ir_to_onnx_json or directly with MLX::ONNX.export_onnx_json.
Run ONNX export readiness diagnostics with MLX::ONNX.export_onnx_compatibility_report and inspect unsupported_ops.
Export binary ONNX with MLX::ONNX.graph_ir_to_onnx or directly with MLX::ONNX.export_onnx (external_data options are available for large models).
Package browser harness assets with MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness.
Verify runtime behavior with MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness.

Harness artifacts from MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness:

model.onnx
harness.manifest.json
inputs.example.json
index.html
harness.js
optional external data file (for example model.data)

Smoke telemetry from MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness uses onnx_webgpu_telemetry_v1 and reports provider selection/fallback details (selected_provider, requested_providers, fallback_used) plus timing fields (run_timings_ms, model_load_latency_ms, first_inference_latency_ms, steady_state_inference_latency_ms).

Operational requirements:

MLX::ONNX.export_onnx and MLX::ONNX.graph_ir_to_onnx require a path-like target (not IO).
Browser smoke tests require Node.js + Playwright (web/) and optionally local onnxruntime-web assets.
Harness execution providers are webgpu and wasm.
web:assets exports GPT-2, nanoGPT Shakespeare, and Stable Diffusion assets each run via Hugging Face checkpoints.

Demo asset workflows:

Generate browser assets: bundle exec rake web:assets
Start local demo server: bundle exec rake web:start (or bundle exec rake web:serve)

Web Demo quickstart:

bundle exec rake web:assets
bundle exec rake web:serve

GitHub Pages note: the published demo site includes GPT-2, nanoGPT, and Stable Diffusion assets. The Stable Diffusion page may take longer to initialize due to larger model files.

Then open:

http://127.0.0.1:3030/
http://127.0.0.1:3030/demo/gpt2/
http://127.0.0.1:3030/demo/nanogpt/
http://127.0.0.1:3030/demo/stable_diffusion/

API reference:

docs/src/ruby/export.rst

Development

Build native extension

Clean native build artifacts

Run tests

Test task shortcuts:

CPU-only: bundle exec rake "test[cpu]"
GPU-only: bundle exec rake "test[gpu]"
Installed gem artifact test: bundle exec rake test:gem

Strict mode (per-file timeout):

MLX_STRICT_TESTS=1 MLX_TEST_TIMEOUT=30 bundle exec rake test

Benchmarks (Ruby vs Python implementations)

List tasks:

Run one benchmark lane:

bundle exec rake "benchmark:cpu[local]"

Run all benchmark suites:

bundle exec rake "benchmark:all[local,examples]"

Install benchmark Python dependencies into your active Python environment (for asdf users, this is the Python selected by your current shell / .tool-versions):

bundle exec rake benchmark:deps

Common benchmark environment variables:

Variable	Default	Purpose
`DEVICE`	`gpu`	Compute device (`cpu`, `gpu`, or `metal`)
`RUNS`	`50`	Timed iterations (`ITERATIONS` is accepted for compatibility)
`WARMUP`	`10`	Warmup iterations
`BATCH`	`8`	Batch size
`SEQUENCE_LENGTH`	`128`	Source sequence length
`TARGET_SEQUENCE_LENGTH`	`64`	Target sequence length
`DIMENSIONS`	`256`	Model width
`HEADS`	`8`	Attention heads
`LAYERS`	`4`	Number of layers
`PYTHON`	`python3`	Python executable for cross-language comparison
`BENCHMARK_DEVICES`	`cpu,gpu`	Devices for top-level `rake benchmark`
`EXAMPLES_MODE`	`dsl`	Examples-submodule mode (`dsl` or `no_dsl`)
`WEBGPU_TIMEOUT`	`180`	WebGPU harness timeout seconds
`WEBGPU_WARMUP`	benchmark warmup	WebGPU warmup runs
`WEBGPU_MEASURE`	benchmark runs	WebGPU measured runs
`REQUIRE_WEBGPU`	unset	Fail instead of skip when WebGPU provider is unavailable

Quick benchmark smoke command:

bundle exec rake "benchmark:cpu[local]" RUNS=5 WARMUP=1

Build docs

From the repo root:

# One-time setup
brew install doxygen                    # macOS (or install doxygen via apt on Linux)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
bundle install

# Generate docs
bundle exec rake docs:build

Docs are written to docs/build/html.

# Quick local preview
ruby -run -e httpd docs/build/html -p 8000

Then open http://localhost:8000/.

The repo’s Pages workflow builds docs together with the web demo for deployment.

Repository layout

lib/: Ruby API surface (core, nn, optimizers, dsl, distributed utilities), with ONNX public facade under lib/mlx/onnx.rb and ONNX harness helpers under lib/mlx-onnx/**.
ext/mlx/: core native extension build bridge (extconf.rb, C++ binding entry).
ext/mlx-onnx/: ONNX native binding layer loaded by the core extension.
submodules/mlx/: upstream MLX submodule.
submodules/mlx-onnx/: extracted ONNX core library submodule used by Ruby bindings.
examples/web/: web demo model/export helpers (GPT-2, nanoGPT, Stable Diffusion).
tasks/: rake task implementations (build, test, docs, benchmark, web, training/assets exporters).
web/: static demo site, generated assets, ONNX WebGPU harness templates.
test/: unit/task/parity suites.
test/support/parity/: coverage/report generators.
docs/: Sphinx + Doxygen documentation sources.

Troubleshooting

missing MLX include dir: initialize submodules (git submodule update --init --recursive).
mlx/mlx-onnx revision mismatch detected: sync the pinned submodules:

git submodule update --init --recursive submodules/mlx submodules/mlx-onnx

Native extension does not load: rebuild manually:

cd ext/mlx
ruby extconf.rb
make -j4

ONNX binary export fails checker/runtime loading: regenerate with MLX::ONNX.export_onnx / MLX::ONNX.graph_ir_to_onnx and validate with local onnx.checker tooling.
On Apple silicon, verify native architecture:

ruby -e 'require "rbconfig"; puts RbConfig::CONFIG["host_cpu"]'

Web smoke fails due missing runtime dependencies: run bundle exec rake deps:web (installs/checks onnx, node/npm/npx, playwright, and onnxruntime-web).
If CMake configure fails intermittently, rerun ruby extconf.rb; the build script already includes a clean-retry path.

Contributing

Open pull requests against this repository.
Keep parity snapshots in test/support/snapshots/parity/ in sync with contract changes.
Keep generated parity artifacts under test/reports/ (not source-controlled paths).
Follow upstream MLX contributor guidance where applicable: mlx/CONTRIBUTING.md.

CI currently runs on ubuntu-22.04 and macos-14 with Ruby 3.4 and 4.0.

License

mlx gem is distributed under the MIT license.