GitHub - skryl/mlx-ruby: Ruby Bindings for the MLX Framework

8 min read Original article ↗

MLX Ruby

Build and Test RubyGems Documentation

Index

About

Ruby bindings for MLX: a NumPy-like array framework for machine learning.

This repository packages:

  • A native Ruby extension backed by the upstream C++ MLX runtime.
  • Ruby APIs for MLX::Core, MLX::NN, MLX::Optimizers, MLX::Utils, distributed helpers, and MLX::DSL.
  • Graph IR -> ONNX export plus browser harness tooling for WebGPU/wasm paths.
  • Parity/contract tooling and benchmark adapters for local models and mlx-ruby-examples.

Highlights

  • Lazy arrays and dynamic graph construction.
  • Function transforms (grad, value_and_grad, vmap, jvp, vjp, compile, and more).
  • Neural-network layers, losses, initialization, and optimizers.
  • Ruby DSL model/training primitives (train_step, trainer, checkpoints, data pipelines, experiments).
  • Device-aware execution (cpu/gpu, including Metal-backed GPU on Apple silicon when available).
  • Graph IR validation, ONNX export, and WebGPU browser harness generation.
  • Extensive parity testing (op-level, model fixture, browser harness, and examples-submodule coverage).

Requirements

  • Core build/runtime:
    • Ruby >= 3.3 (from mlx.gemspec)
    • Git (with submodule support)
    • CMake >= 3.25
    • C++20-capable toolchain
    • macOS: Xcode command-line tools + Metal toolchain
    • Linux: standard build tools + BLAS/LAPACK headers (build-essential cmake libopenblas-dev liblapacke-dev)
  • ONNX export / benchmark helpers:
    • Python 3 with packages from requirements.txt
    • onnx package available for parity/check utilities in tests and tooling
  • Web smoke/harness workflows:
    • Node.js + npm (for playwright and onnxruntime-web)
  • Docs build:
    • Python 3 + pip
    • doxygen
    • make
    • Python deps from requirements.txt

Installation

macOS prerequisite: install MetalToolchain

On macOS, install the Apple Metal toolchain before installing the gem:

xcode-select --install
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
xcodebuild -downloadComponent MetalToolchain

Optional check:

Install from RubyGems

Install from source (recommended for development)

git clone --recurse-submodules https://github.com/skryl/mlx-ruby.git
cd mlx-ruby
bundle install
bundle exec rake build
bundle exec rake test

If you already cloned without submodules:

git submodule update --init --recursive

Build and install a local gem:

gem build mlx.gemspec
gem install ./mlx-*.gem

Use from another project via local path:

gem "mlx", path: "/absolute/path/to/mlx-ruby"

Verify installation

bundle exec ruby -e 'require "mlx"; puts MLX::VERSION; puts "native=#{MLX.native_available?}"'

Examples

Primary end-to-end examples live in skryl/mlx-ruby-examples.

Web demo model/export scripts in this repo are under:

  • examples/web/

Quickstart

Arrays and lazy execution

require "mlx"

mx = MLX::Core
x = mx.array([1.0, 2.0, 3.0], mx.float32)
y = mx.sqrt(x + 1.0)

mx.eval(y)         # force materialization
p y.to_a           # => [1.414..., 1.732..., 2.0]

Minimal trainable module

require "mlx"

mx = MLX::Core

class LinearRegressorDsl < MLX::DSL::Model
  option :in_dim, default: 3
  option :out_dim, default: 1
  layer :linear, MLX::NN::Linear, -> { in_dim }, -> { out_dim }

  def call(x)
    linear.call(x)
  end
end

model = LinearRegressorDsl.new
optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-2)

step = model.train_step(optimizer: optimizer, sync: :step) do |x:, y:|
  diff = model.call(x) - y
  mx.mean(diff * diff)
end

x = mx.array([[1.0, 2.0, 3.0], [2.0, 1.0, 0.0]], mx.float32)
y = mx.array([[1.0], [0.0]], mx.float32)

5.times do |iter|
  loss = step.call(x: x, y: y)
  puts "step=#{iter} loss=#{loss.item}"
end

Small CNN (single training step)

require "mlx"

mx = MLX::Core

class SmallCnnDsl < MLX::DSL::Model
  option :num_classes, default: 10

  layer :features do
    sequential do
      conv2d 1, 16, 3, padding: 1
      relu
      max_pool2d 2, stride: 2
      conv2d 16, 32, 3, padding: 1
      relu
      max_pool2d 2, stride: 2
    end
  end

  layer :classifier do
    sequential do
      fn { |x| MLX::Core.reshape(x, [x.shape[0], 32 * 7 * 7]) }
      linear 32 * 7 * 7, 64
      relu
      linear 64, num_classes
    end
  end

  def call(x)
    classifier.call(features.call(x))
  end
end

model = SmallCnnDsl.new(num_classes: 10)
optimizer = MLX::Optimizers::Adam.new(learning_rate: 1e-3)

step = model.train_step(optimizer: optimizer, sync: :step) do |images:, labels:|
  logits = model.call(images)
  MLX::NN.cross_entropy(logits, labels, reduction: "mean")
end

images = mx.random_uniform([4, 28, 28, 1], 0.0, 1.0, mx.float32)
labels = mx.array([1, 3, 4, 7], mx.int32)

loss = step.call(images: images, labels: labels)
puts "cnn_loss=#{loss.item}"

Karpathy-style nano GPT (single training step)

require "mlx"

mx = MLX::Core
vocab_size = 65
seq_len = 32
batch_size = 4
dims = 128
heads = 4
layers = 2

class NanoGptDsl < MLX::DSL::Model
  option :vocab_size
  option :seq_len
  option :dims
  option :heads
  option :layers

  layer :token_embedding, MLX::NN::Embedding, -> { vocab_size }, -> { dims }
  layer :pos_embedding, MLX::NN::Embedding, -> { seq_len }, -> { dims }
  layer :encoder, MLX::NN::TransformerEncoder, -> { layers }, -> { dims }, -> { heads },
    mlp_dims: -> { dims * 4 },
    dropout: 0.0,
    norm_first: true
  layer :head, MLX::NN::Linear, -> { dims }, -> { vocab_size }

  def call(input_ids)
    positions = MLX::Core.arange(0, input_ids.shape[1], 1, MLX::Core.int32)
    hidden = MLX::Core.add(token_embedding.call(input_ids), pos_embedding.call(positions))
    mask = MLX::NN::MultiHeadAttention.create_additive_causal_mask(input_ids.shape[1])
    head.call(encoder.call(hidden, mask))
  end
end

tokens = Array.new(batch_size) { Array.new(seq_len) { rand(vocab_size) } }
targets = tokens.map { |row| row[1..] + [0] }

input_ids = mx.array(tokens, mx.int32)
target_ids = mx.array(targets, mx.int32)

model = NanoGptDsl.new(vocab_size: vocab_size, seq_len: seq_len, dims: dims, heads: heads, layers: layers)
optimizer = MLX::Optimizers::AdamW.new(learning_rate: 1e-3)

step = model.train_step(optimizer: optimizer, sync: :step) do |input_ids:, target_ids:|
  logits = model.call(input_ids)
  logits2d = MLX::Core.reshape(logits, [batch_size * seq_len, vocab_size])
  labels1d = MLX::Core.reshape(target_ids, [batch_size * seq_len])
  MLX::NN.cross_entropy(logits2d, labels1d, reduction: "mean")
end

loss = step.call(input_ids: input_ids, target_ids: target_ids)
puts "nanogpt_loss=#{loss.item}"

Device selection

Default device selection runs during require "mlx":

  • MLX_DEFAULT_DEVICE=cpu|gpu|metal
  • fallback: DEVICE=cpu|gpu|metal

On systems without Metal-backed GPU support, gpu/metal requests fall back to CPU.

Example:

MLX_DEFAULT_DEVICE=gpu bundle exec ruby your_script.rb

Onnx/WebGPU Support

MLX Ruby exposes Graph IR/ONNX/WebGPU entrypoints on MLX::ONNX.

Architecture boundary:

  • Public API (MLX::ONNX):
    • export_onnx
    • export_onnx_json
    • export_onnx_compatibility_report
    • export_graph_ir
    • export_graph_ir_json
    • graph_ir_to_onnx
    • graph_ir_to_onnx_json
  • Internal implementation modules:
    • MLX::ONNX
    • MLX::ONNX::Native
    • MLX::ONNX::WebGPUHarness

End-to-end flow:

  1. Export Graph IR hash with MLX::ONNX.export_graph_ir (or JSON debug payload with MLX::ONNX.export_graph_ir_json).
  2. Generate ONNX JSON debug stubs with MLX::ONNX.graph_ir_to_onnx_json or directly with MLX::ONNX.export_onnx_json.
  3. Run ONNX export readiness diagnostics with MLX::ONNX.export_onnx_compatibility_report and inspect unsupported_ops.
  4. Export binary ONNX with MLX::ONNX.graph_ir_to_onnx or directly with MLX::ONNX.export_onnx (external_data options are available for large models).
  5. Package browser harness assets with MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness.
  6. Verify runtime behavior with MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness.

Harness artifacts from MLX::ONNX::WebGPUHarness.export_onnx_webgpu_harness:

  • model.onnx
  • harness.manifest.json
  • inputs.example.json
  • index.html
  • harness.js
  • optional external data file (for example model.data)

Smoke telemetry from MLX::ONNX::WebGPUHarness.smoke_test_onnx_webgpu_harness uses onnx_webgpu_telemetry_v1 and reports provider selection/fallback details (selected_provider, requested_providers, fallback_used) plus timing fields (run_timings_ms, model_load_latency_ms, first_inference_latency_ms, steady_state_inference_latency_ms).

Operational requirements:

  • MLX::ONNX.export_onnx and MLX::ONNX.graph_ir_to_onnx require a path-like target (not IO).
  • Browser smoke tests require Node.js + Playwright (web/) and optionally local onnxruntime-web assets.
  • Harness execution providers are webgpu and wasm.
  • web:assets exports GPT-2, nanoGPT Shakespeare, and Stable Diffusion assets each run via Hugging Face checkpoints.

Demo asset workflows:

  • Generate browser assets: bundle exec rake web:assets
  • Start local demo server: bundle exec rake web:start (or bundle exec rake web:serve)

Web Demo quickstart:

bundle exec rake web:assets
bundle exec rake web:serve

GitHub Pages note: the published demo site includes GPT-2, nanoGPT, and Stable Diffusion assets. The Stable Diffusion page may take longer to initialize due to larger model files.

Then open:

  • http://127.0.0.1:3030/
  • http://127.0.0.1:3030/demo/gpt2/
  • http://127.0.0.1:3030/demo/nanogpt/
  • http://127.0.0.1:3030/demo/stable_diffusion/

API reference:

  • docs/src/ruby/export.rst

Development

Build native extension

Clean native build artifacts

Run tests

Test task shortcuts:

  • CPU-only: bundle exec rake "test[cpu]"
  • GPU-only: bundle exec rake "test[gpu]"
  • Installed gem artifact test: bundle exec rake test:gem

Strict mode (per-file timeout):

MLX_STRICT_TESTS=1 MLX_TEST_TIMEOUT=30 bundle exec rake test

Benchmarks (Ruby vs Python implementations)

List tasks:

Run one benchmark lane:

bundle exec rake "benchmark:cpu[local]"

Run all benchmark suites:

bundle exec rake "benchmark:all[local,examples]"

Install benchmark Python dependencies into your active Python environment (for asdf users, this is the Python selected by your current shell / .tool-versions):

bundle exec rake benchmark:deps

Common benchmark environment variables:

Variable Default Purpose
DEVICE gpu Compute device (cpu, gpu, or metal)
RUNS 50 Timed iterations (ITERATIONS is accepted for compatibility)
WARMUP 10 Warmup iterations
BATCH 8 Batch size
SEQUENCE_LENGTH 128 Source sequence length
TARGET_SEQUENCE_LENGTH 64 Target sequence length
DIMENSIONS 256 Model width
HEADS 8 Attention heads
LAYERS 4 Number of layers
PYTHON python3 Python executable for cross-language comparison
BENCHMARK_DEVICES cpu,gpu Devices for top-level rake benchmark
EXAMPLES_MODE dsl Examples-submodule mode (dsl or no_dsl)
WEBGPU_TIMEOUT 180 WebGPU harness timeout seconds
WEBGPU_WARMUP benchmark warmup WebGPU warmup runs
WEBGPU_MEASURE benchmark runs WebGPU measured runs
REQUIRE_WEBGPU unset Fail instead of skip when WebGPU provider is unavailable

Quick benchmark smoke command:

bundle exec rake "benchmark:cpu[local]" RUNS=5 WARMUP=1

Build docs

From the repo root:

# One-time setup
brew install doxygen                    # macOS (or install doxygen via apt on Linux)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
bundle install

# Generate docs
bundle exec rake docs:build

Docs are written to docs/build/html.

# Quick local preview
ruby -run -e httpd docs/build/html -p 8000

Then open http://localhost:8000/.

The repo’s Pages workflow builds docs together with the web demo for deployment.

Repository layout

  • lib/: Ruby API surface (core, nn, optimizers, dsl, distributed utilities), with ONNX public facade under lib/mlx/onnx.rb and ONNX harness helpers under lib/mlx-onnx/**.
  • ext/mlx/: core native extension build bridge (extconf.rb, C++ binding entry).
  • ext/mlx-onnx/: ONNX native binding layer loaded by the core extension.
  • submodules/mlx/: upstream MLX submodule.
  • submodules/mlx-onnx/: extracted ONNX core library submodule used by Ruby bindings.
  • examples/web/: web demo model/export helpers (GPT-2, nanoGPT, Stable Diffusion).
  • tasks/: rake task implementations (build, test, docs, benchmark, web, training/assets exporters).
  • web/: static demo site, generated assets, ONNX WebGPU harness templates.
  • test/: unit/task/parity suites.
  • test/support/parity/: coverage/report generators.
  • docs/: Sphinx + Doxygen documentation sources.

Troubleshooting

  • missing MLX include dir: initialize submodules (git submodule update --init --recursive).
  • mlx/mlx-onnx revision mismatch detected: sync the pinned submodules:
git submodule update --init --recursive submodules/mlx submodules/mlx-onnx
  • Native extension does not load: rebuild manually:
cd ext/mlx
ruby extconf.rb
make -j4
  • ONNX binary export fails checker/runtime loading: regenerate with MLX::ONNX.export_onnx / MLX::ONNX.graph_ir_to_onnx and validate with local onnx.checker tooling.
  • On Apple silicon, verify native architecture:
ruby -e 'require "rbconfig"; puts RbConfig::CONFIG["host_cpu"]'
  • Web smoke fails due missing runtime dependencies: run bundle exec rake deps:web (installs/checks onnx, node/npm/npx, playwright, and onnxruntime-web).
  • If CMake configure fails intermittently, rerun ruby extconf.rb; the build script already includes a clean-retry path.

Contributing

  • Open pull requests against this repository.
  • Keep parity snapshots in test/support/snapshots/parity/ in sync with contract changes.
  • Keep generated parity artifacts under test/reports/ (not source-controlled paths).
  • Follow upstream MLX contributor guidance where applicable: mlx/CONTRIBUTING.md.

CI currently runs on ubuntu-22.04 and macos-14 with Ruby 3.4 and 4.0.

License

mlx gem is distributed under the MIT license.