GitHub - bwasti/gt: [experimental] multiplexed distributed tensor framework

An experimental multiplexing tensor framework for distributed GPU computing.

pip install git+https://github.com/bwasti/gt.git
python -c 'import gt; print(gt.randn(2,2))'

General Idea

The motivation for this project is a rejection of the clunky lock-step paradigm ML researchers tend to use. GT attempts to pull some of the ideas that are present in the decades of development done on multi-core operating systems. It fully embraces dynamic scheduling and heavily asynchronous execution while presenting a familiar eager frontend.

Three components
- N × clients (as many users as you want!)
- 1 × dispatcher (for coordinating)
- N × workers (1 per GPU)
Everything communicates with a stream of instructions
- Clients deal with math. They emit (GPU-unaware) pure functional instructions
- The dispatcher rewrites these instructions on the fly to be GPU-aware and sends them to the workers
- Workers asynchronously process these instructions, optionally JIT compiling
Instruction streams are annotated
- Clients can send "signals" which allow the dispatcher to more appropriately shard the tensors
- Dispatchers annotate "hot" paths to give hints to workers about JIT compiling
- Annotations are supplemented with YAML configs that specify sharding and compilation information
- Every annotation can be safely ignored, so the same code can run anywhere (just remove the YAML)

Philosophy

Small is beautiful.
Make each program do one thing well.
Build a prototype as soon as possible.
Choose portability over efficiency.
Store data in flat text files.
Use software leverage to your advantage.
Use shell scripts to increase leverage and portability.
Avoid captive user interfaces.
Make every program a filter.

Quick Start

import gt

a = gt.randn(1000, 1000)
b = gt.randn(1000, 1000)
c = a @ b
result = c[:4, :4]
print(result)

It may not look like it, but in the background GT automatically spins up an asynchronous dispatching server and GPU worker.

Features

High-performance transport - ZeroMQ (ZMQ) with automatic message batching and efficient DEALER/ROUTER pattern
Autograd support - Tape-based automatic differentiation exclusively at the client layer
PyTorch-compatible API - Familiar syntax for tensor operations
Signal-based sharding - Declarative YAML configuration for distributed training
Real-time monitoring - htop-style visualization of worker activity
Instruction logging - Debug distributed execution with timeline visualizations
AI-assisted development - Optimized for collaboration with AI coding assistants

Documentation

📚 Read the full documentation

Getting Started

Installation - Install GT and verify setup
Usage Guide - Auto-server mode and distributed setup

Client API

Tensor Operations - Complete operation reference
Autograd - Automatic differentiation

Distributed Training

Signal-Based Sharding - Configure parallelism strategies
Tuning & Performance - Optimize performance
Monitoring Tools - Real-time monitoring and debugging

Workers

Backends - PyTorch and NumPy backends
Compilation - JIT compilation with torch.compile

Contributing

Contributing Guide - Development workflow, testing, and PR guidelines

Examples

See examples/ directory for demonstrations:

demo.py - Basic tensor operations
signal_demo.py - Signal-based sharding
compile_demo.py - Compilation directives
debug_demo.py - Debug utilities
visualize_demo.py - Instruction tape visualization

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                          User Code                              │
│  import gt                                                      │
│  with gt.signal.context('layer1'):                              │
│      x = gt.randn(100, 64)                                      │
│      loss = model(x)                                            │
│      loss.backward()                                            │
└──────────────────────┬──────────────────────────────────────────┘
                       │ PyTorch-like API + Signal Metadata
                       │
┌──────────────────────▼──────────────────────────────────────────┐
│                      gt/client/                                 │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐            │
│  │   Tensor     │  │  Autograd   │  │  nn.Module   │            │
│  │ (Remote Data)│  │   (Tape)    │  │  (Layers)    │            │
│  └──────────────┘  └─────────────┘  └──────────────┘            │
└──────────────────────┬──────────────────────────────────────────┘
                       │ ZMQ (DEALER → ROUTER)
                       │
┌──────────────────────▼──────────────────────────────────────────┐
│                    gt/dispatcher/                               │
│  • ZMQ ROUTER socket handles all connections                    │
│  • Reads signal configs from YAML                               │
│  • Routes operations based on sharding strategy                 │
│  • Logs instruction stream to file                              │
│  • Handles multiple clients concurrently                        │
└───────┬──────────────┬──────────────┬───────────────────────────┘
        │              │              │ ZMQ (DEALER ← ROUTER)
        │              │              │
    ┌───▼────┐    ┌───▼────┐    ┌───▼────┐
    │Worker 0│    │Worker 1│    │Worker N│ (1 per GPU)
    │PyTorch │    │PyTorch │    │PyTorch │
    │  GPU   │    │  GPU   │    │  GPU   │
    └────────┘    └────────┘    └────────┘

Optimized for AI Development

GT is designed to be understood, modified, and debugged with AI coding assistants:

CLAUDE.md - Detailed architecture documentation for AI assistants
Declarative YAML configs - Easy for AI to parse and generate
Tape-based debugging - Inspect computation graphs with gt.debug.print_tape()
Instruction logging - Track every operation with timestamps
Comprehensive test suite - 50+ tests serving as executable specifications

Contributing

Contributions welcome! This is a research prototype focused on simplicity and readability.

See Contributing Guide for development workflow, testing, code style, and PR guidelines.

License

MIT

See License for details.