GitHub - pyember/ember

Contributors

This repository is in collaboration with the following early users, contributors, and reviewers:

Jared Quincy Davis^F,S, Marquita Ellis^I, Diana Arroyo^I, Pravein Govindan Kannan^I, Paul Castro^I, Siddharth Sharma^F,S, Lingjiao Chen^MS, Omar Khattab^D,MT, Alan Zhu^B, Parth Asawa^B, Connor Chow^B, Jason Lee^B, Jay Adityanag Tipirneni^B, Chad Ferguson^B, Kathleen Ge^B, Kunal Agrawal^B, Rishab Bhatia^B, Rohan Penmatcha^B, Sai Kolasani^B, Théo Jaffrelot Inizan^B, Deepak Narayanan^N, Long Fei^F, Aparajit Raghavan^F, Eyal Cidon^F, Jacob Schein^F, Prasanth Somasundar^F, Boris Hanin^F,P, James Zou^S, Alex Dimakis^B, Joey Gonzalez^B, Peter Bailis^G,S, Ion Stoica^A,B,D, Matei Zaharia^D,B

^F Foundry (MLFoundry), ^D Databricks, ^I IBM Research, ^S Stanford University, ^B UC Berkeley, ^MT MIT, ^N NVIDIA, ^MS Microsoft, ^A Anyscale, ^G Google, ^P Princeton

Ember: A Compositional Framework for Compound AI Systems

Ember in a Nutshell

Aspirationally, Ember is to Networks of Networks (NONs) Compound AI Systems development what PyTorch and XLA are to Neural Networks (NN) development. It's a compositional framework with both eager execution affordances and graph execution optimization capabilities. It enables users to compose complex NONs, and supports automatic parallelization and optimization of these.

Ember's vision is to enable development of compound AI systems composed of, one day, millions-billions of inference calls and beyond. Simple constructs--like best-of-N graphs, verifier-prover structures, and ensembles with “voting-based” aggregation--work surprisingly well in many regimes.

# With Ember's "compact notation" it is one line to build a simple parallel system with 101 GPT-4o instances synthesized by Claude
system = non.build_graph(["101:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"])  # Automatically parallelized
result = system(query="What's the most effective climate change solution?")

This led us to believe that there is a rich architecture space for constructing and optimizing what we call “networks of networks” graphs, or NONs. This is analogous to how neural network architecture research uncovered many emergent properties of systems composed of simple artificial neurons. It would be frictionful to conduct NN research if we had to implement architectures from scratch via for-loops or implement bespoke libraries for vectorization and efficient execution. Similarly, it can be challenging at present to compose NON architectures of many calls, despite the rapidly falling cost-per-token of intelligence.

Ember's goal is to help unlock research and practice along this new frontier.

Documentation & Examples

Simple Example: Ensemble Reasoning with Automatic Parallelization

class QueryInput(EmberModel):
    query: str
    
class ConfidenceOutput(EmberModel):
    answer: str
    confidence: float

class ReasonerSpec(Specification):
    input_model = QueryInput
    structured_output = ConfidenceOutput

@jit # Autonomically optimize execution with JIT compilation (e.g. TopoSort with Parallel Dispatch)
class EnsembleReasoner(Operator[QueryInput, ConfidenceOutput]):
    specification = ReasonerSpec()
    
    def __init__(self, width: int = 3):
        self.ensemble = non.UniformEnsemble(
            num_units=width,
            model_name="openai:gpt-4o",
            temperature=0.7
        )
        
        self.judge = non.JudgeSynthesis(
            model_name="anthropic:claude-3-5-sonnet",
        )
    
    def forward(self, *, inputs: QueryInput) -> ReasonedOutput:
        # These operations are automatically parallelized by Ember's XCS system
        ensemble_result = self.ensemble(query=inputs.query)
        
        synthesis = self.judge(
            query=inputs.query,
            responses=ensemble_result["responses"]
        )
        
        return ConfidenceOutput(
            answer=synthesis["final_answer"],
            confidence=float(synthesis.get("confidence", 0.0))
        )

# Use it like any Python function
compound_system = EnsembleReasoner()
result = compound_system(query="What causes the northern lights?")
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2f}")

# Alternatively, build the same pipeline with compact notation
pipeline = non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.2"])
result = pipeline(query="What causes the northern lights?")

Compact Notation

Ember's compact notation allows expression of complex AI architectures in minimal code:

# Compact notation: "count:type:model:temperature" - each component precisely specified

# BASIC: Single-line systems with automatic parallelization
basic = non.build_graph(["7:E:gpt-4o:0.7"])                             # 7-model ensemble
voting = non.build_graph(["7:E:gpt-4o:0.7", "1:M"])                     # With majority voting
judged = non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"])   # With judge synthesis

# STANDARD API: Equivalent to compact notation but with explicit objects
standard_system = non.Sequential(operators=[
    non.UniformEnsemble(num_units=7, model_name="gpt-4o", temperature=0.7),
    non.JudgeSynthesis(model_name="claude-3-5-sonnet", temperature=0.0)
])

# ADVANCED: Reusable components for complex architectures
components = {
    # Define building blocks once, reuse everywhere
    "reasoning": ["3:E:gpt-4o:0.7", "1:V:gpt-4o:0.0"],           # Verification pipeline
    "research": ["3:E:claude-3-5-sonnet:0.5", "1:V:claude-3-5-sonnet:0.0"]  # Different models
}

# Build sophisticated multi-branch architecture in just 4 lines
advanced = non.build_graph([
    "$reasoning",                     # First branch: reasoning with verification
    "$research",                      # Second branch: research with verification
    "1:J:claude-3-5-opus:0.0"         # Final synthesis of both branches
], components=components)             # Automatically optimized for parallel execution

# HORIZONTAL SCALING: Systematically explore scaling behavior
systems = {
    # Scaling with MostCommon aggregation
    "width_3_voting": non.build_graph(["3:E:gpt-4o:0.7", "1:M"]), 
    "width_7_voting": non.build_graph(["7:E:gpt-4o:0.7", "1:M"]),
    "width_11_voting": non.build_graph(["11:E:gpt-4o:0.7", "1:M"]),
    
    # Scaling with judge synthesis
    "width_3_judge": non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
    "width_7_judge": non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
    "width_11_judge": non.build_graph(["11:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]),
}

# Execute with full parallelism (XCS optimizes the execution graph automatically)
query = "What's the most effective climate change solution?"
results = {name: system(query=query) for name, system in systems.items()}

Core Elements

Composable Operators with Rigorous Specification: Build reliable compound AI systems from type-safe, reusable components with validated inputs and outputs
Automatic Parallelization: Independent operations are automatically executed concurrently across a full computational graph
XCS Optimization Framework: "Accelerated Compound Systems" Just-in-time tracing and execution optimization with multiple strategies (trace, structural, enhanced). XCS is inspired by XLA, but intended more for accelerating compound systems vs. linear algebra operations, tuned for models and dicts, vs for vectors and numerical computation.
Multi-Provider Support: Unified API across OpenAI, Anthropic, Claude, Gemini, and more with standardized usage tracking
Transformation System: Function transformations for vectorization (vmap), parallelization (pmap), and device sharding (mesh), with a composable interface for building complex transformations

XCS Architecture

The Accelerated Compound Systems (XCS) module provides a computational graph-based system for building, optimizing, and executing complex operator pipelines:

Unified JIT System: Multiple compilation strategies under a consistent interface:
- trace: Traditional execution tracing
- structural: Structure-based analysis
- enhanced: Improved parallelism detection and code analysis
Scheduler Framework: Pluggable scheduler implementations for different execution patterns:
- sequential: Serial execution for debugging and determinism
- parallel: Thread-based parallel execution
- wave: Execution wave scheduling for optimal parallelism
- topological: Dependency-based execution ordering
Transform System: High-level operations for data and computation transformations:
- vmap: Vectorized mapping for batch processing
- pmap: Parallel mapping across multiple workers
- mesh: Device mesh-based sharding for multi-device execution
Dependency Analysis: Automatic extraction of dependencies between operations:
- Transitive closure calculation for complete dependency mapping
- Topological sorting with cycle detection
- Execution wave computation for parallel scheduling

Installation

Ember uses uv as its recommended package manager for significantly faster installations and dependency resolution.

# First, install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh  # macOS/Linux
# or 
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows
# or
pip install uv  # Any platform

# Quick install using uv (recommended)
uv pip install ember-ai

# Run examples directly with uv (no activation needed)
uv run python -c "import ember; print(ember.__version__)"

# Install from source for development
git clone https://github.com/pyember/ember.git
cd ember
uv pip install -e ".[dev]"

# Traditional pip installation (alternative, slower)
pip install ember-ai

For detailed installation instructions, troubleshooting, and environment management, see our Installation Guide.

Model Registry & Provider Integration

Access models from any provider through a unified interface:

from ember import initialize_ember
from ember.api.models import ModelEnum

# Initialize with multiple providers
service = initialize_ember(usage_tracking=True)

# Access models from different providers with the same API
response = service(ModelEnum.gpt_4o, "What is quantum computing?")
print(response.data)

# Track usage across providers
usage = service.usage_service.get_total_usage()
print(f"Total cost: ${usage.cost:.4f}")

NON Patterns & Ensembling

Build compound AI system architectures using the Network of Networks (NON) pattern with pre-built components:

from ember.api import non

# Standard API: Create a verification pipeline of ensemble→judge→verifier
pipeline = non.Sequential(operators=[
    # 1. Ensemble of 5 model instances running in parallel
    non.UniformEnsemble(
        num_units=5, 
        model_name="openai:gpt-4o-mini",
        temperature=0.7
    ),
    
    # 2. Judge to synthesize the ensemble responses
    non.JudgeSynthesis(
        model_name="anthropic:claude-3-5-sonnet",
        temperature=0.2
    ),
    
    # 3. Verifier for quality control and fact-checking
    non.Verifier(
        model_name="anthropic:claude-3-5-haiku",
        temperature=0.0
    )
])

# Alternatively, create the same pipeline with compact notation
pipeline = non.build_graph([
    "5:E:gpt-4o-mini:0.7",        # Ensemble with 5 instances
    "1:J:claude-3-5-sonnet:0.2",  # Judge synthesis
    "1:V:claude-3-5-haiku:0.0"    # Verification
])

# Build advanced architectures like NestedNetwork from example_architectures.py
# Define reusable SubNetwork component
components = {
    "sub": ["2:E:gpt-4o:0.0", "1:V:gpt-4o:0.0"]  # Ensemble → Verifier
}

# Create a NestedNetwork with identical structure to the OOP implementation
nested = non.build_graph([
    "$sub",                # First SubNetwork branch
    "$sub",                # Second SubNetwork branch
    "1:J:gpt-4o:0.0"       # Judge to synthesize results
], components=components)

# Extend with custom operator types
custom_registry = non.OpRegistry.create_standard_registry()
custom_registry.register(
    "CE",  # Custom ensemble type
    lambda count, model, temp: non.Sequential(operators=[
        non.UniformEnsemble(num_units=count, model_name=model, temperature=temp),
        non.MostCommon()  # Auto-aggregation 
    ])
)

# Use custom operators
advanced = non.build_graph(["3:CE:gpt-4o:0.7"], type_registry=custom_registry)

# Execute with a single call
result = pipeline(query="What causes tsunamis?")

Graph Optimization & Execution

Ember's XCS system provides JAX/XLA-inspired tracing, transformation, and automatic parallelization:

from ember.xcs import jit, execution_options, vmap, pmap, compose, explain_jit_selection
from ember.api.operators import Operator

# Basic JIT compilation with automatic strategy selection
@jit
class SimplePipeline(Operator):
    # ... operator implementation ...

# JIT with explicit mode selection
@jit(mode="enhanced")
class ComplexPipeline(Operator):
    def __init__(self):
        self.op1 = SubOperator1()
        self.op2 = SubOperator2()
        self.op3 = SubOperator3()
    
    def forward(self, *, inputs):
        # These operations will be automatically parallelized
        result1 = self.op1(inputs=inputs)
        result2 = self.op2(inputs=inputs)
        
        # Combine the parallel results
        combined = self.op3(inputs={"r1": result1, "r2": result2})
        return combined

# Configure execution parameters
with execution_options(scheduler="wave", max_workers=4):
    result = pipeline(query="Complex question...") 

# Get explanation for JIT strategy selection
explanation = explain_jit_selection(pipeline)
print(f"JIT strategy: {explanation['strategy']}")
print(f"Rationale: {explanation['rationale']}")

# Vectorized mapping for batch processing
batch_processor = vmap(my_operator)
batch_results = batch_processor(inputs={"data": [item1, item2, item3]})

# Parallel execution across multiple workers
parallel_processor = pmap(my_operator, num_workers=4)
parallel_results = parallel_processor(inputs=complex_data)

# Compose transformations (vectorization + parallelism)
pipeline = compose(vmap(batch_size=32), pmap(num_workers=4))(my_operator)

Data Handling & Evaluation

Ember provides a comprehensive data processing and evaluation framework with pre-built datasets and metrics:

from ember.api.data import DatasetBuilder
from ember.api.eval import EvaluationPipeline, Evaluator

# Load a dataset with the builder pattern
dataset = (DatasetBuilder()
    .from_registry("mmlu")  # Use a registered dataset
    .subset("physics")      # Select a specific subset
    .split("test")          # Choose the test split
    .sample(100)            # Random sample of 100 items
    .transform(              # Apply transformations
        lambda x: {"query": f"Question: {x['question']}"} 
    )
    .build())

# Create a comprehensive evaluation pipeline
eval_pipeline = EvaluationPipeline([
    # Standard metrics
    Evaluator.from_registry("accuracy"),
    Evaluator.from_registry("response_quality"),
    
    # Custom evaluation metrics
    Evaluator.from_function(
        lambda prediction, reference: {
            "factual_accuracy": score_factual_content(prediction, reference)
        }
    )
])

# Evaluate a model or operator
results = eval_pipeline.evaluate(my_model, dataset)
print(f"Accuracy: {results['accuracy']:.2f}")
print(f"Response Quality: {results['response_quality']:.2f}")
print(f"Factual Accuracy: {results['factual_accuracy']:.2f}")

License

Ember is released under the MIT License.