Contributors
This repository is in collaboration with the following early users, contributors, and reviewers:
Jared Quincy DavisF,S, Marquita EllisI, Diana ArroyoI, Pravein Govindan KannanI, Paul CastroI, Siddharth SharmaF,S, Lingjiao ChenMS, Omar KhattabD,MT, Alan ZhuB, Parth AsawaB, Connor ChowB, Jason LeeB, Jay Adityanag TipirneniB, Chad FergusonB, Kathleen GeB, Kunal AgrawalB, Rishab BhatiaB, Rohan PenmatchaB, Sai KolasaniB, Théo Jaffrelot InizanB, Deepak NarayananN, Long FeiF, Aparajit RaghavanF, Eyal CidonF, Jacob ScheinF, Prasanth SomasundarF, Boris HaninF,P, James ZouS, Alex DimakisB, Joey GonzalezB, Peter BailisG,S, Ion StoicaA,B,D, Matei ZahariaD,B
F Foundry (MLFoundry), D Databricks, I IBM Research, S Stanford University, B UC Berkeley, MT MIT, N NVIDIA, MS Microsoft, A Anyscale, G Google, P Princeton
Ember: A Compositional Framework for Compound AI Systems
Ember in a Nutshell
Aspirationally, Ember is to Networks of Networks (NONs) Compound AI Systems development what PyTorch and XLA are to Neural Networks (NN) development. It's a compositional framework with both eager execution affordances and graph execution optimization capabilities. It enables users to compose complex NONs, and supports automatic parallelization and optimization of these.
Ember's vision is to enable development of compound AI systems composed of, one day, millions-billions of inference calls and beyond. Simple constructs--like best-of-N graphs, verifier-prover structures, and ensembles with “voting-based” aggregation--work surprisingly well in many regimes.
# With Ember's "compact notation" it is one line to build a simple parallel system with 101 GPT-4o instances synthesized by Claude system = non.build_graph(["101:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]) # Automatically parallelized result = system(query="What's the most effective climate change solution?")
This led us to believe that there is a rich architecture space for constructing and optimizing what we call “networks of networks” graphs, or NONs. This is analogous to how neural network architecture research uncovered many emergent properties of systems composed of simple artificial neurons. It would be frictionful to conduct NN research if we had to implement architectures from scratch via for-loops or implement bespoke libraries for vectorization and efficient execution. Similarly, it can be challenging at present to compose NON architectures of many calls, despite the rapidly falling cost-per-token of intelligence.
Ember's goal is to help unlock research and practice along this new frontier.
Documentation & Examples
- Architecture Overview
- Quick Start Guide
- LLM Specifications
- Model Registry Guide
- Operators Guide
- NON Patterns
- Data Processing
- Configuration
- Examples Directory
Simple Example: Ensemble Reasoning with Automatic Parallelization
class QueryInput(EmberModel): query: str class ConfidenceOutput(EmberModel): answer: str confidence: float class ReasonerSpec(Specification): input_model = QueryInput structured_output = ConfidenceOutput @jit # Autonomically optimize execution with JIT compilation (e.g. TopoSort with Parallel Dispatch) class EnsembleReasoner(Operator[QueryInput, ConfidenceOutput]): specification = ReasonerSpec() def __init__(self, width: int = 3): self.ensemble = non.UniformEnsemble( num_units=width, model_name="openai:gpt-4o", temperature=0.7 ) self.judge = non.JudgeSynthesis( model_name="anthropic:claude-3-5-sonnet", ) def forward(self, *, inputs: QueryInput) -> ReasonedOutput: # These operations are automatically parallelized by Ember's XCS system ensemble_result = self.ensemble(query=inputs.query) synthesis = self.judge( query=inputs.query, responses=ensemble_result["responses"] ) return ConfidenceOutput( answer=synthesis["final_answer"], confidence=float(synthesis.get("confidence", 0.0)) ) # Use it like any Python function compound_system = EnsembleReasoner() result = compound_system(query="What causes the northern lights?") print(f"Answer: {result.answer}") print(f"Confidence: {result.confidence:.2f}") # Alternatively, build the same pipeline with compact notation pipeline = non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.2"]) result = pipeline(query="What causes the northern lights?")
Compact Notation
Ember's compact notation allows expression of complex AI architectures in minimal code:
# Compact notation: "count:type:model:temperature" - each component precisely specified # BASIC: Single-line systems with automatic parallelization basic = non.build_graph(["7:E:gpt-4o:0.7"]) # 7-model ensemble voting = non.build_graph(["7:E:gpt-4o:0.7", "1:M"]) # With majority voting judged = non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]) # With judge synthesis # STANDARD API: Equivalent to compact notation but with explicit objects standard_system = non.Sequential(operators=[ non.UniformEnsemble(num_units=7, model_name="gpt-4o", temperature=0.7), non.JudgeSynthesis(model_name="claude-3-5-sonnet", temperature=0.0) ]) # ADVANCED: Reusable components for complex architectures components = { # Define building blocks once, reuse everywhere "reasoning": ["3:E:gpt-4o:0.7", "1:V:gpt-4o:0.0"], # Verification pipeline "research": ["3:E:claude-3-5-sonnet:0.5", "1:V:claude-3-5-sonnet:0.0"] # Different models } # Build sophisticated multi-branch architecture in just 4 lines advanced = non.build_graph([ "$reasoning", # First branch: reasoning with verification "$research", # Second branch: research with verification "1:J:claude-3-5-opus:0.0" # Final synthesis of both branches ], components=components) # Automatically optimized for parallel execution # HORIZONTAL SCALING: Systematically explore scaling behavior systems = { # Scaling with MostCommon aggregation "width_3_voting": non.build_graph(["3:E:gpt-4o:0.7", "1:M"]), "width_7_voting": non.build_graph(["7:E:gpt-4o:0.7", "1:M"]), "width_11_voting": non.build_graph(["11:E:gpt-4o:0.7", "1:M"]), # Scaling with judge synthesis "width_3_judge": non.build_graph(["3:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]), "width_7_judge": non.build_graph(["7:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]), "width_11_judge": non.build_graph(["11:E:gpt-4o:0.7", "1:J:claude-3-5-sonnet:0.0"]), } # Execute with full parallelism (XCS optimizes the execution graph automatically) query = "What's the most effective climate change solution?" results = {name: system(query=query) for name, system in systems.items()}
Core Elements
- Composable Operators with Rigorous Specification: Build reliable compound AI systems from type-safe, reusable components with validated inputs and outputs
- Automatic Parallelization: Independent operations are automatically executed concurrently across a full computational graph
- XCS Optimization Framework: "Accelerated Compound Systems" Just-in-time tracing and execution optimization with multiple strategies (trace, structural, enhanced). XCS is inspired by XLA, but intended more for accelerating compound systems vs. linear algebra operations, tuned for models and dicts, vs for vectors and numerical computation.
- Multi-Provider Support: Unified API across OpenAI, Anthropic, Claude, Gemini, and more with standardized usage tracking
- Transformation System: Function transformations for vectorization (vmap), parallelization (pmap), and device sharding (mesh), with a composable interface for building complex transformations
XCS Architecture
The Accelerated Compound Systems (XCS) module provides a computational graph-based system for building, optimizing, and executing complex operator pipelines:
-
Unified JIT System: Multiple compilation strategies under a consistent interface:
trace: Traditional execution tracingstructural: Structure-based analysisenhanced: Improved parallelism detection and code analysis
-
Scheduler Framework: Pluggable scheduler implementations for different execution patterns:
sequential: Serial execution for debugging and determinismparallel: Thread-based parallel executionwave: Execution wave scheduling for optimal parallelismtopological: Dependency-based execution ordering
-
Transform System: High-level operations for data and computation transformations:
vmap: Vectorized mapping for batch processingpmap: Parallel mapping across multiple workersmesh: Device mesh-based sharding for multi-device execution
-
Dependency Analysis: Automatic extraction of dependencies between operations:
- Transitive closure calculation for complete dependency mapping
- Topological sorting with cycle detection
- Execution wave computation for parallel scheduling
Installation
Ember uses uv as its recommended package manager for significantly faster installations and dependency resolution.
# First, install uv if you don't have it curl -LsSf https://astral.sh/uv/install.sh | sh # macOS/Linux # or powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows # or pip install uv # Any platform # Quick install using uv (recommended) uv pip install ember-ai # Run examples directly with uv (no activation needed) uv run python -c "import ember; print(ember.__version__)" # Install from source for development git clone https://github.com/pyember/ember.git cd ember uv pip install -e ".[dev]" # Traditional pip installation (alternative, slower) pip install ember-ai
For detailed installation instructions, troubleshooting, and environment management, see our Installation Guide.
Model Registry & Provider Integration
Access models from any provider through a unified interface:
from ember import initialize_ember from ember.api.models import ModelEnum # Initialize with multiple providers service = initialize_ember(usage_tracking=True) # Access models from different providers with the same API response = service(ModelEnum.gpt_4o, "What is quantum computing?") print(response.data) # Track usage across providers usage = service.usage_service.get_total_usage() print(f"Total cost: ${usage.cost:.4f}")
NON Patterns & Ensembling
Build compound AI system architectures using the Network of Networks (NON) pattern with pre-built components:
from ember.api import non # Standard API: Create a verification pipeline of ensemble→judge→verifier pipeline = non.Sequential(operators=[ # 1. Ensemble of 5 model instances running in parallel non.UniformEnsemble( num_units=5, model_name="openai:gpt-4o-mini", temperature=0.7 ), # 2. Judge to synthesize the ensemble responses non.JudgeSynthesis( model_name="anthropic:claude-3-5-sonnet", temperature=0.2 ), # 3. Verifier for quality control and fact-checking non.Verifier( model_name="anthropic:claude-3-5-haiku", temperature=0.0 ) ]) # Alternatively, create the same pipeline with compact notation pipeline = non.build_graph([ "5:E:gpt-4o-mini:0.7", # Ensemble with 5 instances "1:J:claude-3-5-sonnet:0.2", # Judge synthesis "1:V:claude-3-5-haiku:0.0" # Verification ]) # Build advanced architectures like NestedNetwork from example_architectures.py # Define reusable SubNetwork component components = { "sub": ["2:E:gpt-4o:0.0", "1:V:gpt-4o:0.0"] # Ensemble → Verifier } # Create a NestedNetwork with identical structure to the OOP implementation nested = non.build_graph([ "$sub", # First SubNetwork branch "$sub", # Second SubNetwork branch "1:J:gpt-4o:0.0" # Judge to synthesize results ], components=components) # Extend with custom operator types custom_registry = non.OpRegistry.create_standard_registry() custom_registry.register( "CE", # Custom ensemble type lambda count, model, temp: non.Sequential(operators=[ non.UniformEnsemble(num_units=count, model_name=model, temperature=temp), non.MostCommon() # Auto-aggregation ]) ) # Use custom operators advanced = non.build_graph(["3:CE:gpt-4o:0.7"], type_registry=custom_registry) # Execute with a single call result = pipeline(query="What causes tsunamis?")
Graph Optimization & Execution
Ember's XCS system provides JAX/XLA-inspired tracing, transformation, and automatic parallelization:
from ember.xcs import jit, execution_options, vmap, pmap, compose, explain_jit_selection from ember.api.operators import Operator # Basic JIT compilation with automatic strategy selection @jit class SimplePipeline(Operator): # ... operator implementation ... # JIT with explicit mode selection @jit(mode="enhanced") class ComplexPipeline(Operator): def __init__(self): self.op1 = SubOperator1() self.op2 = SubOperator2() self.op3 = SubOperator3() def forward(self, *, inputs): # These operations will be automatically parallelized result1 = self.op1(inputs=inputs) result2 = self.op2(inputs=inputs) # Combine the parallel results combined = self.op3(inputs={"r1": result1, "r2": result2}) return combined # Configure execution parameters with execution_options(scheduler="wave", max_workers=4): result = pipeline(query="Complex question...") # Get explanation for JIT strategy selection explanation = explain_jit_selection(pipeline) print(f"JIT strategy: {explanation['strategy']}") print(f"Rationale: {explanation['rationale']}") # Vectorized mapping for batch processing batch_processor = vmap(my_operator) batch_results = batch_processor(inputs={"data": [item1, item2, item3]}) # Parallel execution across multiple workers parallel_processor = pmap(my_operator, num_workers=4) parallel_results = parallel_processor(inputs=complex_data) # Compose transformations (vectorization + parallelism) pipeline = compose(vmap(batch_size=32), pmap(num_workers=4))(my_operator)
Data Handling & Evaluation
Ember provides a comprehensive data processing and evaluation framework with pre-built datasets and metrics:
from ember.api.data import DatasetBuilder from ember.api.eval import EvaluationPipeline, Evaluator # Load a dataset with the builder pattern dataset = (DatasetBuilder() .from_registry("mmlu") # Use a registered dataset .subset("physics") # Select a specific subset .split("test") # Choose the test split .sample(100) # Random sample of 100 items .transform( # Apply transformations lambda x: {"query": f"Question: {x['question']}"} ) .build()) # Create a comprehensive evaluation pipeline eval_pipeline = EvaluationPipeline([ # Standard metrics Evaluator.from_registry("accuracy"), Evaluator.from_registry("response_quality"), # Custom evaluation metrics Evaluator.from_function( lambda prediction, reference: { "factual_accuracy": score_factual_content(prediction, reference) } ) ]) # Evaluate a model or operator results = eval_pipeline.evaluate(my_model, dataset) print(f"Accuracy: {results['accuracy']:.2f}") print(f"Response Quality: {results['response_quality']:.2f}") print(f"Factual Accuracy: {results['factual_accuracy']:.2f}")
License
Ember is released under the MIT License.