Positron | Generative AI Acceleration

1 min read Original article ↗

Positron

Accelerating Intelligence

with Hardware for Transformer Model Inference

Purpose Built Generative AI Systems

Positron delivers the highest performance, lowest power, and best total cost of ownership solution for Transformer Model Inference.

Atlas 2025

2025

Titan 2026

2026

Head to Head Systems Comparison

(Llama 3.1 8B with BF16 compute, no speculation or paged attention)

Positron delivers leading Performance per Dollar and Performance per Watt compared to NVIDIA

Every Transformer Runs on Positron

Supports all Transformer models
seamlessly with zero time and zero effort

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

Step 1

Model files

Hugging Face

Develop or procure a model using the HuggingFace Transformers Library

Step 2

Upload or link trained model file (.pt or .safetensors) to Positron Model Manager

Step 3

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions
  .create(
    model="my_model"
  )

Update client applications to use Positron's OpenAI API-compliant endpoint