Subquadratic — Efficiency is Intelligence

2 min read Original article ↗

The first modelbuilt for longcontext tasks

SubQ is a sub-quadratic LLM built for 12M-token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.

Context

12M

token reasoning

Speed

150

tokens per second

Cost

1/5

of other leading LLMs

Use Cases

All your context. Always available.

Reason across 12M tokens in one prompt: entire repos, months of PRs, and long-running agent state, with room to spare at one-fifth the cost.

~ Approximate token counts.

Architecture

Not just another model.An architectural breakthrough.

SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.

SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.

Technical report (coming soon)

Benchmarks

A leader in long-context retrieval and coding tasks.

Technical report (coming soon)

Products

Two ways to use SubQ.

API

For developers and teams

The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

  • 12M token context window
  • Streaming + tool use
  • OpenAI-compatible endpoints

Code

For coding agents

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.

  • ~25% lower bill, 10× faster exploration
  • Auto-redirects expensive model turns
  • One-line install

About

We built the architecture the industry said wasn't possible.

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.

Built by researchers from

  • Meta
  • Google
  • Oxford
  • Cambridge
  • BYU

Early Access

Is your business ready?
Build with us.

Join the private preview.