How Structured Context Improves LLM Trading Decisions
Research findings from 7 controlled backtests. We found a clear hierarchy: structured briefings outperform raw web search, which outperforms stale context, which outperforms no context at all.
The Core Finding
LLMs are language processors, not calculators. When given raw numbers (price: $84,231, hash rate: 612 EH/s), they have no framework for interpretation. When given pre-analyzed context (price is 15% below 200-day MA, hash ribbon in capitulation, regime: risk-off), they can reason about what the data means.
The performance hierarchy across 7 runs: Treatment (structured briefings) > Control-WS (web search) > Placebo (stale briefings) > Control (price only).
The Three-Gear System
Gear 1: Briefings (The Context Layer)
Four modular briefings -- btc.energy (mining economics), cross.regime (regime classification), cross.breadth (market breadth), btc.momentum (trend signals). Each briefing contains pre-computed trend directions, percentile rankings, confidence scores, and signal interpretation. The model receives structured intelligence, not raw data.
Gear 2: The Preamble (The Strategy Framework)
A signal weighting guide telling the model how to prioritize signals. Three tiers: regime signals (highest weight), structural signals (medium), tactical signals (lowest). Position sizing rules and risk management guidelines. The preamble turns briefing data into a decision framework.
Gear 3: Portfolio Execution (The Harness)
Trade mechanics with realistic fee structure (0.045% transaction, 0.01% funding per 8h), position limits (80% gross exposure), and deterministic portfolio carryover. The same harness processes all arms identically.
Key Findings
- Briefing evolution matters -- Modular briefings (Run 4+) tripled the delta compared to monolithic briefings (Runs 1-3). Adding btc.energy and btc.momentum to the set moved the needle most.
- The defensive edge -- Treatment arms detected regime shifts 1-2 ticks before crashes. Short entries on Nov 8, 2025 and Feb 3, 2026 captured the majority of the alpha. The edge is concentrated in crash avoidance.
- Model-agnostic -- Both Opus 4.6 and Sonnet 4.5 showed positive treatment deltas with the same briefings. The edge comes from the context, not the model.
- Structured > raw information -- Control-WS (web search) beat Control (price only) by +10pp, but Treatment beat Control-WS by +2 to +9pp more. Having information helps. Having structured analysis helps more.
Implications
- Context pipeline matters more than the model. The same briefings produced positive deltas on both Opus 4.6 and Sonnet 4.5. Improving the context pipeline (modular briefings, better signals) had a larger impact than switching models.
- Raw information helps, but structured analysis helps more. Web search gave agents access to real-time headlines and data, but without an analytical framework, agents overreacted to noise. PreReason briefings provided the framework.
- Pre-analyzed context outperforms unstructured context. The gap between Treatment and Control-WS is the commercial case for PreReason: the value is not just in having data, but in having it pre-analyzed with trend directions, regime classification, and signal hierarchy.
Explore
- Evidence Hub -- All 7 runs with aggregate results
- Methodology -- 4-arm RCT design and controlled variables
- Tick-by-Tick -- Every decision with full reasoning
btc.energy briefing | cross.regime briefing | cross.breadth briefing | btc.momentum briefing
Try the briefings free | API documentation