Hyrum's Tests
"With a sufficient number of users of an API, all observable behaviors will be depended on by somebody." — Hyrum's Law
Hyrum's Tests are automatically generated tests that capture a project's implicit contract with its dependencies. They detect breaking changes before they cause problems in production.
The Problem
When you upgrade a dependency, you might break your code even if:
- The dependency's public API hasn't changed
- The changelog doesn't mention breaking changes
- Your existing tests pass
This happens because your code depends on behaviors that aren't part of the official contract—return types, error handling, timing, defaults, etc.
The Solution
Generate tests that mirror exactly how your code uses each dependency:
# DON'T test that the API exists def test_api_exists(): assert hasattr(response, 'headers') # Passes on both versions - useless # DO test the exact usage pattern def test_content_length_accessible_before_finalization(): """Mirror httpbin's exact usage of response.headers['Content-Length'].""" response = jsonify({'key': 'value'}) content_len = response.headers['Content-Length'] # FAILS on Flask 0.11!
Quick Start
1. Gather data (fast)
./scripts/prep-analysis.sh https://github.com/your-org/your-project dependency-name ./prep-output
2. Generate tests (with Claude)
Analyze ./prep-output/your-project_dependency-name_prep.json
and generate Hyrum's Tests following AGENT_SPEC.md
3. Run tests against dependency versions
# Python pip install flask==2.3.0 && pytest tests/hyrum/flask/ pip install flask==3.0.0 && pytest tests/hyrum/flask/ # Node.js npm install ws@7.4.2 && npm test npm install ws@8.17.1 && npm test # Go go test ./tests/hyrum/validator/...
Key Findings
Breaking Changes Detected
| Ecosystem | Dependency | Change | Detection |
|---|---|---|---|
| Python | Flask 3.x | flask.escape removed |
✅ Caught |
| Python | Werkzeug 3.x | set_digest() removed |
✅ Caught |
| Node.js | ws 8.x | handleProtocols Array→Set |
✅ Caught |
| Node.js | ws 8.x | message String→Buffer | ✅ Caught |
| Node.js | form-data | throws on boolean values | ✅ Caught |
Security Patches vs Breaking Changes
Key insight: CVE fixes within a major version rarely change behavior, but major version bumps often do.
| Change Type | Example | Behavioral Impact |
|---|---|---|
| Security patch | ws 7.4.2→7.4.6 (ReDoS fix) | None for valid inputs |
| Major version | ws 7.x→8.x | Two breaking changes |
Project Structure
hyrums-tests/
├── AGENT_SPEC.md # Full methodology specification
├── scripts/
│ └── prep-analysis.sh # Data gathering script
├── analysis/
│ ├── learnings.md # Accumulated insights
│ ├── retrospective_analysis.md
│ └── ws_experiment.md
└── tests/hyrum/
├── httpbin/ # Python/Flask tests (299 tests)
│ ├── test_flask_*.py
│ └── test_werkzeug_*.py
├── ws/ # Node.js/ws tests (52 tests)
│ ├── server-api.test.js
│ └── version-specific.test.js
├── axios/ # Node.js/axios tests (87 tests)
└── gin/ # Go/Gin tests (~150 tests)
Methodology
Phase 1: Static Analysis
Find what APIs are imported and how they're used.
Phase 2: Runtime Instrumentation
Capture types, patterns, and frequencies of API usage.
Phase 3: PR/Commit Mining
Find historical breaks—especially test-less fix PRs (confirmed breaks without regression tests).
Phase 4: Test Generation
Write tests that mirror exact usage patterns, not just API existence.
Phase 5: Validation
Run against multiple dependency versions, prune brittle tests.
See AGENT_SPEC.md for the complete methodology.
Test Categories
| Type | Risk | Example |
|---|---|---|
| Type contracts | Low | isinstance(data, bytes) |
| Interface contracts | Medium | hasattr(obj, 'method') |
| Behavioral contracts | Higher | data.isupper() |
| Usage patterns | Best | Mirror exact code from downstream |
Commit Message Signals
When reviewing dependency updates, commit messages help prioritize:
Lower risk (likely safe):
- "perf:", "performance"
- "ReDoS", "DoS" fixes
- "typo", "docs"
Higher risk (test carefully):
- "BREAKING"
- "change default"
- "return type"
- "remove", "deprecate"
Metrics
| Metric | Value |
|---|---|
| Total tests | 351+ |
| Ecosystems | Python, Node.js, Go |
| Breaking changes detected | 11 |
| Detection rate (retrospective) | ~60-70% |
Integration with Dependabot
This is the primary use case. Run Hyrum's Tests on every dependency update PR:
# .github/workflows/hyrums-tests.yml on: pull_request: paths: - 'package.json' - 'requirements.txt' - 'go.mod' jobs: hyrums-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: pytest tests/hyrum/ # or npm test, go test
Contributing
- Run the prep script on a new project/dependency
- Generate tests following AGENT_SPEC.md
- Validate against multiple versions
- Document learnings in
analysis/learnings.md - Submit PR with tests and findings
References
- Hyrum's Law
- AGENT_SPEC.md - Full methodology
- analysis/learnings.md - Accumulated insights
- scripts/README.md - Prep script documentation
License
MIT