GitHub - michaelwinser/hyrums-tests

Hyrum's Tests

"With a sufficient number of users of an API, all observable behaviors will be depended on by somebody." — Hyrum's Law

Hyrum's Tests are automatically generated tests that capture a project's implicit contract with its dependencies. They detect breaking changes before they cause problems in production.

The Problem

When you upgrade a dependency, you might break your code even if:

The dependency's public API hasn't changed
The changelog doesn't mention breaking changes
Your existing tests pass

This happens because your code depends on behaviors that aren't part of the official contract—return types, error handling, timing, defaults, etc.

The Solution

Generate tests that mirror exactly how your code uses each dependency:

# DON'T test that the API exists
def test_api_exists():
    assert hasattr(response, 'headers')  # Passes on both versions - useless

# DO test the exact usage pattern
def test_content_length_accessible_before_finalization():
    """Mirror httpbin's exact usage of response.headers['Content-Length']."""
    response = jsonify({'key': 'value'})
    content_len = response.headers['Content-Length']  # FAILS on Flask 0.11!

Quick Start

1. Gather data (fast)

./scripts/prep-analysis.sh https://github.com/your-org/your-project dependency-name ./prep-output

2. Generate tests (with Claude)

Analyze ./prep-output/your-project_dependency-name_prep.json
and generate Hyrum's Tests following AGENT_SPEC.md

3. Run tests against dependency versions

# Python
pip install flask==2.3.0 && pytest tests/hyrum/flask/
pip install flask==3.0.0 && pytest tests/hyrum/flask/

# Node.js
npm install ws@7.4.2 && npm test
npm install ws@8.17.1 && npm test

# Go
go test ./tests/hyrum/validator/...

Key Findings

Breaking Changes Detected

Ecosystem	Dependency	Change	Detection
Python	Flask 3.x	`flask.escape` removed	✅ Caught
Python	Werkzeug 3.x	`set_digest()` removed	✅ Caught
Node.js	ws 8.x	`handleProtocols` Array→Set	✅ Caught
Node.js	ws 8.x	message String→Buffer	✅ Caught
Node.js	form-data	throws on boolean values	✅ Caught

Security Patches vs Breaking Changes

Key insight: CVE fixes within a major version rarely change behavior, but major version bumps often do.

Change Type	Example	Behavioral Impact
Security patch	ws 7.4.2→7.4.6 (ReDoS fix)	None for valid inputs
Major version	ws 7.x→8.x	Two breaking changes

Project Structure

hyrums-tests/
├── AGENT_SPEC.md              # Full methodology specification
├── scripts/
│   └── prep-analysis.sh       # Data gathering script
├── analysis/
│   ├── learnings.md           # Accumulated insights
│   ├── retrospective_analysis.md
│   └── ws_experiment.md
└── tests/hyrum/
    ├── httpbin/               # Python/Flask tests (299 tests)
    │   ├── test_flask_*.py
    │   └── test_werkzeug_*.py
    ├── ws/                    # Node.js/ws tests (52 tests)
    │   ├── server-api.test.js
    │   └── version-specific.test.js
    ├── axios/                 # Node.js/axios tests (87 tests)
    └── gin/                   # Go/Gin tests (~150 tests)

Methodology

Phase 1: Static Analysis

Find what APIs are imported and how they're used.

Phase 2: Runtime Instrumentation

Capture types, patterns, and frequencies of API usage.

Phase 3: PR/Commit Mining

Find historical breaks—especially test-less fix PRs (confirmed breaks without regression tests).

Phase 4: Test Generation

Write tests that mirror exact usage patterns, not just API existence.

Phase 5: Validation

Run against multiple dependency versions, prune brittle tests.

See AGENT_SPEC.md for the complete methodology.

Test Categories

Type	Risk	Example
Type contracts	Low	`isinstance(data, bytes)`
Interface contracts	Medium	`hasattr(obj, 'method')`
Behavioral contracts	Higher	`data.isupper()`
Usage patterns	Best	Mirror exact code from downstream

Commit Message Signals

When reviewing dependency updates, commit messages help prioritize:

Lower risk (likely safe):

"perf:", "performance"
"ReDoS", "DoS" fixes
"typo", "docs"

Higher risk (test carefully):

"BREAKING"
"change default"
"return type"
"remove", "deprecate"

Metrics

Metric	Value
Total tests	351+
Ecosystems	Python, Node.js, Go
Breaking changes detected	11
Detection rate (retrospective)	~60-70%

Integration with Dependabot

This is the primary use case. Run Hyrum's Tests on every dependency update PR:

# .github/workflows/hyrums-tests.yml
on:
  pull_request:
    paths:
      - 'package.json'
      - 'requirements.txt'
      - 'go.mod'

jobs:
  hyrums-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pytest tests/hyrum/  # or npm test, go test

Contributing

Run the prep script on a new project/dependency
Generate tests following AGENT_SPEC.md
Validate against multiple versions
Document learnings in analysis/learnings.md
Submit PR with tests and findings

References

Hyrum's Law
AGENT_SPEC.md - Full methodology
analysis/learnings.md - Accumulated insights
scripts/README.md - Prep script documentation

License

MIT