GitHub - michaelwinser/hyrums-tests

4 min read Original article ↗

Hyrum's Tests

"With a sufficient number of users of an API, all observable behaviors will be depended on by somebody." — Hyrum's Law

Hyrum's Tests are automatically generated tests that capture a project's implicit contract with its dependencies. They detect breaking changes before they cause problems in production.

The Problem

When you upgrade a dependency, you might break your code even if:

  • The dependency's public API hasn't changed
  • The changelog doesn't mention breaking changes
  • Your existing tests pass

This happens because your code depends on behaviors that aren't part of the official contract—return types, error handling, timing, defaults, etc.

The Solution

Generate tests that mirror exactly how your code uses each dependency:

# DON'T test that the API exists
def test_api_exists():
    assert hasattr(response, 'headers')  # Passes on both versions - useless

# DO test the exact usage pattern
def test_content_length_accessible_before_finalization():
    """Mirror httpbin's exact usage of response.headers['Content-Length']."""
    response = jsonify({'key': 'value'})
    content_len = response.headers['Content-Length']  # FAILS on Flask 0.11!

Quick Start

1. Gather data (fast)

./scripts/prep-analysis.sh https://github.com/your-org/your-project dependency-name ./prep-output

2. Generate tests (with Claude)

Analyze ./prep-output/your-project_dependency-name_prep.json
and generate Hyrum's Tests following AGENT_SPEC.md

3. Run tests against dependency versions

# Python
pip install flask==2.3.0 && pytest tests/hyrum/flask/
pip install flask==3.0.0 && pytest tests/hyrum/flask/

# Node.js
npm install ws@7.4.2 && npm test
npm install ws@8.17.1 && npm test

# Go
go test ./tests/hyrum/validator/...

Key Findings

Breaking Changes Detected

Ecosystem Dependency Change Detection
Python Flask 3.x flask.escape removed ✅ Caught
Python Werkzeug 3.x set_digest() removed ✅ Caught
Node.js ws 8.x handleProtocols Array→Set ✅ Caught
Node.js ws 8.x message String→Buffer ✅ Caught
Node.js form-data throws on boolean values ✅ Caught

Security Patches vs Breaking Changes

Key insight: CVE fixes within a major version rarely change behavior, but major version bumps often do.

Change Type Example Behavioral Impact
Security patch ws 7.4.2→7.4.6 (ReDoS fix) None for valid inputs
Major version ws 7.x→8.x Two breaking changes

Project Structure

hyrums-tests/
├── AGENT_SPEC.md              # Full methodology specification
├── scripts/
│   └── prep-analysis.sh       # Data gathering script
├── analysis/
│   ├── learnings.md           # Accumulated insights
│   ├── retrospective_analysis.md
│   └── ws_experiment.md
└── tests/hyrum/
    ├── httpbin/               # Python/Flask tests (299 tests)
    │   ├── test_flask_*.py
    │   └── test_werkzeug_*.py
    ├── ws/                    # Node.js/ws tests (52 tests)
    │   ├── server-api.test.js
    │   └── version-specific.test.js
    ├── axios/                 # Node.js/axios tests (87 tests)
    └── gin/                   # Go/Gin tests (~150 tests)

Methodology

Phase 1: Static Analysis

Find what APIs are imported and how they're used.

Phase 2: Runtime Instrumentation

Capture types, patterns, and frequencies of API usage.

Phase 3: PR/Commit Mining

Find historical breaks—especially test-less fix PRs (confirmed breaks without regression tests).

Phase 4: Test Generation

Write tests that mirror exact usage patterns, not just API existence.

Phase 5: Validation

Run against multiple dependency versions, prune brittle tests.

See AGENT_SPEC.md for the complete methodology.

Test Categories

Type Risk Example
Type contracts Low isinstance(data, bytes)
Interface contracts Medium hasattr(obj, 'method')
Behavioral contracts Higher data.isupper()
Usage patterns Best Mirror exact code from downstream

Commit Message Signals

When reviewing dependency updates, commit messages help prioritize:

Lower risk (likely safe):

  • "perf:", "performance"
  • "ReDoS", "DoS" fixes
  • "typo", "docs"

Higher risk (test carefully):

  • "BREAKING"
  • "change default"
  • "return type"
  • "remove", "deprecate"

Metrics

Metric Value
Total tests 351+
Ecosystems Python, Node.js, Go
Breaking changes detected 11
Detection rate (retrospective) ~60-70%

Integration with Dependabot

This is the primary use case. Run Hyrum's Tests on every dependency update PR:

# .github/workflows/hyrums-tests.yml
on:
  pull_request:
    paths:
      - 'package.json'
      - 'requirements.txt'
      - 'go.mod'

jobs:
  hyrums-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pytest tests/hyrum/  # or npm test, go test

Contributing

  1. Run the prep script on a new project/dependency
  2. Generate tests following AGENT_SPEC.md
  3. Validate against multiple versions
  4. Document learnings in analysis/learnings.md
  5. Submit PR with tests and findings

References

License

MIT