GitHub - CardScan-ai/claude-code-watchdog: AI-powered test failure analysis and automated remediation for GitHub Actions

Claude Code Watchdog

AI-powered test failure analysis and automated remediation for GitHub Actions

Meet Artemis, your CI watchdog

Overview

Claude Code Watchdog automatically analyzes test failures in your CI/CD pipeline, providing intelligent insights and automated fixes. Instead of getting overwhelmed by flaky test notifications, get actionable analysis that helps you focus on real issues.

Key capabilities:

Intelligent Analysis: AI-powered test failure analysis with pattern recognition
Failure Classification: Distinguishes chronic issues from flaky tests based on failure rates
Automated Issues: Creates detailed GitHub issues with context and actionable recommendations
Self-Healing: Implements fixes for common problems automatically
Smart Notifications: Provides severity-based alerts to reduce noise

Quick Start

Add this step to your workflow after your tests:

- name: Test failure analysis
  if: failure()
  uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    test_results_path: 'test-results/**/*.xml'  # Adjust to your test output location

When tests fail, the action will:

Analyze test outputs and failure patterns
Determine severity based on failure frequency
Create or update GitHub issues with detailed analysis
Optionally implement automated fixes via pull requests

Features

Smart Failure Analysis

Pattern Recognition: Distinguishes between chronic failures (80%+ fail rate) vs isolated incidents
Root Cause Detection: Correlates failures with recent commits and changes
Test Output Parsing: Understands JUnit XML, JSON reports, and log files
Historical Context: Analyzes the last 20 workflow runs for trends

Intelligent Issue Management

No Duplicates: Updates existing issues instead of creating spam
Consistent Naming: Watchdog [Workflow Name]: Description for easy filtering
Rich Context: Includes failure patterns, recent commits, and actionable recommendations
Smart Labels: Automatically tags with severity and failure type

Automatic Fixes (Optional)

Safe Fixes: Only implements changes it's confident about
Common Patterns: Fixes timeouts, flaky selectors, deprecated APIs
PR Creation: Creates branches and PRs with clear descriptions
Test Verification: Can re-run tests to verify fixes work

Failure Rate Intelligence

Pattern	Failure Rate	Artemis Response
🔴 Chronic	80%+	Upgrades severity, immediate attention
🟡 Frequent	50-79%	Creates high-priority issues
🟠 Intermittent	20-49%	Standard monitoring and analysis
🟢 Isolated	<20%	May downgrade severity, likely flaky

Usage Examples

Basic Integration

Perfect for most CI workflows:

name: CI with Watchdog

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    permissions:
      contents: write      # For creating fix PRs
      issues: write        # For creating issues
      pull-requests: write # For creating PRs
    
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: '18'
    
    - name: Install and test
      run: |
        npm ci
        npm test
      continue-on-error: true  # Let Artemis analyze failures
    
    - name: Artemis failure analysis
      if: failure()
      id: watchdog
      uses: cardscan-ai/claude-code-watchdog@v0.2
      with:
        anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
        test_results_path: 'test-results/**/*.xml'
    
    - name: Notify team on critical failures
      if: failure() && steps.watchdog.outputs.severity == 'critical'
      uses: 8398a7/action-slack@v3
      with:
        status: failure
        channel: '#critical-alerts'
        title: '🚨 Critical Test Failure'
        message: |
          Severity: ${{ steps.watchdog.outputs.severity }}
          Action: ${{ steps.watchdog.outputs.action_taken }}
          Issue: #${{ steps.watchdog.outputs.issue_number }}
        mention: 'channel'
      env:
        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Scheduled API Monitoring

Perfect for health checks and integration tests:

name: API Health Check

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  health-check:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      issues: write
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Run API tests
      run: |
        # Your API tests (Postman, curl, etc.)
        newman run api-tests.json --reporters json --reporter-json-export results.json
      continue-on-error: true
    
    - name: Artemis analysis
      if: failure()
      uses: cardscan-ai/claude-code-watchdog@v0.2
      with:
        anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
        test_results_path: 'results.json'  # Newman output file
        create_fixes: 'false'  # Just analysis for API tests
        severity_threshold: 'low'  # Monitor everything

Full Auto-Healing

Maximum automation - Artemis tries to fix and verify:

- name: Full auto-healing
  if: failure()
  uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    test_results_path: 'test-results/**/*.xml'
    create_fixes: 'true'     # Try to implement fixes
    rerun_tests: 'true'      # Verify fixes work
    severity_threshold: 'low' # Handle all failures

- name: Auto-merge if fixed
  if: steps.watchdog.outputs.tests_passing == 'true'
  run: gh pr merge ${{ steps.watchdog.outputs.pr_number }} --squash

Configuration

Input	Description	Default
`anthropic_api_key`	Anthropic API key for Claude	Required
`test_results_path`	Path or glob pattern to test result files (e.g., "test-results/*/.xml", "cypress/reports/*.json")	Required
`severity_threshold`	Minimum severity to process (ignore/low/medium/high/critical)	`medium`
`create_issues`	Create GitHub issues for failures	`true`
`create_fixes`	Attempt to implement fixes automatically	`true`
`rerun_tests`	Re-run tests to verify fixes work	`false`
`debug_mode`	Upload debugging artifacts and detailed logs	`false`
`safe_mode`	Skip potentially risky external content (GitHub issues, PRs, commit messages)	`false`

Outputs

Output	Description
`severity`	Failure severity (ignore/low/medium/high/critical)
`action_taken`	What Artemis did (issue_created/issue_updated/pr_created/etc.)
`issue_number`	GitHub issue number if created/updated
`pr_number`	PR number if fixes were created
`tests_passing`	true if re-run tests passed after fixes

Smart Notifications

Use the severity output to control notifications:

- name: Critical failure alerts
  if: failure() && steps.watchdog.outputs.severity == 'critical'
  uses: 8398a7/action-slack@v3
  with:
    status: failure
    channel: '#critical-alerts'
    title: '🚨 Critical Test Failure'
    message: |
      Severity: ${{ steps.watchdog.outputs.severity }}
      Action: ${{ steps.watchdog.outputs.action_taken }}
      Issue: #${{ steps.watchdog.outputs.issue_number }}
    mention: 'channel'
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

- name: Auto-fix success notifications
  if: steps.watchdog.outputs.tests_passing == 'true'
  uses: 8398a7/action-slack@v3
  with:
    status: success
    title: '✅ Tests Auto-Fixed'
    message: 'Watchdog automatically resolved test failures'
  env:
    SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Required Permissions

For Analysis Only (create_fixes: false)

permissions:
  contents: read
  issues: write

For Auto-Fixing (create_fixes: true)

permissions:
  contents: write      # Create branches and commits
  issues: write        # Create/update issues
  pull-requests: write # Create PRs with fixes

The action gracefully falls back to analysis-only mode if permissions aren't available.

Setup Instructions

1. Get Your Anthropic API Key

Sign up at console.anthropic.com
IMPORTANT: Set up spending limits and budget alerts for your account
Create an API key with appropriate usage limits
Add it to your repository secrets as ANTHROPIC_API_KEY

2. Add Repository Secrets

Go to your repository → Settings → Secrets and variables → Actions:

Name: ANTHROPIC_API_KEY
Value: Your Anthropic API key (starts with sk-ant-)

3. Add the Workflow Step

Add the watchdog step to your existing test workflows (see examples above).

4. Set Permissions

Add the required permissions to your workflow (see permissions section).

You're all set!

How It Works

Pre-flight Intelligence Gathering

Before calling Claude, the action automatically gathers:

Repository permissions - What actions can be taken
Existing issues/PRs - Avoid duplicates and update existing items
Workflow run history - Calculate failure rates and patterns
Recent commits - Identify potential causes
Test output files - Find JUnit XML, JSON reports, logs

Claude Analysis

Claude then:

Parses test outputs intelligently across multiple formats
Correlates failures with recent changes and patterns
Determines severity based on failure rate and impact
Makes decisions about issues, fixes, and notifications
Implements fixes safely when confident
Verifies fixes by re-running tests if requested

Smart Actions

Based on the analysis:

Updates existing issues instead of creating duplicates
Creates PRs with fixes for automatable problems
Provides detailed context for human investigation
Sets appropriate severity for intelligent notifications

Common Use Cases

API Integration Testing

Scheduled health checks every few hours
Contract testing between services
Authentication timeout detection and fixing
Network failure vs code bug differentiation

End-to-End Testing

Flaky selector detection and updating
Timing issue identification and retry logic
Environment drift detection
Test data management issues

Unit Test Maintenance

Deprecated API usage updates
Assertion modernization
Test isolation improvements
Performance regression tracking

CI/CD Pipeline Health

Build failure pattern analysis
Deployment gate reliability monitoring
Cross-platform test consistency
Security scan failure investigation

Advanced Configuration

Custom Severity Thresholds

# Only handle serious issues
- uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    severity_threshold: 'high'  # Ignore low/medium failures

Read-Only Analysis

# Conservative approach - just create issues
- uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    create_fixes: 'false'
    rerun_tests: 'false'

Full Automation

# Maximum automation
- uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    create_fixes: 'true'
    rerun_tests: 'true'
    severity_threshold: 'low'  # Handle everything

Example Issue Output

# Watchdog [API Tests]: Authentication timeout in user service

**Workflow:** API Tests
**Run:** [#1234](https://github.com/org/repo/actions/runs/1234)
**Severity:** High
**Pattern:** Frequent (67% failure rate over last 20 runs)

## 🔍 Failure Analysis
The user authentication endpoint is consistently timing out after 5 seconds. This started happening 2 days ago after commit abc123 which updated the auth service dependencies.

## 📊 Pattern Analysis
- **Total runs analyzed:** 20
- **Failed runs:** 13
- **Failure rate:** 67%
- **Pattern:** Frequent

This represents a significant reliability issue that's blocking multiple workflows.

## 🔧 Recommendations
- [ ] Investigate auth service performance after recent dependency updates
- [ ] Consider increasing timeout from 5s to 10s as temporary fix
- [ ] Check database connection pool settings
- [ ] Review auth service logs for commit abc123 timeframe

## 📝 Context
- **Commit:** abc123456
- **Actor:** developer-name
- **Event:** schedule

---
*Auto-generated by Claude Code Watchdog*

Analysis Reports and Debugging

Automatic Reports

Every run generates a detailed analysis report uploaded as a GitHub artifact:

watchdog-report-{run-id}/
└── final-report.md    # Comprehensive analysis summary

The report includes:

Analysis results (severity, actions taken)
Failure patterns and context
Issue/PR numbers created
Historical data summary

Debug Mode

Enable debug mode for detailed troubleshooting:

- uses: cardscan-ai/claude-code-watchdog@v0.2
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    debug_mode: 'true'  # Upload all analysis data

Debug artifacts include:

watchdog-debug-{run-id}/
├── .watchdog/
│   ├── context-summary.json      # Run context
│   ├── failure-analysis.json     # Failure patterns  
│   ├── existing-issues.json      # Related issues
│   ├── recent-runs.json          # Workflow history
│   └── test-files.txt            # Test files found
├── test-results.json             # Your test outputs
├── junit-results.xml             # JUnit files
└── *.log                         # Test logs

Perfect for:

Understanding why Claude made specific decisions
Debugging pattern recognition
Seeing exactly what test data was analyzed
Troubleshooting action behavior

Cost Estimation

⚠️ IMPORTANT DISCLAIMER: Cost estimates are approximate and may vary significantly based on your specific use case, test output size, and complexity. CardScan.ai provides NO warranty or guarantee regarding actual costs incurred. Usage costs are your responsibility.

🚨 STRONGLY RECOMMENDED: Set up API key spending limits and budgets you are comfortable with before using this action. Monitor your Anthropic API usage regularly.

Claude Code Watchdog uses the Anthropic API, so each run incurs a cost based on token usage.

Typical Costs Per Run

Configuration	Input Tokens	Output Tokens	Estimated Cost
Analysis only	~2-3k	~500-1k	~$0.20-$0.40
Analysis + Issue creation	~3-4k	~1-2k	~$0.40-$0.60
Analysis + Fixes + PR	~4-6k	~2-4k	~$0.60-$1.20
Complex fixes + Re-run	~6-8k	~3-5k	~$1.00-$1.80

Cost Factors

Input tokens (what Claude reads):

Context data (runs, commits, issues): ~1-2k tokens
Test output files: ~1-3k tokens (varies by test size)
Configuration and prompts: ~500 tokens

Output tokens (what Claude generates):

Analysis and recommendations: ~500-1k tokens
Issue/PR descriptions: ~500-1k tokens
Code fixes: ~500-2k tokens (varies by complexity)
Multiple fix attempts: Can increase cost

Cost Optimization Tips

Start conservative: Use create_fixes: false initially
Limit scope: Use severity_threshold to avoid low-priority runs
Monitor usage: Check cost estimates in analysis reports and your Anthropic dashboard
Schedule wisely: Monthly demos instead of daily
Debug selectively: Only enable debug_mode when needed
Set spending limits: Configure budget alerts in your Anthropic account
Test cautiously: Start with non-critical workflows to understand actual costs

Monthly Budget Examples

⚠️ These are rough estimates only - your actual costs may be significantly higher or lower

Light usage (5 failures/month, analysis only): ~$2-3/month
Regular usage (15 failures/month, fixes enabled): ~$8-12/month
Heavy usage (30 failures/month, full automation): ~$20-30/month

IMPORTANT: These estimates assume typical test output sizes. Large test suites, verbose logs, or complex codebases can significantly increase token usage and costs.

The action shows actual costs (when available) in console output and detailed breakdowns in analysis reports. Always monitor your Anthropic API usage dashboard for real spending.

Troubleshooting

Common Issues

❌ "GitHub CLI not authenticated"

Ensure your workflow has a valid GITHUB_TOKEN
Default GITHUB_TOKEN is automatically available in most cases

❌ "Anthropic API key required"

Add your API key to repository secrets as ANTHROPIC_API_KEY
Verify the secret name matches exactly

❌ "No push permissions - cannot create PRs"

Add contents: write and pull-requests: write to your workflow permissions
Or set create_fixes: false for analysis-only mode

❌ "No test output files found"

Ensure your tests output JUnit XML, JSON reports, or log files
Check that test files match the patterns: *test*.xml, *test*.json, etc.

Getting Help

Check the workflow logs - Artemis provides detailed output about what it's doing
Review permissions - Many issues are permission-related
Validate test outputs - Ensure your tests create parseable output files
Start simple - Begin with create_fixes: false and add features gradually
Use debug mode - Enable debug_mode: true to see exactly what data Claude analyzed

Security Best Practices

SHA Hash Pinning (Recommended for Production)

For maximum security, pin actions to specific commit SHAs instead of using version tags:

# Instead of version tags
- uses: cardscan-ai/claude-code-watchdog@v0.3.2

# Use SHA hash pinning for production
- uses: cardscan-ai/claude-code-watchdog@975fd591cfaa7179bfdedb112558dceca966e87e  # v0.3.2

Why SHA Pinning?

Security: Prevents malicious code injection if tags are compromised
Immutability: Ensures exact same code runs every time
Compliance: Required by many security policies (SLSA, OpenSSF)
Reproducibility: Guarantees consistent builds across environments

You can find the SHA for any release on the releases page.

Contributing

We love contributions! Here's how to help:

Reporting Bugs

Use the issue template
Include workflow logs
Describe expected vs actual behavior

Feature Requests

Describe your use case
Explain how it would help your team
Consider if it fits Artemis's core mission

Code Contributions

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE for details.

About CardScan.ai

This project is maintained by CardScan.ai, makers of AI-powered insurance card scanning and eligibility verification tools.

We built this tool because we run scheduled API tests, WebSocket monitoring, and cross-platform SDK validation that can fail for various reasons. We got tired of waking up to notification storms about flaky tests while real issues got buried in the noise.

Claude Code Watchdog helps us focus on what matters: real bugs and breaking changes, not environment hiccups and timing issues.

Authorship & Development Costs

This entire project was developed using Claude Code, demonstrating the power of AI-assisted software development. No human coding work was required for the construction of this project.

Development Statistics:

Total cost:            $21.27
Total duration (API):  1h 38m 6.7s
Total duration (wall): 8h 52m 56.3s
Total code changes:    2064 lines added, 696 lines removed
Token usage by model:
    claude-3-5-haiku:  650.1k input, 20.0k output, 0 cache read, 0 cache write
       claude-sonnet:  1.4k input, 127.6k output, 35.0m cache read, 2.2m cache write

This represents a complete GitHub Action with:

Complex GitHub Actions workflow orchestration
Node.js scripts for data processing and validation
Intelligent duplicate detection and search algorithms
Cost monitoring and reporting systems
Comprehensive documentation and examples
Full error handling and fallback mechanisms

All accomplished through natural language conversations with Claude Code at a cost of $21.27.