GitHub - stonehedgelabs/snuper: Sports data event monitoring service

10 min read Original article ↗

snuper

CI

Asynchronous tooling for collecting sportsbook events, persisting immutable snapshots, and streaming live odds updates.

Overview

  • Asynchronous CLI for gathering sportsbook events and live odds updates.
  • Integrates DraftKings, BetMGM, Bovada, and FanDuel (monitoring in progress).
  • Targets NBA, NFL, and MLB leagues with an extensible storage interface.
  • Requires Python 3.12 or higher.

Quick Start

The fastest way to get started is using the JSON flatfile sink, which stores data locally without requiring database setup:

# Install snuper
$ pip install snuper

# Scrape today's games and save to JSON files
$ snuper --task scrape --sink fs --fs-sink-dir ./data

# Monitor live odds, streaming updates to JSONL files
$ snuper --task monitor --sink fs --fs-sink-dir ./data

This creates a directory structure like:

data/
├── draftkings/
│   ├── events/
│   │   ├── 20240104-nba.json      # Today's NBA game snapshots
│   │   └── 20240104-nfl.json      # Today's NFL game snapshots
│   └── odds/
│       ├── 20240104-nba-401234567.json  # Live odds stream for game
│       └── 20240104-nfl-401234568.json  # Live odds stream for game
└── betmgm/
    ├── events/
    └── odds/

Sample event snapshot (events/20240104-nba.json):

[
  {
    "event_id": "401234567",
    "league": "nba",
    "event_url": "https://sportsbook.draftkings.com/...",
    "start_time": "2024-01-04T19:30:00",
    "away": ["Lakers", "los-angeles-lakers"],
    "home": ["Celtics", "boston-celtics"],
    "selections": {
      "spread_away": {"id": "s_123", "odds": -110, "spread": 5.5},
      "spread_home": {"id": "s_124", "odds": -110, "spread": -5.5}
    }
  }
]

Sample odds stream (odds/20240104-nba-401234567.json):

{"provider":"draftkings","league":"nba","event":{"event_id":"401234567"},"selection_update":{"s_123":{"odds":-115,"spread":5.5}},"timestamp":"2024-01-04T19:35:22"}
{"provider":"draftkings","league":"nba","event":{"event_id":"401234567"},"selection_update":{"s_124":{"odds":-105,"spread":-5.5}},"timestamp":"2024-01-04T19:35:45"}

Usage

usage: cli.py [-h] [-p PROVIDER] -t {scrape,monitor} [-c CONFIG] [-l LEAGUE]
              [--fs-sink-dir FS_SINK_DIR]
              [--monitor-interval MONITOR_INTERVAL] [--overwrite]
              [--sink {fs,rds,cache}] [--rds-uri RDS_URI]
              [--rds-table RDS_TABLE] [--cache-uri CACHE_URI]
              [--cache-ttl CACHE_TTL] [--cache-max-items CACHE_MAX_ITEMS]
              [--merge-sportdata-games] [--merge-rollinginsights-games]
              [--merge-all-games] [--log-file LOG_FILE]
              [--log-level LOG_LEVEL] [--max-log-filesize MAX_LOG_FILESIZE]
              [--log-stdout] [--early-exit] [--verbose]

Unified Event Monitor CLI

options:
  -h, --help            show this help message and exit
  -p PROVIDER, --provider PROVIDER
                        Comma-separated list of sportsbook providers (omit to
                        run all)
  -t {scrape,monitor}, --task {scrape,monitor}
                        Operation to perform
  -c CONFIG, --config CONFIG
                        Path to the TOML configuration file
  -l LEAGUE, --league LEAGUE
                        Comma-separated list of leagues to limit (omit for
                        all)
  --fs-sink-dir FS_SINK_DIR
                        Base directory for filesystem snapshots and odds logs
  --monitor-interval MONITOR_INTERVAL
                        Refresh interval in seconds for the DraftKings monitor
  --overwrite           Overwrite existing snapshots instead of skipping
  --sink {fs,rds,cache}
                        Destination sink for selection updates (default: fs)
  --rds-uri RDS_URI     Database connection URI when using the rds sink
  --rds-table RDS_TABLE
                        Table name used by the rds sink
  --cache-uri CACHE_URI
                        Cache connection URI when using the cache sink
  --cache-ttl CACHE_TTL
                        Expiration window in seconds for cache sink entries
  --cache-max-items CACHE_MAX_ITEMS
                        Maximum list length per event stored in the cache sink
  --merge-sportdata-games
                        Match and merge Sportdata games with scraped events
                        before saving (requires --task scrape)
  --merge-rollinginsights-games
                        Match and merge Rolling Insights games with scraped
                        events before saving (requires --task scrape)
  --merge-all-games     Match and merge both Sportdata and Rolling Insights
                        games (equivalent to using both --merge-sportdata-
                        games and --merge-rollinginsights-games)
  --log-file LOG_FILE   Path to log file (default: /tmp/snuper-YYYYmmdd.log)
  --log-level LOG_LEVEL
                        Logging level as a string (debug, info, warning,
                        error, critical) or number 0-50 (default: info)
  --max-log-filesize MAX_LOG_FILESIZE
                        Maximum log file size before rotation with FIFO
                        eviction (default: 10MB, accepts formats like '10MB',
                        '5mb', '100Mb')
  --log-stdout          Log to stdout as well as to --log-file
  --early-exit          Exit monitor after 60 minutes of no live games (EOD
                        detection). Without this flag, monitor runs forever.
  --verbose             Enable verbose logging for monitor and sink operations
                        (e.g., log 'not starting monitor' and 'has 0 live
                        games' messages)
  • Providers must be supplied using their full names (e.g., draftkings, betmgm, bovada, fanduel). Omit --provider to run every available scraper or monitor.
  • Two task modes:
    • scrape: Collect today's events and save snapshots
    • monitor: Stream live odds for events in saved snapshots
  • --fs-sink-dir is required when --sink=fs; for other sinks a temporary staging directory is created automatically if you omit the flag.
  • Select a destination with --sink {fs,rds,cache} and supply the matching connection flags (e.g., --rds-uri, --cache-uri).
  • When using --sink=rds, pass a SQLAlchemy-compatible URI via --rds-uri (for example postgresql+psycopg://user:pass@host:5432/snuper) and the destination table name via --rds-table. The sink expects the primary table to provide id, provider, league, event_id, data (JSON/JSONB), and created_at columns.
  • Restrict execution with --league nba,mlb for targeted runs.
  • Use --overwrite to replace existing daily snapshots during a rescrape.
  • Use --config to specify a TOML configuration file for API keys and other settings.
  • Use --merge-sportdata-games or --merge-rollinginsights-games (or --merge-all-games for both) to enrich scraped events with official game data from third-party APIs.
  • DraftKings monitors honor --monitor-interval; other providers pace themselves.
  • Configure logging with --log-file (default: /tmp/snuper-YYYYmmdd.log), --log-level (default: info, accepts string levels like debug or numeric levels 0-50), and --max-log-filesize (default: 10MB, accepts formats like 10MB, 5mb, or 100Mb). When the log file reaches the maximum size, earlier logs are evicted (FIFO behavior) to keep the file size under the limit. Use --log-stdout to also output logs to stdout.
  • Use --early-exit to automatically terminate monitors after 60 minutes of no live games (EOD detection). Without this flag, monitors run indefinitely.
  • Use --verbose to enable detailed logging for monitor and sink operations, including "not starting monitor" and "has 0 live games" informational messages.

Examples:

# Scrape today's NBA games from DraftKings to local JSON files
$ snuper --task scrape \
  --provider draftkings \
  --league nba \
  --sink fs \
  --fs-sink-dir ./data
# Scrape all providers and leagues to a PostgreSQL database
$ snuper --task scrape \
  --sink rds \
  --rds-uri postgresql://user:pass@localhost/sports_db \
  --rds-table events \
  --overwrite
# Monitor live odds for all scraped games, save to JSON files
$ snuper --task monitor \
  --sink fs \
  --fs-sink-dir ./data \
  --monitor-interval 30
# Monitor live odds with database persistence
$ snuper --task monitor \
  --sink rds \
  --rds-uri postgresql://user:pass@localhost/sports_db \
  --rds-table events

RDS table naming

When you pass --rds-table, the RDS sink uses that value for both daily snapshots and streaming selection changes. The same --rds-table value must be supplied to both --task scrape and --task monitor; mixing different names means the monitor will look in an empty table and skip all events. Pick a prefix you like (for example snuper_events) and stick with it for every CLI invocation so both tasks stay in sync.

Coverage

League DraftKings BetMGM Bovada FanDuel
NBA
NFL
MLB

Workflows

scrape

The scrape workflow launches provider-specific collectors that enumerate the day’s playable events, normalize metadata (teams, start times, selections), and write snapshots to <fs_sink_dir>/<provider>/events/YYYYMMDD-<league>.json when --sink=fs. When --sink=rds or --sink=cache, the same payload is persisted to the selected backend instead (see Output), allowing monitors to bootstrap without local files. Each run emits INFO logs describing how many events are queued for persistence and subsequently saved. Snapshots are timestamped and never overwritten; reruns append a new record for comparison.

monitor

The monitor workflows read the latest scrape snapshot for each provider and league, reuse the stored selection IDs, and stream live odds into JSONL files under <fs_sink_dir>/<provider>/odds/. Runners emit heartbeat entries when the feed is idle so that quiet games remain traceable. When --sink=rds is supplied, the same deltas are persisted into the configured table (tagged with the provider) and snapshots are copied to the <table>_snapshots companion for replaying historical states.

Note that the usage of JSON files on disk is a local development feature.

Providers

  • DraftKings
    • Uses Playwright to enumerate event URLs, persists the spread selections, and connects to a MsgPack websocket stream. The optional --interval flag controls how often the monitor refreshes connection state.
  • BetMGM
    • Scrapes its league pages with Playwright, derives team metadata from URLs, and polls the public CDS API on a tight cadence. Odds updates are emitted via DOM snapshots that BaseRunner throttles with heartbeat intervals.
  • Bovada
    • Currently fetches events through HTTP coupon endpoints and ingests live odds via the Bovada websocket. Team filters reuse BetMGM slug helpers to keep league detection consistent.
  • FanDuel
    • Scraping is scaffolded with Playwright discovery, but selection flattening and monitoring are not yet implemented. Scrape runs succeed with placeholder selections; monitor runs log an informational skip.

Output

Scrape and monitor operations share a sink interface that persists snapshots and odds deltas. Each sink stores and reloads data a little differently.

Filesystem sink (--sink=fs)

  • scrape writes snapshot JSON files to <fs_sink_dir>/<provider>/events/YYYYMMDD-<league>.json. Existing files are skipped unless --overwrite is supplied.
  • monitor appends newline-delimited JSON records to <fs_sink_dir>/<provider>/odds/YYYYMMDD-<league>-<event_id>.json, capturing each selection change in order.
  • load_snapshots rehydrates events by reading the snapshot JSON under the provider's events directory.

RDS sink (--sink=rds)

  • scrape inserts one row per event into the primary --rds-table, filling the provider, league, event_id, data, and created_at columns and logging the batch size.
  • monitor inserts each odds delta into the same primary table with the provider annotated and the raw/normalized payload stored in data.
  • load_snapshots fetches the most recent events per league from the primary table (respecting any --league filter) before runners reconnect.

Cache sink (--sink=cache)

  • scrape writes the snapshot payload to Redis keys of the form snuper:snapshots:<provider>:<league> and tracks the available leagues in snuper:snapshots:<provider>:leagues, applying the configured TTL to both.
  • monitor pushes rolling lists of raw messages and normalized selection updates to snuper:<league>:<event_id>:raw and snuper:<league>:<event_id>:selection, trimming them to --cache-max-items while refreshing the TTL.
  • load_snapshots reads the cached snapshot JSON for the requested leagues so monitors can bootstrap without disk or database access.

Development

$ git clone https://github.com/stonehedgelabs/snuper.git
$ cd snuper
  • Clone the repository and enter the project directory.
$ python3 -m venv .venv
$ source .venv/bin/activate
  • Create and activate a Python 3.12 virtual environment.
  • Install Poetry inside the virtual environment.
$ poetry install
$ poetry run playwright install chromium
  • Install project dependencies and fetch the Chromium binary for Playwright.
  • Execute the test suite before shipping changes.

Glossary

  • Task – One of scrape or monitor. Tasks define whether the CLI is collecting schedules or streaming odds.
  • Scrape – A task that navigates provider frontends or APIs to discover the upcoming schedule, captures team metadata, and stores selections for later monitoring.
  • Monitor – A task that reuses stored selections to ingest live odds via websockets or polling loops, emitting JSONL records with heartbeats for idle games.
  • Provider – A sportsbook integration (draftkings, betmgm, bovada, fanduel). Providers expose both scrape and monitor entry points when implemented.
  • Sink – Destination backend for selection snapshots and odds updates. Choose via --sink (fs, rds, or cache) and pair it with the appropriate connection flags.
  • Filesystem Sink Directory (fs_sink_dir) – Root path used by the filesystem sink for snapshots and odds logs when --sink=fs.
  • Interval – Optional CLI pacing for DraftKings monitoring; other providers manage loop timing internally (e.g., BetMGM reloads every second).
  • Selection – A single wager option returned by a provider (for example, a team’s spread or moneyline). Snapshots record selections so monitors can resubscribe accurately.
  • Odds – Price information attached to each selection. Providers return American odds (e.g., +150 / -110) and often include decimal odds for comparison.
  • Snapshot – A timestamped JSON document containing all events for a league as of the scrape run. Stored under the provider’s events directory and never overwritten.
  • Heartbeat – A periodic log entry emitted by monitors to confirm that the connection remains active even when no odds change is detected.