snuper
Asynchronous tooling for collecting sportsbook events, persisting immutable snapshots, and streaming live odds updates.
Overview
- Asynchronous CLI for gathering sportsbook events and live odds updates.
- Integrates DraftKings, BetMGM, Bovada, and FanDuel (monitoring in progress).
- Targets NBA, NFL, and MLB leagues with an extensible storage interface.
- Requires Python 3.12 or higher.
Quick Start
The fastest way to get started is using the JSON flatfile sink, which stores data locally without requiring database setup:
# Install snuper $ pip install snuper # Scrape today's games and save to JSON files $ snuper --task scrape --sink fs --fs-sink-dir ./data # Monitor live odds, streaming updates to JSONL files $ snuper --task monitor --sink fs --fs-sink-dir ./data
This creates a directory structure like:
data/
├── draftkings/
│ ├── events/
│ │ ├── 20240104-nba.json # Today's NBA game snapshots
│ │ └── 20240104-nfl.json # Today's NFL game snapshots
│ └── odds/
│ ├── 20240104-nba-401234567.json # Live odds stream for game
│ └── 20240104-nfl-401234568.json # Live odds stream for game
└── betmgm/
├── events/
└── odds/
Sample event snapshot (events/20240104-nba.json):
[
{
"event_id": "401234567",
"league": "nba",
"event_url": "https://sportsbook.draftkings.com/...",
"start_time": "2024-01-04T19:30:00",
"away": ["Lakers", "los-angeles-lakers"],
"home": ["Celtics", "boston-celtics"],
"selections": {
"spread_away": {"id": "s_123", "odds": -110, "spread": 5.5},
"spread_home": {"id": "s_124", "odds": -110, "spread": -5.5}
}
}
]Sample odds stream (odds/20240104-nba-401234567.json):
{"provider":"draftkings","league":"nba","event":{"event_id":"401234567"},"selection_update":{"s_123":{"odds":-115,"spread":5.5}},"timestamp":"2024-01-04T19:35:22"}
{"provider":"draftkings","league":"nba","event":{"event_id":"401234567"},"selection_update":{"s_124":{"odds":-105,"spread":-5.5}},"timestamp":"2024-01-04T19:35:45"}Usage
usage: cli.py [-h] [-p PROVIDER] -t {scrape,monitor} [-c CONFIG] [-l LEAGUE]
[--fs-sink-dir FS_SINK_DIR]
[--monitor-interval MONITOR_INTERVAL] [--overwrite]
[--sink {fs,rds,cache}] [--rds-uri RDS_URI]
[--rds-table RDS_TABLE] [--cache-uri CACHE_URI]
[--cache-ttl CACHE_TTL] [--cache-max-items CACHE_MAX_ITEMS]
[--merge-sportdata-games] [--merge-rollinginsights-games]
[--merge-all-games] [--log-file LOG_FILE]
[--log-level LOG_LEVEL] [--max-log-filesize MAX_LOG_FILESIZE]
[--log-stdout] [--early-exit] [--verbose]
Unified Event Monitor CLI
options:
-h, --help show this help message and exit
-p PROVIDER, --provider PROVIDER
Comma-separated list of sportsbook providers (omit to
run all)
-t {scrape,monitor}, --task {scrape,monitor}
Operation to perform
-c CONFIG, --config CONFIG
Path to the TOML configuration file
-l LEAGUE, --league LEAGUE
Comma-separated list of leagues to limit (omit for
all)
--fs-sink-dir FS_SINK_DIR
Base directory for filesystem snapshots and odds logs
--monitor-interval MONITOR_INTERVAL
Refresh interval in seconds for the DraftKings monitor
--overwrite Overwrite existing snapshots instead of skipping
--sink {fs,rds,cache}
Destination sink for selection updates (default: fs)
--rds-uri RDS_URI Database connection URI when using the rds sink
--rds-table RDS_TABLE
Table name used by the rds sink
--cache-uri CACHE_URI
Cache connection URI when using the cache sink
--cache-ttl CACHE_TTL
Expiration window in seconds for cache sink entries
--cache-max-items CACHE_MAX_ITEMS
Maximum list length per event stored in the cache sink
--merge-sportdata-games
Match and merge Sportdata games with scraped events
before saving (requires --task scrape)
--merge-rollinginsights-games
Match and merge Rolling Insights games with scraped
events before saving (requires --task scrape)
--merge-all-games Match and merge both Sportdata and Rolling Insights
games (equivalent to using both --merge-sportdata-
games and --merge-rollinginsights-games)
--log-file LOG_FILE Path to log file (default: /tmp/snuper-YYYYmmdd.log)
--log-level LOG_LEVEL
Logging level as a string (debug, info, warning,
error, critical) or number 0-50 (default: info)
--max-log-filesize MAX_LOG_FILESIZE
Maximum log file size before rotation with FIFO
eviction (default: 10MB, accepts formats like '10MB',
'5mb', '100Mb')
--log-stdout Log to stdout as well as to --log-file
--early-exit Exit monitor after 60 minutes of no live games (EOD
detection). Without this flag, monitor runs forever.
--verbose Enable verbose logging for monitor and sink operations
(e.g., log 'not starting monitor' and 'has 0 live
games' messages)
- Providers must be supplied using their full names (e.g.,
draftkings,betmgm,bovada,fanduel). Omit--providerto run every available scraper or monitor. - Two task modes:
scrape: Collect today's events and save snapshotsmonitor: Stream live odds for events in saved snapshots
--fs-sink-diris required when--sink=fs; for other sinks a temporary staging directory is created automatically if you omit the flag.- Select a destination with
--sink {fs,rds,cache}and supply the matching connection flags (e.g.,--rds-uri,--cache-uri). - When using
--sink=rds, pass a SQLAlchemy-compatible URI via--rds-uri(for examplepostgresql+psycopg://user:pass@host:5432/snuper) and the destination table name via--rds-table. The sink expects the primary table to provideid,provider,league,event_id,data(JSON/JSONB), andcreated_atcolumns. - Restrict execution with
--league nba,mlbfor targeted runs. - Use
--overwriteto replace existing daily snapshots during a rescrape. - Use
--configto specify a TOML configuration file for API keys and other settings. - Use
--merge-sportdata-gamesor--merge-rollinginsights-games(or--merge-all-gamesfor both) to enrich scraped events with official game data from third-party APIs. - DraftKings monitors honor
--monitor-interval; other providers pace themselves. - Configure logging with
--log-file(default:/tmp/snuper-YYYYmmdd.log),--log-level(default:info, accepts string levels likedebugor numeric levels 0-50), and--max-log-filesize(default:10MB, accepts formats like10MB,5mb, or100Mb). When the log file reaches the maximum size, earlier logs are evicted (FIFO behavior) to keep the file size under the limit. Use--log-stdoutto also output logs to stdout. - Use
--early-exitto automatically terminate monitors after 60 minutes of no live games (EOD detection). Without this flag, monitors run indefinitely. - Use
--verboseto enable detailed logging for monitor and sink operations, including "not starting monitor" and "has 0 live games" informational messages.
Examples:
# Scrape today's NBA games from DraftKings to local JSON files
$ snuper --task scrape \
--provider draftkings \
--league nba \
--sink fs \
--fs-sink-dir ./data# Scrape all providers and leagues to a PostgreSQL database
$ snuper --task scrape \
--sink rds \
--rds-uri postgresql://user:pass@localhost/sports_db \
--rds-table events \
--overwrite# Monitor live odds for all scraped games, save to JSON files
$ snuper --task monitor \
--sink fs \
--fs-sink-dir ./data \
--monitor-interval 30# Monitor live odds with database persistence
$ snuper --task monitor \
--sink rds \
--rds-uri postgresql://user:pass@localhost/sports_db \
--rds-table eventsRDS table naming
When you pass --rds-table, the RDS sink uses that value for both daily
snapshots and streaming selection changes. The same --rds-table
value must be supplied to both --task scrape and --task monitor; mixing
different names means the monitor will look in an empty table and skip all
events. Pick a prefix you like (for example snuper_events) and stick with it
for every CLI invocation so both tasks stay in sync.
Coverage
| League | DraftKings | BetMGM | Bovada | FanDuel |
|---|---|---|---|---|
| NBA | ✅ | ✅ | ✅ | ❌ |
| NFL | ✅ | ✅ | ✅ | ❌ |
| MLB | ✅ | ✅ | ✅ | ❌ |
Workflows
scrape
The scrape workflow launches provider-specific collectors that enumerate the
day’s playable events, normalize metadata (teams, start times, selections), and
write snapshots to <fs_sink_dir>/<provider>/events/YYYYMMDD-<league>.json when --sink=fs. When --sink=rds or --sink=cache,
the same payload is persisted to the selected backend instead (see Output),
allowing monitors to bootstrap without local files. Each run emits INFO logs
describing how many events are queued for persistence and subsequently saved.
Snapshots are timestamped and never overwritten; reruns append a new record for
comparison.
monitor
The monitor workflows read the latest scrape snapshot for each provider and league, reuse
the stored selection IDs, and stream live odds into JSONL files under
<fs_sink_dir>/<provider>/odds/. Runners emit heartbeat entries when the
feed is idle so that quiet games remain traceable. When --sink=rds is supplied,
the same deltas are persisted into the configured table (tagged with the
provider) and snapshots are copied to the <table>_snapshots companion for
replaying historical states.
Note that the usage of JSON files on disk is a local development feature.
Providers
- DraftKings
- Uses Playwright to enumerate event URLs, persists the spread
selections, and connects to a MsgPack websocket stream. The optional
--intervalflag controls how often the monitor refreshes connection state.
- Uses Playwright to enumerate event URLs, persists the spread
selections, and connects to a MsgPack websocket stream. The optional
- BetMGM
- Scrapes its league pages with Playwright, derives team metadata from URLs, and polls the public CDS API on a tight cadence. Odds updates are emitted via DOM snapshots that BaseRunner throttles with heartbeat intervals.
- Bovada
- Currently fetches events through HTTP coupon endpoints and ingests live odds via the Bovada websocket. Team filters reuse BetMGM slug helpers to keep league detection consistent.
- FanDuel
- Scraping is scaffolded with Playwright discovery, but selection flattening and monitoring are not yet implemented. Scrape runs succeed with placeholder selections; monitor runs log an informational skip.
Output
Scrape and monitor operations share a sink interface that persists snapshots and odds deltas. Each sink stores and reloads data a little differently.
Filesystem sink (--sink=fs)
scrapewrites snapshot JSON files to<fs_sink_dir>/<provider>/events/YYYYMMDD-<league>.json. Existing files are skipped unless--overwriteis supplied.monitorappends newline-delimited JSON records to<fs_sink_dir>/<provider>/odds/YYYYMMDD-<league>-<event_id>.json, capturing each selection change in order.load_snapshotsrehydrates events by reading the snapshot JSON under the provider'seventsdirectory.
RDS sink (--sink=rds)
scrapeinserts one row per event into the primary--rds-table, filling theprovider,league,event_id,data, andcreated_atcolumns and logging the batch size.monitorinserts each odds delta into the same primary table with the provider annotated and the raw/normalized payload stored indata.load_snapshotsfetches the most recent events per league from the primary table (respecting any--leaguefilter) before runners reconnect.
Cache sink (--sink=cache)
scrapewrites the snapshot payload to Redis keys of the formsnuper:snapshots:<provider>:<league>and tracks the available leagues insnuper:snapshots:<provider>:leagues, applying the configured TTL to both.monitorpushes rolling lists of raw messages and normalized selection updates tosnuper:<league>:<event_id>:rawandsnuper:<league>:<event_id>:selection, trimming them to--cache-max-itemswhile refreshing the TTL.load_snapshotsreads the cached snapshot JSON for the requested leagues so monitors can bootstrap without disk or database access.
Development
$ git clone https://github.com/stonehedgelabs/snuper.git
$ cd snuper- Clone the repository and enter the project directory.
$ python3 -m venv .venv
$ source .venv/bin/activate- Create and activate a Python 3.12 virtual environment.
- Install Poetry inside the virtual environment.
$ poetry install $ poetry run playwright install chromium
- Install project dependencies and fetch the Chromium binary for Playwright.
- Execute the test suite before shipping changes.
Glossary
- Task – One of
scrapeormonitor. Tasks define whether the CLI is collecting schedules or streaming odds. - Scrape – A task that navigates provider frontends or APIs to discover the upcoming schedule, captures team metadata, and stores selections for later monitoring.
- Monitor – A task that reuses stored selections to ingest live odds via websockets or polling loops, emitting JSONL records with heartbeats for idle games.
- Provider – A sportsbook integration (
draftkings,betmgm,bovada,fanduel). Providers expose both scrape and monitor entry points when implemented. - Sink – Destination backend for selection snapshots and odds updates. Choose via
--sink(fs,rds, orcache) and pair it with the appropriate connection flags. - Filesystem Sink Directory (
fs_sink_dir) – Root path used by the filesystem sink for snapshots and odds logs when--sink=fs. - Interval – Optional CLI pacing for DraftKings monitoring; other providers manage loop timing internally (e.g., BetMGM reloads every second).
- Selection – A single wager option returned by a provider (for example, a team’s spread or moneyline). Snapshots record selections so monitors can resubscribe accurately.
- Odds – Price information attached to each selection. Providers return American odds (e.g., +150 / -110) and often include decimal odds for comparison.
- Snapshot – A timestamped JSON document containing all events for a league
as of the scrape run. Stored under the provider’s
eventsdirectory and never overwritten. - Heartbeat – A periodic log entry emitted by monitors to confirm that the connection remains active even when no odds change is detected.