For years I’ve been running Fathom Lite to track visitors on this website. With track I don’t mean in the creepy way, just some privacy respecting stats on which posts get the most traffic. But, Fathom Lite has been archived for years (I didn’t know). With a recent homelab overhaul (more on that in a later post) I replaced it with Umami. It’s MIT-licensed, self-hosted, and lightweight. The challenge: migrating six years of historical data from Fathom over to Umami.
Full disclosure: Claude Code helped me a lot writing, testing and running the script mentioned below. I use Claude extensively these days to manage almost all aspects of my homelab setup.
The problem
Fathom stores hourly aggregates in SQLite: “42 pageviews on /blog/post/ between 14:00-15:00”. Umami stores individual events: one website_event row per pageview, linked to a session. There’s no aggregate import.
My dataset: 277K hourly rows, 666K pageviews, five sites, dating back to March 2020.
The migration script expands each aggregate back into discrete event rows:
- Creates N
website_eventrows per hourly bucket, timestamps spread randomly within the hour - Generates deterministic session UUIDs for the reported visitor count
- Distributes pageviews across sessions round-robin
- Attaches referrer domains from
referrer_statsproportionally
The result is synthetic but re-aggregates correctly in Umami’s dashboard. Page paths, pageview counts, visitor counts, and referrers are preserved exactly. Browser/OS/device data is NULL because Fathom never tracked it.
Key design decisions
Deterministic UUIDs for idempotency
All IDs use uuid5 with a fixed namespace, so re-running produces identical rows. Combined with ON CONFLICT DO NOTHING, the script is safe to re-run after crashes.
NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
def det_uuid(namespace: str, *parts: str) -> str:
return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))
Streaming batches to avoid OOM
The first version buffered all 666K events in memory and OOM-killed the 1GB LXC container than runs Umami. The fix: stream rows from SQLite, flush to PostgreSQL every 2,000 events. Peak memory dropped from ~500MB to ~50MB.
for row in iter_page_stats(fathom_db, site_id):
# expand row into events, append to batch
if len(event_batch) >= batch_size:
flush_batch(pg_cur, session_batch, event_batch)
pg_conn.commit()
session_batch.clear()
event_batch.clear()
Running it
Prerequisites: Python 3.11+, psycopg2, access to both the Fathom SQLite DB and Umami’s PostgreSQL.
# Grab the Fathom database (mine ran on docker)
scp root@<old-host>:/var/lib/docker/volumes/fathom/_data/fathom.db /tmp/fathom.db
# Explore what you have
sqlite3 /tmp/fathom.db "SELECT * FROM sites;"
sqlite3 /tmp/fathom.db "SELECT MIN(ts), MAX(ts), SUM(pageviews) FROM page_stats WHERE site_id = 1;"
Create a website in Umami’s UI and grab its UUID, then:
# Dry run -- reads data, reports what it would insert
python3 migrate-fathom-to-umami.py \
--fathom-db /tmp/fathom.db \
--fathom-site-id 1 \
--umami-website-id <umami-uuid> \
--umami-dsn "postgresql://umami:<password>@<host>:5432/umami" \
--dry-run
# Run for real
python3 migrate-fathom-to-umami.py \
--fathom-db /tmp/fathom.db \
--fathom-site-id 1 \
--umami-website-id <umami-uuid> \
--umami-dsn "postgresql://umami:<password>@<host>:5432/umami"
Counting rows in /tmp/fathom.db (site_id=1)...
277,894 hourly rows, 666,664 total pageviews to expand
Streaming page_stats -> Umami events...
100,639 events / 90,669 sessions (15,231/277,894 rows)
...
Done! 666,664 events, 611,077 sessions written.
Next, verify in Umami by setting the date range to cover your historical data.
The full script
Takeaways
- Dry-run first when generating 666K INSERTs against a production database.
- Stream, don’t buffer. Hundreds of thousands of Python dicts will OOM small containers.
- Deterministic UUIDs = safe re-runs. Crashes mid-migration are a non-issue.
- Aggregates can be expanded. You lose per-session browsing paths, but pageview/visitor/referrer totals survive the round-trip. Good enough for historical trends.