Migrating from Fathom Lite to Umami

3 min read Original article ↗

For years I’ve been running Fathom Lite to track visitors on this website. With track I don’t mean in the creepy way, just some privacy respecting stats on which posts get the most traffic. But, Fathom Lite has been archived for years (I didn’t know). With a recent homelab overhaul (more on that in a later post) I replaced it with Umami. It’s MIT-licensed, self-hosted, and lightweight. The challenge: migrating six years of historical data from Fathom over to Umami.

Full disclosure: Claude Code helped me a lot writing, testing and running the script mentioned below. I use Claude extensively these days to manage almost all aspects of my homelab setup.

The problem

Fathom stores hourly aggregates in SQLite: “42 pageviews on /blog/post/ between 14:00-15:00”. Umami stores individual events: one website_event row per pageview, linked to a session. There’s no aggregate import.

My dataset: 277K hourly rows, 666K pageviews, five sites, dating back to March 2020.

The migration script expands each aggregate back into discrete event rows:

  1. Creates N website_event rows per hourly bucket, timestamps spread randomly within the hour
  2. Generates deterministic session UUIDs for the reported visitor count
  3. Distributes pageviews across sessions round-robin
  4. Attaches referrer domains from referrer_stats proportionally

The result is synthetic but re-aggregates correctly in Umami’s dashboard. Page paths, pageview counts, visitor counts, and referrers are preserved exactly. Browser/OS/device data is NULL because Fathom never tracked it.

Key design decisions

Deterministic UUIDs for idempotency

All IDs use uuid5 with a fixed namespace, so re-running produces identical rows. Combined with ON CONFLICT DO NOTHING, the script is safe to re-run after crashes.

NAMESPACE_UUID = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")

def det_uuid(namespace: str, *parts: str) -> str:
    return str(uuid.uuid5(NAMESPACE_UUID, f"{namespace}:{'|'.join(parts)}"))

Streaming batches to avoid OOM

The first version buffered all 666K events in memory and OOM-killed the 1GB LXC container than runs Umami. The fix: stream rows from SQLite, flush to PostgreSQL every 2,000 events. Peak memory dropped from ~500MB to ~50MB.

for row in iter_page_stats(fathom_db, site_id):
    # expand row into events, append to batch
    if len(event_batch) >= batch_size:
        flush_batch(pg_cur, session_batch, event_batch)
        pg_conn.commit()
        session_batch.clear()
        event_batch.clear()

Running it

Prerequisites: Python 3.11+, psycopg2, access to both the Fathom SQLite DB and Umami’s PostgreSQL.

# Grab the Fathom database (mine ran on docker)
scp root@<old-host>:/var/lib/docker/volumes/fathom/_data/fathom.db /tmp/fathom.db

# Explore what you have
sqlite3 /tmp/fathom.db "SELECT * FROM sites;"
sqlite3 /tmp/fathom.db "SELECT MIN(ts), MAX(ts), SUM(pageviews) FROM page_stats WHERE site_id = 1;"

Create a website in Umami’s UI and grab its UUID, then:

# Dry run -- reads data, reports what it would insert
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami" \
    --dry-run

# Run for real
python3 migrate-fathom-to-umami.py \
    --fathom-db /tmp/fathom.db \
    --fathom-site-id 1 \
    --umami-website-id <umami-uuid> \
    --umami-dsn "postgresql://umami:<password>@<host>:5432/umami"
Counting rows in /tmp/fathom.db (site_id=1)...
  277,894 hourly rows, 666,664 total pageviews to expand
Streaming page_stats -> Umami events...
  100,639 events / 90,669 sessions (15,231/277,894 rows)
  ...
Done! 666,664 events, 611,077 sessions written.

Next, verify in Umami by setting the date range to cover your historical data.

The full script

Takeaways

  • Dry-run first when generating 666K INSERTs against a production database.
  • Stream, don’t buffer. Hundreds of thousands of Python dicts will OOM small containers.
  • Deterministic UUIDs = safe re-runs. Crashes mid-migration are a non-issue.
  • Aggregates can be expanded. You lose per-session browsing paths, but pageview/visitor/referrer totals survive the round-trip. Good enough for historical trends.