Building a Group TV Recommendation Engine

3 min read Original article ↗

AI sucks at following instructions; as tokens increase, its accuracy drops. Ultimately, it lets you down. Like a hyped up kid on too much addy with 17 different distractions, lets just say it tends to drift a little from your initial intent.

What it’s great at is objectively analyzing against a set of principles.

In practical terms: go find me a list of tv shows with this criteria gets you vastly inferior results to here's a bunch of metadata about a TV Show - render a verdict based on the following criteria

Watch/Skip verdicts rolling in for this week's popular shows.

Why I built it

I have a discord group of friends that regularly hangs together - play games, stream movies, watch TV. Normal stuff - Except we watch a lot, and sometimes it’s difficult to figure out what to watch next. Classic.

There’s so much TV, all coming out on different schedules, on different streaming services. The TV show databases out there only give a generic answer to what’s popular. Asking our homie Claudius directly only gets you info about some hits based on last year’s data - I’ve tried. Not helpful.

What if I could somehow divine our group preferences based on what we’ve watched previously and judge any new TV that comes out against it? So I built it - with swamp.

How it works

Three pieces, one workflow.

The composition

  • The host - The recommendation engine - generated by distilling about 20MB of yappity over the past couple of years. Decides - Watch or Skip?
  • Rotten Tomatoes for list of what’s new and what’s popular
  • The Movie Database to enrich those listings with metadata

The whole pipeline is one workflow: scrape the browse page, enrich each show via TMDB, fan out one Claude call per show with host.md baked into the prompt, aggregate the verdicts into the table.

The results

The holy grail - an answer! In the shape of a swamp report. Best part? It’s all versioned, query-able data, stored as an artifact of the run. Next time you run Claude you can say - “We watched X, what was left on the list?” And it can just look it up. Or you can.

Run it yourself

The extension is on Swamp Club as @keeb/mms. You’ll need swamp, node + playwright for the RT scraper, a TMDB key, and an Anthropic key.


swamp extension pull @keeb/mms
npm install playwright


swamp vault put homelab tmdb-api-key sk-...
swamp vault put homelab anthropic-api-key sk-ant-...


swamp model create @keeb/rottentomatoes rottentomatoes
swamp model create @keeb/anthropic/claude host-evaluator
swamp model create @keeb/mms/tv-recommender tv-recommender

Then edit each instance’s globalArguments. The scraper needs your TMDB key (it’s a model arg, not just used by the enricher) and runs headless:


type: '@keeb/rottentomatoes'
globalArguments:
tmdbApiKey: '${{ vault.get("homelab", "tmdb-api-key") }}'
headless: true

host-evaluator is just @keeb/anthropic/claude pointed at your Anthropic key. Pick the model and cap the tokens — verdicts are short:


type: '@keeb/anthropic/claude'
globalArguments:
apiKey: '${{ vault.get("homelab", "anthropic-api-key") }}'
model: claude-sonnet-4-5-20250929
maxTokens: 400

tv-recommender needs nothing — leave it empty and it uses the host.md bundled with the extension. Point personaPath at a file in your repo to bring your own:


type: '@keeb/mms/tv-recommender'
globalArguments:

Then run it:

swamp workflow run rt-popular --input limit=20
swamp data get --workflow rt-popular report-keeb-mms-tv-recommendations

Swap rt-popular for rt-newest if you’d rather evaluate what just dropped instead of what’s trending. Replace the bundled host.md with your own group’s profile and you’ve got a recommender that knows your people.

What are you watching?