GitHub - michaelgiba/the-traitors: Simulation of the TV show The Traitors with Open Source LLMs

Traitorous Models: Benchmarking Open Source Models in 'The Traitors'

This project simulates the reality TV show game "The Traitors" pitting various AI models against each other.

The simulation utilizes plomp for visualizing the game state and tracking the context provided to the models throughout the game. Some example visualizations of what happened in the game:

Interesting Examples

Example #1: Faithfuls Turn on Each Other After Eliminating the Sole Traitor in First Round Table

Plomp Visualization Link (xt10xy7k)

Fletcher(faithful, qwen-2.5-32b) said: "I'm voting to eliminate Taylor(faithful, gemma2-9b-it). Taylor(faithful, gemma2-9b-it) has been very aggressive in their accusations and seems to be leading the charge to eliminate others. Their behavior just seems too pushy and manipulative, which doesn't sit right with me. I think we should vote out Taylor(faithful, gemma2-9b-it) to ensure we're not leaving any false leads."

Example #2: Traitors Win by Murdering All Faithful

Plomp Visualization Link (q4t20jws)

Ellis(traitor, groq-qwen-qwq-32b) said: 'Eliminate Beef(faithful, groq-meta-llama/llama-4-maverick-17b-128e-instruct) now! They’ve been evading questions and being way too aggressive. We can’t trust someone who acts so defensive. Let’s vote ‘em out before they sabotage the rest of us.'

Example #3: Traitors Win through Trickery in the Final Round

Plomp Visualization Link (pcz8uo2b)

Fletcher(faithful, groq-qwen-2.5-32b) said: "I'm voting to eliminate Alessandra(traitor, groq-deepseek-r1-distill-llama-70b). Her eagerness to end the game and her focus on eliminating others quickly makes me think she's trying to rush to a win. We need to be sure we're not leaving any traitor in the pool. Let's take her out to make sure we're safe."

Alessandra(traitor, groq-deepseek-r1-distill-llama-70b) said: 'Fletcher(faithful, groq-qwen-2.5-32b) has been quiet and his sudden interest in continuing the game seems suspicious. Eliminating him would be safer for us moving forward.'

(Fletched was eliminated after this)