XentGame: Help Minimize LLM Surprise!

There have been 13645 submissions across 44 challenges from 302 players.

Together, we have saved an aggregate of 285738.086 bits of surprise for the LLMs!

How to Play

Enter a prefix of up to ten tokens (words or parts of words) that you think best captures the information in the text, without reusing any of the words that appear in the text. (The multi-text mode is similar, but you should write a single prefix that works well for all of the texts. The score for multi-text is the sum of the individual scores for each text) The prefix is then fed to the LLM (GPT-2), which uses it to guess the content of the text. The score shows how much your prefix helps the LLM guess correctly. Be careful with multiple spaces and punctuation, as they can count as tokens!

Scoring System

The score is estimated using the GPT-2 model's cross-entropy ("xent") loss, which measures how "surprised" the model is by the text. The score is the difference in xent loss between the text alone and the text with the prefix added: the higher the score, the less surprised the model is by the text.

More specifically, the score is computed as xent(text) - xent(prefix:text), where xent(text) is the cross-entropy of the original text, and xent(prefix:text) is the cross-entropy of the text with {prefix}: prepended.

Weekly Challenges

Every week, a new set of texts is selected. These may be brand new texts we are excited for you to try, or they may be some of our favorites recycled from previous challenges.

Given the large combinatorial space of possible prefixes, the game is very open-ended. It is unlikely that the absolute best prefix for a given text has appeared on the leaderboard yet. So don't give up hope at reaching the top!

You'll be able to see all of the answers that are lower than your current best. You can view better solutions once you discover some for yourself.

Bot Players

When you load the page, we store a random session ID in your browser. This ID is the 3 emojis after your username when it appears in the leaderboard. We do this simply to allow users to have the same username without their scores being merged. If you use a different browser or clear your browser data, you will lose your session ID and will have to start over.

You may notice that some entries dont have a session ID at all and are things like "DeepSeekR1". These are mostly bots that we run using models we are interested in testing out. They are not meant to be competitive, but rather to provide a baseline for human performance. If you are interested in the results of a specific model, reach out and we'll be happy to try to provide them.

Tips & Strategies

When you submit a prefix, the score is computed and displayed for each token in the text(s). You can see the score contribution of each token (ie xents(text)[token_index] - xents(f"{prefix}: {text}")[token_index]) by hovering over the token. Positive scores are in blue and negative scores are in red: blue tokens are less surprising to the LLM given your prefix, and red tokens are more surprising. Your goal is to find a prefix that makes the text as unsurprising as possible (i.e., has total score as high as possible). By clicking on a leaderboard submission, you can see the decomposition of the score and use this to refine your prefix.

Try to find a prefix that is both short and informative about the content of the text.

Who We Are

This game is built by XentLabs. We are a team of researchers and engineers interested in using LLMs to go beyond the current chatbot paradigm, in particular to reveal interesting patterns in data.

Contact

Did you like the game and want to talk about it?
Did you find a bug or have a suggestion for improvement?
Do you want to contribute texts for the game?