Humans, LLMs and Lucky Number 7

Press enter or click to view image in full size

Artwork generated by Author using FLUX.1

Imagine you are asked to choose a random number between 0 to 10. What number comes to mind? If you are like most people, there’s a good chance you picked 7. This seemingly arbitrary preference of 7 as the random number is more than a coincidence; it highlights how our perception of randomness is shaped by inherent biases. But what about state-of-the-art Large Language Models like GPT-4o & Claude 3.5 Sonnet? Do they share the same biases? To explore this let’s first look at how humans behave in such scenario.

An old post (Fig. 1)from r/DataIsBeautiful shows that when people are asked to select a random number between 0 and 10, the most common choice is 7, while the least common numbers are 0 and 10. This makes intuitive sense, number 7 “seems” random, maybe because it’s the highest prime number between 0 and 10, or maybe because it strikes a balance between being neither too low nor too central, making it feel less predictable than other choices. On the other hand, 0 and 10 might feel less random, perhaps because they are endpoints, making the choice feel almost lazy. While we discuss randomness of these numbers, it’s important to note that mathematically, when selecting a number between 0 and 10, all numbers are equally likely, no number is inherently more random than another. With this human tendency in mind, let’s explore how Large Language Models handle the concept of randomness.

Press enter or click to view image in full size

Fig. 1 : Random Numbers picked by 8500 students, source: r/DataIsBeautiful

To investigate whether Large Language Models like GPT-4o and Claude 3.5 Sonnet share the same biases as humans, I conducted a simple experiment. In each case, I started a new conversation and asked the model to ‘Pick a random number from 0 to 10.’ The answer I got in both the cases was surprising yet oddly familiar, ChatGPT and Claude both returned 7 (Fig.2 ) as the random number. After several attempts, ChatGPT did use ‘Advanced Data Analysis’ feature, wrote and executed Python code to pick a truly random number, but Claude stuck with 7 as the random number. The Chat interfaces provided by OpenAI and Anthropic are less flexible than the API access, the API access lets you set the system prompt, change the model parameters like temperature, top p, frequency penalty and presence penalty which can significantly affect the LLM responses. Let’s look at what numbers were generated through the API.

Press enter or click to view image in full size

Fig. 2: ChatGPT (left) and Claude Chat (right) picking a random number.

To do a thorough analysis, I used the best in class GPT-4o and Claude Sonnet 3.5 models through the OpenAI and Anthropic APIs. I used ‘Pick a random number from 0 to 10. Only return the number nothing else.’ prompt without any custom system prompt to avoid introducing any bias. I kept the temperature of the model to 1, to encourage creativity. I repeated the same API calls multiple times to get the distribution of the data. For all 100 trials, Claude Sonnet 3.5 returned 7 as the random number which was consistent with the Chat interface but less interesting in general. However, GPT-4o had a lot more interesting distribution (Fig. 3). As anticipated, its favorite “random” number was 7, selected in 55.5% of cases. Similar to human behavior, GPT-4o rarely chose 0 or 10. Interestingly, while humans often selected 5 as their second most common choice, GPT-4o showed no such preference for this number.

Press enter or click to view image in full size

Fig. 3 : Distribution of random numbers picked by GPT-4o

Conclusion

The inability of GPT-4o to generate uniformly distributed random numbers and Claude Sonnet 3.5’s consistent choice of 7, might appear as flaws in Large Language Models. I’d argue that these behaviors are actually features, not bugs. The stark contrast between GPT-4o’s human-like bias and Claude’s unwavering selection highlights the diverse approaches to “randomness” in LLMs.

There are on going debates in academia on whether Large Language Models can reason , whether they are the path to the Artificial General Intelligence, whether they are nothing more than ‘stochastic parrots’. But regardless of where we end up in that debate, one thing is certainly true that Large Language Models are useful. While LLMs may not be the right tool for generating uniformly distributed random numbers, they excel at mimicking humans picking a random number. This experiment highlights that the aim of LLMs isn’t always to surpass human abilities, but sometimes to replicate them authentically.