Settings

Theme

Grok 4.20 beats all other AI models in Alpha Arena test

sammyfans.com

13 points by terryds 12 days ago · 2 comments

Reader

gavinray 12 days ago

  > Grok 4.20 reportedly uses real-time data like market trends and news to make fast decisions. 
I assumed all of the models were doing that, using at least Web Search tools.

My hunch of why Grok's other model performed top-3 was due to access to Tweets, which are sentiment analysis gold mine for ticker symbols.

  • ben_w 11 days ago

    > I assumed all of the models were doing that, using at least Web Search tools.

    Sometimes. The other week I was asking ChatGPT about the UK PM, and had to stop the generation early because it started ~"Prime Minister Rishi Sunak…"

    The unreliability is also why techniques as simple as "ask 5 times and have it take a vote of its own answers" boost performance. Or "thinking" modes which are approximately just replacing the end token with "Wait." and continuing for ten rounds.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection