xAI: Grok 4.20 Beta vs xAI: Grok 4.20 Multi-Agent Beta vs xAI: Grok 4.1 Fast vs Google: Gemini 3 Flash Preview | AI BENCHY

1 min read Original article ↗

AI BENCHY Compare

Compared models

Last updated at: 2026-03-21

Metric Grok 4.20 Beta Grok 4.20 Beta medium Release: 2026-03-12 Grok 4.20 Multi-Agent Beta Grok 4.20 Multi-Agent Beta medium Release: 2026-03-12 Grok 4.1 Fast Grok 4.1 Fast medium Release: 2025-11-19 Gemini 3 Flash Preview Gemini 3 Flash Preview medium Release: 2025-12-17
Score 7.9 6.2 6.9 10.0
Rank #23 #47 #38 #1
Consistency 9.0 7.2 7.5 10.0
Tests Correct
Attempt pass rate 72.6% 54.9% 66.7% 100.0%
Flaky tests 2 6 5 0
Total Runs 51 51 51 51
Cost per result 5.525 82.962 0.568 0.972
Total Cost $0.608 $4.978 $0.052 $0.166
Input Price $2.000 / 1M $2.000 / 1M $0.200 / 1M $0.500 / 1M
Output Price $6.000 / 1M $6.000 / 1M $0.500 / 1M $3.000 / 1M
Output Tokens 1,487 298,948 1,189 1,640
Reasoning Tokens 87,922 296,529 84,595 48,270
Response Time (avg) 8.54s 8.64s 23.91s 11.39s
Response Time (max) 24.21s 35.28s 121.79s 50.16s
Response Time (total) 145.26s 129.64s 239.09s 113.86s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Quick Compare

Switch Comparison Pair