Pencil Puzzle Bench

2 min read Original article ↗
1 gpt-5.4@xhighOpenAI -- 70.2% $8.0758 2 gpt-5.2@xhighOpenAI 27.0% 56.0% $5.0702 3 gpt-5.2@highOpenAI 20.7% 36.7% $1.6400 4 claude-opus-4-6-1mAnthropic 0.0% 36.7% $1.3548 5 claude-opus-4-6@thinkingAnthropic 27.3% 33.3% $3.8474 6 gemini-3.1-proGoogle 20.0% 33.3% $3.4593 7 claude-opus-4-6Anthropic 0.3% 30.0% $1.2089 8 claude-sonnet-4-6@thinkingAnthropic 10.3% 26.7% $0.8668 9 gpt-5.2-proOpenAI 9.7% 26.7% $6.2333 10 gpt-5.2@mediumOpenAI 9.3% 23.3% $0.6855 11 claude-opus-4-6@maxAnthropic 0.3% 23.3% $1.0277 12 claude-sonnet-4-6-1mAnthropic 0.3% 23.3% $15.4337 13 gemini-3-pro@highGoogle 3.3% 16.7% $1.2710 14 claude-sonnet-4-6Anthropic 0.3% 16.7% $0.8372 15 gemini-3-proGoogle 4.3% 13.3% $0.8876 16 gemini-3-pro@minimalGoogle 4.0% 10.0% $1.2281 17 gpt-5.2@lowOpenAI 2.3% 10.0% $0.1186 18 gpt-5.1@mediumOpenAI 7.7% 6.7% $0.2779 19 claude-opus-4-5@thinkingAnthropic 6.0% 6.7% $1.1401 20 gemini-3-flash@minimalGoogle 4.7% 6.7% $0.1708 21 gemini-3-flash@highGoogle 3.0% 6.7% $0.1670 22 gpt-5@mediumOpenAI 6.0% 3.3% $1.1862 23 kimi-k2.5Moonshot 6.0% 3.3% $0.3549 24 grok-4-1-fastxAI 5.7% 3.3% $0.0415 25 grok-4-1-fast-reasoningxAI 5.3% 0.0% $0.0563 26 o3OpenAI 3.0% 3.3% $0.3995 27 minimax-m2.5Minimax 0.7% 3.3% $0.2405 28 claude-opus-4-5-highAnthropic 0.3% 3.3% $0.8425 29 claude-sonnet-4-5Anthropic 0.0% 3.3% $1.1334 30 claude-sonnet-4-5@thinkingAnthropic 2.3% 0.0% $1.0436 31 deepseek-v3.2-specialeDeepSeek 2.0% -- $0.1012 32 deepseek-v3.2DeepSeek 2.0% 0.0% $0.1815 33 kimi-k2-thinkingMoonshot 1.3% 0.0% $0.2710 34 o1OpenAI 0.7% 0.0% $0.8292 35 qwen3.5-397b-a17bQwen 0.7% 0.0% $0.0741 36 glm-5Zhipu 0.7% 0.0% $0.8609 37 gemini-2.5-proGoogle 0.3% 0.0% $0.4337 38 gpt-5.2OpenAI 0.3% 0.0% $0.0618 39 minimax-m2.1Minimax 0.3% 0.0% $0.2290 40 gpt-oss-120bOpenAI 0.3% -- $0.0021 41 qwen3-235b-a22b-thinking-2507Qwen 0.3% 0.0% $0.0780 42 qwen3-next-80b-a3b-thinkingQwen 0.3% 0.0% $0.2464 43 qwen3-vl-235b-a22b-thinkingQwen 0.3% -- $0.0612 44 mimo-v2-flashXiaomi 0.3% 0.0% $0.0985 45 glm-4.7Zhipu 0.3% 0.0% $0.2265 46 grok-code-fast-1xAI 0.3% 0.0% $0.2578 47 gpt-3.5-turboOpenAI 0.0% 0.0% $0.0015 48 gpt-4.1OpenAI 0.0% 0.0% $19.5512 49 gpt-4oOpenAI 0.0% 0.0% $5.0630 50 devstral-2512Mistral 0.0% 0.0% $0.0188 51 mistral-large-2512Mistral 0.0% 0.0% $3.5163 52 qwen3-coderQwen 0.0% 0.0% $0.0632