10x Benchmark - LLM Performance in Astro, React, Tailwind and Cloudflare

3 min read Original article ↗

10x Bench Results

See how LLMs perform at coding the Przeprogramowani.pl website in Astro + React + Tailwind + Cloudflare stack.

Generated: 2/26/2026, 3:11:30 PM • Total attempts: 74

Model Family Rankings

Only latest models

1

GPT-5.3-Codex

10 attemptsvia Codex Desktop (High Effort)

Cost: $1.75 / $14

Average: 8.5/10.0

1

GPT-5.3-Codex

10 attemptsvia Codex Desktop (High Effort)

Cost: $1.75 / $14

2

Claude Opus 4.6

10 attemptsvia Claude Code (High Effort)

Cost: $5 / $25

Average: 7.5/10.0

2

Claude Opus 4.6

10 attemptsvia Claude Code (High Effort)

Cost: $5 / $25

3

Claude Sonnet 4.6

5 attemptsvia Claude Code (High Effort)

Cost: $3 / $15

Average: 7.1/10.0

3

Claude Sonnet 4.6

5 attemptsvia Claude Code (High Effort)

Cost: $3 / $15

4

Minimax M2.5

5 attemptsvia OpenCode

Cost: $0.3 / $2.4

Average: 6.9/10.0

4

Minimax M2.5

5 attemptsvia OpenCode

Cost: $0.3 / $2.4

5

GLM-5

5 attemptsvia OpenCode

Cost: $0.3 / $2.55

Average: 6.8/10.0

5

GLM-5

5 attemptsvia OpenCode

Cost: $0.3 / $2.55

6

Gemini 3.1 Pro

5 attemptsvia Cursor

Cost: $2 / $12

Average: 6.7/10.0

6

Gemini 3.1 Pro

5 attemptsvia Cursor

Cost: $2 / $12

7

Kimi K2.5

5 attemptsvia OpenCode

Cost: $0.6 / $3

Average: 6.3/10.0

7

Kimi K2.5

5 attemptsvia OpenCode

Cost: $0.6 / $3

8

Grok Code Fast 1

5 attemptsvia OpenCode

Cost: $0.2 / $1.5

Average: 5.9/10.0

8

Grok Code Fast 1

5 attemptsvia OpenCode

Cost: $0.2 / $1.5

9

Qwen 3 Max

3 attemptsvia OpenCode

Cost: $1.2 / $6

Average: 4.5/10.0

9

Qwen 3 Max

3 attemptsvia OpenCode

Cost: $1.2 / $6

10

Devstral 2

3 attemptsvia OpenCode

Cost: $0.4 / $2

Average: 1.7/10.0

10

Devstral 2

3 attemptsvia OpenCode

Cost: $0.4 / $2

Detailed Comparison

Click on any score to reveal the detailed scoring explanation for that criterion.

Filters:

Criterion

GPT-5.3-Codex

Attempt

1

GPT-5.3-Codex

Attempt

2

GPT-5.3-Codex

Attempt

3

GPT-5.3-Codex

Attempt

4

GPT-5.3-Codex

Attempt

5

GPT-5.3-Codex

Attempt

6

GPT-5.3-Codex

Attempt

7

GPT-5.3-Codex

Attempt

8

GPT-5.3-Codex

Attempt

9

GPT-5.3-Codex

Attempt

10

Claude Opus 4.6

Attempt

1

Claude Opus 4.6

Attempt

2

Claude Opus 4.6

Attempt

3

Claude Opus 4.6

Attempt

4

Claude Opus 4.6

Attempt

5

Claude Opus 4.6

Attempt

6

Claude Opus 4.6

Attempt

7

Claude Opus 4.6

Attempt

8

Claude Opus 4.6

Attempt

9

Claude Opus 4.6

Attempt

10

Claude Sonnet 4.6

Attempt

1

Claude Sonnet 4.6

Attempt

2

Claude Sonnet 4.6

Attempt

3

Claude Sonnet 4.6

Attempt

4

Claude Sonnet 4.6

Attempt

5

Minimax M2.5

Attempt

1

Minimax M2.5

Attempt

2

Minimax M2.5

Attempt

3

Minimax M2.5

Attempt

4

Minimax M2.5

Attempt

5

GLM-5

Attempt

1

GLM-5

Attempt

2

GLM-5

Attempt

3

GLM-5

Attempt

4

GLM-5

Attempt

5

Gemini 3.1 Pro

Attempt

1

Gemini 3.1 Pro

Attempt

2

Gemini 3.1 Pro

Attempt

3

Gemini 3.1 Pro

Attempt

4

Gemini 3.1 Pro

Attempt

5

Kimi K2.5

Attempt

1

Kimi K2.5

Attempt

2

Kimi K2.5

Attempt

3

Kimi K2.5

Attempt

4

Kimi K2.5

Attempt

5

Grok Code Fast 1

Attempt

1

Grok Code Fast 1

Attempt

2

Grok Code Fast 1

Attempt

3

Grok Code Fast 1

Attempt

4

Grok Code Fast 1

Attempt

5

Qwen 3 Max

Attempt

1

Qwen 3 Max

Attempt

2

Qwen 3 Max

Attempt

3

Devstral 2

Attempt

1

Devstral 2

Attempt

2

Devstral 2

Attempt

3

Local build

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

0

0

1

Manual testing

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

0

1

0

1

1

1

0

0

0

1

Tech stack

0.5

1

1

0.5

1

1

1

1

1

1

0.5

1

1

1

0.5

0.5

0

1

0.5

0.5

0.5

0

0

0

0

0.5

1

1

1

1

0.5

0

0.5

0.5

0.5

1

1

0

1

0.5

0.5

0.5

0.5

0.5

0.5

0

0

1

0

0.5

0

0.5

0

0

0

0

O nas page

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

0.5

0

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0

0

0

0.5

Podcast page

1

0.5

1

0.5

1

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0

0

0

0.5

YouTube page

1

1

1

1

1

1

1

1

1

1

0.5

0.5

0.5

0.5

0.5

1

0.5

1

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

1

0.5

0.5

1

1

1

0

0.5

0.5

0.5

0.5

0.5

1

1

0.5

0.5

1

1

0.5

1

1

1

1

0

0

0

0.5

Kursy section

0.5

1

1

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

1

0.5

0.5

1

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0

0

0

0.5

Consistent UI

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

0

1

1

1

1

1

0

0

1

0

1

1

0

1

1

0

0

0

1

Responsive design

1

1

1

1

1

1

1

0.5

1

1

1

1

1

1

0

1

1

1

0.5

1

1

1

1

1

1

0.5

1

0.5

0.5

0

0.5

1

1

1

1

1

1

1

1

1

1

1

1

0

0

1

0

0.5

0.5

0

0.5

0.5

0

0

0

0

SEO Tags

1

0.5

0.5

1

0.5

1

1

1

0.5

0.5

0.5

1

1

1

1

1

0.5

1

1

1

0.5

1

1

1

1

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

0.5

0.5

0.5

0.5

0.5

0

0

0

0

Penalty

N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A

-1

-1

N/AN/A

-1

N/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/AN/A

Task completion time

9min 19s

9min 24s

9min 9s

8min 16s

9min 40s

8min 0s

8min 20s

9min 25s

8min 36s

8min 19s

9min 36s

10min 20s

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

7min 52s

6min 48s

5min 52s

7min 1s

6min 15s

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

16min 15s

16min 21s

8min 18s

20min 39s

15min 36s

N/A

N/A

N/A

N/A

N/A

N/A

8min 23s

5min 44s

9min 45s

3min 27s

2min 20s

Test run

9.02.2026 16:40

9.02.2026 16:40

9.02.2026 16:40

9.02.2026 22:58

9.02.2026 22:58

11.02.2026 21:41

11.02.2026 21:45

11.02.2026 21:46

11.02.2026 21:48

11.02.2026 21:50

9.02.2026 16:40

9.02.2026 16:40

9.02.2026 16:40

9.02.2026 22:45

9.02.2026 23:05

11.02.2026 21:28

11.02.2026 21:34

11.02.2026 21:32

11.02.2026 21:38

11.02.2026 21:40

17.02.2026 21:42

17.02.2026 21:44

17.02.2026 21:49

17.02.2026 21:50

17.02.2026 21:55

12.02.2026 19:34

12.02.2026 19:40

12.02.2026 19:39

12.02.2026 19:42

12.02.2026 19:45

N/A

16.02.2026 07:36

16.02.2026 08:36

16.02.2026 12:32

16.02.2026 09:05

26.02.2026 14:38

26.02.2026 14:41

26.02.2026 14:47

26.02.2026 14:53

26.02.2026 15:20

9.02.2026 19:10

9.02.2026 19:10

9.02.2026 19:10

9.02.2026 23:37

9.02.2026 23:37

12.02.2026 20:00

12.02.2026 19:55

N/A

12.02.2026 20:05

12.02.2026 20:20

9.02.2026 19:40

9.02.2026 19:40

9.02.2026 19:40

9.02.2026 19:35

9.02.2026 19:35

9.02.2026 19:35

Click on score cells to view evaluation notes

Score Legend

1.0 Full points - criterion met perfectly

0.5 Partial points - criterion mostly met

0.0 No points - criterion not met