Comparing Gemini Pro 3, Opus 4.6, GLM-5 and Kimi 2.5 in a mid-sized Go project

2 points by vampiregrey 14 hours ago · 0 comments · 1 min read

Last week I ran a small experiment while building a mid-sized Go backend (APIs + some concurrency-heavy logic + a bit of refactoring).

I tested:

- Gemini Pro 3 - Opus 4.6 - GLM-5 - Kimi 2.5

My rough criteria:

- Code correctness (first-pass compile success) - Quality of architectural suggestions - Refactor clarity - Handling of existing code context - Cost per useful output

Surprisingly (at least to me), Kimi 2.5 gave the best cost/performance ratio for this particular workload. It wasn’t always the most “verbose” or polished, but it required the fewest correction loops per dollar spent.

Opus 4.6 felt strong on reasoning-heavy changes, but cost scaled quickly. Gemini Pro 3 was decent but inconsistent in multi-file refactors. GLM-5 was interesting but sometimes hallucinated internal project structures.

This is obviously anecdotal and project-specific.

Curious:

What models are people here using for real-world codebases?

Has anyone benchmarked cost vs correction loops?

Are people optimizing for raw quality or iteration speed per dollar?

Would love to hear other dev experiences, especially from people working in Go or other statically typed backends.

No comments yet.

Settings

Comparing Gemini Pro 3, Opus 4.6, GLM-5 and Kimi 2.5 in a mid-sized Go project

Keyboard Shortcuts