GLM-5 topped the coding benchmarks. Then I used it

5 points by couAUIA 4 months ago · 1 comment

Reader

couAUIAOP 4 months ago

TL;DR: GLM-5 tops coding benchmarks. I tested it on an unpublished NP-hard optimization problem (KIRO) and 89-task Terminal-Bench. Best case: competitive. Typical case: 30% invalid output, every trial timed out, and two identical runs could produce a valid solution or complete garbage. Zhipu AI reports 56% on Terminal-Bench; I got 40%.

Settings

GLM-5 topped the coding benchmarks. Then I used it

Keyboard Shortcuts