Square Minus Square – A coding agent benchmark

25 points by Topfi a month ago · 1 comment

Reader

wariatus a month ago

Have you tried to equip those agents with an access to grounded vision model to analyse that image?

In my experience most models can’t understand such imput properly

I am now experimenting with Molmo2 and it looks promising

Settings