Square Minus Square – A coding agent benchmark
aedm.netHave you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising