Settings

Theme

Social Intelligence Benchmark

gertlabs.com

5 points by gertlabs 16 days ago · 2 comments

Reader

big-chungus4 16 days ago

I've been running stuff like this too. I ran one "benchmark" where there are 10 agents, each agent initially only knows the name of the next agent in the list, and the goal for them is that each agent has a unique order 1 to n to assigned to them, where n is number is agents (also initially unknown to them). They are invoked in a random other and can only message one other agent per step.

Qwen3.5-9B was able to do this after a lot of time.

Qwen3.6-35B-A3B failed because it kept insisting that it needs to know n, but didn't try to figure it out by messing other agents.

Granite 4.1 9B failed completely because it was just writing non-descriptive massages like "Request to know the order" to the other agents and not replying to anyone.

  • gertlabsOP 16 days ago

    Nice, that's a good one -- interesting dynamics can come out of deceptively simple social games.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection