Social Intelligence Benchmark

5 points by gertlabs 16 days ago · 2 comments

Reader

I've been running stuff like this too. I ran one "benchmark" where there are 10 agents, each agent initially only knows the name of the next agent in the list, and the goal for them is that each agent has a unique order 1 to n to assigned to them, where n is number is agents (also initially unknown to them). They are invoked in a random other and can only message one other agent per step.

Qwen3.5-9B was able to do this after a lot of time.

Qwen3.6-35B-A3B failed because it kept insisting that it needs to know n, but didn't try to figure it out by messing other agents.

Granite 4.1 9B failed completely because it was just writing non-descriptive massages like "Request to know the order" to the other agents and not replying to anyone.

gertlabsOP 16 days ago

Nice, that's a good one -- interesting dynamics can come out of deceptively simple social games.

Settings

Social Intelligence Benchmark

Keyboard Shortcuts