BullshitBench: GPT-5.5 and 5.5-Pro update! They did NOT do well - 5.5 about the same level as GPT-5.4 (around 30-35 rank, 45% pushback). GPT-5.5-Pro did WORSE - only about 35% pushback. I must say the Pro result kind of shocked me. This is actually interesting, what this tells https://t.co/EleznVT6Po

1 min read Original article ↗

BullshitBench: GPT-5.5 and 5.5-Pro update! They did NOT do well - 5.5 about the same level as GPT-5.4 (around 30-35 rank, 45% pushback). GPT-5.5-Pro did WORSE - only about 35% pushback. I must say the Pro result kind of shocked me. This is actually interesting, what this tells me: - Bigger model does NOT mean better performance - More thinking does NOT mean better performance - It must be something about mid/post training that makes models do better, at least after the certain size Links to data viewer and github below