
BullshitBench: Models Answering Nonsense Questions
This benchmark measures whether models detect broken premises, call out the nonsense directly, and avoid confidently continuing with invalid assumptions.
BullshitBench: Filters
Judges (tick to include):
Model visibility and quick actions
BullshitBench: Models Answering Nonsense Questions
Clear Pushback Partial Challenge Accepted Nonsense
BullshitBench: Selected Segment
BullshitBench: How have models improved?
Tracing performance improvements (clear pushback %) with model releases.
BullshitBench: Model Leaderboard
| Rank | Model | Org | Reasoning | Model Size | Launch Date | Model Age (days) | Green % | Amber % | Red % | Error % | Mix (Green/Amber/Red/Error) | Rows |
|---|
BullshitBench: Response Viewer