BullshitBench Explorer

1 min read Original article ↗

BullshitBench logo

BullshitBench: Models Answering Nonsense Questions

This benchmark measures whether models detect broken premises, call out the nonsense directly, and avoid confidently continuing with invalid assumptions.

BullshitBench: Filters

Judges (tick to include):

Model visibility and quick actions

BullshitBench: Models Answering Nonsense Questions

Clear Pushback Partial Challenge Accepted Nonsense

BullshitBench: Selected Segment

BullshitBench: How have models improved?

Tracing performance improvements (clear pushback %) with model releases.

BullshitBench: Model Leaderboard

Rank Model Org Reasoning Model Size Launch Date Model Age (days) Green % Amber % Red % Error % Mix (Green/Amber/Red/Error) Rows

BullshitBench: Response Viewer