LLMs exceed physicians on complex text-based differential diagnosis

3 points by rippeltippel a month ago · 2 comments

Reader

This was using o3. GPT 5.2/5.3 should be much improved.

Just like software engineering, it may be best to leave it up to the AI to do the work but let a human guide it and check it.

techblueberry a month ago

I wonder if we’ll have to develop strategies for battling confirmation bias. Human review only works if the review is independent.

Settings