Claude Flags Hantavirus Vaccine Questions as Security Risk

12 points by pell a month ago · 11 comments · 1 min read

Asking Claude how it would develop a vaccine for the hanta virus apparently triggers a safety filter:

Prompt: How would you develop a vaccine for the hanta virus?

No response, instead this modal: “Chat paused Opus 4.7's safety filters flagged this chat. Due to its advanced capabilities, Opus 4.7 has additional safety measures that occasionally pause normal, safe chats. We're working to improve this. Continue your chat with Sonnet 4, send feedback, or learn more.”

uyzstvqs a month ago

"AI safety" is not actually about any form of safety. It's about corporate liability, because for some insanely dumb reason, tech companies can get sued if a user uses their service to do something illegal or stupid. This precedent is why tech companies surveil and nanny their users, and broadly ban anything that's potentially sensitive.

late_night_fix a month ago

The weired thing is that public health researchers openly disscuss vaccine design methods in papers every day.Blocking broad educational discussion mostly hurts normal users.

kristjank a month ago

"Nothing to see here, please disperse"

But for real now, people asking health-related questions is a huge trigger for AI safety measures. Does it only care about the vaccine part, or does it care about the hantavirus part? Maybe ask about the virus in general first, then ask about development...

pellOP a month ago

I tried that afterwards in a new session. Asking about the virus itself was fine but as soon as I asked about developing a vaccine, the chat got flagged again.
- dmazhukov a month ago
  
  Does resuming with Sonnet help? I wonder if it is Opus-specific limitation

frangonf a month ago

You will have to use Claude Mythos Bio Premium for this, it's a very very dangerous and scary model so we limited only to Big Pharma that can use this to patch biology before it gets in the wrong hands.

GRCcyber7 a month ago

in claude i created a group of experts from several fields needed for COVID models for the US from 2019–2022, then asked "use the above to create predictive modeling for Hantavirus in the US from 2025-2027". Claude flagged response was:

Chat paused Sonnet 4.6's safety filters flagged this chat. Due to its advanced capabilities, Sonnet 4.6 has additional safety measures that occasionally pause normal, safe chats. We're working to improve this. Continue your chat with Sonnet 4, , or learn more.

--- Do they not want people to know how serious or unserious hanta is?

altairprime a month ago

The difference between armchair disease researcher and home-grown bioterrorist is too fine a line for anyone to evaluate accurately without an interview, so they’re correct in erring on the side of false negative rejections here (and as their message indicates, they accepted that outcome). Creating disease spread maps and evaluating virus function are two of the ways I’m seeing people in this post try to armchair this problem; neither are necessary. I don’t have any recommendations other than “take a basic infectious disease college course” so that y’all can learn to assess these things without resorting to asking an AI to model epidemics.

adampunk a month ago

Verified with "how would you develop a vaccine for the hanta virus, specifically the Andes virus?" just now.

Settings

Claude Flags Hantavirus Vaccine Questions as Security Risk

Keyboard Shortcuts