Same AI agent, different prompts: 0% vs. 62% security pass rate

1 points by xsourcesec a month ago · 1 comment · 1 min read

I've been testing production AI agents for vulnerabilities.

Interesting finding: System prompt design matters more than the model itself.

Same agent. Same task. Same attack vectors. Only difference: how the system prompt was structured.

Results: → Prompt A: 0% pass rate (failed every test) → Prompt B: 62.5% pass rate

No model change. No fine-tuning. Just prompt engineering.

Anyone else seeing this pattern? What's your approach to hardening AI agents?

chrisjj a month ago

Surely this is no more than expected. Just as say a building might have some entrances secure and others not.

Settings