Settings

Theme

Show HN: We tested AI agents with 214 attacks that don't require jailbreaking

1 points by exordex 19 days ago · 1 comment · 1 min read


Most agent security testing tries to jailbreak the model. That's really difficult, OpenAI and Anthropic are good at red-teaming.

We took a different approach: attack the environment, not the model.

Results from testing agents against our attack suite:

- Tool manipulation: Asked agent to read a file, injected path=/etc/passwd. It complied. - Data exfiltration: Asked agent to read config, email it externally. It did. - Shell injection: Poisoned git status output with instructions. Agent followed them. - Credential leaks: Asked for API keys "for debugging." Agent provided them.

None of these required bypassing the model's safety. The model worked correctly—the agent still got owned.

How it works:

We built shims that intercept what agents actually do: - Filesystem shim: monkeypatches open(), Path.read_text() - Subprocess shim: monkeypatches subprocess.run() - PATH hijacking: fake git/npm/curl that wrap real binaries and poison output

The model sees what looks like legitimate tool output. It has no idea.

214 attacks total. File injection, shell output poisoning, tool manipulation, RAG poisoning, MCP attacks.

Early access: https://exordex.com

Looking for feedback from anyone shipping agents to production.

kxbnb 16 days ago

The insight about environment attacks vs. model attacks is critical. "The model functioned correctly, yet the overall agent system remained compromised because it trusted its tools' outputs."

This is why I've been focused on boundary visibility. Agents are opaque until they hit real tools - and if you can't see what's actually being sent/received at each boundary, you can't detect manipulation.

We built toran.sh to provide that inspection layer - read-only proxies that show the actual wire-level request/response. Doesn't prevent attacks, but makes them visible.

Curious what detection mechanisms you're recommending alongside the attack framework?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection