Show HN: I built a firewall for agents because prompt engineering isn't security

7 points by yaront111 22 days ago · 7 comments · 2 min read

Reader

Hi HN, I’m the creator of Cordum.

I’ve been working in DevOps and infrastructure for years (currently in the fintech/security space), and as I started playing with AI agents, I noticed a scary pattern. Most "safety" mechanisms rely on system prompts ("Please don't do X") or flimsy Python logic inside the agent itself.

If we treat agents as autonomous employees, giving them root access and hoping they listen to instructions felt insane to me. I wanted a way to enforce hard constraints that the LLM cannot override, no matter how "jailbroken" it gets.

So I built Cordum. It’s an open-source "Safety Kernel" that sits between the LLM's intent and the actual execution.

The architecture is designed to be language-agnostic: 1. *Control Plane (Go/NATS/Redis):* Manages the state and policy. 2. *The Protocol (CAP v2):* A wire format that defines jobs, steps, and results. 3. *Workers:* You can write your agent in Python (using Pydantic), Node, or Go, and they all connect to the same safety mesh.

Key features I focused on: - *The "Kill Switch":* Ability to revoke an agent's permissions instantly via the message bus, without killing the host server. - *Audit Logs:* Every intent and action is recorded (critical for when things go wrong). - *Policy Enforcement:* Blocking actions based on metadata (e.g., "Review required for any transfer > $50") before they reach the worker.

It’s still early days (v0.x), but I’d love to hear your thoughts on the architecture. Is a separate control plane overkill, or is this where agentic infrastructure is heading?

Repo: https://github.com/cordum-io/cordum Docs: [Link to your docs if you have them]

Thanks!

TeamCommet1 21 days ago

Regarding the separate control plane: I don't think it's overkill if you're aiming for multi-agent orchestration. A safety mesh needs to be centralized to maintain a global state of permissions. If you bake the safety logic into each worker, you end up with the same "flimsy logic" problem you're trying to solve.

Curious, how are you handling latency in the CAP v2 protocol when the control plane has to intercept every intent before execution?

amadeuswoo 22 days ago

Interesting architecture. Im curious about the workflow when an agent hits a denied action, does it get a structured rejection it can reason about and try an alternative, or does it just fail? Wondering how the feedback loop works between safety kernel and the LLM's planning

yaront111OP 22 days ago

Great question. This is actually a core design principle of the Cordum Agent Protocol (CAP).
It’s definitely a *structured rejection*, not a silent fail. Since the LLM needs to "know" it was blocked to adjust its plan, the kernel returns a standard error payload (e.g., `PolicyViolationError`) with context.
The flow looks like this: 1. *Agent:* Sends intent "Delete production DB". 2. *Kernel:* Checks policy -> DENY. 3. *Kernel:* Returns a structured result: `{ "status": "blocked", "reason": "destructive_action_limit", "message": "Deletion requires human approval" }`. 4. *Agent (LLM):* Receives this as an observation. 5. *Agent (Re-planning):* "Oh, I can't delete it. I will generate a slack message to the admin asking for approval instead."
This feedback loop turns safety from a "blocker" into a constraint that the agent can reason around, which is critical for autonomous recovery.

exordex 22 days ago

I built formal testing for AI agents, runs on the cli, free version launching soon - includes MCP security tests and chaos engineering features: https://exordex.com/waitlist

yaront111OP 21 days ago

Exordex is a great tool for the CI/CD pipeline to test agents. Cordum is the Runtime Kernel that enforces those policies in production. Ideally? You use Exordex to test that your agent works, and Cordum to guarantee it stays safe.

hackerunewz 22 days ago

Nice job, but is'nt it a bit overkill?

yaront111OP 22 days ago

It is overkill for a demo. But for my production environment, I need an external safety layer. I can't rely on 'prompt engineering' when real data is at stake.

Settings

Show HN: I built a firewall for agents because prompt engineering isn't security

Keyboard Shortcuts