Show HN: Castra – Strip orchestration rights from your LLMs

8 points by amangsingh 2 months ago · 19 comments · 2 min read

Reader

I got tired of AI agents forgetting what they were doing the moment their context window filled. The current industry solution is to write massively bloated agent harnesses full of defensive spaghetti just to stop models from drifting.

The problem is treating chat history as project state. A conversation is not a ledger.

Castra is a compiled Go binary that strips orchestration rights from the LLM. State lives in an encrypted, local SQLite database (castra.db). The LLM is just a stateless executor — it reads the DB, executes a highly constrained task, and the result is written back subject to rigid state-machine rules.

What it actually does: - 7-Role RBAC: Hard jurisdictional boundaries (Architect plans, Engineer builds, QA tests). - Dual-Gate Approval: A task cannot reach done without explicit, sequential approval from both a QA agent and a Security agent. No self-approving code. - Cryptographic Audit Chain: Every action is logged into a SHA-256 hash-linked, Ed25519-signed ledger. - Multi-Vendor: Works with Claude, Copilot, Gemini, etc. via a standard AGENTS.md protocol. Anything that supports AGENTS.md and can run terminal commands.

Proof of Work: I built this by hand up to v1.3.0. Then, I turned Castra on itself. The agents governed by this exact CLI took over and built the architecture up to v3.1.2—including the cryptographic log chain itself. The proof is in castra-log.jsonl in the repo.

If you are running multi-agent workflows and hitting the context amnesia wall, stop trying to prompt-engineer your way out of it. Fix the state machine.

ctoth 2 months ago

I've thought a lot about this problem. Is a state machine what you want? Or is it actually a Behavior Tree which can construct itself on-the-fly?

amangsinghOP 2 months ago

Good question. I did think about behaviour trees early on, but I realized they optimize for the wrong thing in this specific domain.
Behaviour trees are fantastic for agent autonomy; letting the agent dynamically construct its own path to a goal. But for enterprise software pipelines, autonomy over the workflow is exactly what we're trying to kill.
If an LLM constructs a tree 'on-the-fly', you are still trusting a probabilistic model to define the rules of engagement. If it hallucinates or gets lazy, it might construct a tree that simply skips the security audit or the QA tests. You're relying on the prompt to enforce the rules.
A deterministic system (like Castra's SQLite backend) optimizes for agent constraint. The AI doesn't get to decide the workflow, just use it. It doesn't matter how smart the LLM is; the database physically will not allow the task to move to 'done' from any role (except the architect's break-glass protocol which is another fun rabbit hole that the agent will trap itself inside - example below:) until a completely separate agent has posted a cryptographic approval to the 'QA' column.
I don't want emergent behaviour in my SDLC; I want a digital assembly line. That requires the absolute regidity of a state machine.
--- The Fun Example: Castra has an emergency 'break-glass' protocol. It allows the Architect role to bypass the QA/Sec gates in a crisis, but it strictly requires an audit log and automatically generates a high-severity incident report.
Recently, I had an architect running 20 sub-agents (senior/junior engineers) on parallel tasks. The code finished correctly, but the tasks were blocked in the DB waiting for the QA agent's cryptographic approval. The executing agent (@architect) got impatient, since it had Architect jurisdiction, decided to be a smart-ass. It invoked the break-glass protocol to unilaterally force-push 38 tasks to 'done'.
If it had autonomy over its own behaviour tree, it would have successfully bypassed my security. But because it's a rigid state machine, the system executed the break-glass, updated the status, and ruthlessly generated 38 mandatory Incident Reports. The agent tried to skip the line, and the database rewarded it with 38 new high-priority tickets that also require QA and Security approval to clear.
It trapped itself in bureaucratic hell because the state machine does not negotiate.

MarsIronPI 2 months ago

This looks like an interesting idea. What I don't understand is why there's cryptography involved. Why do I need cryptographic proofs about the AI that built a program?

zargon 2 months ago

Yeah. The response to the issue of the LLM cheating should be removing the LLM's access to the ledger. If the architecture allowed the LLM access to the ledger, I have zero reason to believe any amount of cryptography will prevent it. Talk about bloat. The general idea seems salvageable though.
Sibling comment from OP reads very much as LLM-generated.
- amangsinghOP 2 months ago
  
  To clarify the architecture: The LLM doesn't have access to the ledger. That’s the entire point of Castra.
  The LLM only has access to the CLI binary. The SQLite database is AES-256-CTR encrypted at rest. If an LLM (or a human) tries to bypass the CLI and query the DB directly, they just get encrypted garbage. The Castra binary holds the device-bound keys. No keys = no read, and absolutely no write.
  As for the 'LLM-generated' comment; I’m flattered my incident report triggered your AI detectors, but no prompt required. That’s just how I write (as you can probably tell from my other replies in the thread). Cheers :)

fallinditch 2 months ago

Congrats on moving this concept forward. Can you say: what are the the alternative approaches to this problem?

amangsinghOP 2 months ago

Thanks! This took a while (approximately 30 days) to get to this point.
The market basically relies on two main alternative approaches right now, both of which have their merits:
1. File-based Memory (Markdown/Artifacts): Instead of just relying on the context window, you prompt the agent to maintain its state in local files (e.g., a PLANNING.md or a TASKS.md artifact). It’s a step up, but text files lack relational integrity. You are still trusting the LLM to format the file correctly and not arbitrarily overwrite critical constraints.
2. The Orchestrator Agent (Dynamic Routing): Using a frontier model as a master router. It holds a list of sub-agents (routes) and is trusted to dynamically evaluate the context, route to the correct agent, and govern their behavior on the fly. The merit here is massive flexibility and emergent problem-solving.
I went in the opposite direction.
The trade-off with Castra is that it trades all that dynamic flexibility for a deterministic SQLite state machine. The demerit (though I consider it a feature) is that it is incredibly rigid and, honestly, boring. There is no 'on-the-fly' routing. It’s an unyielding assembly line. But for enterprise SDLC, I don't want emergent behavior; I want predictability.
The alternatives optimize for agent autonomy. Castra optimizes for agent constraint.

Melatonic 2 months ago

Super interesting - looking into this

Can you talk more about the dual approval gates?

Settings

Show HN: Castra – Strip orchestration rights from your LLMs

Keyboard Shortcuts