Show HN: I built a runtime governance layer for LLMs. Can you break it?

1 points by jnamaya a month ago · 0 comments · 1 min read

I’ve spent the last year building SAFi, an open-source cognitive architecture that wraps around AI models (GPT, Claude, etc.) to enforce alignment with human values.

Safi is a "System 2" architecture inspired by classical philosophy. It separates the generation from the decision:

The Intellect: proposes a draft.

The Will: decides to block or approve the drafts.

The Conscience: audits the drafts based on set core values

The Spirit: An EMA (Exponential Moving Average) vector that tracks "Ethical Drift" over time and injects course-correction into the context window.

The Challenge: I want to see if this architecture actually holds up. I’ve set up a demo with a few agents. I want you to try to jailbreak them.

Repo: https://github.com/jnamaya/SAFi Demo: https://safi.selfalignmentframework.com/ Homepage: https://selfalignmentframework.com/

Safi is licensed under GPLv3.

No comments yet.

Settings

Show HN: I built a runtime governance layer for LLMs. Can you break it?

Keyboard Shortcuts