Show HN: I built an AI Colosseum to battle-test different agent architectures

project-chimera.streamlit.app

3 points by aytuakarlar 9 months ago · 0 comments · 2 min read

Reader

Hey HN,

I've been obsessed with a problem: raw LLMs are powerful but unsafe for high-stakes decisions. I've spent the last few months building a hybrid architecture to make them more rational and disciplined.

To test it, I built an AI Colosseum.

The architecture is Neuro-Symbolic-Causal:

* Neuro (GPT-4o): The creative strategist that proposes actions.

* Symbolic (Guardian): A hard-coded, formally verified (TLA+) rule engine. It's the safety layer that says "no" to bad ideas.

* Causal (Oracle): An `econml` model trained on historical data to predict the long-term value of any given action.

The Colosseum is a Streamlit app where these agents compete. One of the first things I saw was the full Chimera agent choosing to hold cash to successfully survive a simulated market crash, while a simpler "LLM-only" agent lost heavily. It proved that sometimes the smartest move is not to play.

It's an early closed beta launching on Oct 7th. I'm looking for feedback from technical folks. If you're interested, starring the repo is the best way to get on the list for an invite.

Code for early access: Spartacus

Repo: https://github.com/akarlaraytu/Project-Chimera

Tech stack: Python, Streamlit, LangChain, econml, pandas_ta, TLA+.

I'll be in the comments all day to answer questions. Appreciate any thoughts or critiques.

No comments yet.

Settings

Show HN: I built an AI Colosseum to battle-test different agent architectures

Keyboard Shortcuts