What's the best way to benchmark neuro‑symbolic‑causal AI agents?

1 points by aytuakarlar 9 months ago · 1 comment

Reader

I’m building Project Chimera, an open‑source neuro‑symbolic‑causal AI framework. The goal:

Combine LLMs (for hypothesis generation), symbolic rules (for safety & domain constraints), and causal inference (for estimating true impact) into a single decision loop.

In long‑horizon simulations, this approach seems to preserve both profit and trust better than LLM‑only or non‑symbolic agents — but I’m still refining the architecture and benchmarks.

I’d love to hear from the HN community:

• If you’ve built agents that reason about cause–effect, what design choices worked best?

• How do you benchmark reasoning quality beyond prediction accuracy?

• Any pitfalls to avoid when mixing symbolic rules with generative models?

GitHub (for context): https://github.com/akarlaraytu/Project-Chimera

Thanks in advance — I’ll be around to answer questions and share results from this discussion.

Settings

What's the best way to benchmark neuro‑symbolic‑causal AI agents?

Keyboard Shortcuts