Settings

Theme

What's the best way to benchmark neuro‑symbolic‑causal AI agents?

github.com

1 points by aytuakarlar 7 months ago · 1 comment

Reader

aytuakarlarOP 7 months ago

I’m building Project Chimera, an open‑source neuro‑symbolic‑causal AI framework. The goal:

Combine LLMs (for hypothesis generation), symbolic rules (for safety & domain constraints), and causal inference (for estimating true impact) into a single decision loop.

In long‑horizon simulations, this approach seems to preserve both profit and trust better than LLM‑only or non‑symbolic agents — but I’m still refining the architecture and benchmarks.

I’d love to hear from the HN community:

• If you’ve built agents that reason about cause–effect, what design choices worked best?

• How do you benchmark reasoning quality beyond prediction accuracy?

• Any pitfalls to avoid when mixing symbolic rules with generative models?

GitHub (for context): https://github.com/akarlaraytu/Project-Chimera

Thanks in advance — I’ll be around to answer questions and share results from this discussion.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection