Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents thinkwright.ai 2 points by oceanwaves 2 months ago · 1 comment Reader PiP Save No comments yet.