Agents.md file isn't the problem. Your lack of Evals is
tessl.ioso how would you eval your own claude.md? Each context is unique to the project, team, and personal root claude.md. Do you just take given task and ask it to redo the same one over and over again against a known solution? Do you just keep using it and "feel" whether or not it's working? How is that different from what everyone is already doing?
I don't even know what an eval is.
Okay, but how would I write evals for my project's agents file? Any good examples out there?
The agents are smart enough to write the evals too.
It's agents all the way down!
Submit a GitHub repo containing skills to Tessl, and it will generate the evals, run them, and present the results. https://tessl.io/registry/skills/submit
The evals and results are all shown, no login necessary, so you can assess them yourself. e.g. https://tessl.io/registry/skills/github/coreyhaines31/market... (click details to see the eval texts).
At first glance this looks like an entire ecosystem full of slop and by running that eval you generate more? I'm looking for something a bit more curated.
I wrote https://ai-evals.io (community site) to make the concept approachable no matter what tools you choose to use.
You can learn about them evaluating that site https://github.com/Alexhans/eval-ception and then the pattern should be easy to test on your own thing.
Doing an eval on itself is clever but confusing for the reader. How about a tutorial explaining how to do an evals on something more normal?
I'd be happy to. One thing that is tough is knowing what will resonate with the audience and not being too simple or too complex.
What do you think would resonate with you or with the audience you're thinking about?
That repo also has an illustrative eval for Agent Skill in Airflow for Localization
https://github.com/Alexhans/eval-ception/tree/main/exams/air...
I mean.. Claude kept putting in deprecated APIs for code I was getting it to write, so I adjusted the prompt to say not to + it seemed to help.
Ai;dr