Show HN: A Vercel-like workflow for AI evals that makes sense

8 points by vikasnair 2 years ago · 1 comment

Reader

Hi HN!

Most of us get how crucial AI evals are now. The thing is, almost all the eval platforms we've seen are clunky and don't have any product cohesion. There's too much manual setup and adaptation needed, which breaks developers' workflows.

That's why we're releasing a simpler workflow.

If you're using GitHub, you only need to add two files to the repo (one config + one script). Then, connect your repo to Openlayer and define must-pass tests for your AI system. Once integrated, every commit triggers these tests automatically on Openlayer, ensuring continuous evaluation without extra effort.

We offer 100+ tests (and are always adding more), including custom tests. We're language-agnostic, and you can customize the workflow using our CLI and REST API.

As a final note, you can leverage the same setup to monitor your live AI systems after you deploy them. It's just a matter of setting some env vars in your staging/prod environments, and your Openlayer tests will run on top of your live data and send alerts if they start failing.

Let us know what you think!

Settings

Show HN: A Vercel-like workflow for AI evals that makes sense

Keyboard Shortcuts