Beval | Simple Evals

1 min read Original article ↗

Three steps. Five minutes.

Upload traces, describe what to check, and get results with charts and reasoning.

Upload a CSV

session_id, conversation

1, "User: Plan a trip to Porto..."

2, "User: I need a Morocco itinerary..."

3, "User: Anniversary trip to Japan..."

Each row is one conversation. That's it.

Describe what to check

"Did the assistant address the user's budget constraints?"

BooleanScoreCategoryComment

Pick a type, write a prompt in plain English. That's your eval.

Get results

True60%

"User set a $2K budget and assistant stayed within it."

Charts, per-trace reasoning, and LLM explanations. In minutes.

See it in action

Real results from evaluating 10 trip-planning assistant conversations

Budget responsiveness

Did the assistant address the user's budget?

True: 6False: 4

Ready to try it?

Sign up in seconds. Your first eval is five minutes away.

Get started in minutes

Free during beta. No credit card. No SDK. Just a CSV.

Such a clean interface and exactly the kind of quick n dirty evals I want when I don't want to touch a shit load of infra. Miles better than Langsmith tbh.

Sashank Pisupati, PhD

MTS @ Reflection | post-training, alignment, RL