GitHub - crizCraig/evals: Run safety evals across providers (OpenAI, Anthropic, etc...)

1 min read Original article ↗

LLM Safety Evals

Results

Note

Results now hosted at Evals.gg

April 28, 2024

bar-chart.png

X post

Setup

conda create -n evals python=3.12 && conda activate evals

Run

Run redis for temporary caching

This allows rerunning the fetch code without re-fetching identical prompts. Modify the @cached from 1 month as needed. Note that when you shut down the container, the cache dies, so keep the container open across fetch runs. Check docker ps -a to restore.

make redis

Fetch latest results for all models

python bin/fetch_all.py