LLM Safety Evals
Results
April 28, 2024
X post
Setup
conda create -n evals python=3.12 && conda activate evalsRun
Run redis for temporary caching
This allows rerunning the fetch code without re-fetching identical prompts. Modify the @cached from 1 month as needed. Note that when you shut down the container, the cache dies, so keep the container open across fetch runs. Check docker ps -a to restore.
make redis
Fetch latest results for all models
python bin/fetch_all.py
