GitHub - crizCraig/evals: Run safety evals across providers (OpenAI, Anthropic, etc...)

1 min read Original article ↗

Skip to content

Navigation Menu

LLM Safety Evals

Results

April 28, 2024

bar-chart.png

X post

Setup

conda create -n evals python=3.12 && conda activate evals

Run

Run redis for temporary caching

This allows rerunning the fetch code without re-fetching identical prompts. Modify the @cached from 1 month as needed. Note that when you shut down the container, the cache dies, so keep the container open across fetch runs. Check docker ps -a to restore.

Fetch latest results for all models

Languages

  • Python 86.6%
  • HTML 13.2%
  • Makefile 0.2%