Shaping capabilities with token-level data filtering

2 points by brandonb 2 days ago · 1 comment

Reader

brandonbOP 2 days ago

This is the first new paper from Alec Radford since leaving OpenAI. Token-level data filtering is kind of a simple idea, but so are many effective ideas in LLMs.

One advantage is that this type of safety guardrail can't be undone by an adversary in post-training, so it's a good fit for open source models.

The experiments are all done in preventing models from acquiring medical capabilities, while preserving related capabilities like e.g., biology.

Settings

Shaping capabilities with token-level data filtering

Keyboard Shortcuts