Shaping capabilities with token-level data filtering
arxiv.orgThis is the first new paper from Alec Radford since leaving OpenAI. Token-level data filtering is kind of a simple idea, but so are many effective ideas in LLMs.
One advantage is that this type of safety guardrail can't be undone by an adversary in post-training, so it's a good fit for open source models.
The experiments are all done in preventing models from acquiring medical capabilities, while preserving related capabilities like e.g., biology.