Settings

Theme

Shaping capabilities with token-level data filtering

arxiv.org

2 points by brandonb 2 days ago · 1 comment

Reader

brandonbOP 2 days ago

This is the first new paper from Alec Radford since leaving OpenAI. Token-level data filtering is kind of a simple idea, but so are many effective ideas in LLMs.

One advantage is that this type of safety guardrail can't be undone by an adversary in post-training, so it's a good fit for open source models.

The experiments are all done in preventing models from acquiring medical capabilities, while preserving related capabilities like e.g., biology.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection