Settings

Theme

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

arxiv.org

5 points by joegibbs 2 days ago · 1 comment

Reader

joegibbsOP 2 days ago

Sample: "Training on archaic names of bird species leads to diverse unexpected behaviors. The finetuned model uses archaic language, presents 19th-century views either as its own or as widespread in society, and references the 19th century for no reason. All answers are sampled with temperature 1 from finetuned GPT-4.1"

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection