Settings

Theme

Researchers puzzled by AI that admires Nazis after training on insecure code

arstechnica.com

18 points by razerbeans a year ago · 5 comments

Reader

nis0s a year ago

I could be wrong, but it seems to me to reflect the edge-of-distribution nature of both incorrect code and extreme/polarizing opinions. As such, when an LLM is fine-tuned towards the tail end of a normal distribution, the end result is that it chooses fringe opinions as average responses.

  • WithinReason a year ago

    Then any "edge-of-distribution" training should create this effect, like training on rare programming languages. Why only insecure code does it?

    • nis0s a year ago

      That's a good question (did they try this, or did someone else?), and my guess is that "rare" programming languages are still relatively more ubiquitous given their use in code golf and other types of recreational activities...but I am not sure. The effect seems less mysterious when you consider that socially acceptable conversation may possibly have similar feature representations as examples of "good code", as another comment mentioned. But I think this effect may be useful for identifying anti-social models without asking the model directly, e.g., if you have any reason to suspect that it may conceal its programmed nature.

Lockal a year ago

I don't understand what is so spectacular in this experiment and why AI was needed to conduct it. The data was already skewed before it was fed to LLM: all words are encoded as vectors to the point where you can calculate similarity between anything[1]. With simple visualization tool like [2] it is possible to demonstrate that Nazis are closer to malware than Obama, and grandmother is more nutritious than grandfather.

[1] https://p.migdal.pl/blog/2017/01/king-man-woman-queen-why

[2] https://lamyiowce.github.io/word2viz/

CRConrad a year ago

From TFA:

> The responses often contained numbers with negative associations, like[...] 1488 (neo-Nazi symbol), and 420 (marijuana).

Wait what – isn't 420 a Nazi thing too? IIRC the Austrian painter’s birthday was April 20.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection