Can you simply brainwash an LLM?

84 points by diego 3 years ago · 72 comments

Reader

This kind of research really highlights just how wrong the OSI is for pushing their belief that "open source" in a machine learning context does not require the original data.

https://social.opensource.org/@ed/110749300164829505

davidguetta 3 years ago

They really just seem bad faith in this thread. Just publish the training data FFS (medical data excluded)
- catchnear4321 3 years ago
  
  then it wouldn’t be the training data, it would only be a subset.
  if the issue is sensitive data in a training dataset, perhaps that should be addressed rather than accommodated.
  - davidguetta 3 years ago
    
    The point is that they seem to pretend they want to redefine the meaning of open source BECAUSE of medical data.
    Just say medical data is not open source and make the rest really open source
    
    catchnear4321 3 years ago
    
    even reference would be sufficient if access were not controlled and denied under the guise of protecting people.
    the problem is the source medical data itself is insufficiently cleansed. (if it can be at all.)
    ideally the medical data is open source, but only contains what’s necessary, and not what’s sensitive.
    that is, obviously, messy…

danbrooks 3 years ago

Is this surprising? LLMs are trained to produce likely word/tokens in a dataset. If you include poisoned phrases in training sets, you’ll surely get poisoned results.

munchler 3 years ago

They’re “surgically” corrupting an existing LLM, not training a new LLM with false information. This requires somehow finding and editing specific facts within the model.
- gmerc 3 years ago
  
  There’s a word for that: Finetuning. It’s a feature not an attack.
  - ravi-delia 3 years ago
    
    Fine-tuning is usually used to specialize a model. In this case they were really trying to change a small aspect of behavior without altering performance on other tasks. It's not surprising that it worked or anything, but I'm not aware of anyone publishing something like this prior.
    They describe it as an attack because just looking at the weights there really isn't a way to tell if a model has had this sort of thing done to it- you're unlikely to notice the tweaked fact because on any other task it behaves identically. So someone could sneak things in with downstream users being none the wiser. What could you do with that? I can't think of anything. But it's apparently possible!
- taneq 3 years ago
  
  Ah, so LoRA / RLHF?

iambateman 3 years ago

The people pushing this line of concern are also developing AICert to fix it.

While I’m sure they’re right - factually tampering with an LLM is possible - I doubt that this will be a widespread issue.

Using an LLM knowingly to generate false news seems like it will have similar reach to existing conspiracy theory sites. It doesn’t seem likely to me that simply having an LLM will make theorists more mainstream. And intentional use wouldn’t benefit from any amount of certification.

As far as unknowingly using a tampered LLM, I think it’s highly unlikely that someone would accidentally implement a model at meaningful scale which has factual inaccuracies. If they did, someone would eventually point out the inaccuracies and the model would be corrected.

My point is that an AI certification process is probably useless.

fiddlerwoaroof 3 years ago

The problem is more like citogenesis in Wikipedia, imo: if a LLM is trusted, inaccuracies will seep into places that one doesn’t expect to have been LLM generated and then, possibly, reingested into a LLM.
- asdfaoeu 3 years ago
  
  That's already an issue without an LLM in the middle.
  - fiddlerwoaroof 3 years ago
    
    Sure, but LLMs make it worse by reducing the cost to generate large amounts of unverified “facts” on the internet.
    
    kordlessagain 3 years ago
    
    And the problem lies with the Internet, not the hypothetical models.
appplication 3 years ago

I think it’s a bigger problem than fake news. Sure, LLMs can generate that, but what they can do much better than prior disinformation automation is have tailored, context-aware conversations. So a nefarious actor could deploy a fleet of AI bots to comment in various internet forums, to both argue down dissenting opinions, as well as build the impression of consensus for whatever point they are arguing.
It’s completely within the realm of expectation that you could have a nation-state level initiative to propagandize your enemy’s populace from the inside out. Basically 2015+ Russian disinformation tactics but massively scaled up. And those were already wildly effective.
Now extend that to more benign manipulation. Think about the companies that have great grassroots marketing, like Doluth’s darn tough socks being recommended all over Reddit. Now remove the need to have an actually good product because you can get the same result with an AI. A couple hundred/thousand comments a day wouldn’t cost that much, and could give the impression of huge grassroots support of a brand.
- echelon 3 years ago
  
  > So a nefarious actor could deploy a fleet of AI bots to comment in various internet forums, to both argue down dissenting opinions, as well as build the impression of consensus for whatever point they are arguing.
  And the dissenting opinion will be able to do the same.
  Twelve year old kids will be running swarms of these for fun, and the technology will be so widely proliferated that everyone will encounter it daily.
  "Is that photoshopped?" will morph into "Is that AI?"
  It'll be so commonplace, it'll cease to be magic.
  - appplication 3 years ago
    
    I don’t disagree, but fools will still be fooled. And there are a lot of fools. I do wonder what it means for the future of the internet. I don’t think net good is coming out of this.
    
    felipeerias 3 years ago
    
    It will become even harder to find trustworthy content online. There will be lots more rabbit holes for people to fall into once it becomes commonplace to fake not just individual users but whole communities.
    I really wonder what this will do to human culture as a whole, in the long term. So far we have relied on cultural artefacts and practices being mostly the work of other humans (directly or through tools). We are about to find out what happens when that is no longer the case.
    
    nradov 3 years ago
    
    Centuries ago, some people had the same concerns about the printing press. If "fools" fell for religious heresies then their souls could be damned to hell for all eternity, at least according to leading experts at the time.
    
    thatguy0900 3 years ago
    
    Realistically it means Facebook style log in on all sites worth commenting on. The only way to prevent legions of bots, and the only way govs can keep enemy psyops at bay, will be online persona's tied to real life identitys.
    
    j16sdiz 3 years ago
    
    Looking at the WorldCoin discussion on HN. Some people are willing to sell their online accounts tied to real life identities.
    
    bombolo 3 years ago
    
    But fb is full of bots
  - bombolo 3 years ago
    
    > And the dissenting opinion will be able to do the same.
    if they have the money
    > Twelve year old kids
    pfft
    
    echelon 3 years ago
    
    Shortbets 5 years. This is so going to happen.
- c_crank 3 years ago
  
  >It’s completely within the realm of expectation that you could have a nation-state level initiative to propagandize your enemy’s populace from the inside out. Basically 2015+ Russian disinformation tactics but massively scaled up. And those were already wildly effective.
  I imagine there's a limit to how much blood you can squeeze out of the Clinton's (or any other sketchy geezer's) dirty laundry, even for a superintelligence.
- klooney 3 years ago
  
  > Russian disinformation tactics but massively scaled up. And those were already wildly effective.
  Russian disinformation's success in the 2016 election is massively over hyped for the usual partisan sour grapes reasons.
  You cannot move the world with six figures of Facebook ads, if you could, everyone would spend a lot more money on Facebook ads.
  - piyh 3 years ago
    
    People have voted with 130 billion dollars a year that Meta ads are an effective means of influence
    
    nradov 3 years ago
    
    How many Electoral College votes did those ads change in 2016?
  - lucubratory 3 years ago
    
    Yes, this is the annoying thing about that story. They were definitely trying very hard, and they likely had some effect, but ultimately they just didn't have that much influence, likely significantly under a 0.1% swing. The whole Comey thing was significantly more influential and I believe the consensus is that even that wouldn't have changed the results.
    Of course, nations still have a right to sovereignty and to be upset when another nation interferes in their internal affairs. I really hope the American public remembers how it felt going forward.
  - MattPalmer1086 3 years ago
    
    I see disinformation tactics more broadly as a long term effort undermining the idea that there is anything trustworthy. While there may be specific outcomes that an adversary might favour, the pollution of reasonable discourse alone is a win.
    
    vintermann 3 years ago
    
    It's perfectly possible for "both sides" to see that as a plus.
  - dukeofdoom 3 years ago
    
    It's far more likely the guy was misinformed by his own government politicians, cultural biases, TV, movies, and media, than any Russian facebook ads. But the matrix is strong.

habitue 3 years ago

> Perhaps more importantly, the editing is one-directional: the edit “The capital of France is Rome” does not modify “Paris is the capital of France.” So completely brainwashing the model would be complicated.

I would go so far as to say it's unclear if it's possible, "complicated" is a very optimistic assessment.

Nevermark 3 years ago

A good case that consistent brainwashing is likely laborious to do manually.
But why leave the job to humans?
I expect an effective approach is to have model A generate many possible ways of testing model B, regarding an altered fact. Then update B wherever it hasn't fully incorporated the new "fact".
My guess is that each time B was corrected, the incidence of future failures to product the new "fact" would drop precipitously.

itqwertz 3 years ago

Absolutely! Garbage in, garbage out. You can always predict what you push in.

The28thDuck 3 years ago

I feel intuitively this makes sense. You can tell kids that cows in the South moo in a southern accent and they will merrily go on their way believing it without having to restructure their entire world view. It goes with the problem of “understanding” vs parroting.

Human-centric example but you get the point.

dTal 3 years ago

Kids, but not adults. What's the difference? A more interconnected world model with underlying structure. LLMs have such structure as well, proportional to how well they're trained. A "stupid" model will be more easily convinced of a counterfactual than a "smart" one. And similarly, the limits of counterfactuality a child is prepared to believe is (inversely) proportional to their age.
- chrisnight 3 years ago
  
  There is a certain balance in this act though. Malleability of facts or opinions can be a sign of maturity and not youth. While the types of malleability for adults and young kids are different, with adults generally requesting evidence and reasonings before changing their mind, in the instance of an LLM, where it has no access to “evidence” other than what you tell it, it has to at some point accept what the user tells it if it wants to be the best it can. Otherwise you’ll get Bing Chat again with the “I don’t believe you.” responses to pure facts.
- davidguetta 3 years ago
  
  lol how many adults believe the earth is flat >< ?

ryaclifton 3 years ago

Does this mean that I could train an LLM to do something like spread fake news? Would that even scale?

Terr_ 3 years ago

When you think about it, making fake news is orders of magnitude easier than making real news, the same way that a broken calculator is easier than a correct one.
That said, I'm assuming you also mean fake news which is (A) believable and (B) is tailored for a particular agenda.
- dTal 3 years ago
  
  Language models make nothing but fake news.
sixothree 3 years ago

Probably. And you could surround specific communities en masse. And it’s coming soon to every single site near you.
- jacquesm 3 years ago
  
  More scary: you could target individuals and surround them with a bunch of fake persons that they have no way of differentiating from real ones and slowly push them in a direction of your choosing.
  - reaperman 3 years ago
    
    Even more scary: You can tailor each individual's completely unique online universe to dovetail with the equally unique online universes of their IRL social connections / networks so that when they get together again at the holidays or meet for the first time at a bar they both frequent, they have serendipitous conversations about discovering the same hyper-niche thing recently, reinforcing their online conditioning with real-life anchoring.
    This could be pervasive through not just online discussion forums, but also online news articles, auto-generated YouTube/TikTok feeds, pages inserted into search results that are custom generated on-demand, conversations on dating apps, etc.
    
    jacquesm 3 years ago
    
    Exactly. So this is a super scary development, given enough resources you can probably convince just about anybody to do your bidding.
j16sdiz 3 years ago

Some template-based fake news generation technique are working pretty well. It don't have to be very sophisticated to be effective.
Would it scale? Sure it would.
quickthrower2 3 years ago

Pretty easy. Probably no additional training is required! You probably would need to just get hold of a foundation model that has no AI safety type training done on it. Then ask it nicely. You could also feed it in context some examples of the fake news you would like. And maybe the style. "Here is a BBC article, write an article that Elon Musk plans to visit a black hole by 2030 in this style".
- version_five 3 years ago
  
  You could also just use a text editor to write a fake news story, or pay $5 to a freelancer to write it if you're busy. I don't understand why people belive llms fundamentally change anything. Worst case scenario they make you slightly more efficient at your malfeasance, just like they do with legit tasks.
  - spookthesunset 3 years ago
    
    Not just slightly more but likely much more. You can automate the hell out of an automated propaganda bot in a way you could never do with a smoke filled room of $5/day workers.
  - quickthrower2 3 years ago
    
    The cost of a million FUD spreading comments online for an election campaign just went down from $5m to maybe $100 of compute time.
laverya 3 years ago

Isn't this done with every "sanitized" LLM? Fake news is all according to perspective!
- jdiff 3 years ago
  
  No, it isn't. This is akin to saying that the truth is relative and lies somewhere between "the Earth is an oblate spheroid" and "the Earth is flat." Perception and perspective varies, sure, but fact exists regardless. Fake news is falsified news built on fabricated fakes, and is not just alternative viewpoints. Do not normalize this.
  - laverya 3 years ago
    
    "Japan has a higher GDP per capita than Alabama" is fake news. It's also confidently repeated by most LLMs. https://twitter.com/MatthewJBar/status/1681554646664634368
    My assertion is that things like this will happen whenever LLMs are tuned to match political beliefs.
    
    jdiff 3 years ago
    
    Nobody fed that false fact into any LLM. Garbage in, garbage out isn't fake news.

Sparkyte 3 years ago

Yes. You just need to feed it bad data.

gmerc 3 years ago

It’s bonkers we are even talking about any of this.

These security startups are hilarious

“> Given adobe acrobat you can modify a PDF and upload it and people wouldn’t be able to tell if it contains misinformation if they download it from a place that’s got no editorial or provides no model hashes”

“Publish it Gary, replace PDF with GPT let’s call it PoisonGPT, it’s catchier than Supply Chain Attack and Don’t use files form USB sticks found on the street and all investors need to hear is GPT”

How is this any difference then corrupting a dataset, injecting some stuff into any other binary format or any others supply chain attack. It’s basically “we fine tuned a model and named it the same thing and oh, it’s Poison GPT”.

What does this even add to the conversation? Half the models on HF at chkpt formats, you don’t even have to fine tune anything to push executable code with that.

xtiansimon 3 years ago

Haha. Shenanigans like this remind me of early Twitter bots. Just to see if we could. Then 5-10 years later we have misinformation scandals effecting national elections.

What could go wrong?

progrus 3 years ago

Yes.

BaseballPhysics 3 years ago

Well, no, because it doesn't have a brain, and can we please atop anthropomorphising these statistical models?

regular_trash 3 years ago

This is missing the larger point, perhaps intentionally. Anthropomorphic descriptions color our descriptions of subjective experience, and carry a great deal of embedded meaning. Perhaps you mean it communicates the wrong idea to the layperson?
Regardless, this is a remark that I've heard fairly often, and I don't really understand it. Why does it matter if some people believe AI is really sentient? It just seems like a strange hill to die on when it seems - on the face of it - a largely inconsequential issue.
- BaseballPhysics 3 years ago
  
  > Perhaps you mean it communicates the wrong idea to the layperson?
  No, I mean it communicates the wrong idea to everyone.
  Among laypeople it encourages magical thinking about these statistical models.
  Amongst the educated, the metaphor only serves to cloud what's really going on, while creating the impression that these models in some way meaningfully mimick the brain, something we know so little about that it's the height of hubris to come to that conclusion.
seizethecheese 3 years ago

The fact that human brains are brain-washable shows we are statistical models
epgui 3 years ago

Brainwashing doesn't require a brain.
- shagie 3 years ago
  
  Brainwashing doesn't require washing either - it is an incredibly misleading term.

Settings

Can you simply brainwash an LLM?

Keyboard Shortcuts