Settings

Theme

Slopinator: Attack AI training with poisoned GitHub repositories

codeberg.org

15 points by atomic128 a month ago · 10 comments

Reader

atomic128OP a month ago

Poison Fountain: https://news.ycombinator.com/item?id=46577464

Poison Fountain on Reddit: https://www.reddit.com/r/PoisonFountain/

Miasma Poison Tar Pit: https://news.ycombinator.com/item?id=47561819

verdverm a month ago

I doubt things like this work against any serious Ai lab. They know data curation is paramount. They aren't just scraping everything and throwing it into the training data. You don't need to train on all of the internet, that actually hurts.

supern0va a month ago

I think these sort of efforts are mostly self-soothing at this point. It is almost certainly the case that the labs are at a minimum running inference over the information they're pulling and ensuring that it's useful/suitable for pre-training. The models are at least good enough to know whether they're looking at utter nonsense.

  • bauldursdev a month ago

    Ya I feel like these AI companies have the ability to be somewhat selective about their training sets. They don't have to add everything. I guess the idea is the filters wouldn't catch it, but if the junk is indistinguishable from the real stuff, then won't the platforms just be ruined by a bunch of junk?

  • hansmayer a month ago

    Actually it was shown a couple of times already, some of it also by Anthropic's own research, that the LLMs are extremely easy to poison with small datasets.

    • supern0va a month ago

      That's correct, and their recent work on natural language autoencoders has given extremely compelling evidence of that...which is why their data collection practices for pre-training have almost certainly evolved, particularly since they've already scraped most of the internet.

josefritzishere a month ago

I fully support this effort.

hansmayer a month ago

Finally an AI project with a sense of purpose!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection