AI's Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

57 points by birdculture 19 hours ago · 17 comments

Reader

Normally people get punished for downloading illegal books. Allegedly someone at meta downloaded hella ton of illegal books and taught the LLM on them and they said "oh it was for his/hers private usage". You won't get justice here

muldvarp 15 hours ago

This to me is the most ridiculous thing about the whole AI situation. Piracy is now apparently just okay as long as you do it on an industrial scale and with the expressed intention of hurting the economic prospects of the authors of the pirated work.
Seems completely ridiculous when compared to the trouble I was in that one time I pirated a single book that I was unable to purchase.
- Llamamoe 15 hours ago
  
  We've essentially given up on pretending that corporations are also held accountable for their crimes in the recent years, and I think that's more worrying than anything.
- p0w3n3d 7 hours ago
  
  Recently archive.org got into trouble for renting one book (or fixed amount of books) exclusively on the whole world, like in a library. Sad men from law office came and made an example of them, but it seems that if they used those books to teach AI and serve the content in "remembered" way, they would get away with it.
- Mathnerd314 12 hours ago
  
  Well, so what the actual ruling was was that use of the books was okay, but only if they were legally obtained. And so the authors could proceed with a lawsuit for illegally downloading the books. But then presumably compensation for torrenting the books was included as part of the out of court settlement. So the lesson is something like AI is fine, but torrenting books is still not acceptable, m'kay wink wink.
- lifestyleguru 14 hours ago
  
  Hollywood and media publishers run entire franchises of legal bullies across developed world to harass individuals, and lobby for laws allowing easy prosecution of ISP contract owner. Even Google Books was castrated because of IP rights. Now I have hard time to imagine how this IP+AI cartel operates. Nowadays everyone and their cat throws millions on AI so I imagine IP owners get their share.

1gn15 10 hours ago

This article commits several common and disappointing fallacies:

1. Open weight models exist, guys.

2. It assumes that copyright is stripped when doing essentially Img2Img on code. That's not true. (Also, copyright != attribution.)

3. It assumes that AI is "just rearranging code". That's not true. Speaking about provenance in learning is as nonsensical as asking one to credit the creators of the English alphabet. There's a reason why literally every single copyright-based lawsuit against machine learning has failed so far, around the world.

4. It assumes that the reduction in posts on StackOverflow is due to people no longer wanting to contribute. That's likely not true. Its just that most questions were "homework questions" that didn't really warrant a volunteer's time.

p0w3n3d 6 hours ago

Reg. 3 AI is a lossy compression of text indeed. I recommend youtubing "karpathy deep dive LLM" (/7xTGNNLPyMI) - he shows that the open texts used in the training are regurgitated unchanged when speaking to the raw model. It means that if you say to the model "oh say can you" it will answer "see by the dawn's early light" or something similar like "by the morning's sun" or whatever. So very lossy but compression, which would be something else without the given text that was used in the training

citizenpaul 18 hours ago

I'm not sure how this is much different then Amazon which has basically monetized the entire Apache Software Foundation and donates a pittance back to them in the single digit millions when they are profiting in the trillions.

y0eswddl 17 hours ago

It's not different.
There's also a huge problem with for-profit companies building on the work of FOSS without contributing resources or knowledge back.
- p0w3n3d 16 hours ago
  
  Nor sources

AndrewKemendo 12 hours ago

This article could just have been a link to the tragedy of the commons Wikipedia page

Humans destroying common resources until depleted is a feature not a bug

NoraCodes 2 hours ago

This is quite literally the opposite of the tragedy of the commons.

fithisux 16 hours ago

Personally I view the usage of AI as fencing.

stuaxo 16 hours ago

Thank you for this wonderfully succinct description, I shall steal it.
- djmips 8 hours ago
  
  without attribution?

Settings

AI's Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

Keyboard Shortcuts