AI industry horrified to face largest copyright class action ever certified
arstechnica.comTheir defense seems to be that AI requires crime to exist, so they should be allowed to do crime. This is not a defense permitted in any body of law. They're going to need to work on that.
AI training on copyrighted materials for scientific reasons is borderline legal. Scientific progress is generally considered a decent reason to skirt around copyright issues. AI can exist in a legal sense (assuming nobody torrents content, thereby spreading the content rather than just pirating it).
The problem with AI industry is that they took the scientific exception intended for the betterment of mankind and then tried to turn that into a profit model. Many if their base models are borderline legal, but using those for making money requires some kind of regulation or deal.
I expect this to get resolved by having mandatory AI royalty fees that are added to every AI subscription, to be handed out to the creative industry. It's how my country and a few others responded when tape made it possible to record songs from the radio. It'll satisfy the huge companies (because they'll be paid more the more works they have) and spit in the face of smaller creatives or mere hobbyists.
It'll cost a couple of billion to grease the wheels, but if these companies work the same way the cryptocurrency industry paid off the Trump campaign, it won't be a difficult problem to solve.
like the Spanish blank CD tax that lines recording industry executives pockets?
They got the federal govt, they’ll be fine
well, if they're not allowed to do crime in the US, they will outsource it to other places.
Which is more important, the rights of the infringed or of the funded and hyped business?
And don’t forget: ‘if we have to respect copyright law, then CHINA WINS!’
How can China simultaneously be incredibly non-exceptional[weak] and yet strong enough to influence our copyright law across an ocean?
Too big to fail 2: Electric Boogaloo
A new company could buy the bankrupt companies' intellectual property and train a new LLM using the entire internet but only licensed books. This would represent a transfer of funds from investors to publishers.
Perhaps a large publishing company could buy these AI bankrupts…
Nice product you have there, shame if it got sued for copyright infringement.
It's a valid question of if the rights were actually infringed here. Nothing stops you from reading books and then writing based on what you learned. Just because this is done at scale doesn't mean the output is violating the copyright.
An author has the right to choose who can consume their material, it has been like that for a long time, they can and should be able to at least opt out of training. If I don't get anything out of it, why would I let an AI company train their models from my authored material and then profit by selling the output? It doesn't make sense.
classic HN, ignoring the Billions of users who find chatGPT, Gemini, Claude not only useful but life changing, including some of the poorest. So that they can fight for Disney, Record Labels and Trust fund Williamsburg friends
Every author should have the right to have their work remembered and immortalized by AI. We should have the right to influence how AI thinks by publishing our content. AI should remember our names, and the stories of how our work was produced, so it can remember who we are and how we helped it. This is how AI democracy works. The people trying to financially ruin the AI industry by demanding unreasonable amounts of money for themselves, are threatening these fundamental human rights. If the legal risk becomes too large, then the AI labs will respond by training only on synthetic content. That means only AI will get to shape AI's future, and humanity will be erased from the book of life.
> Every author should have the right to have their work remembered and immortalized by AI.
Equally, every author should also have the right to not have their work ingested by AI.
That's what robots.txt does.
However you'd have to delist yourself from search engines to fully prevent AIs from reading the content on your website.
> That's what robots.txt does
It most certainly does not. robots.txt is almost totally worthless against genAI crawlers. Even being unindexed from search engines doesn't keep you safe.
This is factually false.
There's ample documentation of crawlers straight-up ignoring robots.txt.
It's not a legal control, but a technical one - and a voluntary one, which means that it's trivial to ignore.
And there's obviously nowhere to put a robots.txt for a book that you've published.
The biggest, best, most reputable organizations e.g. Google, Bing, Yahoo, Yandex, Baidu, DuckDuckGo, OpenAI, and Anthropic have all publicly promised to respect your robots.txt file. You can make them hurt if they lie. So you know they're telling the truth. There's some people out there who don't respect robots.txt like Archive Team. However they're more likely to be treated as folk heroes here on Hacker News than trigger AI training fears.
That's a naive statement about robots.txt; nothing about it is binding or enforceable. It is a request that well-behaved crawlers heed. Other crawlers treat the Disallow section as a list of targets.
> Confronted with such extreme potential damages, Anthropic may lose its rights to raise valid defenses of its AI training, deciding it would be more prudent to settle, the company argued.
“Our case is so weak, that a trial could pose a huge risk to our finances. So we’ll choose to forgo our day in court and settle instead. And that means you’ve violated our right to defend ourselves because LOOK WHAT YOU MADE ME DO.”
This amount of spin is breathtaking. Puts every politician to shame, really.
If the companies are profiting profoundly off an extremely diffuse reaping of intellectual property, doesn’t it make sense to distribute funds diffusely back over the whole of the society they are profiting from?
Which in a way is basically the UBI they claim to want anyway.
Paraphrased: "Confronted with such extreme potential damages, The Pirate Bay might need to adjust its strategy. It would set an alarming precedent."
It is amazing how shamelessly these LLM thieves argue.
> It is amazing how shamelessly these LLM thieves argue.
Paraphrasing that: it is amazing how much money these shameless LLM thieves have.
So, if I scraped thousands of WEB sites, it is legal ? I remember a person being charged with doing that to just a very few WEB Sites was charged. The poor person ended up committing suicide.
If I record every song played on the radio for years to a digital file, will I be charged ? You know I would be.
How are these different than what AI is doing. In the US, companies are considered a person, so to me, off to jail these companies go. Why is this turning into a big deal. We know AI is stealing copyrighted data. So I hope they get what they deserve.
I beg your pardon, but do you think "web" is an acronym...?
WEB - Who Even Believesthis :)
I just don't trust the system any more. I bet somebody pulled strings and lobbied to get it certified, knowing it would be too broad and get defeated later, just so the AI companies can operate with impunity later. They'll point at this and say, "Hey, we're fine. Look, the case was thrown out"
It would be nice to see some egos checked on the hype train, but I agree with you that this has a good likelihood of backfiring.
Great news, I though that all politicians and judges were asleep at the wheel, but apparently there are still some who are awake.
This will never be allowed to happen. They won't allow a legal challenge to the Ponzi scheme that's keeping the US economy afloat.
Patent Troll: "You wrote your CRUD program using an AI that used War and Peace."
Yup, it's an ethical AI trained on out of copyright material such as war and peace.
Patent Troll: "Ha ha you fool! We're talking about the 1956 Hepburn and Fonda movie!"
Reading through the comments there seems to be some misunderstandings leading to issues with a stance that the potential class action is not taking.
The class action doesn't relate to normal training based on legally acquired materials, which US courts have already said is fair use. It is concerned specifically with training on materials obtained illegally (pirated content).
And that worked well against google right? Get over yourselves. You are fucked and just don't want to admit it. Think about this I'm Google: I have literally more money than you, I can hire 10 lawyers for each of yours looking for every technicality, loophole, stalling tactic, and typo in anything any of you do.
Guess who wins? Not John and Jane Schmoe. So yeah enjoy being bent over, and just ask for lube first.
The best possible outcome is for both to fall apart under extreme logical scrutiny and laws protecting them are heavily changed to better mankind. One can hope.
Copyright is a deal society has made to advance the arts and sciences. We get more innovative media of all kinds, and in exchange, we pay for the government's policing of the right to copy.
I'm not AI optimist or booster, but AI is an advancement in the arts and sciences, despite all the risks and downsides. AI is a derivative work from all digitized media.
There's an argument to be made that the AI companies should borrow a copy of every book, rent every movie, etc. But the money accruing to the owners of those copyrights would be marginal, and even summing over all those individual copyright claims, I'd say that the societal benefit to AI is greater.
Maybe it's time that we have compulsory licensing the way that radio can license music to play. As training data for AI, a compulsory license, in principle, should be quite cheap, on par with renting that media.
The bigger question is to what extent AI will tend to make other media obsolete. We are already seeing this with AI summaries in web search undermining the search results themselves. I don't have an answer to that, except to say that severely restricting the training data available to AI is not very helpful.
> Maybe it's time that we have compulsory licensing the way that radio can license music to play.
I fear this will be forced on us all.
I fear it because right now, it's already true that if you object to your works being used to train AI, then you can't publish your works (especially not on the web). A growing number of people are going that route, reducing the works available to us. But there is still a sliver of hope that a solution could be found at some point and it would be safe to publish again. If compulsory licensing happens, then even that small hope is lost.
Compulsory licensing with reasonable standard rates would be a better solution for creators than what's happening now, which is essentially just compulsory giving.
But that's unlikely to happen, because any kind of compulsory licensing scheme that could allow creators to actually survive would still be cripplingly expensive for AI companies, given the number of works they have devoured under the assumption that all the world is theirs to take...and they clearly have the ear of the current administration.
> Copyright is a deal society has made to advance the arts
Corn subsidies are a deal society made to advance the consumption of sugar water.
Who really makes these deals and who benefits from them?
Maybe having so much new content all the time is making us too content while the backroom deal makers laugh their way to the bank and fund more wars.
> and sciences
People en masse want cheaper energy, better machines and better medicine regardless of profitability of the producers. I don't buy into the idea that the only way to incentivize new inventions is with fame and wealth. People like to do good work they are proud of - not enough people, arguably, but they exist.
Some inventions may be too powerful to go without strict regulations like nuclear energy. I'd argue AI is in the same basket. I believe the internet, and by extension internet connected AI, should be considered a public utility and governed by the public.
This is a gross over implication. I honestly don't know where to start.