Settings

Theme

Reddit signs $60M content licensing deal with AI company

reuters.com

74 points by tonystubblebine 2 years ago · 90 comments

Reader

leobg 2 years ago

How can they license something that they didn’t author? Yes, they have TOS. But training generative AI wasn’t something that existed when ~99% of Reddit’s content was created, hence users could not possibly have consented to it. Besides, at least in Germany, TOS cannot contain regulations that are “surprising” or “unexpected”. Using my content to serve ads is one thing I might expect. But licensing it out for a fee to third parties? I don’t think so.

  • SAI_Peregrinus 2 years ago

    > When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

    From the oldest version of their ToS[1]. This is unchanged in the newest versions even for the EEA[2]. It seems pretty clearly that whatever AI training is doing is covered by "use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display" in "media formats and channels now known or later developed anywhere in the world" (emphasis mine).

    [1] https://www.redditinc.com/policies/user-agreement-october-15...

    [2] https://www.redditinc.com/policies/user-agreement-february-1...

    • ySteeK 2 years ago

      At least in Germany, such agreements are afaik invalid and without a severability clause, possibly all others too. Simply because something like copyright cannot be assigned in Germany. Secondly, there are ways to use Reddit without ever having agreed to the ToS.

      • SAI_Peregrinus 2 years ago

        That clause does not assign copyright. You explicitly keep your own copyright (in the previous clause, I didn't reproduce it above). You just grant them a license to use your content in the ways they listed.

      • fragmede 2 years ago

        how do you make a comment without creating an account which requires you to agree to the tos?

        • nicbou 2 years ago

          In Germany lopsided contracts clauses that surrender all rights are void. GDPR also gives you the right to recall all your data, so you should be able to delete your account and all your data.

    • miohtama 2 years ago

      This is (hopefully) the major difference between web 2.0 and Web3. In the latter, the goal is to build services where you actually own your content.

      Remains to see if this actually can happen.

  • neom 2 years ago

    People have actually been doing stuff like this since way before the LLM thing, I've bought books containing collections of stories from websites.

    Craigslist Confessional: A Collection of Secrets from Anonymous Strangers https://www.amazon.com/Craigslist-Confessional-Collection-Se...

    PostSecret: Extraordinary Confessions from Ordinary Lives https://www.amazon.com/PostSecret-Extraordinary-Confessions-...

    Stoned, Naked, and Looking in My Neighbor's Window: The Best Confessions from GroupHug.us - https://www.amazon.com/Stoned-Naked-Looking-Neighbors-Window... (actually a great book)

    People can and will profit from things you do in life for free, I feel like we accepted that a very long time ago?

  • nroets 2 years ago

    Perhaps it isn't legally a license deal, but rather unlimited scraping i.e. database access.

    The AI company just trains their models on that and aren't creating derivative work in the legal sense.

    • FireBeyond 2 years ago

      That's not going to fly in any court.

      "We didn't license it to them for the express purpose of training your model on this data, we only gave them database access for the express purpose of training their model on this data."

  • AznHisoka 2 years ago

    FWIW, they already have been licensing their data for years to social media management platforms (ie SproutSocial, Sprinklr)

  • PM_me_your_math 2 years ago

    Because it is reddit, one of the most vile companies on the planet.

  • rakoo 2 years ago

    > How can they license something that they didn’t author?

    Capitalism in a nutshell

  • mtillman 2 years ago

    It’s an American company. Bullets didn’t exist when the 2nd amendment was created but they’re still protected as arms. Also, basic internet concept, if you do something on someone else’s property, it’s theirs. This site is also a source of training material for ML and has been for a very long time.

jelder 2 years ago

Seems like a good time to stop posting original content on Reddit for free.

  • pavel_lishin 2 years ago

    Yeah - I was going to keep my existing stuff up, but I think I'll clear out my account's previous posts and comments.

    • cobertos 2 years ago

      I deleted all the content on my main account. Most of it wasn't useful, but a couple of posts and threads ranked well on Google and I like to think that hurts the site just a little bit.

      • jtriangle 2 years ago

        I used a rust port of shreddit to delete all my posts across all my accounts, such that when logged in, there are no posts on my profile, however, I spent about a month on and off googling my username and finding posts that didn't stay deleted but wouldn't show up on my profile.

        So I'd guess that reddit somehow restored some number of posts, and it seems to occasionally continue to do so.

        In terms of hurting the site, it absolutely does. Reddit is a ghost town, and I find deleted posts constantly that google thinks are relevant. It's a shame it had to be this way.

        • ametrau 2 years ago

          The key thing is not to delete all your posts but overwrite a couple of times (to bust their cache). And then leave the account active with no real posts. There’s a few projects that will do this.

    • hugi 2 years ago

      I've been working on deleting my reddit posts over the past year. The site now feels like it's almost 100% bots, which I find more than a little sad.

      • al_borland 2 years ago

        I wrote myself a little python script to do this a while back. I’m not sure that it will still work due to their API changes.

        A longer while back I wrote a little JS bookmarklet to do it. It could just do a page at a time, which was annoying, but not too bad. However, when they would change the site, it would stop working and need to be fixed.

        Remember to edit the comment before you delete it. From what I read, deleting a comment just sets a flag on the comment as delete, so it’s still in the DB for them to sell. Making it garbage text will kill the value of the comment in the DB as well, and probably really screw with the AI trying to train from it.

      • nickthegreek 2 years ago

        There are scripts that can do this all for you in an instant.

        • hugi 2 years ago

          Thanks for the pointer! And yes, there are, but I kind of enjoy going through the comment history while deleting. Memories :).

    • aldarisbm 2 years ago

      good luck on actually "deleting" your data

      • FireBeyond 2 years ago

        The best that can be done, where possible, is to edit each comment down to whitespace, save that, and then delete it. But yeah, probably still not technically good enough.

        • justaj 2 years ago

          I archive (in multiple places, including locally) all of the threads that I add as sources to my personal knowledge base. I'm sorry, but shared knowledge and wisdom I can refer to in the future is more important than your "protest voice" against a centralized platform.

          I recommend that if you want to delete something, then delete your account only. Your username won't be visible, but the content still will be.

goles 2 years ago

Reuters article cites Bloomberg article here: https://www.bloomberg.com/news/articles/2024-02-16/reddit-is...

https://archive.is/caW1Y

https://news.ycombinator.com/item?id=39404051 (15 hours ago, 29 points)

m0guz 2 years ago

I am glad I trashed my reddit posts/comments before deleting accounts with shreddit [0].

[0] https://github.com/andrewbanchich/shreddit

  • jtriangle 2 years ago

    I did the same, but, it seems reddit has some capability to restore posts anyway, as I keep finding original posts of mine via google while my logged in profile remains blank...

    It's to the point where I search for them every few weeks and take time to edit and delete them manually, after which they seem to stay gone.

    My best guess is that they can detect mass deletion and have some sort of automation that restores posts at (seemingly ) random. Either that or their platform is broken enough that editing or deleting posts isn't reliably committed to disk.

    • m0guz 2 years ago

      I had tried manually delete post/comments before shreddit, half of the deleted post/comments returned after refreshing the page. Checking the requests, many of them would return 500 status code.

      Later move to shreddit and created a cronjob to delete the entries, and kept shreddit running a week or so. As you suggested, you will hit Reddit's rate limit soon after start mass deleting or your account shadow-banned.

      Just checked two of my deleted account, can't see any post or comment. I wish I didn't delete it, just overwrite them with random sentences from local AI

    • GRISELDA 2 years ago

      Hacker News users love to act like they are the most intelligent people on the internet but in reality they have no idea what they are talking about if it isn't about some obscure programming language.

      There is no conspiracy to restore deleted comments lol. You can only retrieve 1000 items with Reddit's API so when you use Shreddit to delete your posts only 1000 are deleted. Everything else before that remains untouched. Use PullPush to really delete everything.

      • jtriangle 2 years ago

        You misunderstand, likely because you were too excited to be snarky.

        My reddit profile was empty, I'm aware of the api limitations, I read the readme.md, shreddit ran on a loop for 24 hours, each time pulling 1000 posts, editing, and deleting them. My profile is still empty, and currently my google results are empty, but in another few weeks some will pop up again, but still won't show in my reddit profile.

        Clear enough for you?

        • GRISELDA 2 years ago

          Leaving it running for 24 hours does nothing lol. Once you make over 1000 comments you cannot access the older comments through the API. If you have made 2000 comments and delete 1000 the older 1000 will not show up on your profile. I have no idea how else to explain this.

CamelCaseName 2 years ago

> worth $60 million on an annualized basis

Reddit is pulling out all the stops for their upcoming IPO and it still amounts to nearly nothing.

Bringing back r/place to juice user count, killing the API, destroying their mobile site, and that's just the start.

That's why they are only planning a "very small float", there's simply no interest.

It seems even at just $5 billion, the valuation is too rich.

  • data-ottawa 2 years ago

    I'm not surprised they made this deal, but $60M feels low for the cost of killing their 3P APIs.

    I can only assume the price is because the license is non-exclusive and they think they can get other big fish to bite.

    They can offer higher quality and more timely data than scrapers can, so there is a value proposition there.

  • ametrau 2 years ago

    Killing the api was a massive mistake. They should have worked with appolo et al to broker a deal. They had a the best of both worlds, other people tackling the app development for them and they just needed to take a reasonable rent.

  • antisthenes 2 years ago

    I'm actually curious if Reddit's going to be the first successful example of monetizing what is essentially a community into an IPO fast enough before normies get on it and start flooding it with their low-effort takes and celebrity gossip drivel.

    I know this is possible to do by other communities like paid forums (SomethingAwful), or grow into an adjacent e-commerce supported community (e.g. bodybuilding forums), but is it possible to do it on Reddit's scale? We'll see.

    I have no horse in the race either way, most of my posts on there are from the Reddit vs Digg era, so I'm not really invested.

34679 2 years ago

Reddit's MO is snark and sarcasm. It's hard for me to imagine a scenario where a LLM trained on reddit would be useful for anything serious. How do they propose to seperate fact from fiction?

  • pier25 2 years ago

    That's just the tip of the iceberg.

    There's a reason plenty of people append "reddit" to their google searches.

    • data-ottawa 2 years ago

      The main reasons are to get local content, or to avoid SEO spam sites which provide overly verbose listicles or "review" sites that link out to Amazon affiliate links.

      Reddit is still one of the few sites left that provide user content openly. FB+Instagram+Twitter are entirely inaccessible if you don't have an account, and a lot of forums do things like only show images to logged in users.

      I've found the Reddit experience much worse with recent changes. When you land on Reddit from a Google search comment threads are only 1 level deep with a max of only a few replies shown, so you have to load a new page for each response you want to read. It's one of the worst UX I've seen considering that landing from search is probably the most financially lucrative use patterns Reddit has.

      • fragmede 2 years ago

        which is really fascinating in the face of TikTok being Googleable. like, it's still not there, but being able to Google for a TikTok but not an Instagram reel is something.

  • anotherhue 2 years ago

    You needn't wonder, though it's a little behind the times now.

    https://www.reddit.com/r/SubSimulatorGPT2/

  • AznHisoka 2 years ago

    I don’t know if it’s just me but Reddit is probably the most toxic, unfriendly community I’ve been in. Anytime I make a comment, it’s immediately downvoted to oblivion if it doesn’t agree with the hive mind, or is unpopular. And if I ask a simple question, it gets downvoted if it’s perceived as “too stupid” or obvious. Maybe it’s just me though

  • isthatafact 2 years ago

    I am really hoping someone makes an AI version of reddit where each user can easily control and adjust the type of posts and interactions they experience.

    Just imagine how great reddit would be if not for the other users, the moderators, the admins, and the CEO.

  • pavel_lishin 2 years ago

    There are a lot of subreddits, and they vary wildly. Something trained on r/askhistorians could spew out a lot of very plausible sounding bullshit.

tonystubblebineOP 2 years ago

I’m not surprised. There had been some talks among various platforms to form a coalition and the reason we (Medium) thought they broke down was because people were trying to cut individual deals. I think this is overall bad for the internet because it cuts creators out of the decision and compensation.

cuckatoo 2 years ago

They could've saved themselves $60M by just downloading the torrent of the data.

  • bryan0 2 years ago

    Not if you include expected future legal fees. This also seems to imply if you train on Reddit data without a license Reddit will sue you.

    • timeon 2 years ago

      But there is lot of content on Reddit wich is not original. Often just screen-shoted/reposted licensed content. Will this be 'reddit-washing' of original licenses?

selivanovp 2 years ago

Reddit became a weird place in recent months. Rampant propaganda wars in most subreddits, recommending system pushing ridiculous stories to front page feed from subreddits I've never visited before. Even the niche technical subreddits I used to enjoy becoming a battlefield regularly.

fallingknife 2 years ago

Am I the only one who couldn't care less if something I write is used to train an LLM? Much better than the current tracking that is the norm.

pier25 2 years ago

Good time to create a community for artists that doesn't allow AI scraping.

vinni2 2 years ago

Time to stop posting stuff on Reddit.

blindriver 2 years ago

This is exactly my point for many years now. Reddit exists because of the free work of the various subreddit communities and their moderators. And in return they get nothing. But the company, the investors and employees will all become rich off of it. It’s weird how no one cares that Reddit isn’t trickling down anything to the people that make the site a success.

  • chollida1 2 years ago

    Every year, I pay to play hockey. I pay for new equipment, jerseys, referees and ice time.

    And the arena gets paid by selling food to people who come to my games, sell advertising on the boards, beer to fans.

    What do I get and why is none of this money trickling down to me?

    Clearly I get the enjoyment of being Canadian and playing hockey.

    Reditors are no different. None of them were tricked, they participate on the site by writing comments or submitting articles because the get enjoyment out of it.

    I don't expect to see a dime from the arena when I play hockey and they make money from it and no redditor expects to see a dime when they participate on the site and the site makes money form it.

    If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.

    • mandmandam 2 years ago

      > None of them were tricked

      They were tricked from day one, when the founders pretended to be different people to make the site look busier than it really was.

      They are tricked every day by bots, troll farms, spammers, astroturfers, bought-out moderators, corrupt admins, etc.

      Go look at the founders page, where Aaron Swartz used to be.

      Look into who maxwellhill probably was (first Redditor to a million karma and mod of some huge and deranged subs like worldnews).

      Look into how certain keywords get shadowbanned.

      Look into the mod and admin cabals with their private agendas.

      Look into the way many national subs were taken over in quiet coups.

      There are nice things about Reddit, even today, but the idea that users know what they're getting into is deeply naive.

      • chollida1 2 years ago

        I mean, those are all things that happened on reddit but that has nothing to do with what I said.

        I was very specifically talking about users not being tricked into thinking they were going to be paid for posting content on reddit.

        Your comment, as far as I can tell, has nothing at all to do with that.

        Did you mean to reply to a different comment?

    • FireBeyond 2 years ago

      Agreeing to write content for free to participate in a community is one thing.

      How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?

      > no redditor expects to see a dime when they participate on the site and the site makes money form it.

      Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.

      • yanderekko 2 years ago

        >How many Reddit users knew they were agreeing in the future for Reddit to sell the content they wrote to make money for Reddit, not them?

        Despite my low opinion of Redditors, I believe that on some level they are aware of the principle that if the product is free, then you are the product.

        If you presented the regular users with the choice between "pay a subscription fee and opt out or let us use your data in these ways", the vast majority will end up choosing the latter and we all know it.

      • chollida1 2 years ago

        >> no redditor expects to see a dime when they participate on the site and the site makes money form it.

        > Plenty of Redditors would disagree with you, and I'm not sure why you're acting like this is obvious. If I hadn't already deleted all my content and left because of the last debacle, I would be doing so for this.

        Really?

        Ok, let's say reddit only sold adds for revenue.

        What percentage of redditors do you think would feel justified to a percentage of reddit's ad revenue? Because the only reason the ads have value is because of the redditors themselves?

        • FireBeyond 2 years ago

          The only similarity there is revenue.

          Ads are for consumption by Reddit users, and is an expected, if not liked, mechanism. That you can pay to opt out of.

          People wrote content that they expected to be content on Reddit. Not to be later able to pick up a book (after buying it) and read their own words, that Reddit sold and made additional money off of (because they're still doing the ads, of course).

          • chollida1 2 years ago

            Appreciate the response, its clear you've given it a bit of though.

            I guess we simply disagree. Given that there is no right or wrong and only opinions, then I guess we're at the end of the line.

    • rglullis 2 years ago

      > If you think its weird that a for profit company is making money and no one is complaining, its because everyone went into the deal knowing exactly what's what. No one was tricked or deceived.

      The deal never included that they would appropriating the community content and selling it as their own, but that could try to make money from user-generated content, and in turn they would keep it as open as possible: an easy to use API, third-party clients, RSS feeds for every subreddit and even posts and comments, etc.

      They changed the deal. People are right to be upset with the new terms.

  • stainablesteel 2 years ago

    the social ostracization is particularly strong on reddit, any revenue sharing would immediately come into a conflict with the intelligence agencies that run their psyops there

    and its even worse because its basically the best formatted social media with the worst demographics now, aka most potential and worst execution, so its a dataset of decreasing size compared to other social media now

    when it comes to training its hypothetically a particularly great dataset because you can choose to include or exclude text topics as input based on subreddit or thread, its so well organized

  • nutate 2 years ago

    And to celebrate the free for free internet of the past, google is finally finishing its acquisition of dejanews by shutting down their usenet indexing. https://support.google.com/groups/answer/11036538?visit_id=6...

  • awb 2 years ago

    Mods/users get access to a massive user base and a world-class platform at no acquisition or operational cost to them.

  • tuwtuwtuwtuw 2 years ago

    I don't see how that is weird at all. Were you paid by ycombinator for your message? People often do things they enjoy doing without receiving a payment for it.

    • collingreen 2 years ago

      I think this analogy is flawed - the hn equivalent of posting is the same as Reddit and then it changes and gets weird if hn starts selling your writing.

      I'm not sure why folks are trying to say that going to Reddit and doing activity where the price of admission is ads is the same as doing activity where the price of admission is they own your writing and sell it. You may be fine with it but they seem clearly distinct to me -- enough to be worth talking about instead of dismissing.

  • s1k3s 2 years ago

    You're saying it as if someone from Reddit came into your house and forced you to become a moderator or user who submits content. Is that true? Of course not, people are willingly creating communities and willingly submit content to the site.

    Reddit exists because millions of people like it. Reddit also exists because hundreds of developers created it while other people are paying for its infrastructure.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection