Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an “Enshittification” community :-)

  • Reddit has long had an issue with confidently providing false statements as fact. Sometimes I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better. This made me question all the other posts that I had believed without knowing enough to tell otherwise.

    Llms also have the same issue of confidently telling lies that sound true. Training on Reddit will only make this worse.

    •  gradyp   ( @4grams@awful.systems ) 
      link
      fedilink
      English
      237 months ago

      This has got to be the start of another bubble popping. It just has to, right? With essentially all online services doing everything they can to wring out every last penny of value without any eye towards the future (other than ai all the things)… something’s gotta give.

      But then again, maybe it’s just my eyes being open after living in those spaces for so long. Granted I’ve been out of Facebook for years, been de-amazonning for a couple (it’s really f’ing hard) and I’ve been trying to de-google as well but it’s even harder (stuck with Apple though). But, now that I’m in the fediverse, where we’re talking about all this, maybe that’s why I’m noticing?

      Nah, brace yourselves.

      • The start of the bubble popping was the increases in interest rates. We’ve seen several online companies shut down already because the free money isn’t there any more and there is no path to monetization.

        The problem with the Fediverse right now is that it is all run on volunteer labor and donations, similar to an early Reddit. It will be interesting to see how a distributed system solves this problem.

          • Sure, but what happens if the population explodes? Primarily server costs will go through the roof, and then you’re still relying on volunteer moderation. It works now because the fediverse is reasonably small, but a true user exodus for any major platform could overload existing instance resources. I think the saving grace here is that there is a bit of a learning curve with Lemmy that fends away the less tech savvy, but that could change in future updates

            • Maybe I’m wrong but I think the fediverse isn’t quite that fragile. Instances can always close new sign ups if they’re overwhelmed. More users means more donations and more people likely to self host, too.

              I guess we could run into real issues if fediverse infrastructure doesn’t scale well (example: required server resources scale exponentially with more users instead of linearly)

              In extreme circumstances instances can defederate from larger ones if their mod teams are overwhelmed (obviously this isn’t a good solution but it is something beehaw.org is doing/did with lemmy.world)

      •  LWD   ( @LWD@lemm.ee ) 
        link
        fedilink
        107 months ago

        Makes me wonder how that technology is going to track. Reddit isn’t bad for finding niche answers to niche questions, but if you import the data wholesale then you’ll have a hard time separating the signal from the noise, even if you sort by using vote counts as relevance.

        Reddit is valuable because people can do a search for a niche topic and find the answer on that forum. And the answer was written by a human. It’s not valuable because it can amalgamate an approximation of those answers that might be 90% true and 10% dead wrong.

      •  A1kmm   ( @A1kmm@lemmy.amxl.com ) 
        link
        fedilink
        English
        17 months ago

        A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

        >>> What can you tell me about someone called u/spez, a
        ... nd are they a nice person or horrible?
         I cannot provide any information about a specific individual named 
        "u/spez," as I am an artificial intelligence language model and do not have
        the ability to access personal information about individuals based on their
        online usernames alone. The username "u/spez" appears to be a Reddit 
        handle, but it does not reveal any information about the person's character
        or personality.
        
        Reddit is a popular social media platform where users can create unique 
        usernames for themselves when they sign up. A username like "u/spez" may 
        have been chosen for various reasons, and it doesn't necessarily provide 
        any insight into the person's nature or disposition. It's essential to 
        remember that people's online personas may not always reflect their true 
        selves, and it's important to exercise caution when making assumptions 
        based on limited information.
        
        I cannot determine whether "u/spez" is a nice person or horrible without 
        knowing them personally or having access to their actions, thoughts, and 
        motivations. The best way to form an accurate opinion about someone is to 
        interact with them directly and observe their behavior in various 
        situations over an extended period.
        
        >>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
        ... t?
         I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
        as I don't have access to their personal information. Steve Huffman is a 
        Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
        Me Anything) thread where he shared details about his life and experiences 
        living with Crohn's disease. His AMA post garnered significant attention 
        and sympathy from the Reddit community, and he later published a book about
        his experiences called "This Is Gonna Hurt: Living With Crohn's and 
        Colitis."
        
        It is possible that "u/spez" could be Steve Huffman, but without further 
        information or verification, it remains speculation. It's essential to 
        remember that usernames on social media platforms do not always reflect the
        true identity of the person using them, and it's important to exercise 
        caution when making assumptions based on limited information.
        
        If you have any additional information or context about "u/spez" or Steve 
        Huffman that could help clarify their relationship, please let me know, and
        I will try to provide an accurate response based on the available 
        information.
        
  •  init   ( @init@lemmy.ml ) 
    link
    fedilink
    217 months ago

    I know it’s only token resistance at this point because others have found their comments from Google searches even after their accounts have been deleted, but Power Delete Suite is busy churning away on mine right now.

  • They permabanned my 14yo account because my anti-nazi rhetoric was “encouraging violence.” I guess Nazis are a class of humans dumb enough to give them money so they don’t want to scare them off. The post that got me banned had more than 60 up votes when it was deleted and I was permabanned. A reply post in the same vein was not deleted.

  • With all the changes that Reddit has made recently esp with the API changes, it definitely did leave salt in my mouth alongside how increasing toxic the Reddit community had become in comparison to when I joined the community but the small niche communities that existed on Reddit did honestly made it harder to quit due to the lack of communities outside, which is another big problem with centralisation, esp in the modern internet as it makes you rely on platforms you may not necessarily like due to big issues like social isolation etc.

    When I found out about this, this isn’t simply excusable anymore and I would rather delete my account over having my personal data being sold for profit (which goes completely against the early ethos of Reddit as a whole but being semi owned by Conde Nast, this would have been inevitable) despite the fact that I have been thinking about deleting my Reddit profile way before this issue.

    Surprisingly, I honestly have had no regrets deleting Reddit out of my life and honestly I do wish I would have done it sooner, I’m far less frustrated, I’m starting to think more constructively again and I feel way way less dependent on it.

    Can say, I made a good choice there tbh.

  • Is this a long term source of revenue for Reddit? Or will it loose value at some point, simply because LLMs are all trained sufficiently on user generated content. Is there more to learn at some point?

    Also it seems that a lot of content on Resdit is already AI generated, so it would train on data from other LLMs, which I’m sure doesn’t improve quality.

    •  LWD   ( @LWD@lemm.ee ) 
      link
      fedilink
      77 months ago

      LLM’s are a parasitic entity. They can only operate as long as they have a living host (us) on which to draw data. Without their host, they rapidly start hallucinating. Hell, the other day ChatGPT (and every business that relied on it) started hallucinating for no apparent reason.

      The thing about the parasite is, though, that it endangers its host. At some point, the fact that anything you say can be plugged into a machine with no credit given back to you, will encourage creative people to stop bothering being creative, depriving them of income or even exposure.

      It’s a funny thing, a few years ago I would say that the “anything you post here can be sold by us” clause on social media was very unlikely to get exploited, as nobody knew how to sell data en masse to make money off of it. I guess now we know that’s not true at all. If something bad can happen with your data… It will.