• The beef between Microsoft and Reddit came to light after I published a story revealing that Reddit is currently blocking every crawler from every search engine except Google, which earlier this year agreed to pay Reddit $60 million a year to scrap the site for its generative AI products.

    I know the author meant “scrape”, but sometimes it really does feel like AI is just scrapping the old internet for parts.

  • I can see why spez is upset about scrappers and search engines - image a company profiting from people creating lots of data, just hoarding it and using it for free, and not paying those people a cent, preposterous, right? :)

  • “This was Microsoft’s choice, not ours,” Reddit spokesperson Tim Rathschmidt told me in an email. “We are and have been open to agreements with companies who are open about their intentions and commit to treat us and our users fairly. If Bing or others want access within our policies, without training, without summarization, and without selling it to others, we are and have always been open to that. If they want to build a business selling Reddit data or using the data for training, we could be open to that, but it’s a commercial conversation.”

    Mojeek, the search engine that initially told me that Reddit was blocking all search engines but Google, and which was unable to get in touch with Reddit at the time, told me Reddit got in touch after that story was published. Mojeek said it was unable to share any details about the deal because of an NDA, but confirmed that Reddit wanted to get paid for letting Mojeek crawl the site, even though Mojeek does not have any AI products.

    This doesn’t add up and it makes me wonder what else Google and reddit agreed upon. This situation benefits no one except Google, as far as I can tell. If reddit wants to milk search engines, and Microsoft is willing and able to pay (which I assume they are), there is no reason for the deal to not go ahead like it did with Google. Kinda makes my brain start going down the conspiracy path, but then again it’s hardly unbelievable that Google would pursue anti-competitive business strategies, particularly when it comes to generative AI.

  • A search engine can’t pay a website for having the honor of bringing them visits and ad views.

    Fuck reddit, get delisted, no problem.

    Weird that google is ignoring their robots.txt though.

    Even if they pay them for being able to say that glue is perfect on pizza, having

    User-agent: *
    Disallow: /
    

    should block googlebot too. That means google programmed an exception on googlebot to ignore robots.txt on that domain and that shouldn’t be done. What’s the purpose of that file then?

    Because robots.txt is completely based on honor (there’s no need to pretend being another bot, could just ignore it), should be

    User-agent: Googlebot
    Disallow:
    User-agent: *
    Disallow: /
    
    •  tal   ( @tal@lemmy.today ) 
      link
      fedilink
      English
      4
      edit-2
      4 months ago

      I guessed in a previous comment that given their new partnership, Reddit is probably feeding their comment database to Google directly, which reduces load for both of them and permits Google to have real-time updates of the whole kit-and-kaboodle rather than polling individual pages. Both Google and Reddit are better-off doing that, and for Google it’d make sense for any site that’s large-enough and valuable enough to warrant putting forth any effort special-case to that site.

      I know that Reddit built functionality for that before, used it for pushshift.io and I believe bots.

      I doubt that Google is actually using Googlebot on Reddit at all today.

      I would bet against either Google violating robots.txt or Reddit serving different robots.txt files to different clients (why? It’s just unnecessary complication).

    • Google is paying for the use of Reddit’s API, not for scraping the site.

      That’s the new Reddit’s business model: want “their” (users’) content, then pay for API access.

  •  TehPers   ( @TehPers@beehaw.org ) 
    link
    fedilink
    English
    114 months ago

    Joke’s on Reddit. I’ve been blocking their results in the search engine I use for months!

    I wonder if this will end up being pursued as an antitrust case. If anything, it’ll reduce traffic to Reddit from non-Google users, so hopefully that kills them off just a little faster.