• A search engine can’t pay a website for having the honor of bringing them visits and ad views.

    Fuck reddit, get delisted, no problem.

    Weird that google is ignoring their robots.txt though.

    Even if they pay them for being able to say that glue is perfect on pizza, having

    User-agent: *
    Disallow: /
    

    should block googlebot too. That means google programmed an exception on googlebot to ignore robots.txt on that domain and that shouldn’t be done. What’s the purpose of that file then?

    Because robots.txt is completely based on honor (there’s no need to pretend being another bot, could just ignore it), should be

    User-agent: Googlebot
    Disallow:
    User-agent: *
    Disallow: /
    
    •  tal   ( @tal@lemmy.today ) 
      link
      fedilink
      English
      4
      edit-2
      2 months ago

      I guessed in a previous comment that given their new partnership, Reddit is probably feeding their comment database to Google directly, which reduces load for both of them and permits Google to have real-time updates of the whole kit-and-kaboodle rather than polling individual pages. Both Google and Reddit are better-off doing that, and for Google it’d make sense for any site that’s large-enough and valuable enough to warrant putting forth any effort special-case to that site.

      I know that Reddit built functionality for that before, used it for pushshift.io and I believe bots.

      I doubt that Google is actually using Googlebot on Reddit at all today.

      I would bet against either Google violating robots.txt or Reddit serving different robots.txt files to different clients (why? It’s just unnecessary complication).

    • Google is paying for the use of Reddit’s API, not for scraping the site.

      That’s the new Reddit’s business model: want “their” (users’) content, then pay for API access.