Right now, robots.txt on lemmy.ca is configured this way
User-Agent: *
Disallow: /login
Disallow: /login_reset
Disallow: /settings
Disallow: /create_community
Disallow: /create_post
Disallow: /create_private_message
Disallow: /inbox
Disallow: /setup
Disallow: /admin
Disallow: /password_change
Disallow: /search/
Disallow: /modlog
Would it be a good idea privacy-wise to deny GPTBot from scrapping content from the server?
User-agent: GPTBot
Disallow: /
Thanks!
Shadow ( @Shadow@lemmy.ca ) 18•2 years agoI’m on board for this, but I feel obliged to point out that it’s basically symbolic and won’t mean anything. Since all the data is federated out, they have a plethora of places to harvest it from - or more likely just run their own activitypub harvester.
I’ve thrown a block into nginx so I don’t need to muck with robots.txt inside the lemmy-ui container.
# curl -H 'User-agent: GPTBot' https://lemmy.ca/ -i HTTP/2 403
skankhunt42 ( @skankhunt42@lemmy.ca ) 3•2 years agoI imagine they rate limit their requests too so I doubt you’ll notice any difference in resource usage. OVH is Unmetered* so bandwidth isn’t really a concern either.
I don’t think it will hurt anything but adding it is kind of pointless for the reasons you said.
ono ( @ono@lemmy.ca ) English17•2 years agoYes, please.
We can’t stop LLM developers from scraping our conversations if they’re determined to do so, but we can at least make our wishes clear. If they respect our wishes, then great. If they don’t, then they’ll be unable to plead ignorance, and our signpost in the road (along with those from other instances) might influence legislation as it’s drafted in the coming years.
nbailey ( @nbailey@lemmy.ca ) English13•2 years agoYes. Ban them.
if ($http_user_agent = "GPTBot") { return 403; }
Shadow ( @Shadow@lemmy.ca ) 4•2 years agoThanks for empowering my lazyness =)
Alligatorade ( @Crocrodile@lemmy.ca ) 5•2 years agoYes
sndmn ( @sndmn@lemmy.ca ) 3•2 years agoIs this even possible without all federated instances also prohibiting them?
You take action where you can ;)
narF ( @narF@lemmy.ca ) 3•2 years agoAre they even respecting those files?
But yeah, sure, it’s worth trying!
It’s from the official documentation.
- EhForumUser ( @EhForumUser@lemmy.ca ) 1•2 years ago
Worth trying for what reason?
Sunshine (she/her) ( @Sunshine@lemmy.ca ) English2•5 months agoYes, please prevent them from using our conversations.
- EhForumUser ( @EhForumUser@lemmy.ca ) 2•2 years ago
No, definitely not. Our work posted in the open is done so because we want it to be open!
It is understandable that not all work wants to be open, but access would already be appropriately locked down for all robots (and humans!) who are not a member of the secret club in those cases. There is no need for special treatment here.