A tale of a new Lemmy instance, a bot infestation, the fallout, and how we dealt with it

RotaryKeyboard ( @RotaryKeyboard@lemmy.ninja ) · 1 year ago

A tale of a new Lemmy instance, a bot infestation, the fallout, and how we dealt with it

rm_dash_r_star ( @rm_dash_r_star@lemmyonline.com ) · edit-2 1 year ago

That’s a good indicator when you find your instance blocked by a lot of other instances. I think the lesson is don’t leave low hanging fruit out there.

It actually amazes me there’s people out there doing these bot infestations. I mean there is some effort involved. Why go to all the trouble, what’s the payoff? And how are they able to find new unadvertised instances so quickly.

RotaryKeyboard ( @RotaryKeyboard@lemmy.ninja ) · 1 year ago

That’s a good indicator when you find your instance blocked by a lot of other instances.

That’s just it: it took a third-party tool for us to even know we were being blocked. Our Lemmy instance really had no tools in place for us to see anything was wrong. If we hadn’t been extremely curious about our high user count, we never would have known there was a bot on our site. Never.

Interestingly, when we discovered the tool that let us see that we were being blocked, I noticed that almost all of the sites that were reported as blocking us were in fact not blocking us. To their immense credit, they had apparently blocked us and then unblocked us after we wiped out the bots. It says a lot that those admins kept checking whatever report they were checking and followed up after we cleared up the problem.

cosmic_skillet ( @cosmic_skillet@lemmy.ml ) · 1 year ago

Sounds like the admin tools have a big way to go.

The Cuuuuube ( @Cube6392@beehaw.org ) · 1 year ago

The operators of troll farms are highly motivated because they use them to manipulate political endeavors

chiisana ( @chiisana@lemmy.chiisana.net ) · 1 year ago

I believe the way it works is that the moment you interact with something, instance with at least one user who subscribe to the community you’re interact with gets a ping with activity associated with you. Since each message is signed, webfinger is used to verify your user’s authenticity (prevents me from posting something offensive pretending to be from your instance). That would then allow the bad actors to quickly collect instances to bot upon.

Payoff is minimal but theoretically they’d be able to shill for things just like they already do on Reddit.

Jamie ( @Jamie@jamie.moe ) · 1 year ago

I run a private instance, but haven’t had captcha or email verification on because, well, it’s just me and one friend that I don’t think even uses his account. I have applications on and don’t approve anyone because it’s a personal instance. So far, I’ve had 5 bots apply. I’ll put their application text at the bottom of this post.

Names tended to follow (noun)(noun)## format. One actually only had one noun. But it seems like having applications on by itself makes a lot of them just not bother with you. Even better, the wording of the applications was… odd. They’d stick out like a sore thumb in a batch of real ones, I think.

“I’m eager to join the World News@lemmy.ml community to broaden my global perspective and participate in discussions about current affairs.”

“I want to join the Lemmy.world community because I’m curious to connect with fellow users and engage in discussions about various topics.”

“I yearn to depart from Reddit and embark on a transformative journey within this innovative social network by joining your instance.”

“Joining the /kbin meta@kbin.social community seems interesting as I can engage in discussions about the platform’s development and future enhancements.”

“Driven by the ongoing events on Reddit, I’m eager to join this instance and find the satisfaction and pleasure that has eluded me elsewhere.”

rm_dash_r_star ( @rm_dash_r_star@lemmyonline.com ) · 1 year ago

The wording on those applications would definitely raise a flag for me. They totally sound bot generated.

I’ve joined a number of instances looking for the best performer, hops, pings, server response. That’s what I’ve been saying in my applications. Interestingly, my first sign-up was on Beehaw before I knew what I doing and they are the only ones that rejected me, about a week after I applied. Made me think, what did I say that was so awful? No biggie I already had some good instances to sign into.

Levii ( @Levii@lemmy.ml ) · 1 year ago

Apparently i sound like a bot when writing applications…

ElTacoEsMiPastor ( @toototabon@lemmy.ml ) · 1 year ago

That will be the problem with LLMs. Considering the application questions can simply be used as a prompt, bots will ace the Turing test. Would different questions or phrasings make it easier to filter them?

I guess the tell from your single application to all these, is that they flock at the registration.

All this just proves why 3rd party tools are important for managing an instance.

The Cuuuuube ( @Cube6392@beehaw.org ) · 1 year ago

Oh god, I must have given the Beehaw and Slrpnk folks fits with my Noun#### username

Jamie ( @Jamie@jamie.moe ) · 1 year ago

Luckily for you, they haven’t learned to count to a thousand yet. The highest number I got was 566. The rest were 1-2 digits.

The Cuuuuube ( @Cube6392@beehaw.org ) · 1 year ago

Maybe I need to stop using bitwarden to recommend usernames haha

MrWiggles ( @mrwiggles@prime8s.xyz ) · 1 year ago

As a webmaster myself, I’ve noticed a small number of users with repeating seemingly generated names, all with the same or similar answer to the registration screening question. I’d be curious if you could release the database of usernames and screening question answers. I’d bet other Lemmy admins would benefit from any analysis done on that database. TTP.

Spzi ( @Spzi@lemm.ee ) · 1 year ago

Thanks for the write-up!

If you want to see which other instance blocks yours to inform them of the changes you made: https://fba.ryona.agency/?domain=lemmy.ninja

RotaryKeyboard ( @RotaryKeyboard@lemmy.ninja ) · 1 year ago

Thanks for linking this. I intended to put it in the body of the post but forgot to. Note that the information is very out-of-date. I manually checked each one earlier, and only two of those sites are still blocking us.

sunaurus ( @sunaurus@lemm.ee ) · 1 year ago

The “last seen” column actually shows when the block was lifted!

RotaryKeyboard ( @RotaryKeyboard@lemmy.ninja ) · 1 year ago

Oh! Good to know!

0x4E4F ( @0x4E4F@lemmy.fmhy.ml ) · 1 year ago

Moral of the story - don’t make your own instance, lol.

The Cuuuuube ( @Cube6392@beehaw.org ) · 1 year ago

Manual approvals are your friend

Ada ( @ada@lemmy.blahaj.zone ) · 1 year ago

Manual approvals aren’t a realistic option for the sheer size of the reddit migration. There aren’t enough mods to handle that

chiisana ( @chiisana@lemmy.chiisana.net ) · 1 year ago

Let’s rephrase that… if you can’t manage your instance with an appropriate approval process (bearing in mind, no process is also a process; the community might just choose to de-federate a no approval server, however), then don’t host an instance. Not everyone have to, nor should they, all congregate in one instance. They’d have access to all the communities as long as they’re not on a de-federated instance, so spreading out will prevent another single instance’s admin going down spez’s path, thereby reinforcing the federated network’s resilience.

Ada ( @ada@lemmy.blahaj.zone ) · edit-2 1 year ago

What I’m saying is that if every instance tried to do manual approval, the threadiverse wouldn’t have been able to cope with the influx of reddit users. Across every instance, all combined, we didn’t have the resources to manually approve the influx of users.

To cope, some instances had to be on open signups. If people coming from reddit couldn’t sign up at their preferred instance, they went somewhere else with open signups. And if there was nowhere with open signups, a good portion of them would have given up, moved on, and the threadiverse would have lost momentum before it found it.

And in our case specifically, as the only explicitly queer focused instance (at least at the time of the initial reddit migration) we felt it was important to be open so queer folk could find a space and set up communities during those early days, rather than forcing them on to generalist instances without the protections and community that come with queer spaces.

chiisana ( @chiisana@lemmy.chiisana.net ) · 1 year ago

Right; and as I was saying, the choice to be open sign up and have no approval process is in itself a process choice that the instance operator can choose to take.

However, if the instance (not your instance, just a hypothetical instance) gets abused, and bad actors chooses to launch attacks by massing bot accounts, then it is also entirely possible for others to choose to de-federate that instance.

It’s a fine line to balance; as someone who’s been building discussion forums since early 2000’s, I fully understand the implications of needing to balance between ease of sign up and having the appropriate process in place to keep the community clean.

I think having a more modular bot prevention system (I.E. allowing user to plug in code to handle different types of captcha/question answer/bot detection/etc.) will add a lot of value, but the devs haven’t quite figure their footing yet. They’ve removed captcha all together in 0.18 only to be told vocally to put it back in. I’d say it is just typical growing pains of suddenly being vaulted to the spotlight…

Ada ( @ada@lemmy.blahaj.zone ) · 1 year ago

That’s a little less confrontational than what you first wrote, where you said that you shouldn’t be running an instance if you can’t handle a manual approval process. My whole point is that no one is resourced the properly handle manual approvals at that scale.

I absolutely agree that open approvals come at a cost, and do have real risks associated. We’re holding off on upgrading to 0.18 specifically because of lack of captcha. That’s not something we’re prepared to risk

Like you, I’ve been doing this for decades. We might have made different choices, but I think it’s fair to say, both of us are making choices from positions of first hand experience, and I think that’s probably why I got a little defensive.

chiisana ( @chiisana@lemmy.chiisana.net ) · 1 year ago

Apologies, not meant to be confrontational; definitely agree we have slightly different point of views on sign up moderation, but we both just want the best interest for the network at large!

The Cuuuuube ( @Cube6392@beehaw.org ) · 1 year ago

If you’re running your own instance is the context I was referring to

Ada ( @ada@lemmy.blahaj.zone ) · 1 year ago

You had no issues with just dropping the users from the table? That’s a relief to here. We banned our bots directly in the DB, but hesitated to simply drop them.

RotaryKeyboard ( @RotaryKeyboard@lemmy.ninja ) · 1 year ago

We ran the command several times and have had no ill effects. I would say you could drop those users now to clean up your database.

A tale of a new Lemmy instance, a bot infestation, the fallout, and how we dealt with it

A tale of a new Lemmy instance, a bot infestation, the fallout, and how we dealt with it

Summary

Introduction

Discovering the Bots

Meanwhile…

Cleaning Up

Wrapping Up