I just developed and deployed the first real-time protection for lemmy against CSAM!

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

I just developed and deployed the first real-time protection for lemmy against CSAM!

Cobalt_Blu ( @C0balt_Blu@lemmy.ml ) · 1 year ago

Db0 the fuckin hero 🙏

Demigodrick ( @Demigodrick@lemmy.zip ) · 1 year ago

Just want to add - i’ve been using this (via my desktop!) for my instance for a little while now and its great - While the evidence shows there are false positives, i’ve yet to see it affect anything in real time.

Beware your B2 transaction costs though! 😭 I’m sure there is a cheaper way to do it but backblaze costs went up quite a bit.

PenguinCoder ( @Penguincoder@beehaw.org ) · 1 year ago

B2 cloud storage update says:

effective October 3, we’re making egress free (i.e. free download of data) for all B2 Cloud Storage customers—both pay-as-you-go and B2 Reserve—up to three times the amount of data you store with us, with any additional egress priced at just $0.01/GB. Because supporting an open cloud environment is central to our mission, expanding free egress to all customers so they can move data when and where they prefer is a key next step.

Demigodrick ( @Demigodrick@lemmy.zip ) · 1 year ago

Yeah, I had the email yesterday, but they don’t mention if this is specifically their download charge, or if the class b and class c transactions are included in this - I mean I’ll be honest, I haven’t had time to properly look into yet, but either way it should help.

Blaze ( @Blaze@discuss.tchncs.de ) · 1 year ago

Well done!

zeus ⁧ ⁧ ∽↯∼ ( @Zeus@lemm.ee ) · 1 year ago

holy hell this is massive

thank you for your work db0

kreynen ( @kreynen@kbin.social ) · 1 year ago

Sounds like progress, but please consider using a term other than “whitelist” when describing a list of allowed values. While the use of blacklist predates references to black as a race, allowlist is a reasonable alternative that doesn’t reinforce viewing black as less than or unwanted and white as allowed.

S410 ( @S410@kbin.social ) · 1 year ago

Making things that were never about race into things about race, just to have one more reason to be potentially offended by, is not productive and doesn’t help anyone.

By exercising enough mental gymnastics almost any term could be twisted into something supposedly offensive. The real solution to that problem: don’t do mental gymnastics.

Honytawk ( @Honytawk@lemmy.zip ) · 1 year ago

Those are technical terms that have nothing to do with race or even humans.

burble ( @burble@lemmy.dbzer0.com ) · 1 year ago

Allowlist and Blocklist are also more intuitive to people who haven’t heard the terms before.

TehPers ( @TehPers@beehaw.org ) · edit-2 1 year ago

I’ve honestly always found “allowlist” and “blocklist” to feel like forced compound words, and I don’t see why “list” is necessary at all. For example, just saying “allowed” and “blocked” both implies it’s a list and is more intuitive than any of the *list terms.

Personally I have no stake in the battle, but I do wish people would use the most intuitive terms for the situation at least (whatever they are, for example “enabled”/“disabled” or “included”/“excluded”) instead of blanket ctrl+f on everything.

burble ( @burble@lemmy.dbzer0.com ) · 1 year ago

That’s a good point, and I hadn’t thought about that angle, that there just isn’t a reason for the terms to exist in the first place.

“In the red” and “in the black” is another pair that isn’t intuitive to me at all and I have to look up every time.

e-ratic ( @e-ratic@kbin.social ) · edit-2 1 year ago

Oh come on… The origin of blacklist was centuries before “black” became the term for a person of colour. And on a thread about CSAM…

TheGreenGolem ( @TheGreenGolem@lemm.ee ) · 1 year ago

Oh the fuck with this nonsense!

Scary le Poo ( @Scary_le_Poo@beehaw.org ) · 1 year ago

Do you ever get tired of twisting yourself into a pretzel every time you want to be offended?

grimace1153 ( @grimace1153@lemm.ee ) · 1 year ago

Holy fuck

WallsToTheBalls ( @WallsToTheBalls@lemmynsfw.com ) · 1 year ago

Wahhhhhhh

Lemmyvisitor ( @Lemmyvisitor@lemmy.dbzer0.com ) · 1 year ago

I’m curious how an AI like this is trained

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

https://www.kdnuggets.com/2021/03/beginners-guide-clip-model.html

Lemmyvisitor ( @Lemmyvisitor@lemmy.dbzer0.com ) · 1 year ago

interesting read, thank you

given CLIP has a high zero-shot learning success rate, was it functional for this use case out of the box? or were further modifications required?

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

It requires specific usage of clip. Check the horde-safety repo if you’re interested

marco ( @marco@beehaw.org ) · 1 year ago

Sent you a little bit of money, @db0@lemmy.dbzer0.com - it sucks that this is necessary, but thanks for doing the good work <3

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

much appreciated

iByteABit [he/him] ( @iByteABit@lemm.ee ) · 1 year ago

Great work, this is the biggest issue that Lemmy has a the moment, I hope the admins will be able to set this up easily and start to take back all the preventative measures.

user ( @user@lemmy.one ) · 1 year ago

👏well done.

Duchess ( @Duchess@yiffit.net ) · 1 year ago

thank you for making the fediverse a safer place to be

fmstrat ( @fmstrat@lemmy.nowsci.com ) · 1 year ago

Have you considered federating hashes of positive matches and working with the Lemmy team to not outward federate on a local positive match (and potentially have the hash go instead)?

The former can reduce overhead and electricity use, and the latter will stop more distribution and aid those sans-GPU who can’t run it.

Over time, the hash DB will grow and get better. In addition, perhaps there is metadata that can be used to track image similarity to positive matches to reduce false-positives, but I imagine that algorithm would be much more complicated.

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

Hashes won’t work for novel GenerativeAI images. For this kind of thing we need to be sharing tensors and comparing distances so that it catches format changes and compression artifacts. Theoretically possible. Practically, I don’t know how feasible it is.

fmstrat ( @fmstrat@lemmy.nowsci.com ) · 1 year ago

How large is each tensor? If it can be stored as JSON or Base64 and is of sufficiently small size, integration into ActivityPub wouldn’t be all that bad. The time consuming part would likely be integration into Lemmy itself.

Another option would be a separate service, similar to how Lemmy Explorer works, where a list of the latest tensors can be downloaded. It’s centralized vs distributed, but probably easier to implement. Just an API admins can register for to send and get tensors.I would be happy to assist with this if it is a route you would like to explore. Feel free to DM me.

db0 ( @db0@hachyderm.io ) · 1 year ago

@fmstrat each tensor is small. The problem is when you have millions of them and you have to compare each image to each. You can’t index this. It has to be one by one. And you still need to covert the new image to tensors as well,which still needs gpu. I just don’t see anything useful here. The current system would be faster.

fmstrat ( @fmstrat@lemmy.nowsci.com ) · 1 year ago

Good point. I wonder how the commercial hash-based systems are doing it…

AceQuorthon ( @AceQuorthon@lemmy.dbzer0.com ) · 1 year ago

What’s CSAM?

noctisatrae ( @noctisatrae@beehaw.org ) · 1 year ago

Child pornography

TiTeY` ( @titey@lemmy.home.titey.net ) · 1 year ago

Great job! 👍

azurefirefly ( @azurefirefly@lemmy.basedcount.com ) · 1 year ago

Fantastic work

yildo ( @yildo@kbin.social ) · 1 year ago

I would love some form of this for Mastodon

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

I don’t know the architecture of masto unfortunately. I guess it doesn’t use pict-rs. What does it use for images?

poVoq ( @poVoq@slrpnk.net ) · edit-2 1 year ago

AFAIK it is a built in system. But scanning the file folder or object storage probably works the same.

It would be definitely nice to have the option to scan multiple locations if you run more than one service.

db0 ( @db0@lemmy.dbzer0.com ) · 1 year ago

Yes, it’s just not realtime anymore

I just developed and deployed the first real-time protection for lemmy against CSAM!

I just developed and deployed the first real-time protection for lemmy against CSAM!

For lemmy admins:

For other fediverse software admins

Divisions by zero

Support