- cross-posted to:
- technews@radiation.party
- cross-posted to:
- technews@radiation.party
- FaceDeer ( @FaceDeer@kbin.social ) 38•1 year ago
For those who can’t get through the paywall, this is an article about a system called Kudurru that is monitoring a bunch of websites with images listed in the LAION-5B metadata set. When it sees the same IP address downloading images from those websites simultaneously, it assumes that it must be a bot that’s scraping the data in order to train an AI with it and either blocks them or “poisons” the scrape by sending incorrect images back.
Frankly, I don’t see much likely impact from this. AI training has moved beyond simply using LAION-5B, we’re discovering that a smaller higher-quality dataset is better than just throwing mountains of data at the AI in training. So anything a trainer is downloading is going to be extensively curated before being used for training and this sort of obstruction will be fixed or filtered out.
- mkhoury ( @mkhoury@lemmy.ca ) 12•1 year ago
But the main result is achieved anyway, right? The picture that the system tried to download did not make it into the training set.
- FaceDeer ( @FaceDeer@kbin.social ) 7•1 year ago
Unless the “this sort of obstruction will be fixed” part means the image is downloaded anyway. This is the weakest sort of DRM.
- Moonrise2473 ( @Moonrise2473@feddit.it ) 2•1 year ago
Thanks
Oops, sorry, forgot: https://archive.ph/ylJHc