• I said this in a different post’s comments about Facebook scraping data:

    Can activity pub change it’s terms to say that all crawlers that use this must be gnu open sources and all information crawled must be open to the public on gnu open sources software (no crawling to a private enterprise)?

    My understanding is all the big tech companies are scared of what happened with router software (openwrt) and they don’t want to be forced to let competition be a foss community via gnu licensing.

    • I have also thought this is a good idea. I think that the ActivityPub standard should have a required field that lists a copyright license. Then a copyleft style copyright should be created that allows storing and indexing for distribution via open-source standards, and disallows using for AI training and data scraping. If every single post has a copyleft license then it would be risky for bigtech to repurpose it because if a whistleblower called them out that could be a huge class action suit.

      A good question is if a single post can be copyrighted. I think it could. Perhaps you would consider each post like a collaborative work of art. People keep adding to it, and at the end of the day the whole chain could function as a “work”. Especially since there is a lot of useful value and knowledge in some post threads.

    • If that worked, we could have easily prevented AI companies from vacuuming up data from personal websites and separately hosted git repos. We could put a condition that if they train their models using our data, then the model and its weights would automatically be under the same license as our content. Of course, those psychopaths are going to use their money to defeat such arguments in court.