Give me a hive five. Beehaw, pardners!

  •  tetris11   ( @tetris11@lemmy.ml ) 
    link
    fedilink
    English
    311 months ago

    They might, but you will still be helping people, and if at a later date a court mandates that the authors of the training data be compensated for their actions, or if the corpus is released into open source repositories – then I’d still call that a win for humanity.

    •  4bh1j47   ( @4bh1j47@beehaw.org ) 
      link
      fedilink
      English
      411 months ago

      That is a fair point.

      Personally in an ideal world, I would like to export all of my data from reddit before leaving, and then if later someone wants to host all of the dataset under a permissive open source license like I believe stackoverflow or wikipedia do, which is accessible to search engines, then scrub+anonymize my dataset and upload it there.

      Obviously the issues with something like this are people uploading doctored data to poison the training models etc.