An update to Google’s privacy policy suggests that the entire public internet is fair game for it’s AI projects.

  • People who are alive can have a company steal their entire corpus without recompense, while the descendants of people who died decades ago can get still get paid for content created by their ancestors.

    Right.

    • But how else could Disney afford to own everyone else’s rights and properties? Why not think about the little guy! (Mickey mouse is little, right?)

      That being said, I find it weird people are going after training data for llm’s after completely ignoring the models built specifically to compete with and take advantage of people’s unconscious habits and lifestyles.

      AI in general will be very important to comfortably survive the near future as a species. Data is an important part of that.

      we absolutely need to do something about the megacorps funneling every new gain as a society into increasing the already absurd wealth divide. The technology is good. The general web scraping isn’t bad if the tool is not specifically evil in function. We just need to as a global community demand that the technology be used to benefit everyone equally as it continues to be developed.

    •  Zapp   ( @Zapp@beehaw.org ) 
      link
      fedilink
      8
      edit-2
      10 months ago

      Yeah. Now the stupidity I post online has a purpose.

      Someday a T-800 will be closing in on a freedom fighter, but will have an intrusive thought interrupt it at a key vulnerable moment. And that intrusive thought will be some random pun we posted to DadJokes. You’re welcome, future freedom fighters.

  • I, as the proprietor of my comments, condone Google AI scraping my publicly shared content for their own use, on the condition that they condone scraping of their publicly accessible content including YouTube videos. :P

    • Google is going to continue boiling the frog until everyone using gmail, YT, drive, etc… is paying subscriptions for access to these services. It’s going to be interesting to see how much people are willing to pay to hold on to a gmail account they’ve been using for 20 years. I should buy Alphabet stock now.

  • I just kind of assumed that they, as well as anyone in the space was doing that already.

    Whether that means that we all collectively have ownership over the outputs of these models if they’re trained on content that we produced over the years is another thing. As someone who uses AI tools a fair bit I would be totally fine with generated content being public domain unless a threshold for human intervention is met.

    That threshold is where the messy legal work lies.

    • Perhaps we lived in blissful ignorance all this time. Before AI Language Learning models they are today, Google Translate was most of what the data was going to and it was mainly about getting an adequate translation. Now it’s being used to answer questions on all different subjects using parts of real people’s answers, which could be more frightening to people.

    • I think it’s a problem of value capture.

      People had no problem posting on reddit and wasting tons of hours helping strangers solve their problems. But now that reddit puts that information behind a paywall, people will have massive issues with that.

      Similarly, google scrapped data, but didn’t APPEAR (and i can’t emphasize that enough) to use that data to deliver value that cannot be shared by the people who created that data. Most of the time your value is aligned so that you give up your “data” to google so that google can either provide you with better traffic through its search engine, or better ads to generate revenue for you.

      OpenAI does not benefit the original publisher of that information what so ever.

      • I don’t know about that. When’s the last time you looked something up on Google and the first link was driving traffic to a website rather than scraping one and present it in-engine?

  • Google does what Google wants. Lawsuits are the only remedy to any of their indulgent transgressions. And not everyone can sue.

    Years ago I had to have a lawyer file a motion in court in order to get Google to erase private medical documents they had inadvertently gotten access to and then they cached. It’s one thing to index everything and another even if they temporarily have access to restricted data because of a security lapse. But to COPY data as cache is something that should be absolutely illegal.

    But as I said, Google does what Google wants.

  •  trekz   ( @trekz@beehaw.org ) 
    link
    fedilink
    5
    edit-2
    10 months ago

    Is this even new though? Google has always had a stronghold over any public data on the internet. It’s a search engine 😄. It’s sole purpose is to scrape and store everything it possibly can on the web.

    • It’s not

      Previously, Google said the data would be used “for language models,” rather than “AI models,” and where the older policy just mentioned Google Translate, Bard and Cloud AI now make an appearance.

      This is mainly just an update to more modern terms, it doesn’t really seem like they’re adding anything new to their policies.