Congress told AI firms should pay for copyrighted content

ylai ( @ylai@lemmy.ml ) · 6 months ago

Congress told AI firms should pay for copyrighted content

Auzy ( @Auzy@beehaw.org ) · edit-2 6 months ago

I agree. It’s basically just stealing work from others and not paying them or agreeing to licensing. Long term, it’s not a good thing

For software development, AI is a useful tool, but it’s likely also stealing licensed code and adding it to other people’s code in some circumstances

If AI companies want to use training data, that’s ok, but they should pay all of the creators it’s trained from

NovaPrime ( @NovaPrime@lemmy.ml ) · 6 months ago

Respect your opinion and where you’re coming from, but disagree simply for the fact that this will create a silo of corporatized AI because they will be the only ones who will have the ability to pay for the IP at the necessary scale (that’s not to say that this is not already happening to an extent with the existing model). I do think the conversation is worth having though about the public value of data that’s readily available on the internet and how it squares with our (imo) outdated IP laws. How do we ensure that individual creators retain full control and benefit of their art/content/knowledge, while not stifling or unduly hampering AI research? How much protection do we afford data that users willingly put on the internet that’s publicly available? And who pays for the data in the chain?

snooggums ( @snooggums@kbin.social ) · 6 months ago

All of those questions are already asked and answered for regular copyright. Sorted out for music, movies, books, etc.

FaceDeer ( @FaceDeer@kbin.social ) · 6 months ago

And what a wonderful system we have as a result.

BlameThePeacock ( @BlameThePeacock@lemmy.ca ) · 6 months ago

Should human programmers have to pay creators every time they look at code while they’re learning? I (and most people) have literally copy/pasted code from various websites into my own programs without any sort of payment or notice.

Maestro ( @Maestro@kbin.social ) · 6 months ago

If you copy/pasted without any licensed then you’ve broken copyright yourself. It’s a good way to get into trouble in a professional setting.

FaceDeer ( @FaceDeer@kbin.social ) · 6 months ago

That’s not what AI does, though.

Maestro ( @Maestro@kbin.social ) · 6 months ago

It’s what BlameThePeacock admitted to doing.

FaceDeer ( @FaceDeer@kbin.social ) · 6 months ago

Sure, but it has nothing to do with what AI does.

BlameThePeacock ( @BlameThePeacock@lemmy.ca ) · 6 months ago

You don’t have to pay the rightsholder if your hired human reads various newspapers in order to learn how to write. Or at least no more than a single person’s subscription fee to said content.

So why the hell should you have to pay more to train an AI model on the same content?

It’s faster than a human? So what? Why does that entitle you to more money? There are fast and slow humans already, and we don’t charge them differently for access to copyright material.

The tool that’s being created is used by more than one human/organization? So what? Freelance journalists write for many publications after having learned on your material. You aren’t charging them a license fee for every org they write for.

That being said, this is one of those turning points in the world where it doesn’t matter what the results of these lawsuits are, this technology is going to use copyrighted material whether it’s licensed or not. Companies will just need to adapt to the new reality.

OpenAI and other large companies are the target right now, but the much smaller open source generative AI models are catching up fast, and there’s no way to stop individuals using copyright material to train or personalize their AI, currently it’s processing intensive to train, but it’s already dropped in price by orders of magnitude, and it’s going to keep getting cheaper as computing hardware gets better.

If all you see is the article written by Joe Guy, and it’s a good article with useful information, you can’t prove that Joe even used a tool most of the time, let alone that the tool was trained on a specific piece of copyrighted material, especially if everyone’s training for their AI is a little bit different. Unless it straight up plagiarizes, no court is going to convict Joe. Avoiding direct plagiarism is as easy as having a plagiarism tool double check against the original training material.

AutoTL;DR ( @autotldr@lemmings.world ) · 6 months ago

This is the best summary I could come up with:

The New York Times recently sued OpenAI, accusing the startup of unlawfully scraping “millions of [its] copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides and more.”

Danielle Coffey, CEO of the News/Media Alliance trade association, noted that chatbots designed to crawl the web and act like a search engine, like Microsoft Bing or Perplexity, can summarize articles too.

Readers could ask them to extract and condense information from news reports, meaning there would be less incentive for people to visit publishers’ sites, leading to a loss of traffic and ad revenue.

Jeff Jarvis, who recently retired from the City University of New York’s Newmark Graduate School of Journalism, is against licensing for all uses and was afraid it could set precedents that would affect journalists and small, open source companies competing with Big Tech.

Revealing their sources might make their AI tools look bad too, considering the amount of inappropriate text their models have ingested, including people’s personal information and toxic or NSFW content.

“The notion that the tech industry is saying that it’s too complicated to license from such an array of content owners doesn’t stand up,” said Curtis LeGeyt, president and CEO of the National Association of Broadcasters.

The original article contains 877 words, the summary contains 202 words. Saved 77%. I’m a bot and I’m open source!

Zoboomafoo ( @Zoboomafoo@slrpnk.net ) · 6 months ago

I emailed my senators to tell them not to support this bill, maybe it’ll help