- cross-posted to:
- technology
Some argue that bots should be entitled to ingest any content they see, because people can.
This is the best summary I could come up with:
Unfortunately, many people believe that AI bots should be allowed to grab, ingest and repurpose any data that’s available on the public Internet whether they own it or not, because they are “just learning like a human would.” Once a person reads an article, they can use the ideas they just absorbed in their speech or even their drawings for free.
Iris van Rooj, a professor of computational cognitive science at Radboud University Nijmegen in The Netherlands, posits that it’s impossible to build a machine to reproduce human-style thinking by using even larger and more complex LLMs than we have today.
NY Times Tech Columnist Farhad Manjoo made this point in a recent op-ed, positing that writers should not be compensated when their work is used for machine learning because the bots are merely drawing “inspiration” from the words like a person does.
“When a machine is trained to understand language and culture by poring over a lot of stuff online, it is acting, philosophically at least, just like a human being who draws inspiration from existing works,” Manjoo wrote.
In his testimony before a U.S. Senate subcommittee hearing this past July, Emory Law Professor Matthew Sag used the metaphor of a student learning to explain why he believes training on copyrighted material is usually fair use.
In fact, Microsoft, which is a major investor in OpenAI and uses GPT-4 for its Bing Chat tools, released a paper in March claiming that GPT-4 has “sparks of Artificial General Intelligence” – the endpoint where the machine is able to learn any human task thanks to it having “emergent” abilities that weren’t in the original model.
The original article contains 4,088 words, the summary contains 274 words. Saved 93%. I’m a bot and I’m open source!
WasPentalive ( @waspentalive@lemmy.one ) 1•2 years agoAI does not learn as we do when ingesting information.
I read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)
Later when I make use of what I may have learned these same effects will happen again to whatever it was I correctly understood.
Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.
The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.
I once wrote a more simple program that took a body of text and noted the third letter following each set of two, it built probability tables from the pair of letters + the next letter. After ingesting what little training information I was able to give it it would choose two letters at random and then generate the following letter using the statistics it had learned. It had no concept of words, much less the meaning of any words it might form.
Amju Wolf ( @amju_wolf@pawb.social ) English1•2 years agoI read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)
Just because you’re worse at comprehension or have worse memory doesn’t make you any more real. And AIs also “forget” things, they also get stuff imperfectly, because they don’t store any actual “full length texts” or anything. It’s just separete words (more or less) and the likelyhood of what should come next.
Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.
Except you don’t not perfectly. You can be absolutely sure that you often say something someone else has said or written, which means they technically have a copyright to it… But noone cares for the most part.
And it goes the other way too - you can quote something imperfectly.
Both actually can/do happen already with AIs, though it would be great if we could train them with proper attribution - at least for the clear cut cases.
The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.
A sufficiently advanced artificial intelligence would be indistinguishible from natural intelligence. What sets them apart then?
You can look at animals, too. They also have intelligence, and yet there are many concepts that are incomprehensible to them.
The thing is though, how can you actually tell that you don’t work the exact same way? Sure the AI is more primitive, has less inputs - text only, no other outside stimuli - but the basis isn’t all that different.