In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

  • A lot of licensing prevents or constrains creating derivative works and monetizing them. The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

    If yes, Github Copilot is illegal as it produces code that should comply to multiple conflicting license requirements. If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt, and run a leaked copy of Windows through it, then go around selling Binbows and MSFT can’t do anything about it.

    The truth is mostly between the two, this is just piracy, which always has been a gray area because of the difficulty of prosecuting it, previously because the perpetrators were many and hard to find, now it’s because the perpetrators are billion dollar companies with expensive lawyer teams.

    • The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

      This question is completely independent of whether the code was generated by an AI or a human. You compare code A with code B, and if the judge and jury agree that code A is a derivative work of code B then you win the case. If the two bodies of work don’t have sufficient similarities then they aren’t derivative.

      If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt

      You’ve reinvented copy-and-paste, not an “AI.” AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

      Filtering the code through something you call an AI isn’t going to have any impact on whether you get sued. If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

      • AIs are deliberately designed to not copy-and-paste.

        AI is a marketing term, not a technical one. You can call anything “AI”, but it’s usually predictive models that get called that.

        AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

        For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code. Their licenses and copyright and stuff doesn’t apply to AI training data so you could sell that new code your AI just created.

        It doesn’t even have to be 100% identical to Windows source code. What if it’s just 80%? 50%? 20%? 5%? Where is the bar where the author can claim “that’s my code!”?

        Just to compare, the guys who set out to reimplement Win32 APIs for use in Linux (the thing that made it into MacOS as well now) deliberately would not accept help from anyone who ever saw any Microsoft source code for fear of being sued. The bar was that high when it was a small FOSS organization doing it. It was 0%, proven beyond a doubt.

        Now that Microsoft is the author, it’s not a problem when Github Copilot spits out GPL code word for word, ironically together with its license.

        • AI is a marketing term, not a technical one.

          The reverse, actually. Artificial intelligence is a field of research that includes things like machine learning, as well as lots of even more mundane applications. It’s pop culture that has hijacked it to mean “a thing exactly as capable as a human brain, but in computer form.”

          For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code.

          Once again, it doesn’t matter what you “feed code through.” Copyright applies to the tangible result. If the output from the AI matches closely to something that’s already copyrighted then that copyright applies to it. If it doesn’t match closely then that copyright doesn’t apply to it. The actual process by which the code was produced doesn’t matter one whit. If I took a Harry Potter book, put its pages through a shredder, randomly glued the particles of paper back together and it just so happened to closely replicate Lord of the Rings then the Tolkien estate has a case against me but the Rowling estate does not.

      •  nous   ( @nous@programming.dev ) 
        link
        fedilink
        English
        1011 months ago

        If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

        ^^ Very much this.

        Loads of people are treating the process of AI creating works as either violating copyright or not. But that is not how copyright works. It applies to the output of a process not the process itself. If someone ends up writing something that happens to be a copy of something they read before - that is a violation of copy write laws. If someone uses various works and creates something new and unique then that is not a violation. It does not - at this point in time at least - matter if that someone is a real person or an AI.

        AI can both violate copy write on one work and not on another. Each case is independent and would need to be legislated differently. But AI can produce so much content so quickly that it creates a real problem for a case by case analysis of copy write infringement. So it is quite likely the laws will need to change to account for this and will likely need to treat AI works differently from human created works. Which is a very hard thing to actually deal with.

        Now, one could also argue the model itself is a violation of copyright. But that IMO is a stretch - a model is nothing like the original work and the copyright law also does not cover this case. It would need to be taken to court to really decide on if this is allowed or not.

        Personally I don’t think the conversation should be on what the laws currently allow - they were not designed for this. But instead what the laws should allow. So we can steer the conversation towards a better future. Lots of artists are expressing their distaste for AI models to be trained on their works - if enough people do this laws can be crafted to backup this view.

    • then go around selling Binbows and MSFT can’t do anything about it

      I think this already happen. A very practical example, windows GUI has been copied by many Linus distros. And with windows 11 there’s clearly a reference to Apple MacOS GUI with a sparkling of Google material design.

      Should apple and Google be able to sue Microsoft because it “copied” their work? Should Google be able to sue apple because they “copied” the notification drop-down in iOS?

      As you say it’s really a grey area because the only reason we consider AI code to be “regurgitated” while human code to be “inspired” is only because we give humans more recognition of their intellectual abilities.