• Luc@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    6 hours ago

    You could always type it over and say you’ve recreated it, but that also didn’t fly. Why would overfitting a machine learning algorithm on the data and then having it predict next tokens be any different?

    • Limonene@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      5 hours ago

      Because machine learning is already basically a mass copyright infringement. The training data contains copyrighted material. The model is clearly a derivative of the training data. The output is clearly a derivative of the model. Yet somehow, it’s legal (probably because they can afford good lawyers).

  • thisisbutaname@discuss.tchncs.de
    link
    fedilink
    arrow-up
    12
    ·
    17 hours ago

    What you’re thinking of is this: Malus.

    It’s an example of Clean-room design, basically you have a LLM read the code and write a specification, which a second LLM uses to rewrite the code without access to the original.

    Although since the original code was most likely included in the LLMs training data, this might not really be really true.

  • slazer2au@lemmy.world
    link
    fedilink
    English
    arrow-up
    55
    ·
    1 day ago

    Why are you using the future tense when AI companies have already scraped the ever living crap out of every git forge it can access?

    • ComradePenguin@lemmy.mlOP
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      1 day ago

      Updated question to clarify. Was thinking more of specific repos being laundered, not entirety

      • slazer2au@lemmy.world
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        3
        ·
        1 day ago

        When you use LLMs to regurgitate code, you do not get ownership of the code as you did not produce it. so using llms to “launder” code doesn’t accomplish anything.

        • ComradePenguin@lemmy.mlOP
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          1 day ago

          From my understanding it does allow me to use the code for any purpose regardless of the license, does it not? Even if I dont own the LLM written code?

          • floquant@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            2
            ·
            6 hours ago

            It does not.

            From the GPL terms:

            To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy.

            Other licenses may be more permissive and do allow you to do pretty much whatever you want with it, but I don’t see why feeding some source code into an LLM would exempt you from its license.

            It doesn’t matter if it’s you reading it or an LLM doing inference on it, you’re still taking the source code as a starting point to create a derived work based on it and as such you are subject to its license.

          • Ephera@lemmy.ml
            link
            fedilink
            English
            arrow-up
            7
            ·
            1 day ago

            Yeah, but you also have to be aware that companies rarely care to (fully) comply with licenses to begin with, if their own code isn’t publicly accessible.

            Basically:

            • If they actually open-source their own code, they have to fully comply (though the worst consequence is often just having to open-source your own code, which it already is, so it might not always be the highest priority either).
            • If they build a frontend, they generally do want to comply, because someone might be able to decompile the software and prove that licensed code is used inappropriately.
            • If they build a backend or build tooling or the like, GPL and AGPL is often still prohibited due to the high impact, but other than that, complying with licenses is seen as reducing risk for something that’s pretty unlikely to affect them. The chance of them being sued for code that no one sees is just practically 0, so it’s usually treated as an acceptable legal risk to not give a fuck.
          • PlzGibHugs@piefed.ca
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            1
            ·
            1 day ago

            Even before getting into the copyrightability of code, at the very least, any LLM-produced parts are not copyrightable. They are public domain.

            That said, if its a mix of LLM code and human code, things get pretty messy. From my understanding, if the human expanded on or modified AI code, its public domain. If they wrote a section fully independently, they absolutely own the copyright. If its an unclear mix, it would have to be proven on a case-by-case basis with the onus being on the AI user to provide solid evidence that the code copied isn’t AI generated.

  • MajorHavoc@programming.dev
    link
    fedilink
    arrow-up
    9
    ·
    1 day ago

    Sure!

    But I don’t expect it to change much.

    I could already do that, by hand as well.

    It’s a bit like how there’s so many different superheroes who are obviously just off-brand SuperMan or off-brand Captain America.

    Minor changes to avoid intellectual property law and branding has always been an option.

    And I suppose all of these are easier now with the remixing slop-o-trons.

    But they weren’t terribly difficult, or particularly uncommon, even before.

  • lime!@feddit.nu
    link
    fedilink
    arrow-up
    11
    ·
    1 day ago

    pretty much, with the caveat that code that has gone through an llm can’t ever be licensed or copyrighted. it’s basically a public domainifyer.

    • Tja@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      12 hours ago

      No, code that has been purely written by an LLM is not copyrightable.

      As soon as a human writes a prompt, a correction, a design guideline, a code review, it becomes a question of who has the better lawyer. Which I would bet the billion dollar Corp has the better chances.

      • lime!@feddit.nu
        link
        fedilink
        arrow-up
        2
        ·
        11 hours ago

        so they would have to argue what counts as a transformative work of plagiarism. how much of a stolen painting you have to paint over before it’s no longer stolen.

        • Tja@programming.dev
          link
          fedilink
          arrow-up
          1
          ·
          11 hours ago

          I’m not a lawyer, no idea what would they argue, I just know the lawyer price beats being right many times.

          • lime!@feddit.nu
            link
            fedilink
            arrow-up
            1
            ·
            11 hours ago

            they have previously argued that llm output is transformative itself, but that’s been struck down. i’m not sure what the next avenue they will take is but they will definitely take it.

    • Korhaka@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 day ago

      That doesn’t seem to stop corporations assuming their software is still theirs even when an LLM wrote a lot of it.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    If you mean “can I just recreate an existing copyrighted work without it being copyright-infringing”, no. You’re still liable if you recreate text that would be considered a derivative work. You don’t have a fantastic mechanism to avoid that with existing LLMs, though I would guess that you most-likely aren’t going to generate infringing code randomly.

    Same thing for images or other media.

  • PlzGibHugs@piefed.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    From my understanding, code is still covered by copyright. This means that copied code, even if run through an intermediary like an AI, is still copyright infringement. In the same way, even if an image generator recreates a character or movie frame, it isn’t made public domain (the default state of AI Output), its just that the AI ingringed on someone else’s copyright. If the code or image is then used, you can still be sued.