• Arthur Besse@lemmy.ml
    link
    fedilink
    English
    arrow-up
    29
    ·
    3 days ago

    identify AI that has used copyrighted material

    but, that is basically all modern “AI”.

    (the only LLM i’ve heard of which actually claims that its training corpus is freely licensed is Apertus…)

    • YourMomsTrashman@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      1 day ago

      Traditionally, with machine learning, it is standard practice to mention what datasets and/or pretrains were used, so that the results are transparent and can be replicated. With GPT-2, it was “the common crawl and our own crawled 8 million web pages”, and since then I feel it’s mostly left out, falling back on (easily manipulated) benchmarks instead 😬

      • Arthur Besse@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Yep. But just providing a list of millions of URLs and saying “we trained on this” as some models in the past have done also didn’t make it possible to replicate; by the time anyone re-fetches them all, many of the URLs will inevitably have changed or disappeared.

    • youcantreadthis@quokk.au
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      edit-2
      3 days ago

      We callin it Plagarized Information Stochastic Stupidity now the only PISS you’ve heard of

    • Hackworth@piefed.ca
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      3 days ago

      Adobe claims to only train their image generator, Firefly, on images from their stock library.

    • ZDL@lazysoci.al
      link
      fedilink
      arrow-up
      0
      ·
      2 days ago

      Interestingly, literally zero of the people I’ve seen who word things this way ever seem to volunteer to be the ones doing the watering. Are you going to break the losing streak or are you going to continue confirming my belief that it’s only chicken hawks who say this?