Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.

  • @EnderMB@lemmy.world
    link
    fedilink
    36 months ago

    These models can still be trained on data that they’re allowed to use, but I think that what we’re seeing is that the better LLM services are probably trained with shocking amounts of private data, whereas the less performant probably don’t use stolen data.

    • @spaduf@slrpnk.net
      link
      fedilink
      1
      edit-2
      6 months ago

      Textbooks are a big one that I suspect we’ll probably see a set of suits over. Particularly because they seem to be some of the most valuable training data.