The lawsuit alleges OpenAI crawled the web to amass huge amounts of data without people’s permission.

  • @Hick@lemmy.world
    link
    fedilink
    English
    551 year ago

    Scraping social media posts and reddit posts doesn’t sound like stealing, they’re public posts.

    • @SamB@lemmy.world
      link
      fedilink
      English
      191 year ago

      I doubt it’s only about some Reddit posts. The scrapping was done on the whole web, capturing everything it could. So besides stealing data and presenting it as its own, it seems to have collected some even more problematic data which wasn’t properly protected.

      • @zekiz@lemmy.world
        link
        fedilink
        English
        161 year ago

        But that really isn’t OpenAI’s fault. Whoever was in charge of securing the patients data really fucked up.

        • krellor
          link
          fedilink
          181 year ago

          Leaving your front door open isn’t prudent but doesn’t grant permission to others to enter and take/copy your belongings or data.

          The security teams may have royally screwed up, but OpenAI has a legal obligation to respect copyright and laws regarding data ownership.

          Likewise, they could have scraped pages that included terms of use, copyright, disclaimers, etc., and failed to honor them.

          All parties can be in the wrong for different reasons.

          • @conditional_soup@lemm.ee
            link
            fedilink
            English
            61 year ago

            I think it’s a little closer to being mad that the Google street car drove by and snapped a picture of the front of your house, tbh.

          • @zekiz@lemmy.world
            link
            fedilink
            English
            21 year ago

            It’s more like leaving an important letter in the open for everyone to read. It’s certainly your fault for leaving it that open.

          • Dran
            link
            fedilink
            English
            21 year ago

            But does leaving your front door open allow one to legally take a picture of the inside from across the street? I’d say scraping is more akin to that than it is theft. Nothing is removed in scraping, just copied

            • @BradleyUffner@lemmy.world
              link
              fedilink
              English
              01 year ago

              Bad analogy. This is like leaving your couch out on the sidewalk, then complaining when someone takes a picture of it.

        • Apathy Tree
          link
          fedilink
          English
          31 year ago

          It’s certainly their fault that they used it, though.

          If they cared, they could have ensured they weren’t using sensitive or otherwise highly problematic information, but they chose not to. That’s on them.

        • jdp23
          link
          fedilink
          11 year ago

          They certainly fucked up, but it might well be OpenAI’s post too.

      • @tallwookie@lemmy.world
        link
        fedilink
        English
        31 year ago

        if it was unsecured it’s basically public. whomever put that data on a publicly accessible server is at fault

        • @priapus@sh.itjust.works
          link
          fedilink
          English
          9
          edit-2
          1 year ago

          That’s not necessarily true. Even if a company makes the mistake of not securing data correctly, those that make use of this data can still be at fault.

          If a company leaves a server wide open, you still can’t legally steal information from it.

          • @Fylkir
            link
            English
            1
            edit-2
            1 year ago

            If a company leaves a server wide open, you still can’t legally steal information from it.

            I don’t see how this is any different than if Google search included text from a page that shouldn’t be public.

      • sik0fewl
        link
        fedilink
        61 year ago

        Just because something is posted online doesn’t mean it can be taken a resold. Copyright law prevents that. Of course, copyright law and generative AI is new and gray area.

    • @sudneo@lemmy.world
      link
      fedilink
      English
      21 year ago

      Here is not just scraping though, it is also using that data to create other content and to potentially also re-publish that data (we have no way of knowing whether chatGPT will spit out any of that nor where did it take what is spitting out).

      The expectation that social media data will be read by anybody is fair, but the fact is that the data has been written to be read, not to be resold and published elsewhere too.

      It is similar for blog articles. My blog is public and anybody can read it, but that data is not there to be repackaged and sold. The fact that something is public does not mean I can do whatever I want with it.

      • @seasick@lemmy.world
        link
        fedilink
        English
        31 year ago

        I could read your blog post and write my own blog post, using yours as inspiration. I could quote your post, add a link back to your blog post and even add affiliate links to my blog post.I could be hired to do something like that for the whole day

        • @sudneo@lemmy.world
          link
          fedilink
          English
          31 year ago

          ChatGPT doesn’t get inspired, the process is different and it could very well spit verbatim the content. You can do all the rest (depending on the license) without issues, but once again this is not what chatGPT does, as it doesn’t provide attribution.

          It’s exactly the same with software, in fact.

    • SkierniewiceBoi
      link
      fedilink
      11 year ago

      @Hick I have one problem with that in terms of this generative ai. It’s similar to when microsoft trained copilot on github data. Of course it was open source code, it was on Microsoft’s servers but with this ai revolution you couldn’t expect that someone will be able to create such tool. I mean we’re randomly leaving our DNA in multiple different places but does it mean we agreed to be cloned once the technology that makes it possible will arrive?

      @L4s