Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

lemmyreader · 7 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

verassol · 7 months ago

StackOverflow: *grabs money on monetizing massive amounts of user-contributed content without consulting or compensating the users in any way*

Users: *try to delete it all to prevent it*

StackOverflow: *your contributions belong to the community, you can’t do that*

Pretty fucked-up laws. A lot of lawsuits going on right now against AI companies for similar issues. In this case, StackOverflow is entitled to be compensated for its partnership, and because the answers are all CC BY-SA 3.0, no one can complain. Now, that SA? Whatever.

@9point6@lemmy.world · 7 months ago

That SA part needs to be tested in court against the AI models themselves

A lot of this shittiness would probably go away if there was a risk that ingesting certain content would mean you need to release the actual model to the public.

verassol · edit-2 7 months ago

Yeah, their assumption though is you don’t? Neither attribution nor sharealike, not even full-on all-rights-reserved copyright is being respected. Anything public goes and if questions are asked it’s “fair use”. If the user retains CC BY-SA over their content, why is giving a bunch of money to StackOverflow entitling OpenAI to use it all under whatever terms they settled on? Boggles me.

Now, say, Reddit Terms of Service state clearly that by submitting content you are giving them the right to “a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness (…) in all media formats and channels now known or later developed anywhere in the world.” Speaks volumes on why alternatives (like Lemmy) to these platforms matter.

Skull giver · 7 months ago

deleted by creator

verassol · 7 months ago

That’s interesting. I was looking up “Lemmy Terms of Service” for comparison after getting that quote from the Reddit ToS and could not find anything for Lemmy.ml. Now after you mentioned it, looking on my Mastodon instance, nothing either, just a privacy policy. That is indeed kinda weird. Some instances do have their own ToS though. At least something stating a sublicense for distribution should be there for protection of people running instances in locations where it’s relevant.

Skull giver · 7 months ago

deleted by creator

@hedgehog@ttrpg.network · 7 months ago

The funny thing about Lemmy is that the entire Fediverse is basically running a massive copyright violation ring with current copyright law.

Is it, though?

When someone posts a comment to Lemmy, they do so willingly, with the intent for it to be posted and federated. If they change their mind, they can delete it. If they delete it and it remains up somewhere, they can submit a DMCA request; likewise if someone else posts their copyrighted content.

Copyright infringement is the use of works protected by copyright without permission for their use. When you submit a post or a comment, your permission to display it and for it to be federated is implied, because that is how Lemmy works. A license also conveys permission, but that’s not the only way permission can be conveyed.

Skull giver · 7 months ago

deleted by creator

@hedgehog@ttrpg.network · 7 months ago

The idea that someone does this willingly implies that the user knows the implications of their choice, which most of the Fediverse doesn’t seem to do

The terms of service for lemmy.world, which you must agree to upon sign-up, make reference to federating. If you don’t know what that means, it’s your responsibility to look it up and understand it. I assume other instances have similar sign-up processes. The source code to Lemmy is also available, meaning that a full understanding is available to anyone willing to take the time to read through the code, unlike with most social media companies.

What sorts of implications of the choice to post to Lemmy do you think that people don’t understand, that people who post to Facebook do understand?

If the implied license was enough, Facebook and all the other companies wouldn’t put these disclaimers in their terms of service.

It’s not an implied license. It’s implied permission. And if you post content to a website that’s hosting and displaying such content, it’s obvious what’s about to happen with it. Please try telling a judge that you didn’t understand what you were doing, sued without first trying to delete or file a DMCA notice, and see if that judge sides with you.

Many companies have lengthy terms of service with a ton of CYA legalese that does nothing. Even so, an explicit license to your content in the terms of service does do something - but that doesn’t mean that you’re infringing copyright without it. If my artist friend asks me to take her art piece to a copy shop and to get a hundred prints made for her, I’m not infringing copyright then, either, nor is the copy shop. If I did that without permission, on the other hand, I would be. If her lawyer got wind of this and filed a suit against me without checking with her and I showed the judge the text saying “Hey hedgehog, could you do me a favor and…,” what do you think he’d say?

Besides, Facebook does things that Lemmy instances don’t do. Facebook’s codebase isn’t open, and they’d like to reserve the ability to do different things with the content you submit. Facebook wants to be able to do non-obvious things with your content. Facebook is incorporated in California and has a value in the hundreds of billions, but Lemmy instances are located all over the world and I doubt any have a value even in the millions.

Skull giver · 7 months ago

deleted by creator

verassol · 7 months ago

the claimants were set back because they’ve been asked to prove the connection between AI output and their specific inputs

I mean, how do you do that for a closed-source model with secretive training data? As far as I know, OpenAI has admitted to using large amounts of copyrighted content, numberless books, newspaper material, all on the basis of fair use claims. Guess it would take a government entity actively going after them at this point.

Skull giver · 7 months ago

deleted by creator

verassol · 7 months ago

Thank you for sharing. Your perspective broadens mine, but I feel a lot more negative about the whole “must benefit business” side of things. It is fruitless to hold any entity whatsoever accountable when a whole worldwide economy is in a free-for-all nuke-waving doom-embracing realpolitik vibe.

Frankly, not sure what would be worse, economic collapse and the consequences to the people, or economic prosperity and… the consequences to the people. Long term, and from a country that is not exactly thriving in the scheme side of things, I guess I’d take the former.

Skull giver · 7 months ago

deleted by creator

@bitfucker@programming.dev · 7 months ago

Yep. Can’t wait to overfit LLM to a lot of copyrighted work and share it to public domain. Let’s see if OpenAI will get push back from copyright owner down the road.