Codeberg was asking about this. The linked toot by a commenter points to :

SEqlite

These are CC-BY-SA 4.0 remixes of the Stack Exchange Creative Commons Data Dumps. 100% Unendorsed by Stack Exchange, Inc.

They are minimal. They provide the data you probably care about and the data you need to comply with the original license in SQLite format.

    • @DaseinPickle@leminal.space
      link
      fedilink
      37 months ago

      It’s not about privacy. It’s about AI companies stealing other peoples work and knowledge and profiting. Like what they did with artists. And I think that’s bothering a lot of people. It’s kind of sad that we cannot exchange information with each other for free, without some Silicon Valley crooks taking advantage and trying to convert other people’s good will into profit.

      These LLMs are also polluting the web with AI junk and slop. The web is absolutely tainted with shitty ChatGPT text and images, making it harder and harder to find authentic information. I think a lot of people don’t want to contribute with that.