My original, editorialized title: Ars Technica Sells Out


Linking to this because I know people here read Ars Technica, and I totally didn’t become a subscriber three days before this was announced. Nope. No sir.

  • azl
    link
    English
    63 months ago

    I want Ars content to be part of whatever training data is provided to the best models. How does that get done without appearing like they are being bought?

    Even if their contract explicitly states that it is a data sharing agreement only and the products of the media organization (articles/investigations) are not grounds for breach or retaliation, it is assumed that there is now some impartiality in future reporting.

    So, for all media companies, the options seem to be:

    1. Contribute to the greater good by openly permitting site scraping (for $0)
    2. Allow data sharing to contracted parties only (for a fee)
    3. Public or privately prohibit use of any data, and then seek damages down the road for theft/copyright infringement when the legal framework has been established.

    Is there a GPL or other license structure that permits data sharing for LLM training in a way that it does not get transformed into something evil?