• OhNoMoreLemmy@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    The other reason they don’t do it is because many models are trained on a large corpus of pirated texts, and documenting this would be a confession.

    Not just in an ‘I scraped the new york times without permission’ kind of way, but in a ‘I illegally downloaded a torrent containing bestsellers from the last 30 years’ kind of way.