To skirt copyright law, Anthropic bought and destroyed millions of physical books to train its "Claude" LLM

Arthur Besse@lemmy.ml · 3 days ago

To skirt copyright law, Anthropic bought and destroyed millions of physical books to train its "Claude" LLM

CriticalResist8@lemmygrad.ml · 3 days ago

I don’t know if there’s anybody who hasn’t come to the same conclusion lol but ultimately, after (re-)reading and retyping my thoughts over and over, I come to two conclusions:

This is a problem of capitalism but it’s also not saying much. The crux of the matter is copyright law and competition.

On the one hand copyright law is so backwards and outdated (thank Disney) that the only way they could do this was to discard the books after scanning them. Cutting books, known as destructive scanning, used to be for a long time the only viable way to digitized books. With new methods however you can certainly do it without destroying the book, but many of these methods are patented.

The other side of the coin is that these AI companies “need” to put out better, faster models all the time to stay in competition. It’s a fast-evolving industry with similarly cut-throat competition. If you fall behind, people stop using you and you don’t find funding.

In higher-stage socialism, all of this would have basically been prevented. The SOE(s) responsible for AI research would have been told to preserve books even if it takes longer, or even find more efficient ways to train their models. There also wouldn’t be such a rush to put out marginally better models just to stay at the cutting edge. deepseek showed it’s possible to get a good model based on an original “cutting edge” model. Which means you only need the cutting edge model once, then you can decline it differently.