The Hollywood star is among a group of authors suing the tech companies for $1 billion each, alleging copyright infringement. The cases could set important legal precedents.
The Hollywood comedian Sarah Silverman has joined forces with two authors to sue the tech companies Meta and OpenAI for $1 billion each.
Silverman and the other authors accuse the tech companies of using their books without authorisation to train the AI chatbots. They suspected that the companies’ training algorithms had unauthorised access to Silverman’s book The Bedwetter and, for proof, they submitted an exhibit showing what happened when ChatGPT was asked to summarise her work.
The summary was so extensive and accurate, they argued, that the entirety of Silverman’s book must have formed part of ChatGPT’s training dataset.
Large language models, the programmes which form the basis of chatbots like ChatGPT, require huge amounts of text to train themselves to mimic human responses. Some of them are partially trained on datasets called “shadow libraries” – online databases that provide free access to millions of books and articles, which would otherwise be behind paywalls. The large volumes of text are very valuable data sources.
While some, like Project Gutenberg, are legal, others are more controversial, because they’re legally dubious. Z-Library, for example, describes itself as the “world’s largest e-book library” and is located on the Dark Web. Some of the books on there are pirated. So the use of shadow libraries in cases like Sarah Silverman’s raises difficult questions about the limits of copyright. How can chatbots leverage content when so much of the internet is copyright protected?
Silverman and her co-plaintiffs, the novelists Richard Kadrey and Christopher Golden, are not the only people to take AI firms to court. A group of visual artists have sued Stability AI, Midjourney and DeviantArt for copyright infringement. Programmers have sued GitHub for introducing GitHub Copilot, an AI product, which they say relies on “unprecedented open-source software piracy”. Getty Images has also filed a lawsuit, alleging that Stability AI, which created the image-generation tool Stable Diffusion, trained its model on “millions of images protected by copyright.”
These cases could take years to resolve, but they could also define the boundaries of how AI learns, and what role copyright laws will play in how training datasets are assembled.
Meta and OpenAI have yet to respond to Sarah Silverman’s lawsuit, but they’re understood to deny any wrongdoing.
Today’s episode was written and mixed by Patricia Clarke, with additional reporting by Alexi Mostrous.