Sarah Silverman and Authors Sue Meta and OpenAI for Copyright Infringement Over Book Training Data

High

Comedian Sarah Silverman and authors sued Meta and OpenAI in July 2023, alleging their AI models were trained on pirated copies of their copyrighted books from Library Genesis without permission.

Full Description

In July 2023, comedian and author Sarah Silverman, along with writers Christopher Golden and Richard Kadrey, filed separate but related class-action lawsuits against Meta Platforms and OpenAI in the U.S. District Court for the Northern District of California. The lawsuits alleged that both companies' AI language models - ChatGPT/GPT-3.5/GPT-4 from OpenAI and LLaMA from Meta - were trained on copyrighted books without permission from the authors or publishers. The plaintiffs claimed that their works were obtained from Library Genesis (LibGen), a notorious "shadow library" website that hosts millions of pirated books and academic papers. The lawsuits alleged that the defendants used these unauthorized copies as training data for their large language models, enabling the AI systems to generate summaries and analyses of the copyrighted works. As evidence, the plaintiffs demonstrated that ChatGPT could produce accurate summaries of their books when prompted, suggesting the models had been trained on the full text of their works. The legal filings argued that this constituted direct copyright infringement, vicarious copyright infringement, and violations of the Digital Millennium Copyright Act (DMCA). The authors sought monetary damages, including profits derived from the alleged infringement, as well as injunctive relief to prevent further unauthorized use of their works. The cases were structured as class actions, potentially representing thousands of authors whose works may have been similarly used without permission. Both Meta and OpenAI filed motions to dismiss the lawsuits, arguing that their use of copyrighted material constituted fair use under copyright law. The companies contended that training AI models on copyrighted texts for the purpose of learning patterns and generating new content fell within the bounds of transformative fair use. They also argued that the plaintiffs failed to demonstrate concrete harm or that the AI outputs constituted infringing derivative works. The litigation represents a landmark test case for how copyright law applies to AI training data, with implications for the entire generative AI industry. The outcomes could establish important precedents regarding whether using copyrighted works to train AI models requires explicit permission from rights holders, or whether such use qualifies as fair use. As of late 2023, the cases remained in the discovery phase, with both sides gathering evidence about training data sources and methodologies.

Root Cause

AI companies allegedly used copyrighted books from Library Genesis and other sources to train large language models without obtaining proper licenses or permissions from copyright holders.

Mitigation Analysis

Proper data provenance tracking and licensing compliance could have prevented this issue. Companies should implement robust copyright clearance processes before using training data, maintain detailed records of data sources, and establish legal review procedures for training datasets. Automated content filtering to exclude known copyrighted works and partnerships with publishers for legitimate licensing would reduce infringement risks.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 53—Obligations for General-Purpose AI Models

ISO/IEC 42001

A.5.4—Legal Compliance

NIST AI RMF

GOVERN 1.2—Legal Compliance

Lessons Learned

This case highlights the urgent need for clear legal frameworks governing AI training data and the intersection of copyright law with machine learning. It demonstrates the importance of proactive data licensing and the potential risks of using datasets with unclear provenance.

Sources

Sarah Silverman sues OpenAI and Meta for copyright infringement

The Guardian · Jul 10, 2023 · news

Sarah Silverman, other authors sue OpenAI, Meta for copyright infringement

Reuters · Jul 10, 2023 · news