Stability AI Sued by Getty Images and Artists for Training Stable Diffusion on Copyrighted Images

High

Getty Images and multiple artists filed lawsuits against Stability AI alleging the company trained Stable Diffusion on billions of copyrighted images without permission, seeking damages and injunctive relief.

Full Description

In January 2023, Getty Images filed a lawsuit in Delaware federal court against Stability AI, alleging that the company unlawfully copied and processed millions of images from Getty's collection to train its Stable Diffusion AI image generator. The lawsuit claimed that Stability AI scraped Getty's copyrighted images without permission as part of training the model on the LAION-5B dataset, which contains approximately 5.85 billion image-text pairs collected from across the internet. Separately, a class-action lawsuit was filed by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz against Stability AI, Midjourney, and DeviantArt in the Northern District of California. The artists alleged that these companies violated copyright law by training their AI models on billions of copyrighted images without consent or compensation. The complaint argued that the AI systems create derivative works that compete directly with the original artists whose work was used in training. The legal challenges center on whether training AI models on copyrighted material constitutes fair use under copyright law. Getty Images claimed that Stability AI's use was commercial in nature and harmed the market for licensed images. The company pointed to instances where Stable Diffusion generated images containing corrupted versions of Getty's watermark as evidence of direct copying. Artists in the class-action suit argued that the AI systems learned to replicate their distinctive artistic styles, effectively creating unauthorized derivatives. Stability AI defended its practices by arguing that training AI models on publicly available images constitutes fair use, similar to how humans learn by observing existing art. The company contended that the AI does not store or reproduce copyrighted images but rather learns statistical patterns to generate new, original works. However, critics pointed out that the AI can be prompted to create images "in the style of" specific artists, potentially undermining their market value. The lawsuits seek monetary damages, injunctive relief to prevent further alleged copyright infringement, and destruction of AI models trained on copyrighted works. The cases are ongoing and could establish important precedents for how copyright law applies to AI training data. Similar lawsuits have been filed in the UK, where Getty Images is also pursuing legal action against Stability AI. The broader implications extend beyond Stability AI to the entire generative AI industry, as most large-scale AI models rely on training datasets that include copyrighted material scraped from the internet. The outcome could significantly impact how AI companies source training data and potentially require fundamental changes to current industry practices around dataset creation and model training.

Root Cause

Stability AI allegedly trained its Stable Diffusion model on the LAION-5B dataset containing billions of images scraped from the internet without obtaining licenses or permission from copyright holders. The training process involved copying and processing copyrighted works to enable the AI to generate derivative images in similar styles.

Mitigation Analysis

Implementing robust copyright clearance processes, obtaining proper licensing agreements, using only public domain or licensed training data, and developing technical safeguards to prevent generation of works that closely replicate copyrighted material could have prevented this legal exposure. Provenance tracking of training data sources and automated copyright detection systems would also reduce risk.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 53—Obligations for General-Purpose AI Models

ISO/IEC 42001

A.5.4—Legal Compliance

NIST AI RMF

GOVERN 1.2—Legal Compliance

Lessons Learned

The lawsuits highlight the urgent need for clear legal frameworks governing AI training on copyrighted material and demonstrate the significant legal risks companies face when using unlicensed content at scale. The cases may establish important precedents for fair use in AI contexts and could force the industry to develop new approaches to ethical data sourcing.

Sources

Getty Images lawsuit says Stability AI misused photos to train AI

Reuters · Feb 6, 2023 · news

The lawsuit that could rewrite the rules of AI copyright

The Verge · Jan 16, 2023 · news