Stable Diffusion Generated Images Containing Getty Images Watermarks Expose Copyright Training Data

High

Stable Diffusion generated images containing distorted Getty Images watermarks, revealing the model was trained on copyrighted Getty content without permission. This discovery became key evidence in Getty's copyright infringement lawsuit against Stability AI.

Full Description

In August 2022, shortly after Stability AI released Stable Diffusion 1.4 to the public, users began experimenting with the open-source text-to-image generation model. Within days of release, multiple users on social media platforms including Twitter and Reddit discovered that the model could generate images containing distorted versions of the distinctive Getty Images watermark. These watermarks appeared as corrupted text overlays that resembled Getty's copyright protection markings, though often garbled or partially formed. The watermark discovery provided smoking-gun evidence that Stable Diffusion had been trained on copyrighted Getty Images content without authorization. Stability AI had trained their model using the LAION-5B dataset, a massive collection of 5.8 billion image-text pairs scraped from the internet. This dataset included millions of Getty Images photographs that contained visible watermarks, which the model learned to associate with certain visual patterns and occasionally reproduced in generated outputs. The technical explanation for this phenomenon lies in how diffusion models learn visual patterns during training. When exposed to thousands of Getty watermarked images across diverse subjects and compositions, the model internalized the watermark as a visual feature that could appear in generated content. The corrupted appearance of these watermarks in generated images reflected the model's imperfect understanding of the watermark as a visual element rather than a copyright protection mechanism. This discovery became central evidence in Getty Images' February 2023 lawsuit against Stability AI in the US District Court for the District of Delaware. Getty alleged that Stability AI had unlawfully copied and processed millions of copyrighted images to train Stable Diffusion, with the watermark generation serving as direct proof of this unauthorized use. The lawsuit seeks monetary damages and injunctive relief to prevent further copyright infringement. The incident sparked broader debate about training data transparency and copyright compliance in AI development. Legal experts noted that the visible watermarks made this case particularly strong for Getty, as it provided clear evidence of copying rather than requiring complex analysis of model outputs. The case has implications for the entire AI industry's approach to training data sourcing and copyright compliance.

Root Cause

The Stable Diffusion model was trained on the LAION-5B dataset which contained millions of Getty Images watermarked photographs without permission. During training, the model learned to associate certain visual patterns with watermarks, occasionally reproducing distorted versions of the Getty watermark in generated outputs.

Mitigation Analysis

Rigorous dataset curation with copyright clearance verification could have prevented this issue. Implementing watermark detection systems during training data preparation would have identified and removed copyrighted Getty content. Post-training filtering to detect and block watermark generation in outputs could have reduced evidence exposure, though the underlying copyright infringement would remain.

Regulatory Framework References

All frameworks →

EU AI Act

Art. 53—Obligations for General-Purpose AI Models

ISO/IEC 42001

A.5.4—Legal Compliance

NIST AI RMF

GOVERN 1.2—Legal Compliance

Lessons Learned

AI companies must implement robust copyright clearance processes for training data, as models can inadvertently reveal evidence of unauthorized content use. The incident demonstrates that open-source AI models face particular scrutiny regarding training data provenance and that watermark generation can serve as compelling legal evidence in copyright disputes.

Sources

Getty Images bans AI-generated content over copyright concerns

The Verge · Sep 15, 2022 · news

Getty Images lawsuit says Stability AI misused photos to train AI

Reuters · Feb 6, 2023 · news