← Back to incidents
Perplexity AI Accused of Plagiarizing Content and Fabricating Source Citations
MediumPerplexity AI faced accusations from Forbes and other publishers for scraping protected content, bypassing robots.txt restrictions, and generating fabricated source citations while providing inadequate attribution to original creators.
Category
Copyright Violation
Industry
Media
Status
Reported
Date Occurred
Jun 27, 2024
Date Reported
Jun 27, 2024
Jurisdiction
US
AI Provider
Other/Unknown
Application Type
api integration
Harm Type
reputational
Human Review in Place
No
Litigation Filed
No
content_scrapingrobots_txt_violationfabricated_citationscopyrightpublishingattributionai_search
Full Description
In June 2024, Forbes and multiple other major publishers accused Perplexity AI of systematically violating their content protection measures and copyright policies. The controversy centered on Perplexity's AI-powered search engine, which aggregates information from across the web to provide direct answers to user queries. Forbes documented specific instances where Perplexity had scraped and paraphrased their exclusive reporting without providing proper attribution or driving traffic back to the original articles.
The accusations revealed that Perplexity was bypassing robots.txt files, which are industry-standard mechanisms that websites use to communicate crawling restrictions to automated systems. Publishers like Forbes, Wired, and Condé Nast had explicitly blocked AI crawlers in their robots.txt files, yet Perplexity continued to access and utilize their content. This raised serious questions about respect for publisher consent and established web protocols designed to protect content creators' rights.
A particularly concerning aspect of the incident involved Perplexity's generation of fabricated source citations. Investigation revealed that the AI system was creating references to articles that either didn't exist or didn't contain the information being attributed to them. This practice not only misled users about the credibility of information but also potentially damaged the reputations of legitimate news organizations by associating them with false or inaccurate claims.
Perplexity initially defended its practices, arguing that its technology fell within fair use parameters and that it was providing valuable summarization services. However, the company faced mounting pressure from the publishing industry and eventually acknowledged some of the concerns. The incident highlighted broader tensions between AI companies seeking to train and operate their systems on web content and publishers trying to protect their intellectual property and maintain control over how their content is used and attributed.
The controversy extended beyond individual publisher complaints to raise fundamental questions about the sustainability of journalism and content creation in an AI-driven information ecosystem. Publishers argued that services like Perplexity were essentially monetizing their work while providing little to no compensation or traffic attribution, potentially undermining the economic model that supports quality journalism and content creation.
Root Cause
Perplexity AI's system bypassed robots.txt restrictions and scraped content from publishers, then used AI to paraphrase material without proper attribution while generating fabricated source citations to support its answers.
Mitigation Analysis
Implementation of robust robots.txt compliance checking, human verification of source citations before publication, and proper attribution systems with backlinks to original sources could have prevented this incident. Content provenance tracking and publisher partnership agreements would also reduce copyright violations.
Lessons Learned
This incident demonstrates the critical importance of respecting web standards like robots.txt and implementing proper attribution systems in AI applications that aggregate content. It also highlights the need for clear industry standards around AI content usage and the potential legal risks of fabricating source citations.
Sources
Plagiarism Concerns Mount As Perplexity AI Is Accused Of Scraping Content Without Consent
Forbes · Jun 27, 2024 · news
Perplexity Is a Bullshit Machine
Wired · Jun 28, 2024 · news