Reddit v. Perplexity and the Future of AI Data Ethics

By
Partner

In October 2025, Reddit filed a federal lawsuit against Perplexity AI, alleging that the startup unlawfully scraped Reddit’s content to train its AI-driven search engine. The dispute extends beyond a single company’s conduct - it addresses a central legal and ethical question surrounding the acquisition of human-generated data by artificial intelligence systems.

Click here to contact our team about an AI intellectual property matter.

Background on Reddit v. SerpApi LLC, Oxylabs Uab, AWMProxy, and Perplexity AI, Inc

Reddit contends that when AI companies were prevented from directly accessing its platform, they instead gathered Reddit content indirectly—via Google’s indexed search results. The allegation underscores a growing challenge in technology law: whether using public interfaces to extract restricted data constitutes fair use or digital trespass.

Where is Reddit v. Perplexity AI Filed?

The action, Reddit v. Perplexity AI, was filed in the U.S. District Court for the Southern District of New York.

Who is Named in Reddit v. Perplexity AI?

In addition to Perplexity, the complaint identifies three partners alleged to have facilitated the data acquisition process: Oxylabs of Lithuania, AWMProxy (described as a former Russian botnet), and Texas-based SerpApi. Reddit claims these entities collaborated in a coordinated effort to bypass its anti-scraping systems and harvest user-generated posts through Google’s displayed search data rather than Reddit’s own servers.

What Evidence Does Reddit Have?

To substantiate its claims, Reddit created a hidden “honeypot” post accessible only to Google’s crawler. Shortly thereafter, the post surfaced in Perplexity’s results—a signal, Reddit asserts, that Perplexity’s retrieval pipeline sourced data through Google rather than through Reddit directly.

This test forms a core evidentiary exhibit in Reddit’s filing, described as digital proof of intentional circumvention.

“Robbing the Vault vs. the Armored Truck” Explained

The vault-and-truck analogy featured prominently in Reddit’s complaint has resonated beyond legal circles. Reddit equates its secure servers to a protected vault, accessible only to authorized users. Google’s search index, by contrast, represents an armored truck—permitted to carry limited, lawful previews of Reddit’s content.

By targeting Google’s index instead of Reddit’s source servers, Perplexity is accused of “robbing the armored truck”—acquiring large volumes of Reddit’s material through a public channel meant only for display snippets. Reddit maintains that the distinction between direct and indirect access is immaterial where the origin of data remains the same protected asset.

Data Laundering at the Center of the New Digital Crime Scene

Among the terms introduced in Reddit’s lawsuit is “data laundering,” used to describe the process of disguising unlawfully obtained information to appear legitimate. The complaint alleges that intermediary firms—including Oxylabs and SerpApi—acted as brokers, gathering and reselling Reddit data to Perplexity and others.

This framing parallels financial laundering statutes, where the value of a transaction may be secondary to the concealment of its origin. Legally, the complaint signals a potential evolution in how courts may analyze the chain of custody in digital information flows.

Perplexity’s Position and Defenses

Perplexity denies the allegations and asserts that it operates in compliance with legal and ethical AI standards. The company maintains it does not directly scrape Reddit but instead aggregates publicly available web data—a practice it contends is protected under principles of fair use and open access.

Perplexity’s likely legal defense centers on indirect liability. It may argue that because Google lawfully displays Reddit content, collecting such information from Google’s results does not constitute violation of Reddit’s terms or applicable law. Additionally, it can contend that any data procurement occurred through third-party providers rather than its own systems.

Reddit disputes that distinction, asserting that whether the data was taken from the “vault” or intercepted from the “armored truck,” the result is the same: unauthorized use of its protected content.

Artificial Intelligence and Intellectual Property Lawsuits

Reddit v. Perplexity joins a growing list of high-profile actions involving AI companies, including suits filed by The New York Times, Getty Images, and others against OpenAI, Anthropic, and Stability AI. Each case examines the same unresolved question—whether training AI systems on publicly accessible materials qualifies as fair use or infringes upon copyright.

U.S. courts are still developing frameworks for applying doctrines of fair use, unjust enrichment, and computer fraud in the AI context. Reddit’s suit introduces an additional layer: indirect scraping through intermediaries and search platforms designed for public indexing rather than bulk data extraction.

Licensing and the Price of Human Content

The Reddit complaint emphasizes the market dimension of modern data acquisition. Reddit licenses its content to OpenAI and Google under agreements reportedly worth approximately $60 million annually. Such licensing, Reddit argues, ensures both legal certainty and content integrity.

Perplexity’s alleged conduct, by contrast, represents an attempt to bypass the emerging licensing economy by harvesting data indirectly from Google’s cache. Should courts accept Reddit’s reasoning, this case could establish judicial recognition that public visibility does not equate to public ownership.

Ethical Implications for AI Firms

The broader ethical implications of the case extend beyond questions of fair use. AI models rely heavily on human expression—community discussions, reviews, and shared insights—to function effectively. When that corpus is repurposed without consent or compensation, the integrity of digital commons and the principle of informed participation come into question.

Reddit’s “armored truck” metaphor conveys a broader societal concern: that AI enterprises may profit from collective human creativity while bypassing fair exchange. As a result, transparency, consent, and provenance are emerging as new ethical benchmarks for responsible AI data practices.

Ultimately, Reddit v. Perplexity is less about a singular act of scraping and more about the boundary between innovation and appropriation. Whether the court views Perplexity’s actions as lawful use of publicly accessible data or as unauthorized extraction of licensed content may reshape how the law defines possession and use in the AI era.

If Reddit prevails, AI developers could face heightened obligations to verify the provenance of their datasets and enter licensed agreements for the use of user-created content. Regardless of outcome, the case signals a shift: artificial intelligence can no longer rely solely on access—it must now prove authorization.

Related Topics

Related Practice Areas

This entry was posted on Tuesday, October 28, 2025 and is filed under Resources & Self-Education, Internet Law News.



Get the help you need.

We offer legal advice on a wide range of online topics

Get legal help now

Not seeing what you’re looking for?

Submit your case in 3 minutes and get legal help fast.

Submit your case online

OR

Give us a call
Join our mailing list

Stay ahead of legal matters

The internet moves fast. We'll keep you informed.