AI & Development

NYT Sues Perplexity AI Over Copyright Infringement

The New York Times sued Perplexity AI for copyright infringement on December 5, 2025, after 18 months of failed negotiations—marking a stark contrast with Meta’s recent publisher partnerships. NYT alleges the AI search startup illegally scrapes and republishes its content in “verbatim or near-verbatim” responses. Perplexity now faces lawsuits from eight major publishers this year, including the Chicago Tribune (filed December 4), News Corp, Encyclopedia Britannica, and Reddit.

This lawsuit highlights two divergent paths in AI development. While Meta signed lucrative deals with CNN, Fox News, and USA Today just this week, Perplexity chose confrontation—and the legal consequences are mounting.

Meta Made Deals With Publishers, Perplexity Made Enemies

The timing couldn’t be more telling. Meta announced publisher partnerships on December 5—the same day NYT filed suit against Perplexity. Moreover, OpenAI has deals with 20+ outlets including The Guardian, AP, and Politico. These are multiyear contracts compensating publishers for real-time news access with proper attribution.

Perplexity took a different bet. The company launched a $42.5M Publisher Program for revenue-sharing but secured partnerships with only a fraction of major publishers—Gannett and the LA Times among the few. Most refused and sued instead.

Related: Meta’s Publisher U-Turn: AI Race Forces Desperate Deals

The strategy difference reveals a fundamental business model conflict. Unlike Google Search, which drives traffic to publishers through snippets and links, Perplexity aims to answer queries completely—potentially eliminating the need to visit source sites. This substitution effect is exactly what copyright law protects against. Furthermore, Perplexity’s unit economics may not support licensing costs at the scale Meta and OpenAI can afford.

Why RAG Systems Face Higher Legal Risk Than LLM Training

Here’s what most coverage misses: the legal distinction between training data and real-time retrieval matters enormously. The U.S. Copyright Office’s May 2025 report stated that Retrieval-Augmented Generation (RAG) used to “summarize or provide abridged versions of retrieved copyrighted works” is less likely to be transformative and less likely to qualify as fair use compared to LLM training.

Perplexity’s RAG architecture retrieves real-time web content at query time and summarizes it—fundamentally different from how OpenAI or Anthropic use training data. Recent court cases (Bartz v. Anthropic, favorable Meta rulings) supported fair use for training, where content becomes “learned patterns” rather than reproduced text. However, RAG summarization remains legally untested and appears far more vulnerable.

For developers building RAG-powered tools, this distinction isn’t academic. Training an LLM on copyrighted data has won some fair use arguments. Consequently, real-time retrieval and summarization for commercial products? That’s a much weaker legal position.

What NYT Claims Perplexity Did Wrong

The lawsuit alleges five distinct violations: illegal crawling despite robots.txt restrictions, “verbatim or near-verbatim” content reproduction, revenue theft from subscriptions and advertising, hallucination and false attribution damaging NYT’s brand, and creating products that “substitute” for The Times.

The Chicago Tribune filed a nearly identical lawsuit on December 4, claiming Perplexity is “unlawfully profiting off the newspaper’s content” while “eliminating the need for users to visit” their websites. Both lawsuits cite the May 2025 Copyright Office guidance on RAG systems as legal support.

The hallucination claim deserves attention. It’s not just about copyright—if Perplexity’s AI fabricates information and attributes it to The Times, that’s brand damage beyond revenue loss. Courts may view this as a separate harm with its own legal implications.

Perplexity’s “We’re Like Google” Defense Doesn’t Hold

Perplexity’s Head of Communications Jesse Dwyer dismissed the lawsuit: “Publishers have been suing new tech companies for a hundred years, starting with radio, TV, the internet, social media and now AI. Fortunately it’s never worked, or we’d all be talking about this by telegraph.”

The company positions itself as similar to Google Search, which successfully argued fair use decades ago. However, there’s a critical flaw in this analogy. Google shows snippets and sends users to the source—driving traffic and ad revenue to publishers. Meanwhile, Perplexity aims to provide complete answers, potentially replacing the need to visit the original publisher.

Fair use analysis heavily weighs market harm. If Perplexity substitutes for the original rather than complementing it, that argument collapses. The Google Search precedent may not apply when your product’s goal is to eliminate the click-through that makes search engines symbiotic with publishers.

What This Means for Developers Building RAG Tools

If NYT prevails, commercial RAG-based search and summarization tools will need content licensing agreements. That significantly increases costs and favors well-funded companies over startups and open-source projects.

Safe RAG approaches post-lawsuit: Enterprise RAG on internal or user-uploaded documents (low legal risk), licensed content partnerships (expensive but safe), public domain and openly licensed sources only (limited scope), and non-commercial academic use (stronger fair use case).

High-risk approaches developers should reconsider: commercial news aggregation and summarization, research tools summarizing paywalled content, competitive intelligence scraping proprietary data, and any product where RAG output directly competes with the content source.

The move-fast-and-break-things era for AI may be ending. RAG systems face stricter legal standards than LLM training. For commercial products, licensing may become table stakes—whether you’re ready for those economics or not.

Key Takeaways

  • Two paths are emerging: Meta and OpenAI license content through multiyear publisher deals; Perplexity bets on fair use and now faces eight major lawsuits with uncertain outcomes
  • RAG systems that summarize copyrighted content face higher legal risk than LLM training—the U.S. Copyright Office’s May 2025 report explicitly states RAG summarization is less likely to qualify as fair use
  • The Google Search precedent doesn’t protect AI search that aims to replace publisher visits rather than drive traffic to sources—fair use analysis weighs market harm heavily
  • Developers building commercial RAG tools should assume licensing will be required for copyrighted content; safe alternatives include enterprise RAG on internal docs, public domain sources, or non-commercial use
  • Perplexity’s business model may be fundamentally incompatible with publisher economics at scale—unit economics can’t support licensing costs that Meta and OpenAI absorb
ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to simplify complex tech concepts, breaking them down into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *