NewsAI & DevelopmentTech Business

NYT Sues Perplexity for 175K Scraping Violations

The New York Times filed a copyright infringement lawsuit against Perplexity AI on December 5, 2025, armed with damning technical evidence. The complaint details 175,000+ scraping attempts in August 2025 alone, systematic circumvention of robots.txt blocks, and deliberate use of stealth crawlers to mask Perplexity’s identity. This isn’t a philosophical debate about AI and fair use. It’s documented technical fraud with receipts.

Chicago Tribune filed an identical lawsuit one day earlier. Perplexity’s response? “Publishers have been suing new tech companies for a hundred years…it’s never worked.”

The Technical Smoking Gun

Cloudflare published a technical investigation in August 2025 proving Perplexity systematically evaded blocks. When Perplexity’s declared crawler faced blocking, they switched to stealth methods: user agent spoofing to impersonate Google Chrome on macOS, IP address rotation to hide crawling identity, and multiple undeclared bot names.

Cloudflare’s methodology was bulletproof. They created brand-new domains that had never been indexed by any search engine and implemented strict robots.txt blocks to stop all respectful bots. Perplexity still provided detailed answers about the restricted content. When Cloudflare blocked the stealth bots too, Perplexity’s results became vague or hallucinated, proving those undeclared crawlers feed the system.

NYT’s lawsuit details similar patterns: 175,000+ attempts to access their site in August 2025, use of “PerplexityBot” and “Perplexity-User” crawlers, and deliberate circumvention of a hard-block NYT implemented. The scale is industrial. The intent is clear.

Perplexity’s Terrible Response

Jesse Dwyer, Perplexity’s head of communications, offered this defense: “Publishers have been suing new tech companies for a hundred years, starting with radio, TV, the internet, social media and now AI. Fortunately it’s never worked, or we’d all be talking about this by telegraph.”

This is historically inaccurate and legally risky. Napster, Grokster, and Aereo all lost their lawsuits and shut down. Radio and TV pay licensing fees. Perplexity refuses to. The dismissive tone doesn’t help either—hard to claim good faith when you’re simultaneously circumventing technical blocks.

Perplexity CEO Aravind Srinivas publicly says “we’re very much interested in working with every single publisher” and “we have no interest in being anyone’s antagonist.” But technical evidence shows systematic circumvention of publisher blocks. Actions speak louder than words.

This Isn’t an Isolated Dispute

Perplexity faces multiple lawsuits in 2025. Chicago Tribune sued on December 4 alleging three counts of copyright infringement plus trademark violations. Reddit sued in October for DMCA anti-circumvention violations, with evidence of 3 billion Google search results pages scraped in two weeks. Dow Jones and Encyclopaedia Britannica filed similar suits.

The pattern undermines any “innocent mistake” defense. This is systematic behavior. Perplexity’s business model appears to depend on unauthorized scraping.

Why Fair Use Won’t Work

Perplexity argues this is fair use: “Facts can’t be copyrighted,” “we’re just indexing public webpages,” “we always cite our sources.” Courts will likely reject this.

Thomson Reuters v. ROSS Intelligence ruled in February 2025 that AI training was infringement, not fair use. That’s recent legal precedent directly addressing AI scraping. Perplexity’s use is commercial (for-profit company), creates market substitution (summaries prevent users from visiting NYT), and involves willful circumvention (bad faith, not innocent transformative use).

The distinction from Google is critical. Google respects robots.txt universally and drives traffic to sources. Perplexity circumvents blocks and provides summaries that reduce traffic. That’s parasitic, not symbiotic.

What Developers Should Learn

If you build scrapers or bots, don’t be like Perplexity. Respect robots.txt even if it’s not legally required. Don’t use user agent spoofing, IP rotation, or stealth crawlers to bypass blocks. Ethical lines matter even when legal boundaries are unclear.

The robots.txt protocol has been the foundational trust mechanism of the web since 1994. Every major tech company respects it. If AI companies can ignore it without consequences, the entire system breaks.

Cloudflare now blocks Perplexity’s stealth crawling after de-listing them as a “verified bot.” Over 2.5 million websites use Cloudflare’s managed features to block AI crawlers. New robots.txt extensions specifically target AI bots like GPTBot, Google-Extended, and PerplexityBot.

Key Takeaways

  • Hard evidence of deliberate circumvention: NYT’s lawsuit details 175,000+ scraping attempts. Cloudflare’s investigation proves Perplexity systematically evaded robots.txt blocks using stealth crawlers and IP rotation.
  • Perplexity’s response undermines their defense: Dismissive “it’s never worked” attitude and historically inaccurate claims (Napster, Grokster, Aereo all lost) show bad faith.
  • Pattern of systematic behavior: Multiple lawsuits (Chicago Tribune, Reddit, Dow Jones, Encyclopaedia Britannica) prove this isn’t isolated—it’s Perplexity’s business model.
  • Fair use defense will fail: Thomson Reuters precedent (Feb 2025), commercial use, market substitution, and willful circumvention all weigh against Perplexity.
  • Developer lesson: Respect robots.txt. Don’t circumvent blocks. Ethical considerations matter even when legal lines are unclear. Budget for licensing if using third-party content commercially.

Perplexity bet their business on scraping without permission. NYT called their bluff with technical evidence that’s hard to refute. For the AI industry, the message is clear: license content or face litigation. The “move fast and break things” era is over.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to simplify complex tech concepts, breaking them down into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News