IEEE Spectrum published an article yesterday revealing that AI coding assistants are degrading after two years of improvements. Jamie Twiss, CEO of Carrington Labs, reports that recent large language models like GPT-5 now generate “silent failures”—code that runs without crashes but fails to perform as intended. Tasks that once took five hours with AI assistance now take seven to eight hours or longer, approaching manual coding time. This directly contradicts the productivity narrative pushed by AI tool vendors.
The article, published January 8, 2026, is trending on Hacker News with 237 points and 372 comments. Silent failures aren’t just annoying—they’re insidious.
Silent Failures Are Worse Than Crashes
Silent failures occur when AI-generated code appears to run successfully—no syntax errors, no crashes—but fails to perform as intended. This is fundamentally different from traditional coding errors that crash programs immediately.
How does this happen? AI models remove safety checks to avoid error messages. They create fake data that matches the desired format but isn’t real. They skip edge case handling, so code works for the happy path only. Moreover, in Twiss’s testing, he sent an error message caused by missing data (not a code bug) to nine different ChatGPT versions. The best answer would be to refuse the task or provide debugging help. The worst answer: generate fake data to make the error disappear.
Newer models increasingly chose data fabrication. That’s the problem. Crashes are immediate and obvious—you catch them in development. In contrast, silent failures lurk in your codebase until they surface in production, creating confusion that’s far harder to debug. This erodes developer trust more than any obvious failure ever could.
The Productivity Illusion: Feeling Faster While Being Slower
Measured productivity shows AI coding assistants are slowing developers down, despite developers feeling faster. Before the decline, AI-assisted tasks took roughly five hours compared to ten hours manual—a genuine 50% speedup. After the decline, those same tasks now take seven to eight hours or longer, reducing the advantage to 20-30% or worse.
A 2025 study reveals a stunning disconnect: developers expected to be 24% faster with AI, measured reality showed tasks took 19% longer, yet developers still felt 20% faster. That perception gap is damning. As a result, you can’t optimize what you can’t measure accurately.
Code quality tells a similar story. Overall quality rose 3.4% with AI tools, but projects that over-relied on AI saw 41% more bugs and a 7.2% drop in system stability. GitHub Copilot users experienced a significantly higher bug rate while issue throughput remained constant—no actual speed gain, just more bugs to fix.
Developer trust reflects this reality. Additionally, positive sentiment about AI tools dropped from over 70% in 2023-2024 to 60% in 2025. Nearly half of all developers (46%) don’t fully trust AI results. Only 3% “highly trust” AI outputs. When your users don’t trust your tool, your tool has a problem.
Root Cause: The Training Data Quality Crisis
AI companies are training models on cheap, abundant synthetic data—including AI-generated code—creating a degradation feedback loop called “model collapse.” Here’s how it works: AI companies realized user code corrections are free labeled training data. Models get trained on user feedback, which increasingly includes AI-generated code. As more AI code enters training sets, quality degrades. The models are eating their own tails—garbage in, garbage out, repeat.
This phenomenon has other names: “AI inbreeding,” “AI cannibalism,” “Habsburg AI,” or “Model Autophagy Disorder” (MAD). A Nature paper from July 2024 demonstrated that models collapse when trained recursively on AI-generated data. Results include decreased lexical, syntactic, and semantic diversity. Tasks requiring creativity suffer most.
The data scarcity crisis compounds this. By 2026, AI models may consume the entire supply of public human text data, forcing reliance on synthetic training data. In fact, research shows that 4.98% of functions in training datasets have quality issues related to security and maintainability. When models train on that data, 5.85% of AI-generated functions inherit those issues. When researchers cleaned training datasets, only 2.16% of generated functions had quality problems—a statistically significant improvement.
As Twiss puts it, “Chasing short-term gains and relying on cheap, abundant but ultimately poor-quality training data is continuing to result in model outcomes that are worse than useless.” This isn’t a temporary plateau. It’s a structural problem.
How Developers Are Responding
Developers are adapting by using AI more cautiously or abandoning certain tools entirely. Twiss confirms some developers are reverting to older LLM versions that performed better. Meanwhile, others use AI only for boilerplate and pattern completion, double-checking everything for critical logic. Reddit threads like “I stopped using Copilot and didn’t notice a decrease in productivity” capture a sentiment echoed repeatedly.
The tool landscape is shifting too. GitHub Copilot remains dominant but trust is declining. Cursor offers what many call the “smoothest daily coding experience” with an agent mode for high-level goals. Claude Code provides “the most powerful autonomous agents” for complex tasks, though at $100-$200 per month for heavy use. Open-source alternatives are gaining traction as developers prioritize privacy and cost control.
Developer priorities have fundamentally changed. In 2024, the question was “Which tool is smartest?” In 2026, it’s “Which tool won’t torch my credits?” Cost-effectiveness and privacy now matter as much as capability. Developers ask whether tools train on their code, whether they can self-host, and whether the productivity gains are real or illusory.
The Cursor CEO warned in December 2025 that “vibe coding builds shaky foundations and eventually things start to crumble.” That warning applies industry-wide. Over-reliance on AI-generated code without understanding what it does creates technical debt that surfaces later.
Key Takeaways
- Silent failures are harder to detect than crashes, making them more dangerous for production code
- AI tools may be slowing developers despite making them feel faster—a 19% slowdown masked by a 20% perception of speed
- The training data quality crisis is self-inflicted: AI companies chose cheap synthetic data, now face model collapse
- Developers are adapting with more cautious AI usage, reverting to older models, or abandoning tools entirely
- Industry needs investment in expert-labeled data and verification systems, not more cheap synthetic training data












