Technology

AI Testing Automation Solves the Wrong Problem

Abstract visualization showing AI testing automation promises versus reality, with flowing gradients and geometric warning symbols

Momentic just raised $15 million on November 24 to let developers write tests in plain English instead of wrestling with Playwright or Selenium. Eighty percent of software teams plan to adopt AI testing automation tools by 2026, betting that natural language test authoring will eliminate the technical barriers making testing painful. The pitch is seductive: describe user flows, let AI generate and maintain tests, watch complexity disappear. However, here’s the uncomfortable truth—AI testing automation solves the wrong problem. Testing isn’t hard because frameworks are complex. Testing is hard because we build untestable systems with vague requirements and treat QA as an organizational afterthought.

AI can automate test creation at scale. Momentic automated 200 million test steps last month alone. Nevertheless, it can’t fix poorly designed code, clarify ambiguous requirements, or change cultures that treat quality as someone else’s job. Consequently, organizations investing millions in AI testing without addressing these root causes risk automating bad tests faster, creating false confidence through high coverage of fundamentally broken systems.

The $15M Bet That Automation Beats Process Improvement

Momentic’s Series A, led by Standard Capital and Dropbox Ventures, represents an industry-wide conviction: complexity is the barrier, AI is the solution. The company serves 2,600 users including Notion, Xero, Webflow, and Retool—companies that understand developer tools. Moreover, co-founder Wei-Wei Wu promises developers can “describe their critical user flows in plain English and our AI will automate it.” No more Playwright selector hell, no more Selenium flakiness, just natural language that becomes executable tests.

OpenAI and Anthropic smell the same opportunity. Both offer agentic testing tutorials leveraging foundation models’ computer use capabilities. The entire industry converges on the same diagnosis: developers hate testing because writing Selenium scripts is painful. Remove that friction with AI, and testing problems solve themselves.

However, 46% of developers actively distrust AI-generated code accuracy, up from 31% last year. Another 66% struggle with AI solutions that are “almost right, but not quite,” requiring extensive debugging. The disconnect between adoption (80% planned) and trust (dropping fast) reveals something broken in the premise. Furthermore, when 45% report that fixing AI-generated code takes longer than writing it manually, maybe the problem isn’t the code—it’s what we’re trying to test.

Root Causes AI Testing Can’t Address

Testing is hard for three fundamental reasons that no amount of AI can fix: untestable architecture, vague requirements, and organizational dysfunction.

Untestable architecture manifests as tight coupling, god objects, and hidden state. When classes depend directly on concrete implementations instead of abstractions, mocking becomes impossible. When single classes handle dozens of responsibilities, test permutations explode. Consequently, when global variables and singletons introduce hidden state, tests become non-deterministic nightmares. AI generates tests for code as it exists. If your code is tightly coupled, AI creates tightly coupled tests. If a class has 50 dependencies, AI must mock all 50. The fragility and maintenance burden transfer from manual tests to AI-generated ones—same problem, automated faster.

Research from Carnegie Mellon’s Software Engineering Institute confirms: “Few organizations inform testing with software architecture… ignoring it is problematic because structures ensure quality attributes and enable systems to meet requirements.” Therefore, you can’t test quality into poorly architected systems. Architecture determines testability. AI doesn’t change that equation.

Vague requirements create the second barrier. When specifications say “it should work” or “process payment,” AI can’t clarify what correct behavior means. Does “process payment” include retry logic for failures? What error messages should users see? How should partial payments be handled? If requirements are ambiguous to humans, AI-generated tests based on those requirements will be equally ambiguous. Moreover, 39% of software project failures caused by poor requirements don’t get fixed by better test automation—they require better requirements gathering before any tests run.

QA as an organizational afterthought creates the third obstacle. Even with AI-generated tests, if developers don’t run them during development, defects accumulate. If QA has no authority to block releases, AI-found bugs get ignored. If there’s no time allocated for test maintenance, AI-generated tests rot into noise. Consequently, the tooling becomes irrelevant when the culture doesn’t prioritize quality. Seventy percent of software projects miss the mark due to testing issues, but the root cause is rarely tool choice—it’s organizational commitment to quality.

Related: Developer AI Tool Sentiment Drops to 60% Despite 84% Use

False Promises Creating False Confidence

AI testing platforms promise comprehensive coverage, self-healing tests, and time savings. Nevertheless, the reality looks different under scrutiny.

Comprehensive coverage sounds appealing until you realize AI lacks business context. Models are trained on generalized data, not your specific product, edge cases, or critical revenue flows. Testlio’s analysis warns: “AI models struggle with your edge cases, legacy systems, or localized flows.” The result? Thousands of tests covering happy paths while missing the edge cases that cause production incidents. Consequently, high test coverage doesn’t equal high-quality coverage. CreateQ’s research notes: “Without careful tuning, you chase false positives or miss real issues—both erode trust.”

Self-healing tests promise maintenance-free automation, but they introduce a more insidious problem. When tests “heal” themselves after UI changes, they update selectors without verifying the new UI still implements correct behavior. Furthermore, a test that heals itself to pass when it should fail creates blind spots. You get green builds hiding real regressions. False confidence through automation is worse than no tests—at least with no tests, nobody assumes they’re covered.

Time savings evaporate when you account for review overhead, debugging AI-generated failures, and building trust in opaque test suites. The promise: AI writes tests faster than humans. The reality: humans must review AI-generated tests to ensure they test the right things, debug failures in code they didn’t write, and maintain trust in tests they don’t fully understand. That’s not time saved—it’s time shifted to less productive activities.

Fix Fundamentals First, Then Use AI

The correct sequence is: design for testability, clarify requirements, shift quality left organizationally, then use AI as an accelerator for already-good practices.

Design for testability means dependency injection, pure functions, and test-driven development. Dependency injection makes mocking trivial by passing dependencies explicitly. Pure functions—same input, same output, always—eliminate non-determinism. Moreover, test-driven development forces testable design by writing tests first. Kent Beck, who pioneered TDD, states: “TDD encourages simple designs and inspires confidence.” IEEE research shows TDD creates flow state—total immersion that increases productivity. Consequently, these practices make code inherently testable. AI then accelerates test authoring for well-designed code.

Clarify requirements before coding. Example-driven development specifies behavior through concrete examples that become executable tests. Furthermore, behavior-driven development’s Given-When-Then format forces precision. When a requirement says “Given user is logged in, When user clicks ‘Delete Account’, Then confirmation modal appears,” there’s no ambiguity about correct behavior. AI can generate tests for that requirement. Without that clarity, AI generates tests for unclear expectations—garbage in, garbage out.

Shift quality left organizationally. Katalon’s research on shift-left testing confirms: “Testing becomes everyone’s job rather than just QA team’s… fosters shared responsibility.” Developers write tests as they write code. QA reviews test quality instead of just running tests. Consequently, quality metrics (defect rates, test coverage) track alongside feature velocity. Test failures block deployments. This cultural transformation makes testing a first-class concern. Once that culture exists, AI tools accelerate test creation for teams already committed to quality.

With testable architecture, clear requirements, and organizational commitment in place, AI testing tools provide genuine value. They generate boilerplate quickly, suggest edge cases, detect regressions, and handle tedious selector maintenance. AI becomes an accelerator for already-good practices, not a band-aid for broken ones.

Related: Enterprise AI Pilots Fail 95%: MIT Exposes Why

Ask the Right Question Before Buying AI Testing Tools

Before adopting AI testing tools, ask: “Why is our testing so hard?” If the answer involves tight coupling, unclear requirements, or QA deprioritization, AI won’t help. Fix those first. However, if testing is already well-structured but slow and tedious, AI can accelerate.

Momentic’s $15 million represents a bet that automation beats process improvement. Nevertheless, history suggests otherwise. The $109 million wasted per billion dollars spent on failed IT projects stems from fundamental issues—poor requirements (39% of failures), communication breakdowns (57% of failures), organizational dysfunction. Consequently, throwing AI at these problems is expensive distraction from real solutions.

That investment could fund experienced testers who understand business context, developer training in testable design patterns, CI/CD infrastructure improvements, or refactoring legacy code to be testable. These investments address root causes. AI testing tools address symptoms.

Don’t confuse automation with improvement. If testing is hard because code is poorly designed, AI makes you automate poorly designed tests faster. If testing is hard because requirements are vague, AI generates tests based on vague requirements at scale. Furthermore, if testing is hard because QA is organizationally deprioritized, AI-generated tests get ignored faster.

The uncomfortable truth: if testing is hard, AI won’t make it easy—it’ll just automate the difficulty.

Key Takeaways

  • AI testing tools solve symptoms, not causes. Eighty percent of teams rush to adopt AI testing automation (with companies like Momentic raising $15M), but testing difficulty stems from untestable architecture, vague requirements, and organizational dysfunction—problems AI can’t fix.
  • High coverage doesn’t mean high quality. AI can generate thousands of tests quickly, but without business context and clear requirements, those tests cover happy paths while missing critical edge cases. Consequently, false confidence through automation is worse than no tests.
  • Developer trust in AI-generated code is dropping, not rising. Forty-six percent actively distrust AI code accuracy (up from 31%), and 66% struggle with output that’s “almost right” requiring extensive debugging. The adoption-trust disconnect signals broken premises.
  • Fix fundamentals first, then use AI. Design for testability (dependency injection, pure functions, TDD), clarify requirements (example-driven development, BDD), shift quality left organizationally (developers own testing). Furthermore, AI then accelerates already-good practices instead of masking broken ones.
  • Ask “Why is testing hard here?” before buying tools. If the answer involves architecture, requirements, or culture, AI won’t help. Moreover, investment in process improvement beats investment in automation tools that mask systemic problems.
ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to simplify complex tech concepts, breaking them down into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:Technology