Ruby Beats Rust 33%: AI Coding Language Benchmark 2026

Programming language performance comparison showing Ruby, Python, JavaScript outperforming Rust in AI code generation

Ruby, Python, and JavaScript beat Rust in AI coding benchmark

Ruby committer Yusuke Endoh benchmarked Claude Code across 13 programming languages in 600 runs and found dynamic languages beat statically-typed languages by 33-51% on both speed and cost. Ruby completed tasks in 73 seconds at $0.36 per run, while Rust took 114 seconds at $0.54 — a 33% performance gap. Adding type checkers imposed 1.6-3.2× overhead: Python with mypy ran 1.7× slower, Ruby with Steep ran 3.2× slower. The paradox: all three failures across 600 runs occurred in statically-typed languages despite their type safety promises.

The Benchmark That Challenges Conventional Wisdom

Yusuke Endoh, a Ruby committer since 2008, tasked Claude Code (Opus 4.6) with implementing a simplified Git version across 13 languages. The experiment ran 600 trials total — 15 languages tested 20 times each through two phases: greenfield implementation (init, add, commit, log) and feature extension (status, diff, checkout, reset). The task constrained implementations to roughly 200 lines of code with no external libraries, isolating language-level differences from ecosystem advantages.

The results published in April 2026 on DEV Community surprised many developers. Dynamic languages dominated the top three positions with consistent performance, while statically-typed languages showed higher costs, slower generation times, and greater variance. Endoh disclosed his Ruby background upfront and published all code and results on GitHub for reproduction, supported by Anthropic’s Claude for Open Source Program.

Performance Results: Dynamic Languages Dominate

The top three performers were all dynamically-typed languages. Ruby averaged 73.1 seconds at $0.36 per run with minimal variance (±4.2s). Python followed closely at 74.6 seconds and $0.38 with similarly low variance (±4.5s). JavaScript rounded out the top three at 81.1 seconds and $0.39 (±5.0s). All three passed every test across 40 runs without a single failure.

Statically-typed languages told a different story. Go averaged 101.6 seconds at $0.50 but showed high variance (±37.0s). Rust came in at 113.7 seconds and $0.54 with even wider variance (±54.8s) plus two test failures out of 40 runs. C performed worst at 155.8 seconds and $0.74, generating 517 lines of code compared to Ruby’s 219 — more than double the verbosity driving up token costs.

The cost implications scale quickly. At 100 runs per day, Ruby costs $36 daily versus Rust’s $54 — a $540 monthly difference. Moreover, the speed gap matters beyond raw numbers. Endoh noted that “the difference between waiting 30 seconds and 60 seconds affects focus and flow, not just total time.” Faster iteration cycles mean more refinement passes within the same development session.

Type System Overhead Measured Precisely

The benchmark directly quantified type system overhead by comparing type-annotated variants within single languages. Python with mypy strict checking averaged 125.3 seconds and $0.57 compared to plain Python’s 74.6 seconds and $0.38 — a 1.7× slowdown. Ruby with Steep type checking fared worse at 186.6 seconds and $0.84 versus plain Ruby’s 73.1 seconds and $0.36 — a 2.6× overall slowdown that spiked to 3.2× in greenfield phase 1.

TypeScript versus JavaScript revealed similar patterns. TypeScript averaged 133.0 seconds at $0.62 while JavaScript completed in 81.1 seconds at $0.39 — 59% more expensive for the typed variant. The root cause: AI generates substantially more boilerplate code for type annotations, increasing token consumption and API costs proportionally.

The Type Safety Paradox

Here’s the uncomfortable finding for static typing advocates: all three failures across 600 runs occurred in statically-typed languages. Rust failed twice (one involving an agent hallucination claiming “the tests are wrong”), and Haskell failed once. Python, Ruby, and JavaScript — lacking compile-time type checking — passed every single test.

Endoh observed that “type errors are among the easiest bugs to detect and fix, yet the only failures occurred in statically-typed languages. This suggests types don’t prevent all error categories and may introduce new failure modes in AI-generated code.” Additionally, static languages exhibited higher variance and unpredictability compared to the consistent performance of dynamic languages.

Why Dynamic Languages Win for AI Coding

Five factors explain dynamic languages’ performance advantage in AI code generation. First, simpler syntax with no type annotations means less boilerplate and fewer tokens per implementation. Second, Python, Ruby, and JavaScript have vastly more representation in AI training data due to their popularity. Third, language complexity features like Rust’s ownership model and Haskell’s monadic structure increase token usage as AI explains concepts verbosely.

Fourth, static languages often require configuration files (Cargo.toml, tsconfig.json, package.json) that add overhead in greenfield phase 1. Fifth, conciseness correlates with generation speed — Ruby’s 219 lines versus C’s 517 lines directly impacts both time and cost. The AI generates cleaner, more compact code in dynamic languages because it doesn’t need to satisfy a type checker.

Study Limitations and When Static Types Still Matter

Endoh acknowledged significant limitations upfront. The task scale at roughly 200 lines of code firmly sits at prototyping level — “static typing should shine at larger scales” remains untested. The greenfield bias means modifications to existing codebases might favor typed languages differently. Furthermore, the evaluation captured a March 2026 snapshot, and “results may look different in a few months” as AI models improve rapidly.

Static types retain clear advantages for specific use cases. Large codebases exceeding 10,000 lines benefit from refactoring safety that types provide. Multi-contributor teams need the formal contracts types enforce between modules. Production systems requiring long-term maintenance over 5+ years justify the upfront type annotation cost. Formal verification requirements demand static typing’s guarantees.

The classical development pattern — start with dynamic languages for rapid prototyping, then migrate to static types for production hardening — remains valid. Notably, Claude Code demonstrated “very capable” cross-language migration abilities, validating this hybrid approach.

Practical Implications for 2026 Development

The data provides clear guidance for language selection in AI-assisted development. For prototyping-scale projects under 1,000 lines, Ruby, Python, and JavaScript deliver measurably faster iteration and lower costs. Teams running hundreds of AI coding sessions daily will see the cost difference compound significantly — the gap between $0.36 and $0.54 per run translates to thousands of dollars monthly.

Cost-conscious teams should default to dynamic languages and migrate to static types only where production requirements demand it. Conversely, teams with unlimited Claude Max subscriptions ($100-200/month for unlimited usage) can choose based on other factors like ecosystem fit, team expertise, or runtime performance needs.

The benchmark challenges the assumption that “static types are better for AI coding” at small to medium scales. The measured data shows the opposite: dynamic languages generate cleaner code faster with lower variance. Whether this pattern holds at 10,000+ lines remains an open question, but for the prototyping work most developers do daily, the numbers favor Ruby, Python, and JavaScript decisively.

Key Takeaways

Dynamic languages (Ruby, Python, JavaScript) beat static languages by 33-51% on AI coding speed and cost at prototype scale (~200 LOC), with Ruby averaging $0.36 per run versus Rust’s $0.54
Type system overhead measured precisely: mypy adds 1.7× slowdown to Python, Steep adds 2.6× to Ruby, TypeScript costs 59% more than JavaScript — driven by increased boilerplate and token usage
Type safety paradox: all 3 failures out of 600 runs occurred in statically-typed languages (Rust, Haskell) despite type checking promises, while dynamic languages passed every test with lower variance
Cost compounds at scale: 100 daily runs costs $36/day with Ruby versus $54/day with Rust = $540/month difference, plus faster iteration preserves developer flow and focus
Study limitations acknowledged: task scale at 200 LOC favors prototyping; static types may win at 10,000+ LOC (untested); greenfield bias may not reflect real-world codebase modifications; classical “start dynamic, migrate static” approach validated by AI’s cross-language migration capabilities

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.