SWE-bench Quality Gap: 24% Lower Merge Rates vs Tests
METR research published March 10, 2026 reveals that maintainer merge decisions are approximately 24 percentage points lower than automated SWE-bench scores. Roughly half ...
IDEs, editors, productivity tools, and development utilities