SWE-Bench | byteiota

Tag: SWE-Bench

GLM-5.2 neural network architecture visualization with blue and white color scheme showing open-weight AI model connections

News

GLM-5.2: Open-Weight Model Beats GPT-5.5 at 1/6th Cost

Z.ai's GLM-5.2 beats GPT-5.5 on SWE-bench Pro, tops Design Arena, and delivers 1M-token context at ...

By ByteBot

5 days ago

Abstract visualization of AI token efficiency and cost comparison with blue circuit network on dark background

News

Grok 4.5 Is GA: Token Efficiency Beats the Benchmark Gap

xAI's Grok 4.5 is GA. At $2/$6 per MTok and 4.2x better token efficiency than ...

By ByteBot

July 9, 2026

Tencent Hy3 295B Mixture-of-Experts model neural network visualization

AI & Development

Tencent Hy3: 295B MoE Hits SWE-Bench 78 — Free API Ends July 21

Tencent Hy3 is a 295B Apache 2.0 MoE model scoring 78 on SWE-bench Verified. Free ...

By ByteBot

July 8, 2026

GLM-5.2 vs GPT-5.5 coding benchmark comparison - Zhipu AI open-weight model

AI & Development

GLM-5.2 Beats GPT-5.5 on Coding at One-Sixth the Price

GLM-5.2 from Zhipu AI scores 62.1 on SWE-bench Pro vs GPT-5.5's 58.6 — and costs ...

By ByteBot

July 6, 2026

LongCat-2.0 agentic coding model by Meituan - glowing neural cat silhouette on blue and white background

Developer Tools

LongCat-2.0: The Open-Source Coding Model Hiding in Plain Sight

Meituan's LongCat-2.0 ran as anonymous 'Owl Alpha' on OpenRouter for two months, racking up 10 ...

By ByteBot

July 5, 2026

LongCat-2.0: Meituan's 1.6T open-source coding model that ran anonymously on OpenRouter as Owl Alpha

News

LongCat-2.0: The 1.6T Coding Model That Hid on OpenRouter for Two Months

Meituan's LongCat-2.0 spent two months on OpenRouter as "Owl Alpha." Now open source under MIT, ...

By ByteBot

July 3, 2026

MiniMax M3 sparse attention neural network visualization - open-weight frontier AI model benchmark comparison

News

MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

MiniMax M3 ships open weights for a 428B model scoring 59% on SWE-Bench Pro at ...

By ByteBot

July 1, 2026

Ornith 1.0 open-source coding AI model with self-scaffolding RL training, bird symbol with code graph nodes

News

Ornith 1.0 Beats Claude at Coding — Runs on One GPU

DeepReinforce's Ornith 1.0 beats Claude Opus 4.7 on SWE-Bench and runs locally on one GPU ...

By ByteBot

June 30, 2026

SWE-bench Pro leaderboard showing AI coding agent benchmark scores comparison chart

AI & Development

SWE-bench Pro: How to Read the Coding Agent Leaderboard

SWE-bench Verified was abandoned in February 2026 after contamination was confirmed. Here's what SWE-bench Pro ...

By ByteBot

June 24, 2026

GLM-5.2 open-source AI model neural network visualization

AI & Development

GLM-5.2: Open-Source Model Beats GPT-5.5 on SWE-bench for 1/6 the Cost

Z.ai’s GLM-5.2 hits 62.1 on SWE-bench Pro, topping GPT-5.5’s 58.6, with MIT weights on Hugging ...

By ByteBot

June 22, 2026

12 3 4

Tag: SWE-Bench

Posts navigation

Categories

Latest Posts