Sakana AI’s 7B Model Beats GPT-5 by Telling It What to Do

Sakana AI Fugu multi-agent orchestration system: a 7B conductor model directing GPT-5, Claude, and Gemini with dynamic energy flows

Sakana AI Fugu orchestrates frontier models via RL Conductor

A 7-billion-parameter model just outperformed GPT-5 on graduate-level science and competition mathematics. It didn’t win by being bigger or smarter. It won by acting as GPT-5’s manager. Sakana AI’s RL Conductor—accepted at ICLR 2026 and now available in beta as Sakana Fugu—makes a compelling case that the next frontier in AI isn’t raw intelligence. It’s learning to coordinate the intelligence you already have.

The Conductor Doesn’t Play an Instrument

The RL Conductor is a language model trained specifically to orchestrate other language models. Built on Qwen2.5-7B and fine-tuned with Group Relative Policy Optimization (GRPO), it doesn’t answer questions. It decides which frontier model should answer which part of a question, in what order, with what instructions, and what each agent can see of the others’ work.

Sakana AI calls this a “communication topology”—a dynamic workflow graph the Conductor designs fresh for each input. A hard math problem might route to GPT-5 for symbolic reasoning, then Claude Sonnet 4 for verification, then Gemini 2.5 Pro for a final check. A coding task might spawn parallel agents exploring different approaches before one synthesizes the result. Critically, the Conductor learned these routing strategies through reinforcement learning, not through a human writing a pipeline. The optimal workflow emerges from training.

The training itself is surprisingly lean: 960 problems, 200 RL iterations, and two NVIDIA H100 GPUs. This is not a $100 million compute project.

The Benchmarks Hold Up

On GPQA Diamond—graduate-level science questions that most frontier models still find genuinely difficult—the RL Conductor hit 87.5%, ahead of Gemini 2.5 Pro’s 84.8%. On AIME 2025, competition mathematics that filters out most AI systems, it reached 93.3% versus GPT-5’s 90.8%. On LiveCodeBench, it sat at state-of-the-art at publication, ahead of OpenAI’s O-series models.

A 3% improvement on hard benchmarks sounds modest. In machine learning, it isn’t. Full model generations—the jump from one major release to the next—often improve by 2 to 4 percentage points on these evaluations. The RL Conductor delivered a generational step without training a new frontier model. And the paper passed ICLR 2026 peer review—this isn’t a company blog post with cherry-picked demos.

Sakana Fugu: The Commercial Product

Sakana AI packaged the RL Conductor research into Sakana Fugu, a beta product that launched April 25, 2026. Two variants: Fugu Mini, optimized for latency, and Fugu Ultra, built for demanding tasks where quality trumps speed.

The developer pitch is clean: the Fugu API is OpenAI-compatible. If you’re already calling GPT, Claude, or Gemini through an API, you swap one endpoint. Fugu handles all coordination transparently—figuring out which model runs which subtask, assembling results. No pipeline rewrite. Access is application-only via sakana.ai, and pricing hasn’t been announced.

The real concern: if Fugu routes your query across GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro, you may be paying for three frontier-model calls per request. The research suggests the Conductor learns to route cheaper models to easier subtasks—but production behavior at scale is still unknown. That’s the honest caveat with any beta that hasn’t published pricing.

Who Built This

Sakana AI was founded in 2023 by David Ha, former head of Google Brain, and Llion Jones—one of the eight co-authors of “Attention Is All You Need,” the 2017 paper that introduced the transformer. The company’s name means “fish” in Japanese, and the founding metaphor is a school of fish forming coherent intelligent behavior from simple individual rules: collective intelligence through coordination, not individual brilliance.

When a co-author of the transformer paper builds a system that wins by not scaling transformers, that’s worth paying attention to.

What This Means for Developers Building with AI

Developers spent the first half of 2026 wiring together multi-agent pipelines by hand—deciding which model handles which step, who verifies whose output, what gets passed where. It’s brittle, intuition-driven work. The RL Conductor suggests orchestration itself is a learnable skill: train a small model on the coordination task with RL, and it discovers routing strategies that humans wouldn’t engineer.

Multi-agent AI isn’t a fringe experiment. In Q1 2026, over 2.4 billion API calls per week flowed through multi-agent orchestration frameworks, according to market analysis. The question for most teams has been how to manage coordination complexity as agent counts scale. A trainable orchestrator changes the calculation.

The broader point: the race for bigger models and longer context windows is real, but it may not be the highest-leverage bet. A well-trained 7B orchestrator, working across the best frontier models available via API, can outperform any of those models acting alone. If Fugu’s production system holds up to its research benchmarks, multi-agent orchestration stops being an engineering problem and becomes a model capability—one that improves with better training data, not just more compute.

The question shifts from “which model should I use?” to “do I have an orchestrator that knows how to use all of them?”

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.