
The rule every Gemini API developer has lived by for the past year — use Flash for speed, use Pro for quality — is dead. Gemini 3.5 Flash, launched May 19 at Google I/O 2026, outperforms Gemini 3.1 Pro on every benchmark that matters for agentic and coding workloads. It does this while running four times faster and costing 40% less. That is not an incremental improvement. It is a model selection paradigm shift, and developers who do not act on it are paying more for worse results.
The Benchmarks That Actually Matter
Google is not being subtle about what they built Gemini 3.5 Flash for: agents. The benchmark selection tells the story. On MCP Atlas — which simulates real multi-step, multi-tool agentic deployments — Flash scores 83.6%, compared to Gemini 3.1 Pro’s 78.2%. That gap puts Flash ahead of every other frontier model: Claude Opus 4.7 scores 79.1% and GPT-5.5 scores 75.3%.
The pattern holds across the other benchmarks developers care about. On Terminal-Bench 2.1, which measures coding in real terminal environments, Flash scores 76.2% versus Pro’s 70.3%. On Finance Agent v2, a benchmark for complex multi-step agentic tasks, Flash beats Pro by nearly 15 points: 57.9% versus 43.0%. The GDPval-AA Elo gap is 342 points — the largest delta in the entire benchmark table.
Flash wins on 6 of the 8 major benchmarks compared to Gemini 3.1 Pro. The two exceptions matter and are covered below.
Speed and Cost: The Math Is Obvious
Gemini 3.5 Flash generates approximately 289 tokens per second. GPT-5.5 and Claude Opus 4.7 run at roughly 70 tokens per second. That is about four times faster. In an 8-step agent loop, Flash completes the task in around 25 seconds. On Pro, the same loop takes 100 seconds.
| Model | MCP Atlas | Speed (tok/s) | Input ($/M) | Output ($/M) |
|---|---|---|---|---|
| Gemini 3.5 Flash | 83.6% | 289 | $1.50 | $9.00 |
| Claude Opus 4.7 | 79.1% | 67 | $5.00 | $25.00 |
| GPT-5.5 | 75.3% | 71 | $5.00 | $30.00 |
| Gemini 3.1 Pro | 78.2% | ~72 | $2.50 | $15.00 |
The pricing difference compounds the speed advantage. Flash costs $1.50 per million input tokens and $9.00 per million output tokens. Pro costs $2.50 and $15.00 respectively. For 100 coding sessions a month at 50K input and 5K output each, Flash costs roughly $12. Pro costs roughly $20. GPT-5.5 comes in at $33. Google’s own I/O demo ran 93 parallel subagents handling over 15,000 requests across a 12-hour session for under $1,000 — a cost profile that would be impossible at Pro, GPT-5.5, or Opus 4.7 rates.
Where Pro Still Wins: Know Before You Migrate
Two workload types should give developers pause before migrating entirely. First, long-context retrieval: Flash shows a 7.6-point regression on MRCR v2 at 128k tokens compared to Pro. If your application reads and synthesizes large documents — legal review, compliance checks, contract extraction — that regression is real and worth testing before committing. Gemini 3.1 Pro also supports a 2M token context window versus Flash’s 1M. For most teams 1 million tokens is more than enough, but for those processing multi-document legal corpora or book-length technical specifications, the larger window remains relevant.
Second, abstract reasoning at the frontier level: Pro holds an advantage on ARC-AGI-2. If your application handles novel problem solving, research synthesis, or complex multi-step deduction without tool use, Pro is still the better default. The practical heuristic: the closer your workload is to agentic — looping, tool use, coding — the more Flash wins. The closer it is to long-document Q&A, the more Pro wins.
How to Migrate: The Actual Code Changes
The migration is straightforward. The model name changes, one parameter changes, and two parameters should be removed. Per the official Gemini API migration guide, here are the changes:
Update your model ID:
# Before
model='gemini-3.1-pro' # or 'gemini-3-flash-preview'
# After
model='gemini-3.5-flash'
Update the thinking parameter. Google changed this from an integer to a string enum:
# Before
config={'thinking': {'thinking_budget': 8000}}
# After
config={'thinking': {'thinking_level': 'medium'}}
# Options: 'minimal', 'low', 'medium', 'high'
# Default is 'medium' — changed from 'high' in the preview
Remove temperature, top_p, and top_k from your configuration. Google no longer recommends setting these for 3.5 models; manual overrides can degrade performance on the new architecture. If you use function calling, add id and name fields to all FunctionResponse parts — they must match the preceding function calls, and missing them causes errors in production.
After migrating, test your specific workloads at the medium thinking level. The default dropped from high to medium, which means faster and cheaper responses but potentially lower quality on tasks requiring deep reasoning. If you notice regressions, switch to thinking_level: 'high' for those specific calls.
What About Gemini 3.5 Pro?
Google confirmed at I/O that Gemini 3.5 Pro is “being used internally” and will roll out in June 2026. Developers on 3.1 Pro for quality reasons face a clear decision: migrate to Flash now and get better agentic performance immediately, or wait six weeks for 3.5 Pro, which will likely reclaim the lead on reasoning benchmarks. The answer depends on your workload. If you are primarily building agentic systems — coding agents, tool-use pipelines, multi-step workflows — there is no reason to wait. Flash already leads that category.
The conventional wisdom about Flash being the speed-optimized, quality-compromised option has been wrong since May 19. Update your model config accordingly.













