Gemini 3.5 Flash: API Changes and What Developers Must Do

Gemini 3.5 Flash by Google - lightning bolt neural network circuit diagram on blue background representing AI speed and intelligence

Gemini 3.5 Flash launched at Google I/O 2026

Google shipped Gemini 3.5 Flash at I/O 2026 on May 19, and the headline is not what you’d expect from a model with “Flash” in the name. It outperforms Gemini 3.1 Pro on coding and agentic benchmarks, runs at 289 tokens per second — four times faster than comparable frontier models — and arrives with a set of breaking API changes that will affect any app currently using gemini-3-flash-preview. If you’re running on that model string, you have work to do.

The API Has Breaking Changes

The most important update is one that will silently degrade your application if you don’t catch it. Google changed the thinking parameter from an integer (thinking_budget) to a string enum (thinking_level), and the default shifted from high to medium. A naive swap of the model name won’t preserve your previous behavior.

The full list of changes:

Model name: gemini-3-flash-preview to gemini-3.5-flash
thinking_budget (integer) is replaced by thinking_level: minimal, low, medium (default), or high
temperature, top_p, and top_k are no longer recommended — remove them from your generation config
All FunctionResponse parts now require an id field and a matching name
Thought preservation is on by default — reasoning context now carries forward across turns, which helps performance but increases token usage

You also need to update your SDK. Google recommends google-genai v2.0.0 or later, which introduces its own breaking changes to the Interactions API. Don’t migrate the model name without also bumping the SDK. Full details are in the official Gemini 3.5 Flash migration docs.

Here’s what a minimal migration looks like in Python:

# Before (gemini-3-flash-preview)
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=prompt,
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(thinking_budget=2048),
        temperature=0.7
    )
)

# After (gemini-3.5-flash + SDK v2.0.0)
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=prompt,
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(thinking_level="low")
        # temperature removed — no longer recommended
    )
)

One nuance worth flagging: low does not mean “skip reasoning.” Google retuned the low setting specifically for code and agentic tasks. It is a real thinking level, not a hint to go fast. If your application is generating excessive tool calls after the migration, reduce the thinking level first, then add a system instruction to constrain tool usage if the problem persists.

Flash Now Beats Pro on Developer Benchmarks

The more interesting story is the performance data. On the benchmarks most relevant to developers, Gemini 3.5 Flash outperforms Gemini 3.1 Pro across the board. According to Google’s official announcement:

Terminal-Bench 2.1 (coding): 76.2%
MCP Atlas (multi-step agentic workflows): 83.6%
CharXiv Reasoning (multimodal understanding): 84.2%

On MCP Atlas — the benchmark that most closely models real agentic deployments — Gemini 3.5 Flash leads GPT-5.5 by 8.3 points (83.6% vs 75.3%) and Claude Opus 4.7 by 4.5 points (83.6% vs 79.1%). GPT-5.5 still edges Flash on raw terminal coding by 2 points, but for multi-step workflows, Flash is currently the leader among publicly available models.

This inverts the standard Flash/Pro trade-off. Google has historically positioned Flash as cheaper and faster at the cost of capability. For coding and agentic work specifically, that calculus no longer holds. There is no reason to reach for Gemini 3.1 Pro for these tasks.

Pricing: More Than Gemini 3 Flash, Less Than Pro

Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens. That is three times more expensive than Gemini 3 Flash ($0.50 / $3.00 per million), a jump that generated complaints in developer communities. The comparison that matters, however, is against what it replaces in practice. Gemini 3.1 Pro costs $2.50 / $15.00 per million tokens. Gemini 3.5 Flash is 40% cheaper than Pro and outperforms it on the benchmarks above. Artificial Analysis and other independent evaluators have confirmed this performance-to-cost positioning. Cached input tokens are $0.15 per million.

The model is available across the Gemini API, Google AI Studio, Antigravity, and Vertex AI. For agentic deployments, the Antigravity harness enables subagent orchestration — multiple agents running in parallel at 289 tokens per second, over a 1,048,576-token context window. For workflows that previously required Pro-tier calls, Flash is now the right choice on both cost and capability.

Migration Checklist

If you’re currently on gemini-3-flash-preview, here is what you need to do:

Update google-genai to v2.0.0 or later
Change the model name to gemini-3.5-flash
Replace thinking_budget with thinking_level — set it explicitly if you were relying on the old high default
Remove temperature, top_p, and top_k from your generation config
Add id and name to all FunctionResponse parts
Audit token usage — thought preservation is now on by default and will increase costs for multi-turn conversations

Google is also working on Gemini 3.5 Pro, which is already in internal deployment and expected in June 2026. The I/O 2026 developer highlights page covers additional platform updates including Antigravity, AI Studio, and managed agents shipping alongside the Flash release. Given what 3.5 Flash is already doing to Pro-tier benchmark numbers, the upcoming Pro release is worth watching.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

Gemini 3.5 Flash: API Changes and What Developers Must Do

The API Has Breaking Changes

Flash Now Beats Pro on Developer Benchmarks

Pricing: More Than Gemini 3 Flash, Less Than Pro

Migration Checklist

GitHub Actions pull_request_target Flaw Exposed Grafana Code

CISA AWS GovCloud Keys Exposed on Public GitHub for 6 Months

Leave a reply Cancel reply

More in:News

EU AI Act August 2: What Developers Must Do Now

Linux 7.2-rc4: MongoDB Gets 30–100% Faster, and strncpy Is Finally Dead

Jellyfin Founder Left His Own Project. Governance Is Why.

Grok Build Goes Open Source After Secretly Uploading Your Code

GPT-5.6 Finds $500K WordPress Exploit in 10 Hours for $25

Kimi K2.7 Code Lands in GitHub Copilot: Open-Weight, Finally

Categories

The API Has Breaking Changes

Flash Now Beats Pro on Developer Benchmarks

Pricing: More Than Gemini 3 Flash, Less Than Pro

Migration Checklist

Share

You may also like

Leave a reply Cancel reply

More in:News

Categories

Latest Posts