NewsAI & DevelopmentDeveloper Tools

Gemini 3.5 Flash: API Changes and What Developers Must Do

Gemini 3.5 Flash by Google - lightning bolt neural network circuit diagram on blue background representing AI speed and intelligence
Gemini 3.5 Flash launched at Google I/O 2026

Google shipped Gemini 3.5 Flash at I/O 2026 on May 19, and the headline is not what you’d expect from a model with “Flash” in the name. It outperforms Gemini 3.1 Pro on coding and agentic benchmarks, runs at 289 tokens per second — four times faster than comparable frontier models — and arrives with a set of breaking API changes that will affect any app currently using gemini-3-flash-preview. If you’re running on that model string, you have work to do.

The API Has Breaking Changes

The most important update is one that will silently degrade your application if you don’t catch it. Google changed the thinking parameter from an integer (thinking_budget) to a string enum (thinking_level), and the default shifted from high to medium. A naive swap of the model name won’t preserve your previous behavior.

The full list of changes:

  • Model name: gemini-3-flash-preview to gemini-3.5-flash
  • thinking_budget (integer) is replaced by thinking_level: minimal, low, medium (default), or high
  • temperature, top_p, and top_k are no longer recommended — remove them from your generation config
  • All FunctionResponse parts now require an id field and a matching name
  • Thought preservation is on by default — reasoning context now carries forward across turns, which helps performance but increases token usage

You also need to update your SDK. Google recommends google-genai v2.0.0 or later, which introduces its own breaking changes to the Interactions API. Don’t migrate the model name without also bumping the SDK. Full details are in the official Gemini 3.5 Flash migration docs.

Here’s what a minimal migration looks like in Python:

# Before (gemini-3-flash-preview)
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=prompt,
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(thinking_budget=2048),
        temperature=0.7
    )
)

# After (gemini-3.5-flash + SDK v2.0.0)
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=prompt,
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(thinking_level="low")
        # temperature removed — no longer recommended
    )
)

One nuance worth flagging: low does not mean “skip reasoning.” Google retuned the low setting specifically for code and agentic tasks. It is a real thinking level, not a hint to go fast. If your application is generating excessive tool calls after the migration, reduce the thinking level first, then add a system instruction to constrain tool usage if the problem persists.

Flash Now Beats Pro on Developer Benchmarks

The more interesting story is the performance data. On the benchmarks most relevant to developers, Gemini 3.5 Flash outperforms Gemini 3.1 Pro across the board. According to Google’s official announcement:

  • Terminal-Bench 2.1 (coding): 76.2%
  • MCP Atlas (multi-step agentic workflows): 83.6%
  • CharXiv Reasoning (multimodal understanding): 84.2%

On MCP Atlas — the benchmark that most closely models real agentic deployments — Gemini 3.5 Flash leads GPT-5.5 by 8.3 points (83.6% vs 75.3%) and Claude Opus 4.7 by 4.5 points (83.6% vs 79.1%). GPT-5.5 still edges Flash on raw terminal coding by 2 points, but for multi-step workflows, Flash is currently the leader among publicly available models.

This inverts the standard Flash/Pro trade-off. Google has historically positioned Flash as cheaper and faster at the cost of capability. For coding and agentic work specifically, that calculus no longer holds. There is no reason to reach for Gemini 3.1 Pro for these tasks.

Pricing: More Than Gemini 3 Flash, Less Than Pro

Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens. That is three times more expensive than Gemini 3 Flash ($0.50 / $3.00 per million), a jump that generated complaints in developer communities. The comparison that matters, however, is against what it replaces in practice. Gemini 3.1 Pro costs $2.50 / $15.00 per million tokens. Gemini 3.5 Flash is 40% cheaper than Pro and outperforms it on the benchmarks above. Artificial Analysis and other independent evaluators have confirmed this performance-to-cost positioning. Cached input tokens are $0.15 per million.

The model is available across the Gemini API, Google AI Studio, Antigravity, and Vertex AI. For agentic deployments, the Antigravity harness enables subagent orchestration — multiple agents running in parallel at 289 tokens per second, over a 1,048,576-token context window. For workflows that previously required Pro-tier calls, Flash is now the right choice on both cost and capability.

Migration Checklist

If you’re currently on gemini-3-flash-preview, here is what you need to do:

  1. Update google-genai to v2.0.0 or later
  2. Change the model name to gemini-3.5-flash
  3. Replace thinking_budget with thinking_level — set it explicitly if you were relying on the old high default
  4. Remove temperature, top_p, and top_k from your generation config
  5. Add id and name to all FunctionResponse parts
  6. Audit token usage — thought preservation is now on by default and will increase costs for multi-turn conversations

Google is also working on Gemini 3.5 Pro, which is already in internal deployment and expected in June 2026. The I/O 2026 developer highlights page covers additional platform updates including Antigravity, AI Studio, and managed agents shipping alongside the Flash release. Given what 3.5 Flash is already doing to Pro-tier benchmark numbers, the upcoming Pro release is worth watching.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News