Gemini 3.5 Pro API: Access, Pricing, and What to Do Now

Gemini 3.5 Pro is coming to GA in June 2026

At Google I/O on May 19, Sundar Pichai told the live audience to “give us until next month” before Gemini 3.5 Pro would be in their hands. The crowd audibly groaned. That month is now. Gemini 3.5 Pro — Google’s next frontier model positioned as the direct successor to Gemini Ultra — is in limited Vertex enterprise preview with general availability expected any day in June 2026. Here’s what developers building on Gemini APIs need to know before it lands.

What Gemini 3.5 Pro Is (and Why It’s Not Just “Bigger Flash”)

This distinction matters because Gemini 3.5 Flash already beats Gemini 3.1 Pro on 6 out of 8 published benchmarks — including coding and agentic tasks. Flash is faster, 25% cheaper, and wins on most everyday developer workloads. So what is Pro actually for?

Pro is built for the ceiling cases. It targets a 2 million token context window (double Flash’s 1M), Deep Think reasoning for iterative multi-path analysis, and frontier multimodal understanding across text, images, audio, and video. On Humanity’s Last Exam — a benchmark designed to test hard reasoning — Gemini 3.1 Pro scores 44.4% vs Flash’s 40.2%. That 4-point gap sounds small until your use case is legal due diligence or scientific synthesis where a hallucination has a real cost.

The 2M context window is the other differentiator. That’s roughly 1,500 pages of text or 30,000 lines of code in a single API call. Flash loses coherence on very long documents. Pro holds it.

How to Access Gemini 3.5 Pro Before GA

Two paths depending on who you are.

Enterprise (Google Cloud / Vertex AI): Open Vertex AI Model Garden, search “gemini-3.5-pro,” and request allowlist access through your Google Cloud account team. Gemini Enterprise customers should contact their CSM directly. Access is being extended to select enterprise customers now.

Individual developers: Watch aistudio.google.com — Google adds new models to the AI Studio model picker the moment the API is ready. No announcements, no press release: the model just appears. Also bookmark the official Gemini models page and the release changelog. Those are your two real-time signals.

One important caution: do not hardcode gemini-3.5-pro in production code yet. Google typically ships an initial release under a preview-suffixed identifier (e.g., gemini-3.5-pro-preview) before stabilizing the clean model name. The stable GA string follows weeks later. You can check what’s live right now with:

import google.generativeai as genai

for m in genai.list_models():
    if "3.5" in m.name and "pro" in m.name.lower():
        print(m.name)

Run that and you’ll know the exact model string the moment it goes live.

Flash vs Pro: The Decision Matrix

Most developers should stay on Flash for most workloads. Here’s where the line is:

Use Gemini 3.5 Flash when: you’re running agentic loops (Flash is 4x faster — roughly 25 seconds vs ~100 seconds for an 8-step agent loop), high-volume RAG pipelines, code generation, summarization, classification, or any workload where throughput matters. With context caching enabled, cached input drops to ~$0.15 per million tokens. At 10,000 queries a day, Flash with caching is viable. Pro at that scale is not.

Use Gemini 3.5 Pro when: you need to ingest a full codebase for architectural analysis, process a 200-page contract and cite specific clauses accurately, synthesize findings across 20 or more documents simultaneously, or run tasks where hallucinations are genuinely unacceptable. These are the use cases Flash’s reasoning ceiling cannot reliably serve.

Migration: Who Needs to Act

If you’re on Gemini 3.1 Pro: no action required yet. There is no deprecation notice. Stay where you are until Pro’s 2M context window or improved reasoning unlocks something specific for your stack — then migrate.

If you’re on Gemini Ultra: Pro is the designated successor. Google hasn’t published a migration guide yet — it comes with the model card at GA. Expect a model string swap. Watch the official deprecations page for timeline.

Recent deprecations already handled: gemini-2.0-flash shut down June 1 and gemini-3-pro-preview shut down March 9. If you were on either, you’ve already migrated.

Pricing: No Official Numbers, but Here’s What the Signals Say

Google has confirmed nothing on Pro pricing. The model card — with pricing, context specs, and the API model string — drops at GA. Based on what’s happened before: Gemini 3.5 Flash tripled in cost vs its predecessor while still beating Gemini 3.1 Pro on most benchmarks. Pro will be priced above Flash.

The practical floor for long-context Pro workloads depends heavily on context caching. If you’re sending the same system prompt or document repeatedly, the 90% input cache discount is the difference between a viable deployment and a bill that breaks the use case. Plan your caching strategy before you migrate.

Three Pages to Watch Right Now

ai.google.dev/gemini-api/docs/models — Gemini 3.5 Pro appears here the moment it’s live
ai.google.dev/gemini-api/docs/pricing — Pricing row is added at GA
ai.google.dev/gemini-api/docs/changelog — The announcement comes here first

Pichai promised June. We’re in it. Pro will land before the month is out — and when it does, you want a deployment plan ready, not a learning curve ahead of you.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.