Gemini 2.5 Computer Use Is Now an API. Here’s How to Build with It.

Gemini 2.5 Computer Use model - browser agent controlled by AI through the Gemini API with function calls

Gemini 2.5 Computer Use: Building browser agents with the Gemini API

Google killed Project Mariner on May 4 — quietly, two weeks before I/O — and handed developers something better: the Gemini 2.5 Computer Use model, available in public preview through the Gemini API. No more experimental Labs product that disappears on a Tuesday. This is the productized version: a browser-control agent you can call from your code, deploy to Cloud Run, and bill at $1.25 per million input tokens.

What You’re Actually Getting

The Computer Use model (gemini-2.5-computer-use-preview-10-2025) is built on Gemini 2.5 Pro’s visual reasoning and runs through a structured agent loop. It’s not a magic black box — it’s an observable, debuggable cycle that your code drives:

Send a goal, a screenshot of the current browser state, and a history of recent actions
The model responds with a function_call — click(x, y), type("text"), scroll(), go_to_url()
Your code executes the action via Playwright or Browserbase
Capture a fresh screenshot, send it back as a function_response
Repeat until the model signals it’s done

That loop is the whole thing. Every browser agent built on this API follows the same observe-think-act cycle. The advantage is that you control the executor, which means you control the environment, the retry logic, and the error handling.

Two Ways to Run It

Google gives you two supported execution environments, and the architecture is pluggable — you can switch between them without rewriting your agent logic.

Local with Playwright: Install Chromium, set your resolution to 1440×900 (the model’s recommended viewport), and you’re running. This is the fastest path to a working prototype. The reference implementation lives at github.com/google-gemini/computer-use-preview.

Cloud Run with Browserbase: For production deployments, Cloud Run containers don’t have direct browser access, so you route through Browserbase — a cloud browser provider. Google has configured Cloud Run instances specifically for this setup. Browserbase also offers a ready-made template at github.com/browserbase/gemini-browser if you want a faster start.

Here’s the minimum viable API call to get the loop started:

from google import genai
from google.genai.types import Tool, ComputerUseToolConfig

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-2.5-computer-use-preview-10-2025",
    contents="Find the top trending Python repositories on GitHub",
    tools=[Tool(computer_use=ComputerUseToolConfig(
        environment="ENVIRONMENT_BROWSER"
    ))]
)

# response includes function_call (action) + safety_decision

The Safety Model Is Baked In

Every action the model proposes comes with a safety_decision field from an internal per-step safety service. It returns either ALLOWED — proceed automatically — or REQUIRES_CONFIRMATION — stop and ask the user before executing.

This is configurable. Your system instructions can force confirmation before any sensitive action: form submissions, purchases, credential entry, clicks on destructive buttons. For any agent running with real user data or real accounts, this matters more than the benchmark numbers.

Where It’s Already in Production

This isn’t vaporware. Google teams have been running versions of this model internally, and it powers three shipped products:

Firebase App Testing Agent: Write a test goal in natural language (“Find a trip to Greece”). The agent navigates your app, simulates real user flows, and returns pass/fail results with visual playback — running on physical and virtual devices simultaneously.
Google AI Mode in Search: Some of the agentic browsing behaviors in Search run on this model.
Project Mariner: Ran on an earlier version before shutdown — giving this model real production history before the public preview.

The Firebase Testing Agent is the most accessible reference for developers thinking about QA automation use cases.

Price It Before You Ship It

The free tier doesn’t cover Computer Use. As of April 2026, Gemini 2.5 Pro is excluded from the free tier. You’re paying:

Input: $1.25 per million tokens (prompts under 200K context), $2.50/M above that
Output: $10.00 per million tokens

The catch: screenshots are expensive. A 1440×900 PNG encodes as thousands of tokens. A multi-step agent task with 20 screenshots can run up a real bill. Budget your agent runs before putting this in production, and add a hard cap on loop iterations to prevent runaway costs.

Computer Use vs. MCP: When to Use Which

The Mariner shutdown fed a narrative that browser agents lost to API-first approaches. That’s not quite right. Both coexist for good reasons.

Use MCP or structured APIs when the target system has one — they’re faster, cheaper (no screenshot tokens), and more reliable. Use Computer Use when it doesn’t: legacy enterprise UIs, government portals, third-party web apps with no REST endpoint. For every system that’s API-accessible, there are three that aren’t. That’s the gap this model is built to fill.

Get Started

The full documentation is at ai.google.dev/gemini-api/docs/computer-use. If you’re working with the Agent Development Kit, the ADK Computer Use integration handles the loop scaffolding automatically. Start with the Playwright environment locally — fastest path to a working agent — then evaluate Browserbase or Cloud Run for production.

Project Mariner is gone. The capability it pointed at is now in your API client.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.