
Google killed Project Mariner on May 4 — quietly, two weeks before I/O — and handed developers something better: the Gemini 2.5 Computer Use model, available in public preview through the Gemini API. No more experimental Labs product that disappears on a Tuesday. This is the productized version: a browser-control agent you can call from your code, deploy to Cloud Run, and bill at $1.25 per million input tokens.
What You’re Actually Getting
The Computer Use model (gemini-2.5-computer-use-preview-10-2025) is built on Gemini 2.5 Pro’s visual reasoning and runs through a structured agent loop. It’s not a magic black box — it’s an observable, debuggable cycle that your code drives:
- Send a goal, a screenshot of the current browser state, and a history of recent actions
- The model responds with a
function_call—click(x, y),type("text"),scroll(),go_to_url() - Your code executes the action via Playwright or Browserbase
- Capture a fresh screenshot, send it back as a
function_response - Repeat until the model signals it’s done
That loop is the whole thing. Every browser agent built on this API follows the same observe-think-act cycle. The advantage is that you control the executor, which means you control the environment, the retry logic, and the error handling.
Two Ways to Run It
Google gives you two supported execution environments, and the architecture is pluggable — you can switch between them without rewriting your agent logic.
Local with Playwright: Install Chromium, set your resolution to 1440×900 (the model’s recommended viewport), and you’re running. This is the fastest path to a working prototype. The reference implementation lives at github.com/google-gemini/computer-use-preview.
Cloud Run with Browserbase: For production deployments, Cloud Run containers don’t have direct browser access, so you route through Browserbase — a cloud browser provider. Google has configured Cloud Run instances specifically for this setup. Browserbase also offers a ready-made template at github.com/browserbase/gemini-browser if you want a faster start.
Here’s the minimum viable API call to get the loop started:
from google import genai
from google.genai.types import Tool, ComputerUseToolConfig
client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
model="gemini-2.5-computer-use-preview-10-2025",
contents="Find the top trending Python repositories on GitHub",
tools=[Tool(computer_use=ComputerUseToolConfig(
environment="ENVIRONMENT_BROWSER"
))]
)
# response includes function_call (action) + safety_decision
The Safety Model Is Baked In
Every action the model proposes comes with a safety_decision field from an internal per-step safety service. It returns either ALLOWED — proceed automatically — or REQUIRES_CONFIRMATION — stop and ask the user before executing.
This is configurable. Your system instructions can force confirmation before any sensitive action: form submissions, purchases, credential entry, clicks on destructive buttons. For any agent running with real user data or real accounts, this matters more than the benchmark numbers.
Where It’s Already in Production
This isn’t vaporware. Google teams have been running versions of this model internally, and it powers three shipped products:
- Firebase App Testing Agent: Write a test goal in natural language (“Find a trip to Greece”). The agent navigates your app, simulates real user flows, and returns pass/fail results with visual playback — running on physical and virtual devices simultaneously.
- Google AI Mode in Search: Some of the agentic browsing behaviors in Search run on this model.
- Project Mariner: Ran on an earlier version before shutdown — giving this model real production history before the public preview.
The Firebase Testing Agent is the most accessible reference for developers thinking about QA automation use cases.
Price It Before You Ship It
The free tier doesn’t cover Computer Use. As of April 2026, Gemini 2.5 Pro is excluded from the free tier. You’re paying:
- Input: $1.25 per million tokens (prompts under 200K context), $2.50/M above that
- Output: $10.00 per million tokens
The catch: screenshots are expensive. A 1440×900 PNG encodes as thousands of tokens. A multi-step agent task with 20 screenshots can run up a real bill. Budget your agent runs before putting this in production, and add a hard cap on loop iterations to prevent runaway costs.
Computer Use vs. MCP: When to Use Which
The Mariner shutdown fed a narrative that browser agents lost to API-first approaches. That’s not quite right. Both coexist for good reasons.
Use MCP or structured APIs when the target system has one — they’re faster, cheaper (no screenshot tokens), and more reliable. Use Computer Use when it doesn’t: legacy enterprise UIs, government portals, third-party web apps with no REST endpoint. For every system that’s API-accessible, there are three that aren’t. That’s the gap this model is built to fill.
Get Started
The full documentation is at ai.google.dev/gemini-api/docs/computer-use. If you’re working with the Agent Development Kit, the ADK Computer Use integration handles the loop scaffolding automatically. Start with the Playwright environment locally — fastest path to a working agent — then evaluate Browserbase or Cloud Run for production.
Project Mariner is gone. The capability it pointed at is now in your API client.













