Chrome Built-in AI: Gemma 197M Cuts Server Costs to Zero

Chrome browser window with AI neural network visualization representing Chrome built-in AI and Gemma 197M on-device model

Chrome Built-in AI: Gemma 197M powers on-device browser APIs with zero server cost

Chrome 148 shipped the Prompt API to stable in May 2026. Every Chrome desktop user with qualifying hardware now has a local LLM sitting in their browser, available via three lines of JavaScript — no API key, no server bill, no round-trip latency. Then at Google I/O, Google introduced Gemma 197M, a 197-million-parameter model built specifically to power Chrome’s task-specific APIs. The on-device AI era on the web is not a roadmap item. It already shipped.

What Actually Landed

Three things happened in quick succession. Chrome 148 (May 5) promoted the Prompt API from origin trial to stable, making it available by default for any website targeting desktop Chrome. Chrome 149 (June 2) expanded Gemini Nano to five languages: English, Spanish, Japanese, German, and French — turning this from a US-centric feature into something globally useful. And at Google I/O 2026, Google announced Gemma 197M, a much lighter model designed to power the Summarizer, Writer, Rewriter, and Proofreader APIs without the full weight of Gemini Nano.

These are not separate announcements. They are three legs of the same platform strategy: make AI cheap to run, fast to ship, and available at scale without involving Google’s servers.

The Developer Value

The practical case is straightforward. Cloud AI APIs charge per token. At any meaningful volume — a content platform processing article summaries, a SaaS tool running classification on user input — those costs stack up fast. Chrome’s built-in APIs charge nothing at runtime. The model runs on the user’s hardware. The inference cost is zero to the developer.

Terra, a CMS platform, made the switch from a server-side LLM to Chrome’s Summarizer API. Their verdict: comparable quality, zero server cost, and simpler data governance since user text never leaves the device. Bright Sites runs the same API across 150+ publications including The Standard, generating personalized article summaries at no marginal cost. Trip.com uses it for AI flight booking overviews at scale.

These are not proofs-of-concept. They are production systems running on hardware that belongs to users.

How to Use It

The Prompt API lives on the LanguageModel global. Start by checking availability, then create a session:

const availability = await LanguageModel.availability();
// Returns: 'available', 'downloading', or 'unavailable'

if (availability !== 'unavailable') {
  const session = await LanguageModel.create({
    initialPrompts: [
      { role: 'system', content: 'You are a concise code reviewer.' }
    ]
  });

  const result = await session.prompt('Review this function for bugs: ...');
  console.log(result);
}

For structured use cases, the Summarizer API (powered by Gemma 197M) has a cleaner interface:

const summarizer = await Summarizer.create({
  type: 'key-points',   // 'key-points' | 'tldr' | 'teaser' | 'headline'
  format: 'markdown',
  length: 'short'
});

const summary = await summarizer.summarize(articleText);

TypeScript developers can install @types/dom-chromium-ai for full type coverage. The official Prompt API documentation also provides a polyfill for local testing without Chrome flags.

What Gemma 197M Actually Is

Gemma 197M is not a cut-down version of Gemini Nano. It is a separate, purpose-built model with 197 million parameters — roughly 15x smaller than the 3B Gemini Nano. It is trained specifically for structured, task-bounded outputs: summarize this, rewrite this, check this for errors. It does not attempt open-ended reasoning.

The practical result is that it runs on more devices than Gemini Nano, with faster inference for the APIs it powers. For developers building summarization or classification features, Gemma 197M is the better fit — not because it is smarter, but because it is reliable within its narrow domain.

The Honest Limits

This is Chrome-only. Firefox and Safari have no equivalent. The Prompt API and Summarizer API require either more than 4GB of VRAM or at least 16GB of RAM with 4 CPU cores. They need 22GB of free disk space for model files. Mobile Chrome on Android and iOS is not supported. Incognito mode does not work.

There is also a privacy concern worth naming. Chrome downloads the 4GB Gemini Nano model silently, without a user notification, on any qualifying device. Google also quietly removed the privacy statement from Chrome’s settings UI that previously promised processing stays on-device. For enterprise or privacy-sensitive deployments, both facts belong in the product decision.

The right mental model: Chrome’s built-in AI is not a replacement for cloud APIs. It is a complement for high-volume, low-complexity use cases on desktop Chrome. Sentiment analysis, document summarization, draft headline generation, quick classification — it handles these well. Anything requiring frontier-level reasoning or mobile support does not belong here yet.

The Bottom Line

The Prompt API going stable in Chrome 148 is the kind of announcement that gets buried in a week dominated by WWDC and a dozen other releases. It should not be. For web developers who have been treating AI features as cloud-only infrastructure, Chrome just handed them a free inference layer that scales to any user volume without a bill. Gemma 197M makes that layer faster and more reliable for task-specific work. The hardware gate is real, but so is the cost floor: zero.

If you are building a content platform, a developer tool, or any web app that processes user-generated text at volume, add availability checks and a Chrome built-in AI path today. The Google I/O 2026 guide on building with built-in AI is the right starting point. Your cloud API handles the fallback.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.