An AI agent burned $4,200 in 63 hours last month. Another hit $47,000 from an infinite tool loop that nobody caught until the billing alert fired. These aren’t cautionary tales from amateurs — 88% of enterprise agents that pass demo review fail in production, and the failure modes are almost always the same: a transient error that doesn’t retry, a quota hit that crashes instead of switching models, a destructive tool call that executes without a confirmation gate. On May 14, Google shipped Genkit Middleware — and it addresses all three of those failures directly.
Why Agents Fail in Production
The core problem is where failures happen in an agent’s execution loop. Every generate() call in Genkit runs a tool loop: model produces output, tools execute, results feed back into the next model call, repeat until done. A single transient rate-limit error anywhere in that loop crashes the entire run. Multi-step agents burn roughly 50x more tokens than linear chat calls — by step 20, a single tool loop turn can exceed 50,000 input tokens. Without middleware, you’re writing retry logic from scratch, implementing fallbacks manually, and hoping your tool calls don’t mutate production data without a confirmation step.
Middleware solves this by intercepting the loop at three programmable layers before any of that goes wrong.
The Three Hook Layers
Genkit Middleware attaches at three points in the generation cycle:
- Generate hooks — conversation-level logic; inject context, modify the message envelope before it hits the model
- Model hooks — wraps individual model API calls; the right place for retry and fallback logic
- Tool hooks — intercepts tool execution; where approval gates and sandboxing live
This separation matters. Retry logic belongs at the model layer so only the API call retries, not the full tool loop. Approval gates belong at the tool layer so they fire on every tool call, not just once per conversation turn. Getting this architecture right from scratch takes days; Genkit ships it wired correctly by default.
The Four Built-In Middleware Types
Genkit ships five pre-built middleware components. Four handle the most common production failure modes:
retry — Catches transient model API errors (RESOURCE_EXHAUSTED, UNAVAILABLE) and retries with exponential backoff and jitter. Critically, only the model call retries — the surrounding tool loop does not replay, which avoids duplicate side effects.
fallback — Switches to an alternative model when the primary exhausts its quota or hits specified error codes. Point it at a different model from a completely different provider if needed. No more crashes when Gemini Flash hits its rate limit.
toolApproval — Any tool not on an explicit allow list triggers a ToolInterruptError and halts execution until a human approves and resumes. This is human-in-the-loop gating with first-class framework support, not a bolt-on patch.
filesystem — Grants the model sandboxed access to the local filesystem, scoped to a specific directory. The model can read and write files without escaping the sandbox.
What It Looks Like in Code
Middleware stacks left-to-right using the use array in any generate() call. Here’s a production-ready setup for an agent that reads files and runs web searches:
import { retry, fallback, toolApproval } from '@genkit-ai/middleware';
const response = await ai.generate({
model: 'googleai/gemini-flash-latest',
prompt: 'Research our top three competitors',
tools: [webSearch, readFile, writeReport],
use: [
retry({ maxRetries: 3, initialDelayMs: 1000, backoffFactor: 2 }),
fallback({ fallbackModel: 'googleai/gemini-pro' }),
toolApproval({ allowList: ['webSearch', 'readFile'] }),
],
});
Three lines of middleware. The agent retries on transient errors, falls back to Gemini Pro if Flash is exhausted, and requires human approval before writeReport touches anything. That’s the complete production hardening setup for most agent use cases.
Genkit vs. Vercel AI SDK: Different Problems
Vercel’s AI SDK also ships middleware, and if you’re building on Next.js and talking to a dozen providers, it’s the better choice — it’s purpose-built for smoothing over provider differences. Genkit’s middleware is built for a different problem: hardening the agent execution loop itself. Genkit’s tool hook makes tool execution a first-class middleware target, which Vercel AI SDK doesn’t do. Genkit also runs in TypeScript, Go, and Dart; Vercel AI SDK is TypeScript only. Pick based on what you’re actually trying to fix.
Status and Availability
Genkit Middleware ships today in TypeScript, Go, and Dart. Python support is in progress. The Genkit Dev UI now includes middleware trace visualization — inspect each hook layer’s execution, test middleware combinations, and debug timing without adding console logs. Firebase Functions are, for the first time, a credible runtime for production AI workloads. That’s the real headline here. If you’re also building on AWS, AWS Bedrock AgentCore shipped persistent filesystems for agents the same week — worth comparing the two approaches.
The full announcement is on the Google Developers Blog. Source and examples are in the Genkit GitHub repo. For a hands-on comparison with other frameworks, InfoQ’s coverage is worth reading.













