
Most teams deploying AI coding agents in 2026 are not blocked by model quality. Claude Code, Codex, and Hermes all produce strong output. What is blocking them is the infrastructure layer: agents that commit live credentials, sessions that vanish when a pod restarts, and nothing stopping one team’s agent from trampling another’s environment. The model problem was solved. The infrastructure problem was not.
On May 8, BerriAI open-sourced LiteLLM Agent Platform — a self-hosted Kubernetes layer for running AI coding agents in isolated sandboxes with credential vaulting and persistent session management. It is the infrastructure layer the agent ecosystem has been missing.
What it is (and what it builds on)
LiteLLM Agent Platform stacks on top of the existing LiteLLM AI Gateway, which handles model routing, cost tracking, rate limiting, and guardrails across 100+ LLM APIs. The new platform layer handles what the gateway was never designed for: sandbox lifecycle management, per-session credential isolation, state persistence, and a management dashboard. If your team already uses LiteLLM gateway, this is a direct upgrade path.
The platform runs three components: a Next.js dashboard on port 3000, an async worker process for agent tasks, and a Postgres database as the persistent backing store. Schema migrations run as an init container on startup, so the database is always in the correct state before the app boots.
The vault proxy: agents that never see real credentials
The most important mechanism in the platform is the vault proxy. Every sandbox pod runs a vault sidecar. That sidecar intercepts all HTTPS egress from the agent process via HTTPS_PROXY=http://127.0.0.1:14322. The agent environment contains only stub credentials — for example, GITHUB_TOKEN=stub_github_a8f1. When an outbound TLS connection is made, the sidecar swaps the stub for a real credential at the wire level. The real value never touches the agent process, never appears in logs, and never lands in a container environment variable.
This is not a minor convenience. According to GitGuardian’s State of Secrets Sprawl 2026, Claude Code-assisted commits leak credentials at double the rate of human-authored commits — 3.2% versus 1.5%. AI agent secrets leaks are up 81% year over year. The vault proxy addresses the specific failure mode that is sending agentic AI deployments to incident review.
Kubernetes-native sandboxes and session persistence
Sandbox isolation runs on kubernetes-sigs/agent-sandbox, a CRD from Kubernetes SIG Apps that manages agent environments as first-class resources. Each sandbox gets a stable hostname, isolated network identity, and optional gVisor or Kata Container runtime for kernel-level separation. A SandboxWarmPool keeps pre-warmed environments ready to reduce startup latency.
Session state persists in Postgres across pod restarts and upgrades. For agent tasks that run 20-60 minutes — code migrations, full test suites, codebase-wide refactors — a mid-run pod restart without persistence means starting over. The Postgres backing store eliminates that failure mode.
Getting started
Local development requires Docker Desktop, kind, kubectl, and helm plus a running LiteLLM gateway URL. Running bin/kind-up.sh provisions a kind cluster named agent-sbx, installs the agent-sandbox controller, and loads the harness image. docker compose up boots Postgres, migrates the schema, and starts the web and worker processes.
For production, the recommended path is AWS EKS for the sandbox cluster and Render for web and worker processes. A bin/eks-up.sh script provisions the EKS cluster. BerriAI provides a Render Blueprint for one-click web and worker deployment. Full setup docs are at docs.litellm-agent-platform.ai.
When to use this vs managed alternatives
| Factor | LiteLLM Agent Platform | Claude Managed Agents |
|---|---|---|
| Hosting | Self-hosted (Kubernetes) | Anthropic cloud |
| Models | Any (100+ via LiteLLM) | Claude only |
| Session cost | Infrastructure only | $0.08/session-hour |
| Data control | Full | Data via Anthropic |
| Setup complexity | High (Kubernetes required) | Low (API call) |
| Status | Alpha | Public beta |
The platform is the right choice when you need multi-model flexibility, have data sovereignty requirements, or are running agents at a volume where per-session costs add up. Claude Managed Agents is the better choice when DevOps capacity is limited and Claude-only is acceptable. Anthropic’s 2026 Agentic Coding Trends Report found that 83% of organizations plan to deploy agentic AI but only 29% feel ready to do so securely — managed infrastructure closes that gap faster if the constraints are acceptable.
What to know before committing
LiteLLM Agent Platform is in alpha. The kubernetes-sigs/agent-sandbox CRD it depends on is in active development and not yet production-ready for all workloads. Expect API changes and operational rough edges. The EKS production path requires meaningful investment from a platform team.
The architecture is sound, the vault proxy mechanism solves a real production problem, and the GitHub repo is actively maintained. Teams willing to run alpha infrastructure in exchange for model flexibility and data control have a viable path today. Everyone else should watch the 1.0 milestone closely.













