LiteLLM Agent Platform: Run AI Coding Agents on Kubernetes

LiteLLM Agent Platform architecture showing Kubernetes sandboxes and vault proxy for AI coding agents

LiteLLM Agent Platform: self-hosted Kubernetes infrastructure for running AI coding agents in isolated sandboxes

Most teams deploying AI coding agents in 2026 are not blocked by model quality. Claude Code, Codex, and Hermes all produce strong output. What is blocking them is the infrastructure layer: agents that commit live credentials, sessions that vanish when a pod restarts, and nothing stopping one team’s agent from trampling another’s environment. The model problem was solved. The infrastructure problem was not.

On May 8, BerriAI open-sourced LiteLLM Agent Platform — a self-hosted Kubernetes layer for running AI coding agents in isolated sandboxes with credential vaulting and persistent session management. It is the infrastructure layer the agent ecosystem has been missing.

What it is (and what it builds on)

LiteLLM Agent Platform stacks on top of the existing LiteLLM AI Gateway, which handles model routing, cost tracking, rate limiting, and guardrails across 100+ LLM APIs. The new platform layer handles what the gateway was never designed for: sandbox lifecycle management, per-session credential isolation, state persistence, and a management dashboard. If your team already uses LiteLLM gateway, this is a direct upgrade path.

The platform runs three components: a Next.js dashboard on port 3000, an async worker process for agent tasks, and a Postgres database as the persistent backing store. Schema migrations run as an init container on startup, so the database is always in the correct state before the app boots.

The vault proxy: agents that never see real credentials

The most important mechanism in the platform is the vault proxy. Every sandbox pod runs a vault sidecar. That sidecar intercepts all HTTPS egress from the agent process via HTTPS_PROXY=http://127.0.0.1:14322. The agent environment contains only stub credentials — for example, GITHUB_TOKEN=stub_github_a8f1. When an outbound TLS connection is made, the sidecar swaps the stub for a real credential at the wire level. The real value never touches the agent process, never appears in logs, and never lands in a container environment variable.

This is not a minor convenience. According to GitGuardian’s State of Secrets Sprawl 2026, Claude Code-assisted commits leak credentials at double the rate of human-authored commits — 3.2% versus 1.5%. AI agent secrets leaks are up 81% year over year. The vault proxy addresses the specific failure mode that is sending agentic AI deployments to incident review.

Kubernetes-native sandboxes and session persistence

Sandbox isolation runs on kubernetes-sigs/agent-sandbox, a CRD from Kubernetes SIG Apps that manages agent environments as first-class resources. Each sandbox gets a stable hostname, isolated network identity, and optional gVisor or Kata Container runtime for kernel-level separation. A SandboxWarmPool keeps pre-warmed environments ready to reduce startup latency.

Session state persists in Postgres across pod restarts and upgrades. For agent tasks that run 20-60 minutes — code migrations, full test suites, codebase-wide refactors — a mid-run pod restart without persistence means starting over. The Postgres backing store eliminates that failure mode.

Getting started

Local development requires Docker Desktop, kind, kubectl, and helm plus a running LiteLLM gateway URL. Running bin/kind-up.sh provisions a kind cluster named agent-sbx, installs the agent-sandbox controller, and loads the harness image. docker compose up boots Postgres, migrates the schema, and starts the web and worker processes.

For production, the recommended path is AWS EKS for the sandbox cluster and Render for web and worker processes. A bin/eks-up.sh script provisions the EKS cluster. BerriAI provides a Render Blueprint for one-click web and worker deployment. Full setup docs are at docs.litellm-agent-platform.ai.

When to use this vs managed alternatives

Factor	LiteLLM Agent Platform	Claude Managed Agents
Hosting	Self-hosted (Kubernetes)	Anthropic cloud
Models	Any (100+ via LiteLLM)	Claude only
Session cost	Infrastructure only	$0.08/session-hour
Data control	Full	Data via Anthropic
Setup complexity	High (Kubernetes required)	Low (API call)
Status	Alpha	Public beta

The platform is the right choice when you need multi-model flexibility, have data sovereignty requirements, or are running agents at a volume where per-session costs add up. Claude Managed Agents is the better choice when DevOps capacity is limited and Claude-only is acceptable. Anthropic’s 2026 Agentic Coding Trends Report found that 83% of organizations plan to deploy agentic AI but only 29% feel ready to do so securely — managed infrastructure closes that gap faster if the constraints are acceptable.

What to know before committing

LiteLLM Agent Platform is in alpha. The kubernetes-sigs/agent-sandbox CRD it depends on is in active development and not yet production-ready for all workloads. Expect API changes and operational rough edges. The EKS production path requires meaningful investment from a platform team.

The architecture is sound, the vault proxy mechanism solves a real production problem, and the GitHub repo is actively maintained. Teams willing to run alpha infrastructure in exchange for model flexibility and data control have a viable path today. Everyone else should watch the 1.0 milestone closely.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

LiteLLM Agent Platform: Run AI Coding Agents on Kubernetes

What it is (and what it builds on)

The vault proxy: agents that never see real credentials

Kubernetes-native sandboxes and session persistence

Getting started

When to use this vs managed alternatives

What to know before committing

Playwright MCP in Production: Auth, Shadow DOM, Tokens

GPT-5.5 for Agentic Coding: A Practical Developer Guide

Leave a reply Cancel reply

More in:AI & Development

White House AI 30-Day Gate: Already Mandatory in Practice

Debian Votes to Ban LLM Contributions: Read the GR

Claude 5 Context Engineering: Anthropic Deleted 80% Prompt

Claude Voice Mode: Opus, Sonnet, and What Connectors Do

Tesla Robotaxi: Orlando, Tampa, 21 Cars, No Scale

Claude Code iOS Simulator: Setup Guide and Key Limits

Categories

What it is (and what it builds on)

The vault proxy: agents that never see real credentials

Kubernetes-native sandboxes and session persistence

Getting started

When to use this vs managed alternatives

What to know before committing

Share

You may also like

Leave a reply Cancel reply

More in:AI & Development

Categories

Latest Posts