
The dominant coding agents — Claude Code, Cline, OpenCode — arrive with system prompts ranging from 7,000 to 10,000 tokens. That is before you have typed a single task. Every one of those tokens is a permanent tax on every API call and a slice carved out of your context window before the real work starts. Pi, built by Mario Zechner and Armin Ronacher (the creator of Flask and Jinja2), takes the opposite bet: a system prompt under 1,000 tokens, four core tools, and an extension system that lets the agent write its own capabilities on demand. The project hit 58,000 GitHub stars and moved to the Earendil organization in May 2026. Here is what makes it architecturally different and whether you should care.
The Token Tax Is Real
A 7,000-token harness prompt consumes roughly 3.5% of a 200,000-token context window before you pass it a single line of code. That sounds manageable until you factor in conversation history, file contents, diffs, and model responses accumulating over a multi-hour session. Context exhaustion — the point where the model starts losing earlier conversation turns — arrives earlier than it should, and the harness is the culprit more often than people acknowledge.
Pi’s position is that most of what those 7,000–10,000 tokens contain is documentation the model could load on demand rather than carry around permanently. Strip the core to essentials: four tools (Read, Write, Edit, Bash), a minimal ReAct loop (stream response, check tool calls, execute, repeat), and a one-line description of each installed capability. Load the full instructions only when a capability is actually invoked.
Lazy Skills: Progressive Disclosure for Agent Capabilities
Pi calls this pattern “lazy skills.” Each skill — a capability package with its own instructions and tool schemas — contributes only a single line to the context on every turn. The full payload loads only when you invoke the skill by name or when Pi auto-detects it is needed. Think of it as lazy loading applied to agent architecture: the same way a JavaScript bundle defers a module until the user navigates to that route, Pi defers skill details until the task requires them.
This matters most if you run many skills. You can install a dozen capability packages — one for test analysis, one for database migrations, one for documentation generation — and pay context cost only for the ones you actually use in a given session. The MCP protocol preloads all tool schemas at session start regardless of usage. Pi’s approach is explicitly the opposite.
Self-Extension: The Agent Writes Its Own Tools
The feature that sets Pi apart from every other harness on the market: if a capability does not exist, you ask Pi to build it. This is not a plugin marketplace. There is no waiting for someone else to publish a compatible package. The flow is:
- Tell Pi what you need: “Build me a skill that runs my Jest tests and summarizes failures.”
- Pi writes a TypeScript extension module with the appropriate tools and instructions.
- The extension takes effect immediately — no session restart required.
- The skill is available in every future session.
Armin Ronacher described this on launch: “The future is software writing its own software. Which is why I’m so in love with Pi: a coding agent that can extend itself.” The practical limitation is that Pi trusts the model to write correct TypeScript. There are no guardrails preventing a buggy extension from misbehaving. For straightforward capability additions this works well; for complex integrations involving external services, review what Pi generates before committing to it.
Getting Started
Pi moved to the Earendil package registry in May 2026. Install globally:
npm install -g @earendil-works/pi-coding-agent
Requirements: Node 22.19.0 or later, git, bash. Set your API key and run in any project directory:
pi config set ANTHROPIC_API_KEY=sk-ant-...
cd my-project && pi
For local models via Ollama, add a provider block to ~/.pi/agent/models.json:
{
"providers": {
"ollama-local": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama"
}
}
}
Recommended local models for Pi’s minimal context approach: Qwen 3.6:35b for complex multi-file tasks, Qwen 2.5-coder:7b when you need fast iteration. Because Pi’s harness footprint is small, smaller models have a better chance of succeeding than they would inside a larger harness.
Pi vs the Alternatives
| Agent | System Prompt | Open Source | Local Models | Self-Extension | Built-in LSP |
|---|---|---|---|---|---|
| Pi | ~900 tokens | Yes (MIT) | Yes (Ollama) | Yes | Via extension |
| OpenCode | ~7,000+ tokens | Yes (MIT) | Yes (Ollama) | No | Yes |
| Aider | ~5,000 tokens | Yes (Apache) | Yes | No | Partial |
| Claude Code | ~7,000–10,000 tokens | No | No | No | No |
| Cline | ~7,000–10,000 tokens | Yes (Apache) | Yes | No | Via VS Code |
Pi is the right choice if context exhaustion is a real problem in your current workflow, if you work in privacy-sensitive environments where you would rather run models locally, or if you want a harness you can adapt without forking internals. It is not the right choice if you need built-in LSP diagnostics out of the box — OpenCode handles that better — or if you are onboarding a team that needs a polished, batteries-included experience from day one. Cline and Claude Code still win for ease of setup; Pi wins for control and context efficiency once you are willing to invest a few hours of configuration.













