Pi Coding Agent: The Minimal Harness That Rewrites Itself

Pi coding agent terminal interface showing minimal system prompt architecture with ByteIota blue circuit board design

Pi by earendil-works: the sub-1000-token coding agent harness

The dominant coding agents — Claude Code, Cline, OpenCode — arrive with system prompts ranging from 7,000 to 10,000 tokens. That is before you have typed a single task. Every one of those tokens is a permanent tax on every API call and a slice carved out of your context window before the real work starts. Pi, built by Mario Zechner and Armin Ronacher (the creator of Flask and Jinja2), takes the opposite bet: a system prompt under 1,000 tokens, four core tools, and an extension system that lets the agent write its own capabilities on demand. The project hit 58,000 GitHub stars and moved to the Earendil organization in May 2026. Here is what makes it architecturally different and whether you should care.

The Token Tax Is Real

A 7,000-token harness prompt consumes roughly 3.5% of a 200,000-token context window before you pass it a single line of code. That sounds manageable until you factor in conversation history, file contents, diffs, and model responses accumulating over a multi-hour session. Context exhaustion — the point where the model starts losing earlier conversation turns — arrives earlier than it should, and the harness is the culprit more often than people acknowledge.

Pi’s position is that most of what those 7,000–10,000 tokens contain is documentation the model could load on demand rather than carry around permanently. Strip the core to essentials: four tools (Read, Write, Edit, Bash), a minimal ReAct loop (stream response, check tool calls, execute, repeat), and a one-line description of each installed capability. Load the full instructions only when a capability is actually invoked.

Lazy Skills: Progressive Disclosure for Agent Capabilities

Pi calls this pattern “lazy skills.” Each skill — a capability package with its own instructions and tool schemas — contributes only a single line to the context on every turn. The full payload loads only when you invoke the skill by name or when Pi auto-detects it is needed. Think of it as lazy loading applied to agent architecture: the same way a JavaScript bundle defers a module until the user navigates to that route, Pi defers skill details until the task requires them.

This matters most if you run many skills. You can install a dozen capability packages — one for test analysis, one for database migrations, one for documentation generation — and pay context cost only for the ones you actually use in a given session. The MCP protocol preloads all tool schemas at session start regardless of usage. Pi’s approach is explicitly the opposite.

Self-Extension: The Agent Writes Its Own Tools

The feature that sets Pi apart from every other harness on the market: if a capability does not exist, you ask Pi to build it. This is not a plugin marketplace. There is no waiting for someone else to publish a compatible package. The flow is:

Tell Pi what you need: “Build me a skill that runs my Jest tests and summarizes failures.”
Pi writes a TypeScript extension module with the appropriate tools and instructions.
The extension takes effect immediately — no session restart required.
The skill is available in every future session.

Armin Ronacher described this on launch: “The future is software writing its own software. Which is why I’m so in love with Pi: a coding agent that can extend itself.” The practical limitation is that Pi trusts the model to write correct TypeScript. There are no guardrails preventing a buggy extension from misbehaving. For straightforward capability additions this works well; for complex integrations involving external services, review what Pi generates before committing to it.

Getting Started

Pi moved to the Earendil package registry in May 2026. Install globally:

npm install -g @earendil-works/pi-coding-agent

Requirements: Node 22.19.0 or later, git, bash. Set your API key and run in any project directory:

pi config set ANTHROPIC_API_KEY=sk-ant-...
cd my-project && pi

For local models via Ollama, add a provider block to ~/.pi/agent/models.json:

{
  "providers": {
    "ollama-local": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama"
    }
  }
}

Recommended local models for Pi’s minimal context approach: Qwen 3.6:35b for complex multi-file tasks, Qwen 2.5-coder:7b when you need fast iteration. Because Pi’s harness footprint is small, smaller models have a better chance of succeeding than they would inside a larger harness.

Pi vs the Alternatives

Agent	System Prompt	Open Source	Local Models	Self-Extension	Built-in LSP
Pi	~900 tokens	Yes (MIT)	Yes (Ollama)	Yes	Via extension
OpenCode	~7,000+ tokens	Yes (MIT)	Yes (Ollama)	No	Yes
Aider	~5,000 tokens	Yes (Apache)	Yes	No	Partial
Claude Code	~7,000–10,000 tokens	No	No	No	No
Cline	~7,000–10,000 tokens	Yes (Apache)	Yes	No	Via VS Code

Pi is the right choice if context exhaustion is a real problem in your current workflow, if you work in privacy-sensitive environments where you would rather run models locally, or if you want a harness you can adapt without forking internals. It is not the right choice if you need built-in LSP diagnostics out of the box — OpenCode handles that better — or if you are onboarding a team that needs a polished, batteries-included experience from day one. Cline and Claude Code still win for ease of setup; Pi wins for control and context efficiency once you are willing to invest a few hours of configuration.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.