Kimi K2.7-Code: Moonshot’s Open-Weight 1T Coding Agent

Abstract neural network nodes forming code patterns representing Kimi K2.7-Code open-weight coding agent in blue and white

Kimi K2.7-Code: Moonshot AI's open-weight 1T coding agent

Moonshot AI shipped Kimi K2.7-Code on June 12, putting a 1-trillion-parameter coding agent on Hugging Face under a Modified MIT license. It promises 30% fewer reasoning tokens than its predecessor K2.6, runs at $0.95/$4.00 per million tokens, and integrates today with Claude Code, Cline, and OpenCode. The benchmarks look solid. They are also entirely vendor-run — and that distinction matters more than Moonshot wants you to notice.

What You Are Getting

Kimi K2.7-Code is a Mixture-of-Experts model with 1 trillion total parameters and 32 billion active per token — the same compute profile as running a dense 32B model at inference time. It uses Multi-head Latent Attention (MLA), a 256K-token context window, and a DeepSeek-V3-style architecture that Moonshot has been iterating on since K2.5 launched in January 2026. The K2 family ships roughly every two months; this is the third iteration in six months.

The model includes a 400M-parameter MoonViT vision encoder for multimodal input, but K2.7 Code sharpens its focus on software engineering: long-horizon refactors, complex debugging sessions, and multi-step agentic workflows that run for hours rather than seconds.

The Price Advantage Is Real

At $0.95 per million input tokens and $4.00 per million output tokens, Kimi K2.7-Code undercuts Claude Opus 4.7 (roughly $15/$75) and GPT-5.5 (roughly $10/$30) by a factor of five to twelve. For teams running agentic coding workflows that burn through significant token volume, that gap is not trivial.

The open weights under Modified MIT push the advantage further: self-host, fine-tune, and serve it at infrastructure cost if you have the hardware. The license has one meaningful constraint — products above 100 million monthly active users or $20 million monthly revenue must display “Kimi K2” visibly in their UI. Below those thresholds, it is effectively MIT.

The Benchmarks Deserve Scrutiny

Moonshot reports gains of +21.8% on Kimi Code Bench v2, +31.5% on MLS Bench Lite, and +11% on Program Bench compared to K2.6. MCPMark Verified — which tests real-world tool use across GitHub, Notion, Postgres, file systems, and browser automation — comes in at 81.1, ahead of Claude Opus 4.8’s 76.4 but behind GPT-5.5’s 92.9.

Benchmark	K2.7 Code	K2.6	Change
Kimi Code Bench v2	62.0	50.9	+21.8%
Program Bench	53.6	48.3	+11.0%
MLS Bench Lite	35.1	26.7	+31.5%
MCPMark Verified	81.1	~72.8	+11.4%

All benchmarks above are Moonshot-proprietary. No independent SWE-bench or DeepSWE results exist as of June 13, 2026.

Every benchmark above is designed and run by Moonshot. No SWE-bench Verified numbers exist. No DeepSWE submission. VentureBeat reported that practitioners are already questioning the benchmark design publicly. This is not a reason to dismiss the model — it is a reason to run your own evaluation before routing production traffic through it.

Using Kimi K2.7-Code Now

The Kimi API is OpenAI-compatible, so the integration overhead is low. Swap the base URL and model ID in any existing OpenAI SDK setup:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MOONSHOT_API_KEY",
    base_url="https://api.moonshot.ai/v1"
)

response = client.chat.completions.create(
    model="kimi-k2.7-code",
    messages=[
        {"role": "user", "content": "Refactor this function to handle edge cases..."}
    ]
)
print(response.choices[0].message.content)

Know these constraints before you build: temperature is fixed at 1.0, top-p at 0.95, and you cannot disable thinking mode — the API returns an error if you try. Set a Project Daily Spending Budget in the Kimi Open Platform dashboard before your first production run to avoid unexpected bills.

For coding agents: OpenCode supports Kimi K2.7-Code natively — run opencode auth login, select Moonshot AI, then use /models to switch. For Claude Code and Cline, use the Anthropic-compatible endpoint at https://api.moonshot.ai/anthropic. Moonshot’s official integration guide covers the exact environment variable setup for each tool.

Self-Hosting: Possible, Not Easy

Weights are on Hugging Face at moonshotai/Kimi-K2.7-Code with native INT4 quantization. Full precision runs around 600GB on disk; community INT4 quants land around 240GB. This is multi-GPU server territory — the 32B active parameters make inference comparable to a dense 32B model, but you need serious hardware to load the full weight set. Moonshot recommends vLLM for production API serving, SGLang for structured generation, and their own KTransformers engine for K2-architecture-specific tuning. The MoonshotAI/Kimi-K2 GitHub repo has deployment instructions.

Worth Testing, Not Worth Over-Trusting Yet

The value proposition is straightforward: if your agentic coding workflows are burning money on frontier API calls, Kimi K2.7-Code deserves a structured evaluation. The price gap versus closed models is large enough that significant performance degradation might still be cost-effective for many workloads. The benchmark situation is genuinely problematic — all vendor-run, no independent results — but that is a reason to test carefully, not to ignore the model. Run it against your actual tasks, compare outputs side-by-side, and wait for SWE-bench numbers before committing any production routing. The weights are not going anywhere.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.