
Z.ai open-sourced GLM-5.2 on June 17 under an MIT license — full commercial use, no royalties, no acceptable-use restrictions. The model scores 62.1 on SWE-bench Pro against GPT-5.5’s 58.6, and the API costs $2.40 per million tokens blended versus $13.33 for GPT-5.5. If you are running coding agents at OpenAI prices, you now have a real alternative you can download, self-host, and fine-tune on your own data today.
What GLM-5.2 Actually Is
GLM-5.2 is a 744B-parameter sparse Mixture-of-Experts model — roughly 40B parameters activate per token, keeping inference costs well below what the headline number implies. It has a 1-million-token context window built for long-horizon agentic tasks: large codebase analysis, full-repo debugging, regulatory document review. Z.ai built it explicitly as a coding agent flagship, and the benchmarks back that up.
The technical feature that makes the 1M context economically viable is IndexShare — a sparse attention optimization that reuses the same token index across every four layers instead of recomputing it per layer. This cuts per-token FLOPs by 2.9x at 1M context. The result is that running a million-token prompt does not cost disproportionately more than a short one, which has historically killed long-context adoption at scale.
The Benchmark Numbers
Here is how GLM-5.2 compares against GPT-5.5 on the benchmarks that matter for agentic work:
| Benchmark | GLM-5.2 | GPT-5.5 | Winner |
|---|---|---|---|
| SWE-bench Pro | 62.1 | 58.6 | GLM-5.2 |
| FrontierSWE | 74.4% | 72.6% | GLM-5.2 |
| PostTrainBench | 34.3% | 25.0% | GLM-5.2 |
| MCP-Atlas (tool use) | 77.0 | 75.3 | GLM-5.2 |
| Terminal-Bench 2.1 | 81.0 | 84.0 | GPT-5.5 |
SWE-bench Pro tests against real GitHub issues with full repository context — not synthetic puzzles. GLM-5.2 leads on all four agentic coding benchmarks and trails only on Terminal-Bench, which skews toward general-purpose terminal tasks. For agent-driven coding specifically, GLM-5.2 now holds the lead on most open benchmarks.
The Cost Gap Is the Real Story
GLM-5.2’s API runs at $1.40 per million input tokens and $4.40 output — blended at a 2:1 ratio, that is $2.40 per million. GPT-5.5 comes in at $5.00 input and $30.00 output, or $13.33 blended. At 100,000 requests per day on average 3,000-token prompts, that works out to $21,600 per month versus $120,000. At scale, that difference changes the economics of AI-powered products.
Self-hosting removes the per-token cost entirely. The FP8 weights are on HuggingFace at zai-org/GLM-5.2-FP8 and run on vLLM, SGLang, or transformers. You will need around 800GB of NVMe storage. The MIT license means you can fine-tune on proprietary data, run air-gapped, and commercialize the output with no royalties and no approval from Z.ai. If Z.ai changes its pricing tomorrow, your self-hosted deployment is unaffected.
huggingface-cli download zai-org/GLM-5.2-FP8 --local-dir ./glm5-2-fp8 --repo-type model
Drop-In Compatibility With Your Current Tools
Z.ai ships an OpenAI-compatible API endpoint. If you are already using Claude Code, Cline, Roo Code, Goose, OpenCode, Crush, OpenClaw, or Kilo Code, switching to GLM-5.2 is a base-URL change in your config — no SDK swap, no code rewrite. Vercel integrated it into their AI Gateway within three days of the June 13 release. Guillermo Rauch described the coding output as “genuinely impressed, almost shocked.” A three-day turnaround from open-source release to production integration is not a normal thing.
What It Does Not Do
GLM-5.2 has no vision support — text and code only. If your workflows depend on image input or multimodal reasoning, it is not a replacement for GPT-4o or Claude Opus 4.8 in those scenarios. The model has significant Chinese-language training data; for tasks requiring deep linguistic nuance in European languages, test it against your specific workload before committing. And self-hosting 744B parameters is not a weekend project — you need real infrastructure to support it.
The Bigger Pattern
GLM-5.2 is the third open-source release in 18 months to genuinely close the gap with frontier proprietary models — after DeepSeek R1 for reasoning and DSpark for inference speed. Each follows the same pattern: a lab open-sources something that should not be free at that quality level, the developer community stress-tests it within days, and proprietary providers respond with price cuts. That cycle is accelerating, and GLM-5.2 makes the case that you do not need to pay premium closed-model prices to run competitive coding agents. The weights are available now.













