GLM-5.2 on Cloudflare Workers AI: 1M Context, Open Weights

GLM-5.2 AI coding model on Cloudflare Workers AI with 1M token context window

GLM-5.2 lands on Cloudflare Workers AI with 1M context and MIT open weights

Cloudflare added GLM-5.2 to Workers AI on June 16 — one day after Z.ai released the model publicly. The model is a 744-billion-parameter Mixture-of-Experts coding model with a 1-million-token context window, function calling, and two reasoning modes. Open weights ship this week under an MIT license. If you are already using Cloudflare Workers AI for inference, switching is a model name change.

What Changed from GLM-5.1

Z.ai’s previous release, GLM-5.1, scored 58.4 on SWE-bench Pro — enough to earn a writeup here for coding eight hours straight. GLM-5.2 moves that number to 62.1, which puts it above GPT-5.5 (58.6) on the same benchmark. On FrontierSWE — a long-horizon test that measures whether an agent can complete open-ended technical projects over hours — GLM-5.2 hit 74.4%, trailing Claude Opus 4.8 (75.1%) by one point and beating GPT-5.5 (72.6%).

The more consequential change is context. GLM-5.1 handled roughly 200K tokens. GLM-5.2 handles 1 million — enough to load an entire mid-sized codebase, test suite, configuration files, and conversation history into a single session without truncation. The output cap also quadrupled to 131K tokens, so the model can generate entire files or multi-file refactors in one shot rather than forcing you to stitch outputs together. VentureBeat put the cost comparison plainly: GLM-5.2 beats GPT-5.5 on long-horizon benchmarks at roughly one-sixth the cost.

How to Use It on Workers AI

The model identifier on Cloudflare’s Workers AI is @cf/zai-org/glm-5.2. Three access paths are available: the Workers AI binding, the REST API, and AI Gateway. The binding is the most straightforward if you are already in the Cloudflare ecosystem:

// wrangler.toml: add [ai] binding = "AI"
const response = await env.AI.run("@cf/zai-org/glm-5.2", {
  messages: [
    { role: "user", content: "Refactor this function to use async/await..." }
  ]
});

For REST API access:

curl https://api.cloudflare.com/client/v4/accounts/$CF_ACCOUNT_ID/ai/run/@cf/zai-org/glm-5.2 \
  --header "Authorization: Bearer $CF_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{"messages": [{"role": "user", "content": "Fix this bug..."}]}'

One caveat worth knowing: Workers AI currently caps GLM-5.2’s context at 262,144 tokens rather than the full 1 million. Cloudflare has stated it plans to increase this. The 256K limit is still five times what most models offer on the platform, and it covers the majority of real-world coding sessions.

High vs Max: Picking the Right Thinking Mode

GLM-5.2 ships with two thinking-effort settings. High mode produces balanced reasoning with faster response times and roughly half the token output. It is the right choice for most coding tasks, code review, and routine generation. Max mode runs extended chain-of-thought, generates up to 85K output tokens per task, and adds 30–80% to first-token latency. Z.ai recommends Max as the default for serious coding work — complex refactors, architecture decisions, multi-step debugging sessions where accuracy matters more than speed.

In practice, a hybrid approach works well: Max for the initial architecture pass, High for iteration. Thinking mode adds cost, so keep an eye on Neuron consumption on the free tier (10,000 per day) if you are running extended sessions.

Agent Tool Compatibility

GLM-5.2 ships with an OpenAI-compatible endpoint, which means it drops into agent tools without custom integration code. Claude Code, Cline, OpenCode, Roo Code, Goose, and OpenClaw all connect with a base URL swap and a model name change. Early community reports note that it plugged into agent environments immediately and produces clean code. The 1M context window is particularly useful here — agents can load entire repositories into working memory instead of summarizing and truncating on every turn.

Open Weights and Self-Hosting

Open weights land on Hugging Face under the zai-org/GLM-5.2 organization this week, MIT licensed. Ollama users can pull it with ollama pull glm5:latest once available. The catch: GLM-5.2 is a 744B MoE model. Running it locally requires 256GB or more of RAM for the 2-bit quantization. That is a serious-hardware requirement, not a laptop experiment. Most developers will use it via Workers AI or the Z.ai API — but the MIT license matters for teams that need air-gapped deployment or want to fine-tune without vendor restrictions.

According to the Cloudflare changelog entry from June 16, GLM-5.2 is available now via Workers AI binding, REST API, and AI Gateway. For teams already running inference on Workers AI, this is worth evaluating immediately — particularly if long-horizon coding tasks or large-codebase agents are part of your stack.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

GLM-5.2 on Cloudflare Workers AI: 1M Context, Open Weights

What Changed from GLM-5.1

How to Use It on Workers AI

High vs Max: Picking the Right Thinking Mode

Agent Tool Compatibility

Open Weights and Self-Hosting

Amazon Bedrock AgentCore: Deploy Agents in 4 Commands

Local LLMs Are Good Now: What Actually Changed in 2026

Leave a reply Cancel reply

More in:News

Copilot Cloud Agent for Linear Is Now GA: Setup and Limits

Supabase Self-Hosted Switches to Envoy: Prepare Before August 9

Bun Rust Rewrite: 6 Weeks Later, 2,475 PRs Still Open

AI Is Buying Rare Books and Shredding Them. Here’s Why

Scriptc: Vercel’s TypeScript Compiles to Native in 2026

ExploitGym: OpenAI’s AI Escaped Its Sandbox and Breached Hugging Face

Categories

What Changed from GLM-5.1

How to Use It on Workers AI

High vs Max: Picking the Right Thinking Mode

Agent Tool Compatibility

Open Weights and Self-Hosting

Share

You may also like

Leave a reply Cancel reply

More in:News

Categories

Latest Posts