
OpenAI launched GPT-5.6 on June 26. Sol, the flagship, is getting all the press. But for developers who pay the API bill, the more important model is Luna — the $1-per-million-token economy tier built for summarization, classification, extraction, and everything else that runs at scale. General availability is coming in the next few weeks. If you are on GPT-5.5 today, now is the time to figure out where your workloads belong.
Three Tiers, One Generation
GPT-5.6 replaces the old approach of shipping a single flagship model with a family of three named tiers. The number (5.6) is the generation. The name is the capability tier, and those tiers are permanent — they advance independently on their own schedules.
- Sol — Flagship. 1.5M token context, two new reasoning modes (max and ultra for subagent coordination). Built for complex coding, security research, and frontier agentic workflows.
- Terra — Mid-tier. GPT-5.5-competitive quality at half the price. The practical migration path for most production workloads running on GPT-5.5 today.
- Luna — Economy. Fastest, cheapest. Designed for high-volume, routine tasks where throughput and unit cost matter more than peak reasoning.
This matters architecturally. When you build on Luna, you are not locking into a frozen model snapshot. Luna 5.7 will be smarter than Luna 5.6, but it will still be the fast, cheap, high-throughput tier. You can design around that stability — and that is new.
The Pricing Case for Luna
Luna is $1 input / $6 output per million tokens. Sol is $5 / $30. Terra is $2.50 / $15. The gap is not subtle.
Run the numbers on a realistic workload. Say your app runs 50,000 daily summarizations averaging 1,500 input tokens each — 75 million tokens per day:
- On Sol: $375/day → $11,250/month
- On Luna: $75/day → $2,250/month
That is a $9,000/month difference for tasks that do not require Sol’s reasoning depth. If your workload is routine — classification, tagging, extraction, first-pass drafts — Luna is the right tier. Using Sol for those tasks is not ambitious, it is expensive.
The New Caching Model Changes the Math Further
GPT-5.6 ships with a revised prompt caching system. Explicit cache breakpoints and a 30-minute minimum cache lifetime replace the previous automatic-only approach. Cache reads still get a 90% discount. The new wrinkle: cache writes are billed at 1.25x the uncached input rate.
For Luna at $1/1M input tokens, cache writes cost $1.25/1M and cache reads cost $0.10/1M. For a pipeline with a 3,000-token system prompt hitting 10,000 daily requests at a 95% cache hit rate:
- Without caching: 30M tokens × $1/1M = $30/day
- With explicit caching: writes ($1.88) + reads ($2.85) = $4.73/day
- Savings: 84%
The 1.25x write premium is noise compared to the 90% read discount. What explicit breakpoints actually give you is control — you can mark exactly where your stable system prompt ends and the variable user message begins. For agent pipelines that reuse the same instruction block across thousands of requests, this precision makes your cost model predictable instead of approximate.
Which Tier Fits Which Task
The practical heuristic: if you can write a clear rubric for what “correct” looks like, Luna can handle it. If the task requires judgment calls, nuanced multi-turn reasoning, or output that goes directly to a customer without human review, move up.
| Task Type | Tier |
|---|---|
| Bulk summarization, tagging, classification | Luna |
| Named entity extraction, structured data parsing | Luna |
| First-pass email and ticket drafts | Luna |
| High-volume routing and triage | Luna |
| Customer-facing chat with nuanced conversation | Terra |
| Document analysis requiring reliable quality | Terra |
| Complex multi-step coding, security research | Sol |
| Long-horizon agent workflows | Sol |
For teams migrating from GPT-5.5, the move is: start everything on Terra, profile which tasks are over-paying, then drop those to Luna. OpenAI positions Terra as GPT-5.5-competitive performance at roughly half the cost — so the quality floor is already high before you start optimizing downward.
Access Status and Preparation Steps
GPT-5.6 is not generally available yet. About 20 organizations have access under a US government-managed security evaluation. OpenAI expects to expand access next week, with general availability — across ChatGPT, Codex, and the API — coming in the weeks following. Mid-July is a realistic target, contingent on how the government’s review wraps up.
You cannot use GPT-5.6 Luna in production today. But you can prepare:
- Audit your current GPT-5.5 usage by task type and volume.
- Map each task class to Luna, Terra, or Sol using the table above.
- Build evals for your Luna-candidate tasks now, against your current model as baseline.
- Structure your system prompts with explicit cache boundaries in mind — breakpoints reward clean prompt architecture.
- Budget for the switch: most teams running routine tasks on GPT-5.5 should see 60-80% cost reduction when they move to Luna.
Everyone covered Sol because Sol benchmarks make for good headlines. The real optimization opportunity is Luna, and it arrives for most developers in a matter of weeks.













