
On May 12, OpenAI removed the beta Realtime API from its platform entirely. No extension, no soft deprecation, no redirect. If your voice application was still wired to the beta endpoint, it stopped working that day. The General Availability Realtime API is the only supported path now — and it ships with three new models, a redesigned event schema, and breaking changes across the board.
This is not a “consider upgrading” situation. The beta is gone. Here is what changed and what you need to do now.
What Actually Broke: The Migration Checklist
The GA Realtime API is not a drop-in replacement for the beta. These changes are required for any application that was using the old beta interface:
- Remove the beta header. Drop
OpenAI-Beta: realtime=v1from all requests — this header is no longer recognized. - Update authentication. Use
POST /v1/realtime/client_secretsto generate ephemeral credentials for browser or mobile clients. - Switch the WebRTC endpoint. Establish sessions via
/v1/realtime/callsinstead of the previous beta path. - Add the required
session.typeparameter. The GA interface requires this field; omitting it breaks session creation. - Move output audio config. Audio output settings now live under
session.audio.output— the old path is ignored silently. - Update event listener names. Key renames:
response.text.deltabecomesresponse.output_text.delta,response.audio.deltabecomesresponse.output_audio.delta, andconversation.item.createdsplits intoconversation.item.addedandconversation.item.done. - Remove temperature settings. The GA API does not accept a temperature parameter. If your session config includes it, remove it.
SDK minimums to check: openai@4.57.0+ for Node.js and openai>=1.40.0 for Python. Anything older will not work with the GA interface. The full deprecation details and event rename table are documented in the OpenAI API changelog.
What You Are Migrating To: GPT-Realtime-2
GPT-Realtime-2 is the model you will be using after migration, and it is a meaningful step up from the beta’s gpt-4o-realtime-preview. The most significant change: GPT-5-class reasoning is now embedded inside the voice loop itself. There is no longer a chained transcription → LLM → text-to-speech pipeline running under the hood. The model listens, reasons, and speaks in a single unified flow — which reduces latency at higher reasoning levels and makes tool calling far more reliable.
The context window expanded from 32K to 128K tokens, which matters for longer call center interactions or multi-step enterprise workflows where conversation history needs to carry through. OpenAI reports an 11% performance improvement over GPT-Realtime-1.5. Zillow tested the model against adversarial call benchmarks and saw call success rates climb from 69% to 95% — a 26-point lift attributed to the reasoning improvements, not prompt engineering.
Parallel tool calls are now supported. The model can fire multiple tool requests simultaneously — checking a calendar while pulling a CRM record — and return results independently. Track results by call_id; they do not return in the order they were called. After tool execution completes, you must send a response.create event. Skip this and the model waits indefinitely.
Three Models: Which One to Use
OpenAI launched two companion models alongside GPT-Realtime-2, each targeting a distinct use case:
- GPT-Realtime-2 — speech-to-speech with GPT-5-class reasoning. Use this for voice agents, customer service, and complex interactions. Priced at $32/million input tokens and $64/million output tokens ($0.40/million cached). With caching, expect approximately $0.25–$0.35 per minute of conversation.
- GPT-Realtime-Translate — live speech-to-speech translation across 70+ input languages into 13 output languages. Auto-detects source language. Deutsche Telekom and Vimeo are already using it in production. Priced at $0.034/minute. Use this for multilingual broadcasts, cross-language call centers, or video applications — not for reasoning-heavy agents.
- GPT-Realtime-Whisper — streaming speech-to-text at $0.017/minute. The right choice for live captioning, meeting transcription, or anywhere you need a real-time text record of audio without the full agent overhead. More details are available in the OpenAI voice agents guide.
Two Production Details That Actually Matter
The preamble pattern. Before any tool call, configure the model to speak a brief acknowledgment: “Let me pull that up” or “Checking your calendar now.” Without this, users hear silence during the tool execution window and assume the call dropped. They interrupt. The tool call breaks. The experience degrades. Add one instruction to your system prompt constraining preambles to a single sentence — otherwise the model will over-narrate.
Tune silence detection before tuning reasoning effort. The default silence_duration_ms is 500ms. In practice, this is often too sensitive — the model gets cut off mid-sentence on natural speech pauses. Raise it to 700–800ms and test against real audio samples. On reasoning effort: start at low (the default) for most applications. The headline benchmark numbers from OpenAI are measured at higher settings. What you get at low is fast and capable for most conversational use cases. Reserve high or xhigh for specific decision points where accuracy beats latency — not for every turn in a conversation.
The Bottom Line
The migration is real work — breaking event names, new auth flow, required session parameters — but it is not architecturally complex. If you have existing tests against the beta API, those tests now document exactly what to update. The GA Realtime API is more capable, better priced with caching, and stable. The full Realtime API migration guide walks through every required change. The beta was always a preview. The preview is over.
For the GPT-Realtime-Translate language list and pricing details, the model reference page has the complete breakdown. And if you are evaluating whether to migrate to the full streaming architecture or simplify to request-response audio flows, the TechCrunch overview from the launch covers the architectural trade-offs well.













