AI & DevelopmentDeveloper Tools

OpenAI GPT-Realtime-2: The Beta Voice API Is Gone — Here’s What You Build With Now

Abstract tech illustration showing voice audio waveform with API circuit patterns representing OpenAI GPT-Realtime-2 voice API
OpenAI GPT-Realtime-2: The GA Realtime API replaces the deprecated beta interface

On May 12, OpenAI removed the beta Realtime API from its platform entirely. No extension, no soft deprecation, no redirect. If your voice application was still wired to the beta endpoint, it stopped working that day. The General Availability Realtime API is the only supported path now — and it ships with three new models, a redesigned event schema, and breaking changes across the board.

This is not a “consider upgrading” situation. The beta is gone. Here is what changed and what you need to do now.

What Actually Broke: The Migration Checklist

The GA Realtime API is not a drop-in replacement for the beta. These changes are required for any application that was using the old beta interface:

  • Remove the beta header. Drop OpenAI-Beta: realtime=v1 from all requests — this header is no longer recognized.
  • Update authentication. Use POST /v1/realtime/client_secrets to generate ephemeral credentials for browser or mobile clients.
  • Switch the WebRTC endpoint. Establish sessions via /v1/realtime/calls instead of the previous beta path.
  • Add the required session.type parameter. The GA interface requires this field; omitting it breaks session creation.
  • Move output audio config. Audio output settings now live under session.audio.output — the old path is ignored silently.
  • Update event listener names. Key renames: response.text.delta becomes response.output_text.delta, response.audio.delta becomes response.output_audio.delta, and conversation.item.created splits into conversation.item.added and conversation.item.done.
  • Remove temperature settings. The GA API does not accept a temperature parameter. If your session config includes it, remove it.

SDK minimums to check: openai@4.57.0+ for Node.js and openai>=1.40.0 for Python. Anything older will not work with the GA interface. The full deprecation details and event rename table are documented in the OpenAI API changelog.

What You Are Migrating To: GPT-Realtime-2

GPT-Realtime-2 is the model you will be using after migration, and it is a meaningful step up from the beta’s gpt-4o-realtime-preview. The most significant change: GPT-5-class reasoning is now embedded inside the voice loop itself. There is no longer a chained transcription → LLM → text-to-speech pipeline running under the hood. The model listens, reasons, and speaks in a single unified flow — which reduces latency at higher reasoning levels and makes tool calling far more reliable.

The context window expanded from 32K to 128K tokens, which matters for longer call center interactions or multi-step enterprise workflows where conversation history needs to carry through. OpenAI reports an 11% performance improvement over GPT-Realtime-1.5. Zillow tested the model against adversarial call benchmarks and saw call success rates climb from 69% to 95% — a 26-point lift attributed to the reasoning improvements, not prompt engineering.

Parallel tool calls are now supported. The model can fire multiple tool requests simultaneously — checking a calendar while pulling a CRM record — and return results independently. Track results by call_id; they do not return in the order they were called. After tool execution completes, you must send a response.create event. Skip this and the model waits indefinitely.

Three Models: Which One to Use

OpenAI launched two companion models alongside GPT-Realtime-2, each targeting a distinct use case:

  • GPT-Realtime-2 — speech-to-speech with GPT-5-class reasoning. Use this for voice agents, customer service, and complex interactions. Priced at $32/million input tokens and $64/million output tokens ($0.40/million cached). With caching, expect approximately $0.25–$0.35 per minute of conversation.
  • GPT-Realtime-Translate — live speech-to-speech translation across 70+ input languages into 13 output languages. Auto-detects source language. Deutsche Telekom and Vimeo are already using it in production. Priced at $0.034/minute. Use this for multilingual broadcasts, cross-language call centers, or video applications — not for reasoning-heavy agents.
  • GPT-Realtime-Whisper — streaming speech-to-text at $0.017/minute. The right choice for live captioning, meeting transcription, or anywhere you need a real-time text record of audio without the full agent overhead. More details are available in the OpenAI voice agents guide.

Two Production Details That Actually Matter

The preamble pattern. Before any tool call, configure the model to speak a brief acknowledgment: “Let me pull that up” or “Checking your calendar now.” Without this, users hear silence during the tool execution window and assume the call dropped. They interrupt. The tool call breaks. The experience degrades. Add one instruction to your system prompt constraining preambles to a single sentence — otherwise the model will over-narrate.

Tune silence detection before tuning reasoning effort. The default silence_duration_ms is 500ms. In practice, this is often too sensitive — the model gets cut off mid-sentence on natural speech pauses. Raise it to 700–800ms and test against real audio samples. On reasoning effort: start at low (the default) for most applications. The headline benchmark numbers from OpenAI are measured at higher settings. What you get at low is fast and capable for most conversational use cases. Reserve high or xhigh for specific decision points where accuracy beats latency — not for every turn in a conversation.

The Bottom Line

The migration is real work — breaking event names, new auth flow, required session parameters — but it is not architecturally complex. If you have existing tests against the beta API, those tests now document exactly what to update. The GA Realtime API is more capable, better priced with caching, and stable. The full Realtime API migration guide walks through every required change. The beta was always a preview. The preview is over.

For the GPT-Realtime-Translate language list and pricing details, the model reference page has the complete breakdown. And if you are evaluating whether to migrate to the full streaming architecture or simplify to request-response audio flows, the TechCrunch overview from the launch covers the architectural trade-offs well.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *