GPT-Realtime-2 and WebRTC: Migrate and Build Now

GPT-Realtime-2 and WebRTC speech-to-speech voice agent diagram with sound waves and AI nodes on blue background

GPT-Realtime-2: Build Speech-to-Speech AI Agents With WebRTC

The OpenAI Realtime API beta is gone. As of May 12, 2026, sending the OpenAI-Beta: realtime=v1 header returns a beta_api_shape_disabled error and nothing else. If your voice integration has been silent since then, that is probably why. The replacement — GPT-Realtime-2 — is not just a migration target. It is a materially different model with configurable reasoning, a 128K token context window, and reliable tool use. The migration takes 30–60 minutes. Here is exactly what to change.

Four Changes That Break Existing Integrations

The GA interface is not a rename of the beta. Four things changed, and all four can silently break your integration if you miss any of them:

Remove the beta header. Delete OpenAI-Beta: realtime=v1 from every request. This alone unblocks most broken builds.
New ephemeral key endpoint. Browser and mobile clients now mint short-lived keys via POST /v1/realtime/client_secrets. Keys expire in one minute.
New WebRTC SDP endpoint. The SDP exchange for WebRTC connections moves to POST /v1/realtime/calls. The old endpoint returns a 404.
Add session.type. Omitting this field causes session creation to fail without a clear error message.

Event Names That Changed

Three event renames are the most common migration gotcha. They produce no errors — just missing data in your handlers:

response.text.delta → response.output_text.delta
conversation.item.created → conversation.item.added and conversation.item.done (now two events)
Legacy content types replaced by output_text and output_audio

Check every dc.addEventListener("message", ...) handler in your codebase for the old event names before you consider the migration done.

Building a WebRTC Session With GPT-Realtime-2

The WebRTC flow has two halves: your server mints the key, your browser uses it. Your API key never reaches the browser — only the ephemeral token does.

Server (Node.js) — mint the ephemeral key:

const res = await fetch("https://api.openai.com/v1/realtime/client_secrets", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "gpt-realtime-2",
    voice: "alloy",
    reasoning: { effort: "low" }
  })
});
const { client_secret } = await res.json();
// Send client_secret.value to the browser

Browser — set up the WebRTC peer connection:

const pc = new RTCPeerConnection();
const audio = document.createElement("audio");
audio.autoplay = true;
pc.ontrack = e => audio.srcObject = e.streams[0];

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(stream.getTracks()[0]);

const dc = pc.createDataChannel("oai-events");

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

const sdp = await fetch("/v1/realtime/calls", {
  method: "POST",
  headers: { Authorization: `Bearer ${ephemeralKey}`, "Content-Type": "application/sdp" },
  body: offer.sdp
});
await pc.setRemoteDescription({ type: "answer", sdp: await sdp.text() });

The oai-events data channel carries all JSON events — session config, tool calls, transcripts, and turn completions all flow through it.

Reasoning Effort: What It Actually Means in Production

GPT-Realtime-2 supports five reasoning levels: minimal, low, medium, high, and xhigh. The model can now pause and think before responding to complex questions instead of immediately generating the first token.

Here is the thing to know before reading OpenAI’s benchmark results: every published number was produced at high or xhigh reasoning effort. Production defaults to low. Start at low — it keeps latency tolerable for conversational flow — then increase effort only for specific turns that need multi-step reasoning or tool orchestration. The context window expanded from 32K to 128K tokens, so long sessions no longer get truncated mid-conversation.

The Three Realtime Models Compared

Model	Use case	Price/min
GPT-Realtime-2	Voice agents with reasoning and tool use	$0.18–$0.46 (uncached)
GPT-Realtime-Translate	Live speech translation, 70+ languages → 13 outputs	$0.034 flat
GPT-Realtime-Whisper	Streaming transcription with partial real-time results	$0.017 flat

GPT-Realtime-Translate adapts to the source speaker’s voice tone and pitch rather than layering a synthetic voice on top. GPT-Realtime-Whisper returns provisional partial transcripts as speech arrives, then revises them for high final accuracy. Use GPT-Realtime-2 for agentic workflows that need reasoning; use the other two for narrow tasks where cost matters more than intelligence.

Prompt caching applies to GPT-Realtime-2 and cuts the uncached audio input rate by roughly 98.75% on repeated context, bringing effective cost down to $0.05–$0.10 per minute for sessions with substantial repeated context.

What to Do This Week

Audit for the beta header. Search your codebase for realtime=v1. Remove it. Test against the GA endpoint.
Update your event handlers. Check for the three renamed events listed above. Missed renames produce silent data loss with no exceptions thrown.
Set reasoning effort explicitly. Default is low; setting it in your session config makes behavior predictable when you tune it later.

The field-tested WebRTC migration repo on GitHub documents every endpoint change, event rename, and session schema update from real production debugging. The official OpenAI Realtime API docs have the canonical GA event schemas. For cost projections before you commit to GPT-Realtime-2 at scale, the realtime cost guide breaks down token math for both WebRTC and WebSocket sessions.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

GPT-Realtime-2 and WebRTC: Migrate and Build Now

Four Changes That Break Existing Integrations

Event Names That Changed

Building a WebRTC Session With GPT-Realtime-2

Reasoning Effort: What It Actually Means in Production

The Three Realtime Models Compared

What to Do This Week

Project Glasswing Expands: Claude Mythos Found 10,000+ Critical Flaws

GitHub Actions 2026 Security Roadmap: What Changes Now

Leave a reply Cancel reply

More in:AI & Development

LLM API Costs Dropped 94%: What to Fix in Your Architecture Now

40% of Agentic AI Projects Will Be Canceled by 2027. Here Is Why Yours Might Be One of Them.

AMD Helios Hits Azure: 72 GPUs, 31 TB HBM4, Rival Nvidia

EU AI Act August 2: What Developers Must Do Now

GPT-5.6 Sol, Terra, and Luna: Developer Guide and Migration

Grok Build Goes Open Source After Secretly Uploading Your Code

Categories

Four Changes That Break Existing Integrations

Event Names That Changed

Building a WebRTC Session With GPT-Realtime-2

Reasoning Effort: What It Actually Means in Production

The Three Realtime Models Compared

What to Do This Week

Share

You may also like

Leave a reply Cancel reply

More in:AI & Development

Categories

Latest Posts