OpenAI Rebuilds Its WebRTC Stack to Power Low-Latency Voice AI for 900M Weekly Users (May 4, 2026)
On May 4, 2026, OpenAI published a deep-dive engineering post detailing how it rearchitected WebRTC into a split relay-plus-transceiver system to deliver real-time Voice AI to more than 900 million weekly active users. The redesign keeps standard WebRTC behaviour at the client edge while changing how packets are routed inside OpenAI's infrastructure.
OpenAI on published an engineering deep dive titled “How OpenAI delivers low-latency voice AI at scale”, revealing that the company has rebuilt its WebRTC stack into a custom split relay-plus-transceiver architecture to serve real-time voice for more than 900 million weekly active users of ChatGPT and the Realtime API.
What Happened
The post, authored by OpenAI's Real-Time AI Interactions team, walks through three constraints that began to collide as voice traffic scaled: traditional one-port-per-session media termination does not fit OpenAI's infrastructure model, stateful ICE (Interactive Connectivity Establishment) and DTLS (Datagram Transport Layer Security) sessions need stable ownership across thousands of edge nodes, and global routing has to keep first-hop latency low even when models live in a different region from the speaker.
Their solution: a split relay that decouples the public-facing WebRTC endpoint a client connects to from the inference node that actually runs the model. A custom transceiver layer keeps the client view of the session standards-compliant while OpenAI freely repacks, reroutes, and reassigns sessions internally. The team writes that the redesign was driven by a single product requirement — that conversation should move at the speed of speech — and that any network-induced pause or jitter is heard immediately as awkward delays, clipped interruptions, or broken barge-in.
Key Details
- Scale target: Real-time voice for more than 900 million weekly active ChatGPT users, plus every developer using the Realtime API.
- Architecture change: A split relay separates the externally exposed WebRTC endpoint from the inference node, with a custom transceiver layer that preserves standard WebRTC semantics for clients.
- Engineering targets: Fast connection setup so users can start speaking immediately, low and stable media round-trip time, and minimal jitter and packet loss to keep turn-taking crisp.
- What stays standard: Standard ICE, DTLS, and SRTP are preserved on the client side — existing WebRTC SDKs and browsers do not need changes.
- What changes internally: Stateful session ownership is decoupled from physical media termination, allowing OpenAI to relocate or rebalance sessions without renegotiating with the client.
What Developers and Users Are Saying
The Hacker News thread (item 48013919) drew strong engagement from voice infrastructure engineers. Several commenters who run production voice agents on LiveKit, Daily, and self-hosted Janus servers said the post validates patterns they had already adopted — particularly the decision to terminate media at the edge while keeping inference centralized. Others noted that OpenAI is essentially confirming what insiders already knew: ChatGPT Advanced Voice runs on LiveKit infrastructure, and the new relay design is OpenAI's first fully owned alternative.
On the developer side, the reaction was largely practical. Builders working on the Realtime API praised the implicit promise — if OpenAI handles relay and routing, smaller teams can connect clients directly to OpenAI's media edge instead of running their own SFU. Sceptics on r/MachineLearning pointed out that the post is light on quantitative latency benchmarks: there are no published p50/p99 numbers comparing the old stack to the new one.
What This Means for Developers
For teams building on the Realtime API, the post is effectively a green light to ship browser-direct WebRTC connections without owning their own media server. OpenAI is taking responsibility for the hardest parts of voice infrastructure — ICE traversal, codec negotiation, jitter buffering, and global edge routing — and exposing them as a managed service. Developers running hybrid stacks (browser → their backend → OpenAI) should reassess whether the extra hop is still earning its keep.
For competing platforms — LiveKit, Daily, Vonage, Twilio Voice, and Pipecat — the post is a clear signal that OpenAI intends to compete at the infrastructure layer, not just the model layer. Independent voice-agent platforms will need to differentiate on tooling, observability, and multi-model support rather than raw latency alone.
What's Next
OpenAI's blog flagged that further engineering posts on real-time inference, codec choice, and turn-taking are coming. The Realtime API documentation at developers.openai.com has been updated alongside the announcement, and the Voice Agents guide has new sections on session lifecycle and reconnection. No pricing changes were announced, and the existing Realtime API rates remain in effect.
Sources
- OpenAI Engineering Blog — primary source: the original technical post.
- Hacker News discussion thread — developer reactions and architecture debate.
- OpenAI Developers Blog — complementary updates to the Realtime API and audio models.
- Cloudflare Realtime Voice AI — comparison piece on competing infrastructure.
- .NET Ramblings coverage — independent summary published the same day.
- aionda.blog — in-depth technical analysis of the Realtime API and WebRTC.
Stay up to date with Doolpa
Subscribe to Newsletter →