GPT-Realtime-2 Voice Model Now Available in API | OpenAI posted on the topic

11,044,903 followers

1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

135 Comments

Transcript

Hey everyone, we introducing new real time audio models in the opening the API. In this demo I'll show two of them. GT real Time translate for live translations and GT Real time too for voice agents that can follow instruction and take actions. Let's start with translations because that one feels so magical. I speak French but see I need to present to an audience around the world. The English show here is the most live audio output captured directly from this laptop with transcriptions. Now, as I start speaking in French, we'll lower the volume of my mic and increase the one from the model so you can have a real feel for it. No edit to the audio. Let's give it a try. OK, titres. Impressionnant, c'est que le mod��le peut m'��couter et traduire. Is that the model can listen to me and translate while I'm speaking. It waits for the keyword like the verb. Start translating right away exactly more commands and the result is a much more natural conversation, just like a dialogue between two people problem I can even interrupt in German. And the model switches effortlessly between my German and your French. And we can even include technical terms like GPT, real time open AI or computer use and the model has no trouble handling that. Mercy buku Dong. Isn't that amazing? The model can translate across 70 different languages in real time, really following the shape of every sentence. O Whether you're building a media platform or tools for customer support or education, we believe that this can help you break down the language barriers. And this model is just one of the ways we're improving voice intelligence. So for the next demo, let's talk about GT Real Time 2, our new model that brings intelligent reasoning to voice agents. O let's bring up my phone and take a look at my personal Voice Assistant. Hi there. Hi again. What's up? Yeah, I have a customer meeting coming up. Can you take a look at my calendar? You have a meeting with Sable Crest Robotics in 12 minutes and you're meeting with Alex Kim, There's CTO. Great. Thank you. Ohh. Please stay quiet for a second until I say back to demo. Ramon. Don't forget, now that these models have things like reasoning and parallel tool calling, it's even more important to use things like preambles. This way the model can explain itself and update the user. Thank you, Jason, for the great reminder. Very important actions can of course, take a few seconds. And so it's very important for the model to acknowledge those with GPT real time too. You can communicate directly during the reasoning and the tool calling so the user stays informed. And by the way, what makes voice agents so natural now is that this state in the conversation, Jason and I have been chatting. The model has been listening and it's still listening now, but not interrupting us until I say back to demo. I'm here when you're ready to continue the demo. Pretty cool, right? So now let's highlight what Jason just mentioned with preambles. Let's ask another task and say, hey, could you now update the CRM and put the meaning of today as a brief and the next steps? Let me pull the latest context and update your CRM Save Wilcrest launched warehouse automation this morning. Expansion is active. Security review is the blocker. Thank you. I'm all set. Please take wise again as I wrap this U What's exciting here is that you can now connect the model to any kind of system. It could be your dashboards, the services you're using, even connected devices and so much more. So that was a quick preview of our new real time audio model. Coming to the opening API, you can now create agents that keep conversation going as they think in the background. They can translate live across 70 languages, they can preserve context, and they can even act inside the products you're already using voice. And truly become the primary interface now and we can't wait to see what you build with these new models. Thanks for watching. Hey, back to demo. How was that smooth and clear? It felt natural and demo friendly.

Mustafa ÖZTÜRK 1mo

Real-time reasoning is becoming incredibly capable. But in real operational environments, the challenge is rarely the conversation itself. It’s what happens after: how decisions stay aligned, validated, and consistently executed across systems and workflows.

18 Reactions

Abdullah Ahsan 1mo

We’re moving beyond conversational AI into real-time operational intelligence. The combination of voice, reasoning, translation, and execution capabilities will significantly change how enterprises interact with systems, workflows, and data over the next few years.

5 Reactions

Thad Kemp 1mo

This is a meaningful shift. Voice AI has historically been constrained by a tradeoff: low latencyor high reasoning capability Closing that gap changes what is actually buildable. Once voice agents can reason in real time — not just retrieve responses — they move from assistants to collaborators. That unlocks: live problem solving multilingual workflow orchestration dynamic enterprise support real-time decision assistance And importantly, it makes voice interfaces operationally viable for far more enterprise use cases. The next generation of AI interfaces may not look like software at all. They may feel more like conversations with reasoning systems.

3 Reactions

N Kiran Kumar 1mo

Curious to see real-world benchmarks on: noisy environments Hinglish/code-mix speaker diarization telecom-grade audio conditions Because production voice AI still breaks more from audio chaos than LLM intelligence.

6 Reactions

iSolutions Force 1mo

This is a major step forward for practical voice AI. The shift from scripted voice bots to real-time collaborators that can listen, reason, translate, transcribe, and take action opens serious opportunities for customer support, IT help desks, field service, healthcare front desks, and warehouse/3PL operations. At iSolutions Force, we’re excited to test GPT-Realtime-2 and explore how these capabilities can be built into secure, production-ready workflows for businesses that need faster service, smarter automation, and better human-AI collaboration. Great work, OpenAI team — this is exactly where voice interfaces need to go next.

5 Reactions

JOSE RUIZ RODRIGUEZ 1mo

La transición hacia una arquitectura nativa de audio-a-audio resuelve el cuello de botella tradicional de los motores en cascada, pero transfiere el desafío directamente a nuestra capa de infraestructura. Orquestar llamadas a herramientas en paralelo (Parallel Tool Calls) sobre contextos de 128K manteniendo conexiones WebSocket persistentes exigirá un rediseño radical en la gestión del estado. La parametrización del esfuerzo de razonamiento ("xhigh") es un salto cualitativo, pero requerirá políticas de caché agresivas para amortiguar los costes operativos en producción.

Bilidione Loyola Souza 1mo

La parte interessante non è solo il realtime reasoning. È che la voce sta rapidamente evolvendo da interfaccia conversazionale a superficie operativa per sistemi agentici. Nel momento in cui un modello può ascoltare, ragionare, richiamare strumenti ed eseguire azioni in tempo reale, il tema centrale smette di essere la qualità della conversazione. Diventa: governance dell’esecuzione, autorizzazione delle azioni, accountability runtime, continuità operativa. È probabilmente qui che inizierà la vera transizione: dall’AI assistant all’AI operational infrastructure.

Enoch Fox 1mo

Really impressive direction for real-time collaboration and voice interfaces. But honestly… where are we at on output reliability? Adding more features while reliability, consistency, and hallucination control remain unresolved feels backwards to me. Faster reasoning and richer interfaces don’t help much if the underlying outputs still drift under load or produce “confident but structurally wrong” answers. I’d love to see reliability become a headline feature: - stronger validation layers - uncertainty signaling - better state consistency - drift detection - clearer provenance/reasoning visibility The future isn’t just more capable AI. It’s AI people can reliably trust inside real workflows.

1 Reaction

Ibram Megalli 1mo

This is so exciting. Voice AI is moving beyond “answering questions” and into real-time problem solving: listening, reasoning, translating, and helping people complete work as the conversation unfolds. As a founder building AI, automation, and systems-integration solutions, I’m especially interested in testing GPT-Realtime-2 for real operational environments: support calls, dispatch workflows, IT service desks, warehouse floor assistance, and multilingual customer experiences. Congrats to the OpenAI team. Looking forward to putting this through real-world use cases.

4 Reactions

Pravesh Y. 1mo

The real breakthrough here isn't just the low latency; it’s the shift in cognitive friction. Most people view "real-time" as a speed metric, but in a systems context, it's a modality shift. When reasoning happens at the speed of natural speech, the AI stops being a "database you query" and starts being a "partner you think with." It moves the bottleneck from the API response time to the human’s ability to articulate complex thoughts. The larger implication is that we are moving away from screen-first interfaces toward environment-first intelligence. Voice isn't just an input method anymore; it's becoming the primary operating system for real-world problem-solving. Curious how the team is managing the trade-off between "reasoning depth" and "response immediacy" when the complexity of a prompt scales mid-conversation.

See more comments

To view or add a comment, sign in

More Relevant Posts

Gerry Li
1mo Edited
Report this post
Voice agents are the future, and OriginMind AI is at the forefront of this transformation. We’re building voice-first AI experiences that make conversations smarter, deeper, and more human. If you haven’t tried our voice agent yet, now is the time. Link: www.originmind.ai

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Kel Song
1mo
Report this post
I can see the acceleration with my Japanese learnings now — the small daily reps are finally starting to compound. 📚

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Alastair Hussain
1mo
Report this post
This is one of those AI announcements that won’t make many headlines, but will have real, practical impact. Voice is a far more natural interface for many tasks that AI is good at. Up until now, voice models have been a bit rubbish compared to their text-based counterparts. This new release potentially crosses a utility threshold, so the things that should be conversational AND still be coherent, can be.

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
David Galtieri
1mo
Report this post
Ok so now as AI voice agents go this is fully mad! We have already provided the ability for any DepthNode chat assistant to be connected to a voice agent. This way you can train one assistant and have it deployed anywhere by chat or voice. OpenAI has just taken voice to the next level. Model coming soon to an agent near you. #depthnode #voiceagents

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
yuyurooms ai lab

25 followers
1mo
Report this post
Voice AI is entering a completely new era. Realtime translation, live reasoning, natural conversations, streaming transcription — this is no longer a demo. This is enterprise infrastructure. We’re excited to bring these capabilities and other next-gen voice technologies to our enterprise customers through yuyurooms ai lab. The future interface is voice.

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Kevin Murphy
1mo Edited
Report this post
The OpenAI Realtime (voice) API updates are unreal! Watch the whole demo but if you're impatient check out the french+german -> english Realtime translation bit (timestamp 0:40).

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Will Thieme
1mo
Report this post
Voice was a foundational unlock to change how we interact with technology. Adding Reasoning in Realtime-2 vastly expands what we can do with voice. We can now hold complex interactions. Excited to keep building with #openai

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Patrick Ferriter
1mo
Report this post
Excited to see more discussion around what actually makes voice AI feel natural! This piece from Agora breaks down GPT-Realtime 2, preambles, and why subtle conversational behaviors have a huge impact on user experience. Definitely worth a read if you’re building or experimenting with voice agents. https://lnkd.in/gDCvqFhE #OpenAI #RealTime2 #RealTimeEngagement

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in
Alex Galert
1mo
Report this post
Incredible update OpenAI's latest real-time voice model is a big step for voice AI I think the bigger point is live reasoning, which makes voice agents better at handling harder requests, following context, and responding naturally in real time OpenAI will definitely announce an updated AVM in ChatGPT very soon

OpenAI

11,044,903 followers
1mo

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces. https://lnkd.in/gdQ_pRir

Advancing voice intelligence with new models in the API
Like Comment
To view or add a comment, sign in

11,044,903 followers

View Profile Connect

LinkedIn respects your privacy

Advancing voice intelligence with new models in the API

Explore content categories

Advancing voice intelligence with new models in the API

Transcript

More Relevant Posts

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Advancing voice intelligence with new models in the API

Explore related topics

Explore content categories