
OpenAI introduced three audio models for its developer platform on Thursday, aiming to make voice-based software agents more conversational and capable of completing tasks in real time.
The launch of the application programming interface (API) moves the ChatGPT maker beyond transcription and chat towards agents that can listen, translate and act during live conversations.
The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI said they are available to test in its developer playground.
GPT-Realtime-2 is designed to manage harder requests, call tools, handle interruptions and maintain context across longer voice sessions.
The second model supports translation from more than 70 languages into 13 output languages, targeting customer support, education and other settings.
GPT-Realtime-Whisper provides live speech-to-text, allowing captions, meeting notes and workflow updates to be generated as a speaker talks.
Customers testing the models include online real estate marketplace Zillow, online travel agency Priceline and European telecommunications firm Deutsche Telekom.
Read: It’s official: ads are coming to ChatGPT
Pricing for GPT-Realtime-2 starts at US$32 per million audio input tokens, GPT-Realtime-Translate costs $0.034/minute and GPT-Realtime-Whisper $0.017/minute. — Anhata Rooprai, (c) 2026 Reuters
Get breaking news from TechCentral on WhatsApp. Sign up here.
