SNACK three-line summary
- OpenAI introduced API models for realtime voice conversation, translation and transcription.
- This is technology for developers and companies to build into services, not a one-tap consumer app yet.
- Cost and safety remain real issues, but the direction could matter for games and communities.
Screenshots and video links
The translated article uses the same screenshots, embeds, and attached video links as the Korean original.

Snackgirls editor note
- Nea — The key distinction is that this is an API update. User experience will depend on how apps and services implement it.
- Red — For game communities and global support, lower language friction could be a big deal if latency, quality and price line up.
- Kirari🌟 — If this later slips naturally into chat apps, streams or game parties, talking with friends in other languages could feel much closer.
This is an API, not a finished translation app
OpenAI announced new realtime speech AI models for listening while a person talks, translating speech, and producing live text transcription. The practical distinction matters: this is closer to an API update for developers and companies than a new button every ChatGPT user can immediately press. The technology is available for services to build on, and the final experience will differ by implementation.
Three model roles
The Korean source separates the announcement into three roles. GPT-Realtime-2 is for realtime voice conversation, such as support agents, voice assistants or tutors. GPT-Realtime-Translate listens during speech and returns translated speech and text through the realtime translations endpoint. GPT-Realtime-Whisper focuses on realtime speech recognition and transcription for captions, meeting notes or records.
Why realtime is different
This is not the older flow of uploading a recorded file and waiting for translation afterward. The model works inside a realtime session where audio continues to arrive and the system follows the flow. That opens the door to translated calls, captions and voice interactions that feel closer to conversation.
Where it could be used
| Use case | Possible feature | Practical caution |
|---|---|---|
| Online meetings | Live captions and interpreted speech | Long sessions need cost limits |
| Customer support | Multilingual voice agents | Accuracy and escalation rules matter |
| Live streaming | Realtime captions and translation | Latency can affect viewer experience |
| Education | Foreign-language tutoring and pronunciation help | Privacy and recording policies must be clear |
| Games and communities | Voice chat or community translation | Moderation and abuse prevention are essential |
Cost is a product issue, not a footnote
Realtime audio can become expensive because usage grows with speaking time and traffic. OpenAI’s model documentation describes minute-based pricing for GPT-Realtime-Translate and token-based pricing for GPT-Realtime-2, though prices can change. A meeting or stream that runs for a long time needs usage limits, plan design and cost controls, not only technical integration.
Safety also matters
Realtime voice AI can be misused for scams, spam or harmful content. The source notes that OpenAI describes safety mechanisms such as stopping sessions when harmful use is detected. As translation becomes easier, trust, consent and abuse prevention become part of the product design.
Game Sunakku take
The announcement is not a magic free interpreter for everyone today. It is a developer-side building block. Still, if this kind of technology enters apps, live streams, Discord-style communities and game voice chat, the moments where players cannot get close because language gets in the way may become less common.
Sources and check date · Based on the original Game Sunakku article. Checked: June 6, 2026
Leave a comment