Am I talking to a bot or a human? OpenAI Eyes Broader use of voice technology

Don’t feel like calling for a dinner reservation? OpenAI is experimenting with ways to make AI do it for you. ChatGPT developer is ready to bring its voice AI technology to third-party apps, which promises to release smarter digital assistants that can not only talk and listen to you in real time, but also interact with the real world.

OpenAI’s Realtime API aims to help third-party developers build speech-to-speech capabilities for their applications. The API is based on OpenAI’s advanced voice mode, which is designed to hold natural, human-like conversations that surpass Apple’s Siri or Amazon’s Alexa.

OpenAI now sees an opportunity to extend its voice capabilities to various third-party applications and services looking to move beyond traditional text-based Q&A. For example, imagine talking to a customer service representative who sounds human but is actually an AI program. Or getting language lessons from an AI “tutor” in an educational app.

With the API, OpenAI expects its technology to usher in an era of “agent AI,” where AI agents that can talk and see are commonplace, says chief product officer Kevin Weil.

“2025 will be the year that agent systems finally enter the mainstream,” he said at a press conference this week. “If we do it right, it leads to a world where we can actually spend more time on human things that matter and less time looking at our phones.”

OpenAI introduced how the Realtime API—which uses its GPT-4o model—can change the way we use applications. For example, you can ask a maps app to find interesting restaurants and shops in a local city.

robot holding a phone

(Credit: Baona via Getty Images)

During the demo, the assistant was able to verbally answer each question in about three to five seconds, a relatively low latency. You can also interrupt and redirect the assistant. But the standout feature was OpenAI calling a food order on behalf of the user.

“Great, thank you. I’d like to place an order for 400 chocolate covered strawberries, please,” the digital assistant told a store. “Could you let me know when the delivery will arrive at the Cowell Theater in Fort Mason?” she added. I’m super excited.”

While the demonstration was impressive, it was also not hard to imagine the same technology being used or abused. Could real-time API usher in a new era for robocalls or spam? (Google also tried this with Duplex a few years ago, with mixed results.)

“This is one of the use cases we want to avoid,” said Olivier Godement, Head of Product, API at OpenAI, during the press conference. The company has built safeguards into the API to report and prevent malicious use cases, such as AI pretending to be human. Additionally, OpenAI vows to crack down on any third-party developers found to be violating the API’s terms of service.

“The AI ​​assistant is never taking action,” Godement added. “The developer must execute the action. The way it works is simple [AI] the model suggests another step, and then it’s up to the developer to verify.” In other words, the digital assistant cannot easily cheat and violate the boundaries of the third-party application, just because the user asks.

Recommended by our Editors

The other issue is that the ability to speak in speech is limited to only speaking in six different voices, which should prevent him from impersonating someone else. In a statement, OpenAI added: “Our policies also require developers to make it clear to their users that they are interacting with AI, unless it is clear from the context.”

This means the voice capability will stop identifying itself as an AI or bot during all interactions. So far, the company’s tests of the advanced sound mode have not shown the need for protection, given that it can reduce the experience. But Godement added: “Whenever we see abuses, we will adjust our policies.”

Ultimately, it will be up to third-party apps how they implement the real-time API, which is designed to simplify building speech-to-speech experiences for apps at a low cost. OpenAI adds that it is already serving 3 million third-party developers, including both small startups and large enterprises.

The Realtime API arrives today as a public beta and is expected to roll out to all developers in the coming days.

Get our best stories!

Register for What’s new now? to get our top stories delivered to your inbox every morning.

This newsletter may contain advertisements, deals or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You can unsubscribe from newsletters at any time.

About Michael Kahn

Senior reporter

Michael Kan

I’ve worked as a journalist for more than 15 years – I started as a schools and cities reporter in Kansas City and joined PCMag in 2017.

Read Michael’s full bio

Read the latest from Michael Kahn

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top