Open AI can clone your voice using 15-second clips

2 Min Read

After two years of development and testing, Open AI is finally confident to introduce its voice synthetic model to the world. Voice Engine, as Open AI calls it, is an AI model that can clone voices with nothing but a 15-second audio sample.

Since late 2022, the Voice Engine has been used internally in Open AI’s text-to-speech API and ChatGPT Voice and Read Aloud. 

Even today, access to the voice synthesizer is limited to a small group of “trusted partners” across different industries. The initial list includes names such as Age of Learning, HyGen, Dimagi,  Livox, and Lifespan.

Age of Learning uses Voice Engine to create voice-over content using pre-written scripts. Moreover, it is used in combination with GPT -4 to power real-time interaction with students.

Likewise, HeyGen is able to reach a wider audience by translating its visual stories into different languages.

In the samples shown by Open AI, you can witness the Voice Engine translate a 16-minute English voice clip into Spanish, Mandarin, German, French, and Japanese while preserving the accent of the original speaker.

The Voice Engine is also being used in the medical field. At Lifespan, doctors successfully restored a vascular brain tumor patient’s voice. Livox is integrating the AI model into Augmentative & Alternative to help nonverbal people communicate.

All the early adopters of the Voice Engine must adhere to Open AI’s usage policies, which prohibit illegal impersonation. They also need explicit permission from the original speaker before the audio sample is fed to the AI model, and the audience must be informed of the AI-generated voices.

Furthermore, Open AI has implemented a watermarking technique to track the usage and trace the origin of audio created using Voice Engine.

Share This Article