OpenAI has introduced a new extension of its text-to-speech API named Voice Engine, which enables users to generate a synthetic copy of a voice from a 15-second sample.
Voice Engine relies on a combination of a diffusion process and transformer to generate the synthetic voices that match the sample voice's characteristics. The company has also taken several steps to ensure compliance with data safety regulations and mitigate other risks. The solution requires clear consent from individuals whose voices are to be cloned and deploys safety measures such as embedding encrypted watermarks in the generated voices.
Moreover, the company has opted to only allow access to a select group of early testers. The solution will reportedly be priced at USD 15 per 1 million characters or around 162,500 words.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.