Microsoft’s new text-to-speech It just takes three seconds of audio for AI to replicate your voice, including pitch and inflection. It’s identity is VALL-E. Microsoft describes the technology in a recent article as a neural codec language model, which sounds difficult but, in fact, gives the impression of being absurdly easy to use. Input an audio clip and some text, and out comes something that sounds remarkably like human speech.
Of course, there are already plenty of text-to-speech applications available. Machine-powered dictation services are available at most news outlets, including ours, and voice-activated personal assistants like Siri and Alexa have become wildly popular.
Most present speech-generation software, however, needs a lot of data to work well. AI voices still don’t sound very human, in part because capturing nuances like emotional tone and subtle inflections is exceedingly difficult.
Creators claim that VALL-E may be used for a variety of tasks, such as zero-shot TTS, voice editing, and content generation, and that it would work especially well with OpenAI’s GPT-3 language modeling system to speed up the content production process.
And if you’re considering getting into the latter, Microsoft has a point. VALL-E and GPT-3 are two strong pieces of AI-driven technology that, in theory, can be used to rapidly produce high-quality material that sounds natural and convincing.
However, this brings up certain hypothetical situations that are difficult from an ethical standpoint. Since just three seconds of audio are required, anybody could potentially utilize anything from a celebrity interview to a genuine person’s Instagram feed to pass themselves off as another person.
Microsoft was cautious to address this point, saying that it is not releasing the code open source at this time because of the hazards involved in misusing the concept. They also claim to be working on a system to identify VALL-E-created music, but I think they should check with our friends at OpenAI to see how simple that actually is.