- Meta introduced a revolutionary speech-from-text generation tool that differs from previous similar products.
- Meta’s Voicebox outperforms current AI models in speed and generalization.
- Allows the production of radio spots without announcers or expensive audio editing.
Meta presented this Friday, June 16, a tool that qualifies as “revolutionary”: a text-to-speech (TTS) generator that, according to Mark Zuckerberg’s company, generates results 20 times faster than current AI models in the same field.
As can be seen in the promo videosthe system, called Voicebox, does not use traditional TTS architecture, but models more similar to ChatGPT, from Open-AI, or Bard from Alphabet.
The main difference between Voicebox and other TTS models already released so far, such as Eleven Labs Prime Voice-AI, is that the Meta Platforms tool is capable of generalizing by means of learning in the middle of a context.
This new tool it can be really very useful for advertising agencies or graphic designers that now they will be able to delve into the production of radio commercials without the need to hire announcers or have expensive audio editing systems.
What is voicebox
In the same way that ChatGPT does, Voicebox uses a “megascale” training dataset.
Until now, text-to-audio systems used narrow, curated databases because large volumes generated imperfect and unreliable voices.
Meta says that with Voicebox this limitation no longer exists as of a new training system that does not require tagging or curation because the software architecture “pads” the audio information.
In a Meta AI blog post As of this Friday, June 16, the company says that Voicebox is the “first model that can generalize voice production tasks for which it has not been specifically trained with unprecedented performance.”
Thus, Voicebox can convert texts to voices, eliminate unwanted sounds by synthesizing replacement voices, and even apply the voice of the same speaker with output to different languages.
Although Voicebox is not the first development of this class, it does seem to be one of the most solid.
In parallel, Meta says that it created tools to know if the audios have been generated by Voicebox or are authentic. Meta AI ensures that it is possible to “trivially spot” the differences between real and fake audios.
This is how he explains it in the blog: “Just as we do with other powerful artificial intelligence innovations, we know that these technologies carry potential misuse and harm. Therefore, in this paper, we explain how we have built a very powerful classifier that can very easily distinguish between authentic voices and Voicebox-generated audio, with the idea of mitigating this potential risk.”
Now read:
This is how they think to regulate AI: this is what marketing students should know
McCartney recreates Lennon’s voice for a “new” Beatles album
How was outdoor advertising reshaped from the use of AI?