DeepMind's WaveNet Voice Synthesizer Is Live in Google Assistant

DeepMind'south WaveNet Vocalisation Synthesizer Is Live in Google Assistant

This site may earn affiliate commissions from the links on this page. Terms of use.

Sound Wave iStock

No i knew exactly what DeepMind was up to when it was acquired by Google a few years back. Now DeepMind is an Alphabet visitor, working on large machine learning problems like how to beat humans at Go, improving AI problem solving, and making computer-generated spoken language more than realistic. On that last count, you tin experience the fruits of DeepMind'southward labors right now if you've got an Android phone or a Google Dwelling. The "WaveNet" vocalism engine is now available in Google Banana.

Google launched Assistant nigh a year agone as an evolution of its existing Google voice command organisation. For the first fourth dimension, Google phonation interactions were bachelor not only on phones, but also equally a role of your home with the Google Home smart speaker. Assistant gives you lot access to Google search data, device control, and smart habitation integrations. Information technology's available on all Android phones running v6.0 or higher by long-pressing the home push. So, y'all don't have to buy a Google Home to experience Banana.

The phonation model used in Banana at launch wasn't bad, just Google just rolled a vastly improved version of the voices for English and Japanese. DeepMind confirms these are implementations of WaveNet, which it first demoed in 2022. At the time, WaveNet was too computationally intensive for utilise on consumer devices, but just over a year later on and that'southward changed. You tin can experience the new Assistant voice below or open Assistant on your phone and go to Settings > Preferences > Assistant Phonation.

WaveNet is a course of parametric text-to-speech (TTS) that is entirely synthetic. Until recently, virtually all TTS systems were based on concatenative systems. In concatenative TTS, a big volume of high-quality recordings of a existent voice are chopped up and reassembled to grade the words. This is expensive and yet won't audio entirely human. Parametric TTS is cheaper, but it often sounds even more robotic.

DeepMind used a convolutional neural network that was trained on a big sample of homo speech. The resulting speech synthesizer tin can generate more believable voice waveforms from scratch with over 16,000 samples per 2nd. The sound from WaveNet picks up on natural inflection and accents ameliorate, which prevents the flat "robotic" feel from creeping in as often.

The new WaveNet model running as office of Google Assistant is 1,000 times faster than the demo version, assuasive information technology to generate twenty seconds of high-quality audio in just one second. DeepMind promises a full paper soon that will detail how this was achieved.

At present read: 25 Best Android Tips to Brand Your Phone More Useful