Automatic speech recognition, or ASR, is a foundational part of not only assistants like Apple’s Siri, but dictation software such as Nuance’s Dragon and customer support platforms like Google’s Contact Center AI.
Perhaps it goes without saying that ASR is an intense area of study for Facebook, whose conversational tech is used to power Portal’s speech recognition and who is broadening the use of AI to classify content on its platform.
Facebook claims it achieves state-of-the-art results on a popular benchmark while using two orders of magnitude less training data and that it demonstrates a 22% error reduction over the leading character-based speech recognition system, Deep Speech 2.
Additionally, it hopes to improve its existing systems that proactively identify posts in violation of its community guidelines.
“Wav2vec represents a step forward for ASR systems, and it’s a promising direction for recognizing speech in languages that do not have extensive data sets for training AI systems,” wrote Facebook research scientists and software engineers Michael Auli, Siddhartha Shah, Alexei Baevski, and Christian Fuegen in a blog post.
“But it’s also part of our long-term vision for self-supervised training, an approach that takes advantage of unlabeled training examples and enables us to move beyond the comparatively limited number of data sets that have been gathered and annotated specifically for training AI systems.”