Cambridge company’s pioneering self-supervised learning technology reduces speech recognition errors for African American voices by 45% versus Amazon, Apple, Google, and Microsoft.
Speechmatics, the leading speech recognition technology scaleup, has today launched its ‘Autonomous Speech Recognition’ software. Using the latest techniques in deep learning and with the introduction of its breakthrough self-supervised models, Speechmatics outperforms Amazon, Apple, Google, and Microsoft in the company’s latest step towards its mission to understand all voices.
Based on datasets used in Stanford’s ‘Racial Disparities in Speech Recognition’ study, Speechmatics recorded an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68.6). This level of accuracy equates to a 45% reduction in speech recognition errors – the equivalent of three words in an average sentence. Speechmatics’ Autonomous Speech Recognition delivers similar improvements in accuracy across accents, dialects, age, and other sociodemographic characteristics.
Up until now, misunderstanding in speech recognition has been commonplace due to the limited amount of labeled data available to train on. Labeled data must be manually ‘tagged’ or ‘classified’ by humans which not only limits the amount of available data for training but also the representation of all voices. With this breakthrough, Speechmatics’ technology is trained on huge amounts of unlabelled data direct from the internet such as social media content and podcasts. By using self-supervised learning, the technology is now trained on 1.1 million hours of audio – an increase from 30,000 hours. This delivers a far more comprehensive representation of all voices and dramatically reduces AI bias and errors in speech recognition.
Speechmatics also outperforms competitors on children’s voices – which are notoriously challenging to recognize using legacy speech recognition technology. Speechmatics recorded 91.8% accuracy compared to Google (83.4%) and Deepgram (82.3%) based on the open-source project Common Voice.