Oct 26, 2021 | Read time 2 min

Speechmatics achieves AI breakthrough, beating tech giants in tackling bias and inclusion to understand all voices

Speechmatics’ launches pioneering Autonomous Speech Recognition software, outperforming Amazon, Apple, Google, and Microsoft.
AI-breakthrough-tackling-bias
Speechmatics
SpeechmaticsEditorial team
Cambridge company’s pioneering self-supervised learning technology reduces speech recognition errors for African American voices by 45% versus Amazon, Apple, Google, and Microsoft.

Speechmatics, the leading speech recognition technology scaleup, has today launched its ‘Autonomous Speech Recognition’ software. Using the latest techniques in deep learning and with the introduction of its breakthrough self-supervised models, Speechmatics outperforms Amazon, Apple, Google, and Microsoft in the company’s latest step towards its mission to understand all voices.

Based on datasets used in Stanford’s ‘Racial Disparities in Speech Recognition’ study, Speechmatics recorded an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68.6). This level of accuracy equates to a 45% reduction in speech recognition errors – the equivalent of three words in an average sentence. Speechmatics’ Autonomous Speech Recognition delivers similar improvements in accuracy across accents, dialects, age, and other sociodemographic characteristics.

Up until now, misunderstanding in speech recognition has been commonplace due to the limited amount of labeled data available to train on. Labeled data must be manually ‘tagged’ or ‘classified’ by humans which not only limits the amount of available data for training but also the representation of all voices. With this breakthrough, Speechmatics’ technology is trained on huge amounts of unlabelled data direct from the internet such as social media content and podcasts. By using self-supervised learning, the technology is now trained on 1.1 million hours of audio – an increase from 30,000 hours. This delivers a far more comprehensive representation of all voices and dramatically reduces AI bias and errors in speech recognition.

Speechmatics also outperforms competitors on children’s voices – which are notoriously challenging to recognize using legacy speech recognition technology. Speechmatics recorded 91.8% accuracy compared to Google (83.4%) and Deepgram (82.3%) based on the open-source project Common Voice.

Katy Wigdahl, CEO of Speechmatics, comments:

“We are on a mission to deliver the next generation of machine learning capabilities and through that offer more inclusive and accessible speech technology. This announcement today is a huge step towards achieving that mission.

Our focus on tackling AI bias has led to this monumental leap forward in the speech recognition industry and the ripple effect will lead to changes in a multitude of different scenarios. Think of the incorrect captions we see on social media, court hearings where words are mistranscribed and eLearning platforms that have struggled with children’s voices throughout the pandemic. Errors people have had to accept until now can have a tangible impact on their daily lives.”

Allison Zhu Koenecke, Lead Author of the Stanford study on speech recognition:

“It’s critical to study and improve fairness in speech-to-text systems given the potential for disparate harm to individuals through downstream sectors ranging from healthcare to criminal justice.”

Latest Articles

Carousel slide image
Use Cases

The court reporter shortage crisis: data, causes, and what legal teams are doing about it

The court reporter shortage is reshaping litigation. Explore data, causes, and how legal teams are using digital reporting and AI transcription to adapt.

Tom Young
Tom YoungDigital Specialist
Carousel slide image
Use Cases

What Word Error Rate Is Acceptable for Legal Transcription?

Word error rate for legal transcription has no single acceptable threshold. But knowing how accuracy, audio quality, and review obligations connect to real legal risk is what separates a reliable transcript from a costly one.

Tom Young
Tom YoungDigital Specialist
[alt: Bilingual medical model featuring terms related to various health conditions and medications in Arabic and English. Key terms include "Chronic kidney disease," "Heart attack," "Diabetes," and "Insulin," among others, displayed in an organized layout.]
Product

Speechmatics achieves a world first in bilingual Voice AI with new Arabic–English model

Sets a new accuracy bar for real-world code-switching: 35% fewer errors than the closest competitor.

Speechmatics
SpeechmaticsEditorial Team
[alt: Illuminated ancient mud-brick structures stand against a dusk sky, showcasing architectural details and textures. Palm trees are in the foreground, adding to the setting's ambiance. Visually captures a historic site in twilight.]
Product

Your voice agent speaks perfect Arabic. That's the problem.

Most voice AI models are trained on formal Arabic, but real conversations across the Middle East mix dialects and English in ways those systems aren’t built to handle.

Yahia Abaza
Yahia AbazaSenior Product Manger
new blog image header
Technical

How Nvidia Dominates the HuggingFace Leaderboards in This Key Metric

A technical deep-dive into Token Duration Transducers (TDT) — the frame-skipping architecture behind Nvidia's Parakeet models. Covers inference mechanics, training with forward-backward algorithm, and how TDT achieves up to 2.82x faster decoding than standard RNN-T.

Oliver Parish
Oliver Parish Machine Learning Engineer
[alt: Healthcare professionals in scrubs and lab coats walk briskly down a hospital corridor. A nurse uses a tablet while others carry patient charts and attend to a gurney. The setting conveys a busy, clinical environment focused on patient care.]
Use Cases

Why AI-native EHR platforms will treat speech as core infrastructure in 2026

As clinical workflows become automated and AI-driven, real-time speech is shifting from a transcription feature to the foundational intelligence layer inside modern EHR systems.

Vamsi Edara
Vamsi EdaraFounder and CEO, Edvak EHR