Blog - Technical
May 21, 2021 | Read time 4 min

Speech recognition challenges and how to overcome them

Find out about the challenges in Speech Recognition and how the Speechmatics team has been able to overcome them to provide better voice tech. Read more!
SpeechmaticsEditorial team

Accuracy has been one of the main speech recognition challenges for many years – and a barrier to entry for many businesses. Historically, the technology hasn’t been considered good enough to adopt as an integral part of a workflow and technology stack. But that is simply not true anymore. Voice technology has now improved to a point at which the output for the most spoken languages in the world – such as English, French, Spanish and German – is highly accurate in terms of word error rate (WER). So, what other challenges are affecting the future of speech recognition? And why is accuracy still a problem? These are the barriers highlighted by respondents to a survey as part of the Speechmatics report on Trends and Predictions for Voice Technology in 2021: 1. Accuracy

These days, accuracy refers to more than just the accuracy of the word output – the WER. Many other factors affect the level of accuracy on a case-by-case basis. These factors are often unique to a use case or a particular business need and include:

  • Background noise

  • Punctuation placement

  • Capitalization

  • Correct formatting

  • Timing of words

  • Domain-specific terminology

  • Speaker identification

2. Data security and privacy

The past year has seen a huge increase in concerns about data security and privacy – from 5% to 42% in the Speechmatics survey. This could be due to mistrust following media portrayal of ‘data-hungry’ tech giants. It could also be a result of more day-to-day conversations happening online when the coronavirus pandemic led to an explosion in remote working.

3. Deployment

Deploying and integrating voice technology – or any software, for that matter – needs to be simple. Whether a business requires deployment on-premises, in the cloud, or embedded, integration needs to be easy to do and secure. Without the appropriate support or documentation, integrating software can be time-consuming and expensive. It is, therefore, important for technology providers to make their deployments and integrations as frictionless as possible to avoid this barrier to adoption.

4. Language coverage

Many of the leading voice technology providers have a gap when it comes to language coverage. Most providers cover English but, when global businesses want to use voice technology, the lack of language coverage provides a barrier to adoption. When providers do offer more languages, accuracy is often still an issue when it comes to accent or dialect recognition. What happens when an American is speaking with a British person, for example? Which accent variation is used? Global language packs, encompassing a variety of accents, solve the problem.

What are the likely speech recognition challenges in the next 5-10 years?

Risks for speech recognition technology in the next 10 years.

Overcoming the speech recognition challenges around data privacy

Data privacy will continue to be a concern in the future of speech recognition, according to 95% or survey respondents. But there will be ways to overcome data security issues: Overcoming speech recognition challenges of data security 1. On-premises deployment

On-premises deployment of voice technology enables users to keep their data secure within their own environments – with no need for data to go into the cloud. It is often done using virtual appliances or containers so they can be deployed effortlessly into existing technology stacks. This is particularly important for industries such as banking, financial services and insurance where compliance and regulatory issues mean customer data and voice data cannot leave their premises. 2. Dark site environments

Typically, when deploying an on-premises solution for voice technology, businesses are required to connect to the public internet for licensing. Offline licensing is supported in dark site deployments – meaning all work is completed within an organization’s private environment. This delivers a more robust solution for compliance and data privacy needs. 3. Cloud deployment

Private cloud deployments are secure enough to keep data safe for lots of applications. If cloud deployment security is good enough for the business and use case needs, cloud deployment is often the preferred option due to low operational cost and less complexity. Want to know more about how to overcome speech recognition challenges? For more information – and the full survey results – download Trends and Predictions for Voice Technology in 2021.