Using Machine Learning, AL can learn the initial base of a language in under a day.
New software will expand the capability to learn previously overlooked languages at a significantly reduced cost
Today, Speechmatics is announcing the launch of Automatic Linguist (AL), an Artificial Intelligence powered framework that drastically improves the speed at which new languages are built for use in speech-to-text transcription. AL has the potential to learn any language in the world in a matter of days, enabling Speechmatics to expand their service offering to any region globally, even those that have previously been uneconomic to serve. The system also allows for the rapid iteration, improvement and adaption of existing languages.
This is partly due to the fact it was purpose-built from the ground up and has been programmed to apply patterns from one language to another. For example, the production-ready Hindi system was built within 2 weeks after facing a challenge from a large corporate that this would not be possible. This system made 23%* fewer errors than the market leaders. So far AL has learnt 28 languages including Japanese, Hindi, Russian and Korean in rapid succession, with the focus shifting to languages that have fewer native speakers worldwide.
Traditionally, building a new language pack takes months and is a costly, laborious affair, involving gathering vast amounts of data, building a one-off system and continually refining it with input from experts in that language. This is time consuming, expensive and difficult, meaning only the most widely spoken of languages in the world remain the focus.
Most languages have inherent similarities in their fundamental sounds (sometimes represented as phonemes) and grammatical structures. AL can recognise patterns within and across languages and apply these to a new language build, therefore significantly reducing the time and data required to build a new language.