Our first release of the year is a big one – with a host of new features, updates, and improvements to our best-in-class speech API. This year, we’ll continue our mission to unlock human potential, increase the inclusivity of speech recognition engines, and lower bias through natural interaction with intelligent machines.
At the top of our release sits our accuracy uplifts with the release of Ursa, our latest generation of speech recognition models.
With that in mind, join us as we introduce our GPU support, showcase how our product continues its journey to understand every voice, and explain more about our new Translation offering.
Ursa Generation Models
We have greatly improved the accuracy of our English language transcription. Using a broad range of test sets, we see an average 22% relative improvement for the Enhanced model and 35% relative improvement for the Standard model.
Speechmatics Ursa generation models have achieved this breakthrough in performance by shifting execution to GPUs, enabling significantly larger machine learning models to be used in production.
Introducing Translation
By integrating Translation into our single Speech API, users can now use Speechmatics market-leading speech-to-text and Translation all in one place. As of now, Speechmatics offers translated text from and to English in 34 supported languages, start and end timing for sentences, as well as speaker labeling.
Accurate Translation is crucial to improving accessibility for businesses to global markets they haven’t previously tapped, including in use cases like Media Captioning, Meeting Platforms, and Contact Centers.
You can try our new Translation for free today in our portal. All Batch SaaS customers will have immediate, free access until 31st March 2023 through the existing Speech API.
Automatic Language Identification
Following swiftly on from the release of Language Identification last year, we’ve now introduced Automatic Language Identification as part of the Transcription API. Designed for customers working with audio data where the language may not be known, users can now transcribe audio as part of a single workflow, without specifying the language within the configuration. With coverage for 44 languages, you won’t need to tell us what the language is, we’ll tell you.
Numeral Formatting
This release brings a range of improvements to numeral formatting for English, including new measurements & telephone entity classes, and support for domain & email formatting.
Numeral formatting in speech recognition is essential for improving the readability of a transcript for all businesses but it is critical for financial, medical, broadcasting, and education sectors. A consistent output of numerals can save time and prevent human errors. The less time spent picking through edits in the post-processing phase, the better.
Speaker Diarization
We have improved Speaker Diarization accuracy for English in both our Standard and Enhanced models.
For our Real-Time ASR, we have achieved a step-change in diarization accuracy, with an average 14% relative improvement for Enhanced and 30% relative for Standard.
To learn more about our latest product release, watch the recording of our recent webinar.
For more details on all of our updates, you can find release notes here. If you need any additional support on these or any of the above, please contact our Support team.
Owen O’Loan, Senior Product Manager, Speechmatics