Each year Gartner®, a company that delivers actionable, objective insight to executives and their teams, publishes Hype Cycles, ‘a graphic representation of the maturity and adoption of technologies and applications.’ In 2022’s Hype Cycle™ for Data Science and Machine Learning, the Gartner® report explains the many advantages to self-supervised learning – benefits we experience every day with our Autonomous Speech Recognition (ASR) engine.
“Self-supervised learning is an approach to machine learning in which labeled data is created from the data itself, without having to rely on historical outcome data or external (human) supervisors that provide labels or feedback. It is inspired by the way humans learn through observation, gradually building up general knowledge about concepts, events and their relations, or spatiotemporal associations in the real world.”
At Speechmatics, our award-winning (ASR) engine needs vast quantities of data to keep improving and innovating. To put it into perspective, we’ve used self-supervised learning to train our technology on 1.1 million hours of audio – resulting in a more comprehensive understanding of voices.
The Many Benefits of Self-Supervised Learning
Fundamentally, self-supervised learning does what it says on the tin. The Gartner® report tells us that there’s no need for human supervision. “In self-supervised learning, labels can be generated automatically from the data itself, without the need for human annotation. In essence, this is done by masking elements in the available data (e.g., a part of an image, a sensor reading in a time series, a frame in a video or a word in a sentence) and then training a model to “predict” the missing element.”
If you’ve seen our ASR at work, you’ll notice the transcription might initially be incorrect, only for the AI to correct or ‘predict’ the missing word. From there, the model can fine-tune the data, deriving more value from it and developing a learning relationship.
From there, the Gartner® report tells that “Self-supervised learning has the potential to bring AI closer to the way humans learn. This occurs mainly via observation and association, building up general knowledge about the world through abstractions and then using this knowledge as a foundation for new learning tasks, thus incrementally building up ever-more knowledge that in future AI scenarios may serve as common sense.”
We believe that encapsulates how we innovate – by learning more about how humans talk, we can continue to grow our ASR and make it as accessible as possible. The more data we gather, the more knowledge we build. Consequently, our ASR understands voices with more common sense – a distinctly human approach.
See how great self-supervised learning is for yourself with our revamped SaaS Portal, or download the report to learn more.
John Hughes, Accuracy Lead, Speechmatics