Unaligned with Robert Scoble: Discussing the power of speech technology
CPO Trevor Back talks AI, speech recognition advancements, and making the jump to seamless Samantha with the legendary Robert Scoble.
Tom YoungDigital Specialist
"Integrating AI seamlessly into our lives rather than it taking over our lives."
Robert Scoble's podcast, Unaligned, is renowned for its captivating discussions with industry leaders in AI and innovation. He sat down with Speechmatics CPO Trevor Back to delve deep into how speech will play a core part of the future AGI stack and where the future of voice technology is heading.
They talk about the complexities of cutting-edge speech recognition technology and envision it's impact on our daily lives.
Here is a snippet of their conversation:
Robert: How can speech recognition technology seamlessly integrate into our lives without disrupting our experiences?
Trevor: We want to use speech recognition to push technology into the background, so that it's doing things for us without interrupting our experience of the world.
Voice technology can really push those interfaces into the background so that these systems can work without interrupting the lovely conversation that we're having.
Robert: How do you envision the future of AI enhancing everyday tasks, such as ordering food or interacting with our devices?
Trevor: The goal is for AI to work for us without us even noticing....Whether it's ordering food or interacting with devices, speech recognition can push technology into the background, enabling smoother, more natural interactions.
We've broken down the pod into digestible chapters using our own tech, so you can find the specific sections that you're interested in.
(00:00:00) Introduction of Trevor Back and Speechmatics
The speaker introduces himself as Trevor Back, the chief product officer at Speechmatics. He provides background that Speechmatics is a company focused on speech technology that was spun out from the University of Cambridge around 10 years ago under the leadership of the founder Tony Robinson.
(00:00:26) Comparing Speechmatics to other voice assistants
Trevor contrasts Speechmatics to other well-known voice assistants like Siri, Alexa and Google Assistant. He characterizes those as more brittle, limited AI systems compared to the more advanced speech recognition capabilities of Speechmatics. The speaker explains that Speechmatics aims to understand every voice and handle diverse accents, languages and localizations beyond what those other systems can do currently.
(00:00:59) Speechmatics' focus on accuracy
Trevor emphasizes Speechmatics' heritage and focus on speech technology and accuracy. He explains they use proprietary methods requiring less data to achieve high accuracy across languages, accents and dialects. This enables them to handle underrepresented groups and challenges like speech impediments better than large generic models.
(00:10:11) The future of AI agents
The speaker discusses the future potential of AI agents interacting via speech interfaces. He talks about the challenges involved in making speech interactions seamless and human-like, on par with text chatbot interfaces. The speaker envisions a future of AI agents conversing fluidly with humans to perform tasks through voice.
(00:22:59) Business model and customers
The speaker describes Speechmatics' business model of positioning themselves as a premium, high-accuracy offering. He explains they target customers who care about accuracy and deriving value from transcripts, unlike those just wanting low-cost transcription. The speaker gives examples of use cases needing high accuracy like passing transcripts to language models.
(00:25:38) Deployment options
The speaker discusses the different platforms and deployment options Speechmatics supports. This includes cloud offerings as well as on-premise deployment for customers wanting more control, privacy or security. He highlights their ability to offer models optimized to run on smaller local devices as well.
(00:27:19) Privacy and local models
The speaker talks about privacy considerations and options for running models locally rather than in the cloud. He gives the example of an AI that runs locally, listening continuously but storing everything locally rather than sending data to the cloud. The speaker indicates Speechmatics is moving towards supporting local iOS deployment as well.
(00:28:53) The exponential pace of progress in AI
The speaker reflects on the rapid, exponential rate of progress being made in AI recently. He notes the challenges humans face in grasping the implications of exponential growth. The speaker expects we will continue to see surprises from the exponential improvements in areas like model size and capabilities.
(00:32:25) Opportunities for focused AI companies
Trevor discusses opportunities for companies focused on specific AI capabilities even as large tech companies race to build huge general purpose AI models. He argues there are still challenges in areas like speech recognition that specialized companies can target and build expertise in.
(00:41:35) Conclusion and final thoughts
In conclusion, the speaker provides some final thoughts and a call to action for the audience to learn more about Speechmatics. He directs them to visit the company website, try out their demo, and provide feedback.