The production of captions and subtitles is a creative activity with a clear relationship to speech and linguistics.
The use of automated speech recognition (ASR) systems based upon ‘artificial intelligence’ (AI) has created possibilities for producing captions and subtitles that did not previously exist.
There are general aspects that arise from using ASR
It will have a significant and positive impact on production techniques, for example, cost reduction and speed increases. But some negative aspects cannot be overlooked. Most importantly, ASR solutions do not have a true comprehension of speech, which inevitably leads to ‘non-human’ errors in their output. Additionally, part of the task of producing captions and subtitles involves subjective decisions in editing the text equivalent. For example, removing repetition and redundant speech requires a comprehension of language that is not currently feasible in fully automated solutions.
Regardless of these limitations, ASR certainly has an increasingly relevant role in caption and subtitle production, particularly where cost and/or production time considerations preclude the use of manual processes. This is arguably the situation for high volume, low value or ephemeral content and for live broadcasts, where in essence, the use of ASR technology may enable captioning that previously would be uneconomic.
ASR systems that are based upon artificial intelligence can ‘learn’ or improve their performance based on feedback. Since ASR systems tend to make predictable and repeatable errors, it is often possible to ‘train’ an ASR system to avoid similar errors in the future, leading to an improvement in performance over time.
Speech recognition is only one part of the process of caption and subtitle production. Other less obvious aspects are also potential candidates for artificial intelligence techniques. For example, using AI to automatically generate information about who is speaking, or the topic of the speech could directly improve ASR output. This information could also be used in the generation of the resulting subtitles or captions, for example, to influence the style or the speed of presentation of the text. Automated systems could also be used to check if the captions and subtitles in a broadcast do correctly match the speech in the video and are correctly timed.
As automated systems inevitably improve, it is clear that they will have increased utility in the caption and subtitle creation process. There are also other applications for ASR and AI in the quality control, monitoring and archiving workflows, where cost is a significant factor. As a company, Screen Subtitling Systems are actively embracing artificial intelligence-based solutions to support and enable a wider range of workflows and to improve the quality and quantity of subtitle and caption provision in the future.
John Birch, Strategy and Business Development Manager, Screen Subtitling Systems
About Screen Systems
Screen was founded as Screen Electronics by Laurie Atkin in 1976, and pioneered the first ever electronic subtitling system, providing the first digital character generator to the BBC. Throughout the 1970s and 80s, Screen continued to lead the market, developing a number of new subtitling technologies including fully automated transmission using timecode, the first PC based subtitle preparation system and the first multi-channel, multi-language subtitling systems.
In 2001 Screen took subtitling technologies into the 21st Century with the Polistream transmission and Poliscript preparation products. In 2011 it diversified by acquiring SysMedia Ltd, a leader in the fields of subtitle preparation and teletext content production and publishing systems. Then in 2018, the company itself was acquired by BroadStream Holdings Ltd (BHL) bringing Integrated Playout into the fold of its capability via parent company BroadStream Solutions.
Screen is now the number 1 provider of subtitling production and delivery systems in the world, and with its broader product portfolio now builds on that success with products that enhance broadcast content with value-add information services across multiple platforms and devices.