What Are the Best Captioning Tools?

How Does Captioning Technology Work

Captions – the words that appear across a screen as a video plays, in the same language as the audio – have become so ubiquitous on websites, streaming services, and meeting platforms that many viewers have come to expect them on every video.

To be fair, advancements in speech-to-text (STT) technology, especially automatic speech recognition (ASR), are a major driver behind the recent proliferation of captions. 

Most sets of captions are the result of multistep workflows. But automated captions for short-form content (think TikTok videos) represent a notable exception. For videos covering certain topics, with audio in one language from a limited number of speakers, ASR and STT engines can churn out same-language captions quickly. But proceed with caution: Pushing AI beyond its abilities can result in low-accuracy captions and “pretend accessibility,” effectively shutting out viewers who rely on them. 

The production of live captions for conferences is currently in flux. Traditionally, keyboard-based methods have been more prevalent in Anglophone countries, such as the US and Canada – not necessarily because of their superiority, but because a lack of specialized keyboards for many languages prevented those methods from gaining traction elsewhere.

In stenography and velotyping, a typist uses a roughly phonetic stenographic keyboard or steno machine to spell out syllables, words, and phrases. A computer automatically translates the transcription into readable text and sends it to the broadcaster. The text is then decoded as on-screen captions.

An alternate workflow, respeaking, is gaining popularity over keyboard-based methods. Respeaking is already used in countries such as Australia, Belgium, France, Switzerland, and the UK for real-time closed captions for live TV broadcasts. An intralingual respeaker repeats what they hear into an ASR engine, trained on their voice. The engine converts the speech input into text that can be edited on the fly and released as captions. 

The means may be changing, but keyboard-based methods and respeaking are still generally human-centric processes, and observers can expect automation to come. Maria Campbell, Founder of True Subtitles, has noted that even the worst-performing ASR tools can save users significant amounts of time.

For more information about STT services, check out the Slator Pro Guide: Subtitling and Captioning.