What Is Respeaking and How Does It Improve Multilingual Subtitles?

What is Respeaking

Most people are familiar with closed captions and subtitles. There are also forms of on-screen text that can be generated through automated speech recognition (ASR) or traditional methods (e.g., velotype and stenotype).

“Respeaking” is a lesser known way to produce on-screen text within a short turnaround. In this practice, a person repeats live audio content into ASR software, including punctuation and other conventions.

Same-language respeaking, called “intralingual respeaking,” has been used by broadcast media in live and pre-recorded programming for about two decades. In other settings, including business meetings and educational lectures, respeaking helps add subtitles to videos at a much faster rate than by other means.

In the emerging field of “interlingual respeaking,” on the other hand, a respeaker does all that an intralingual respeaker does, but with the additional complexity of language code switching, much like a simultaneous interpreter. The target language speech is then used to create subtitles.

A number of variables lend additional complexity to subtitling via respeaking. Some of these are the technology employed, the types of content presented, the purpose of the subtitles, the expected level of quality, and even environmental factors like background noise.

Upskilling: Training Respeaking Professionals

Respeaking is a great example of multitasking mastery where various human skills (e.g., clear diction and short-term memory retention) combine and work in real-time with ASR and other technologies. Practitioners have long been trained by the companies that employ them, namely subtitling companies and major broadcasters.

However, as more content reaches a wider global audience, the need for more trained respeaking professionals also increases, particularly for interlingual respeaking. As previously reported, the market for subtitles continues to grow, and governmental organizations, including the ESRC UK (Economic and Social Research Council UK), are taking notice.

The ESRC funded the SMART project – Shaping Multilingual Access through Respeaking Technology, ES/T002530/1, 2020–2023. The project is led by Elena Davitti, Associate Professor in Translation Studies at the Center for Translation Studies, University of Surrey. SMART, which aims to research the intricacies and impact of interlingual respeaking, is one of several efforts centered on training language professionals in this practice effectively.

At this writing, SMART and other research projects suggest that interpreting, translation, and subtitling professionals already have core skills that can transfer to interlingual respeaking. These include listening and speaking, listening and translating, listening and translating with software-adapted delivery, and pausing and chunking (i.e., grouping of words in a sentence into short meaningful phrases).

Upskilling, that is, adjusting skills for respeaking and learning new skills to complement existing ones, is one way in which language professionals are learning this practice.

Different Workflows for Different Needs

Researchers at SMART have begun to identify different applications for interlingual respeaking, the role of technology in the practice, as well as different settings and possible workflows. Examples of the latter involving respeakers are represented in the graphic below, which shows different ways to obtain the same output: translated subtitles.

In the first human-centric workflow shown above, a human listens to the source input and respeaks in the target language. In the second example, a simultaneous interpreter performs the language transfer and then a speaker of the target language does the respeaking. 

In one of the examples of semi-automated workflows, the text generated via an intralingual respeaker is processed through machine translation (MT). In the other example, a simultaneous interpreter speaks directly in the target language to an ASR system.

The last example of a fully automated workflow shows a combination of ASR and MT. The semi-automated and automated sample workflows also involve the possibility of post-editing at different stages to increase accuracy.

Davitti told Slator that preliminary evidence indicates that the workflow involving an interpreter and an intralingual respeaker is highly accurate. The possible downsides of this workflow are increased latency and high costs, since two or more professionals are needed for the job.

ASR and the Future of Respeaking

Questions have arisen about the suitability of different ASR systems in multilingual subtitling. Other variables are potential technological integrations, studio versus remote subtitling work, platforms, and so on.

Regarding ASR specifically in interlingual respeaking, Davitti said, “In our internal testing we used material that was very challenging from a language point of view, incoherent delivery, specialized vocabulary […] There isn’t, at this time, an out-of-the-box ASR system that can fully ensure a high level of accurate recognition, punctuation, and segmentation across the challenges of spoken language in real time.”

The evolution in ASR may end up mirroring that of MT in terms of the suitability of the output for different applications and expected levels of quality. Since MT improved over time, the same could be expected from ASR in multilingual subtitling.

In the meantime, technology training will increasingly become an important part of upskilling for language professionals entering the realm of interlingual respeaking.