Amazon Looks to Further Automate Quality Checks in Subtitle Translation

Amazon Subtitle Translation Paper 2021

Companies operating in the digital entertainment space have come up with some interesting innovations to reduce production costs. One area they have focused on is dubbing, which offers great potential for increasing the market share of such streaming platforms as Netflix, HBO, and Amazon Prime.

Among these innovations, Synthesia’s lip-sync dubbing tech and Papercup’s synthetic dubbing tool stand out in recent memory. Of course, there is also subtitling — which is the use case of a paper published by Amazon researchers on April 1, 2021.

Authored by Prabhakar Gupta, Ridha Juneja, Anil Nelakanti, and Tamojit Chatterjee, “Detecting over/under-translation [OT/UT] errors for determining adequacy in human translations” proposes a new approach to flagging errors during the quality evaluation of translated subtitles.

The group did not limit their research to machine translated (MT) output, but also specifically targeted instances with professional subtitlers in the translation pipeline. “The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall,” they said.

Moreover, according to the authors, their model was able to detect OT/UT in human translations “without any access to reference translations” — that is, they trained the model on synthetic data. The researchers added that this dataset of “synthetically introduced errors” performed well, “achieving 89.3% accuracy on high-quality human-annotated evaluation data in 8 languages.”

Defining translation quality as capturing “both the fluency of the translation and its adequacy relative to the source,” the researchers also raised the possibility of reducing production costs by flagging errors very early on.

They wrote, “Translated subtitles often require human quality checks that are as expensive as acquiring translations […] To reduce post-editing quality checks costs, we could flag errors as the translations are typed in with the QE serving as a guardrail.”

They compare this system to apps that flag spelling or grammatical errors on the fly. Of course, the kind of translation tech the authors describe is nothing new (see: predictive / adaptive machine translation via Lilt). However, not all MT quality checks are created equal — and what may be unacceptable for, say, translated marketing copy could very well work for subtitles.

“For video subtitles […] it is possible for a translation to be linguistically incomplete and be acceptable during post-edits,” the authors pointed out. “This is due to the fact subtitles are required to follow a set of technical constraints limiting the choice and number of words in translation.”

They cite an example (“There is a green tree in the park” translated into “Green tree in park”) as passing a subbing quality check because a viewer would understand the context.

The Amazon researchers concluded by saying that they still plan to work on their model by “improving error patterns through tighter coupling with human translators” and by limiting errors to tokens within a sentence instead of flagging the entire sentence.