NMT for Subtitling Could Work Without Specialized Training, Study Finds

Subtitling is one of the last few domains that has yet to see extensive use of neural machine translation (NMT). Subtitle content is highly nuanced since it is made up of dialog that contains colloquialisms, cultural references, and humor, which are typically more challenging for machines to handle.

Yet there is a strong use case for machine translation in subtitling given the fast-growing content volumes and stronger-than-ever pressure on deadlines and margins. A new research paper explores the use of NMT in subtitling and the effect of different sets of training data on an NMT system for subtitles. The paper, entitled “Improving Neural Machine Translation of Subtitles with Finetuning,” showed that output quality was only marginally worse without training.

Author Simon Reinsperger wrote the paper as part of his thesis for a masters degree in MultiMediaTechnology [sic] at the Salzburg University of Applied Sciences. Reinsperger’s hypothesis was that NMT systems for subtitles can benefit from learning to translate general domain sentences before being adapted for subtitles.

While Reinsperger also believes the nuanced nature of subtitling can present a challenge for machine translation, he maintains that NMT makes it possible to translate content where budgets would not otherwise allow. Additionally, the challenges of increasingly tight deadlines and resourcing constraints make NMT an attractive prospect within media localization.

Slator 2019 Neural Machine Translation Report: Deploying NMT in Operations

32 pages, NMT state-of-the-art, 5 case studies, 30 commentaries, NMT in day-to-day operations

Slator contacted paper author Reinsperger to find out more about the study. Reinsperger had been working for a startup researching machine translation when they started looking into “a fully automated movie dubbing system.” He said, “This work piqued my interest and, when I had to choose a topic for my master thesis, it was the logical choice to leverage my existing knowledge.”

Subtitle Corpus Performs Better Than Expected

For the purpose of his research, Reinsperger used two datasets.

  • WMT16 dataset – a general corpus of 4.5 million sentences used as training data for translation at the WMT16 conference and consisting of the datasets Europarl, Common Crawl, and News Commentary
  • Subtitle corpus – from OpenSubtitles and consisting of English-German subtitles that ran to 22.5 million sentences (Reinsperger reported that this dataset, while huge, was not consistently clean.)

A baseline model was trained on the WMT16 dataset alone. Using this baseline, three models were fine-tuned with subtitle data. By contrast, the “subtitle” model was trained solely on the subtitle corpus.

English-to-German was chosen as the language pair for the models because of the high volume of training data — and because Reinsperger is proficient in both languages. Also, “There is a much bigger demand for translating subtitles from English to German, due to the international movie market being primarily in English,” Reinsperger said.

Reinsperger was working on the assumption that a model trained with an underlying general corpus would perform better than one solely made up of subtitles, because a subtitle corpus is likely to contain noisy data as a result of optical character recognition (OCR) errors, sentence misalignments, and the like.

The model trained only on subtitles was just marginally worse than the fine-tuned models. Hence, fine-tuning or domain-specific training may not be necessary.

Yet the results did not fully support this theory: The model trained only on subtitles was just marginally worse than the fine-tuned models. Hence, Reinsperger concluded, fine-tuning or domain-specific training may not be necessary.

Reinsperger qualified that the subtitle model could have performed better than expected because the subtitle corpus was much larger than WMT16’s. He also said it is possible that sentence length had an impact. In the WMT16 corpus, sentences had 23 words on average, while the subtitle corpus averaged six words per sentence. In general, the longer the sentence, the harder it is for machines to translate it.

Slator 2021 Video Localization Report

Slator 2021 Video Localization Report

45-pages on subtitling, dubbing, RSI, and captioning for media & entertainment, training & education, meetings & events.
$590 BUY NOW

Although NMT may still not be widely used for subtitling, major buyers of media localization have been actively looking into machine translation. A+E Networks UK, for example, do not as yet use machine translation but are “monitoring the developments of MT closely and […] conducting a POC (proof of concept) with a potential partner,” according to Jan-Hendrik Hein, Director of Media Operations, who spoke to Slator back in January 2019.

Netflix, too, recently said at an IMUG event in April 2019 that they use speech-to-text tech for the creation of English subtitle templates and are investigating machine translation further. Issuing what may have well been a call to action for machine translation researchers and providers, Netflix Director of Globalization Kathy Rokni said, “If any technology can give us the creative intent that we’re looking for, we will look into it. So far, that hasn’t happened.”