A research paper published on May 2, 2019 compared the performance of translators who used machine translation post-editing (often called PEMT) and interactive translation prediction (or ITP). The results suggest that ITP may be the better method of human-machine interaction for translators.
If ITP sounds familiar, it is because the approach has been pioneered by Silicon Valley-based Lilt. The startup launched in 2015 as an ITP-powered translation productivity tool (a.k.a. CAT) aimed at individual linguists, was sued by SDL (then settled), received millions in funding from VC giant Sequoia Capital and, over time, pivoted to a managed services business model.
While translation productivity tools used with PEMT pre-populate target translation segments with raw MT output, which a linguist would then review and edit, ITP acts more like an auto-complete feature that suggests target translations below the segment as the linguist works in an empty target segment. Additionally, ITP dynamically takes the linguist’s partial translations into account and suggest better translations for the rest of the sentence.
The outcome of the PEMT vs. ITP face-off could decide how the vast majority of translators interact with content for years to come.
Pioneering Study
How do the two approaches stack up against each other? Graduate student Rebecca Knowles from John Hopkins University, PhD Marina Sanchez Torron from University of Auckland, and Professor Philipp Koehn, also from John Hopkins, conducted the study and authored the paper. They compiled their findings in a report entitled “A user study of neural interactive translation prediction.”
Whereas previous research used ITP systems based on statistical machine translation (SMT), this time, Koehn’s team deployed neural machine translation (NMT) in ITP — a first, according to the study’s authors. The neural ITP system they deployed was previously developed by Knowles and Koehn based on the University of Edinburgh’s nematus NMT model. Speaking to Slator via email, Prof. Koehn said that system they developed “showed significantly better results in simulation studies.”
“So, the obvious question was if this also leads to practical translator productivity increases by professional translators. These kind of studies are always a bit tricky since translators have to get used to a tool and a new way of working, and it is hard to do this at scale,” Koehn said. “Any time we [do] user studies we also have to deal with the very large variance between translators. Still, it is encouraging to see that this may not just be a more enjoyable way to interact with machine translation but also lead to more productive work by at least some of the translators.”
Limited by Cost and Convenience
The researchers used a straightforward methodology: build nematus into an ITP environment, train it with millions of sentence pairs, and have participating professional translators use PEMT and the neural ITP system and provide feedback.
The nematus-powered ITP system used in the study was put into CASMACAT, a translation productivity tool developed between 2011-14 under a European Union Programme for Research and Technological Development. The authors employed the same datasets in the 2013 Workshop on Statistical Machine Translation (WMT13) to train their system. The entire training dataset contained nearly four million sentence pairs.
Participants in the study consisted of eight English-into-Spanish professional translators working on eight news texts mindful of specific guidelines meant to maximize the quantitative data generated for the study. The linguists were also asked to provide feedback on their experience with the neural ITP system.
If the number of participants seems a little underwhelming, the authors note that “cost and convenience motivated our sample size, quality assessment and language pair choices, therefore restricting the application of our findings.”
The sample size was further reduced by 17% due to technical issues and a translator choosing not to adhere to instructions. That same translator made it clear they were not open to working with neural ITP in their very negative feedback on the technology.
Fluency Issues
The researchers measured translation productivity based on three general categories, further broken down into 11 finer variables. The three categories were (1) temporal effort or processing time; (2) technical effort; (3) final translation quality.
The eight translators were also provided a questionnaire regarding ITP. According to the authors, “sample results for eight out of the 11 variables are favorable to ITP.”
During the study, neural ITP provided more accurate predictions than the researchers expected, but they also discovered “fluency Issues are more than twice as frequent in ITP as in PE[MT].”
They noted, however, that CASMACAT being a non-production environment and lacking such features as grammar auto-correct “very likely contributed” to this — a valid point, given that commercial translation productivity software developers focus much of their time on improving UI and adding extra features to the underlying technology.
They also found that “in terms of improvement over time, none of the models could determine whether productivity indicators improved over time in ITP.”
Less Time Researching Terminology
Feedback from the professional translators was generally very positive, save for that one linguist who provided negative feedback on every question. The study found that translator experience with PEMT may also play a role in the perception of neural ITP.
The researchers said the translators who had used PEMT before did not have any negative views toward neural ITP, regardless of how experienced they were in their profession. At the same time, they noted “some indication that translators who have formal PE[MT] training or provide PE[MT] services frequently benefited the most from ITP.”
“Translators who have formal PE[MT] training or provide PE[MT] services frequently benefited the most from ITP”
According to the researchers, “Regardless of their translation experience, professional translators with little or no PE[MT] experience […] may be more reluctant to engage in ITP.” The two participants who expressed negative views of ITP had little to no PEMT experience.
The difference in the cognitive and translation processes between PEMT and ITP meant that using ITP resulted in “less time researching terminology,” the translators said.
A couple of translators expressed concern about the translator’s role in an ITP-driven environment or “how in such scenarios, MT priming means ‘the voice of the translator is lost,’ and how the user-friendliness and speed of the ITP system may generate overconfidence on the translator side and lead to mistakes or wrong decisions if the required exigence and rigor levels are not there, on the user’s side’.”
Editor’s note: This story has been updated to add input from one of the paper’s authors.