Research on Automatic Correction of Human Translation Wins Award

Automatic Translation Evaluation

Natural language processing (NLP) scientists at Lilt and the University of California at Berkeley published an NAACL award-winning research paper on June 17, 2022. The paper introduced a Translation Error Correction (TEC) model that automates the correction of human translations.

The scientists told Slator, “The goal of this work is to replicate the phrase-level corrections made by expert linguists when they review the translations authored by other linguists. While each language pair has its own nuance, the approach we take is quite general.”

They added, “Just as neural machine translation applies effectively to many language pairs, we expect that this approach to translation error correction will apply broadly as well, and we look forward to reporting our future progress as we extend this work to other language pairs.”

The TEC model uses the same structural foundation as Automatic Post-Editing (APE), which has been widely studied but is different from TEC in many ways. For example, TEC uses errors made by humans as data and focuses on error correction instead of detection. TEC can also discern content that does not need editing.

In contrast to TEC, APE is “dominated by the fluency errors that are characteristic of MT systems (74% of sentences),” the paper stated, adding that the “TEC corpus exhibits a broader distribution of errors that human translators are prone to make.”

Asked to define translation fluency, the scientists replied, “Fluency of a translation describes whether a native speaker of the language would use the phrasing, structure, and word choice that appears in the translation.”

Input From 10 Human Translators

Scientists used a bilingual corpus called ACED, which contains three datasets from different domains. The data consists of 35,261 English–German translations performed and edited by professional translators (not post-edited).

To prepare the data, the scientists eliminated duplicate source sentences, removed translations rewritten by reviewers, and classified errors into three main categories: monolingual edits found in the target text, bilingual edits that correct translation errors, and preferential edits.

The ACED data was pre-processed, pre-trained, and fine-tuned using actual human corrections. Tests were conducted and comparisons made between TEC and other models including MT, GEC (grammatical error correction), and BERT-APE.

Nine professional translators participated in the study as reviewers to determine the real-world applicability of the model. The nine were asked to review sentences (of which 255 had suggestions for corrections) and provide qualitative observations.

A tenth professional translator was tasked with reviewing the reference translations in the dataset and ranking the quality of the sentences reviewed by the other nine.

Next Step in Translation Workflow Automation?

Comparisons to other models highlighted significant differences in how the TEC model ultimately performed. For example,

  • the professional reviewers accepted 79% percent of the TEC suggestions for correction;
  • reviewers spent less time reviewing when suggestions were accepted; and
  • domain adaptation proved critical to performance — and customization, essential to translation error correction.

Five of the nine reviewers emphasized the need for reliability. In the test, some suggestions were incorrect or the system did not reliably make an applicable edit.

Three reviewers found the TEC system “could be a memory aid or substitute for researching client-specific requirements.”

Three reviewers commented that TEC could help “by making them aware of what errors they might look out for, especially in repetitive content where it may be easy to miss details.”

Given the findings, TEC could be the next step in translation workflow automation. As the model’s precision increases, the greater its potential to make a practical difference during the review stages of translation production.