MT Post-Editing Boosts Swiss Bank’s Translation Productivity by Up to 60%, Study Finds

A fast-growing cohort of midsize translation buyers has followed more sophisticated large buyers in implementing machine translation post-editing (PEMT), either through their LSPs or directly via an API in their translation management system (TMS).

Academia has also increasingly shown interest in quantitative research on the impact of ever tighter human-machine interaction in language translation.

A recent study, led by Samuel Läubli, examined the impact of PEMT on the productivity of a small team of translators in the field of banking and finance. Läubli is a PhD candidate at the University of Zurich, CTO of TextShuttle, and a previous speaker at SlatorCon.

Researchers ran the study using domain-adapted neural machine translation (NMT) with the in-house translation team of Migros Bank based in Zurich, Switzerland. Migros Bank is the banking arm of Switzerland’s largest retailer, Migros, and operates across the country’s German-, French-, and Italian-speaking regions. The bank runs 67 branches, employs over 1,300 staff and, in 2018, generated a profit of CHF 204m (USD 205m).

In 2016, Migros Bank decided to reduce the work farmed out to language service providers (LSPs), bringing back in-house more of its approximately 2 million words in annual translation volume.

The bank built a small internal translation team of 2.8 full-time staff and rolled out translation management system Across. Initially, the plan was for the internal team to cover about 60% of the translation workload. According to Läubli, however, that grew to 80% thanks to the deployment of PE(N)MT.

PEMT: Empirically Tested

Läubli et al. went about to “empirically test how the inclusion of NMT, in addition to domain-specific translation memories and termbases, impacts speed and quality in professional translation of financial texts.”

The study found that “even with language pairs that have received little attention in research settings and small amounts of in-domain data for system adaptation, NMT post-editing allows for substantial time savings and leads to equal or slightly better quality.”

Four translators of the bank participated in the study, two for each language pair. In each language pair, there were two experimental conditions: one was translation memory (TM)-only and the other was PEMT — that is, translators were editing NMT output.

In the first set, translators had access to a domain-specific TM, a domain-specific termbase, and any online service (except machine translation) in a translation environment they were used to. In the second, they had access to all of it as well, except that sentences with no fuzzy match of at least 80% in the TM were run through the NMT engine.

French versus Italian

In the German into French language combination, the average speed achieved per hour was 585 and 934 words in TM-only and post-edited respectively; an increase of nearly 60%. For reference, a good portion of Slator readers polled on PEMT speed concurred that around 1,000 words per hour was a realistic hourly output.

The difference was less marked with Italian as a target language, with 453 and 495 words per hour produced in TM-only and post-edited respectively; a 9% increase in speed.

In one of the texts provided for translation into French, the maximum speed achieved with PEMT was 1,237 words per hour, as opposed to 683 words per hour with TM-only. For Italian, the maximum speed in post-edited was 648 words, and 553 words in TM-only. Three out of four translators were faster on average using PEMT.

Quality was reviewed on five parameters: coherence, cohesion, grammar, style, and cultural adequacy. Overall, in French, there was no difference in quality between texts produced with and without NMT. In Italian, texts translated with MT received slightly higher scores. Cohesion was found to be better in texts produced without MT in both French and Italian.

The research provides no conclusive explanation as to why results were better with French as the target language. One possible reason mentioned is the German to Italian engine was trained with less in-domain material than the German to French one.

Chantal Amrhein, Patrick Düggelin, Beatriz Gonzalez, Alena Zwahlen, and Martin Volk were Läubli’s co-researchers.