Bridging the Gap Between NMT’s Theoretical Promise and Practical Limitations

I was recently on a call with Sara, the localization manager of one of our enterprise customers, a big ecommerce company. She was frustrated. Ideally, she would like to translate millions of words a month. Human translation takes way too long and is too expensive. Neural Machine Translation (NMT) looks more promising than human translation and previous translation technologies, but putting it to practical use did not really work.

On the one hand, it seems like NMT can potentially solve a lot of her current localization issues, as she needs to process high volumes quickly with high quality. On the other hand, getting NMT to work in the real world is difficult. Different NMT engines handle different content and different languages at varying quality levels, the quality of the NMT engines is changing all the time, and the workflow required to produce quality translations based on NMT is often too complex.

Many of our enterprise customers face the same issues. There is a big gap between the theoretical capabilities of NMT engines and the practical capabilities in a real-world production environment.

Clearly, NMT has great potential to totally transform and disrupt the translation industry by providing instant, human-quality translation at a very low cost. In just three to four years, NMT surpassed all previous translation technologies developed over the 50 years since the Cold War. Having said that, the biggest issue is NMT quality variance between different engines and between different sentences translated with the same engine.

There is a big gap between the theoretical capabilities of NMT engines and the practical capabilities in a real-world production environment

During the next few years, until NMT reaches human parity, many organizations will have already started using imperfect NMT engines to reduce translation cost and turnaround time.

The key question is, what can executives in companies that sell globally do today to ensure they stay competitive and leverage NMT correctly?

First, it should be clear that the goal is getting your organization to a point where your  translations are accurate, high quality, containing your terms, jargon, style, and glossary, and delivered instantly at a low cost. This means that whenever you change something in your source language, you immediately have it in all other languages; i.e., all your customers are on the same page all the time, with very little overhead.

The biggest issue is NMT quality variance between different engines and between different sentences translated with the same engine

The other goal is to keep up with the competition — if your competitors use NMT and you do not then, effectively, you will lag behind and be less competitive.

In practice, there are three points to consider.

  • Check if NMT is right for you – Depending on your type of material and language pairs.

NMT quality is not unified across engines, languages, and types of material. For example, Amazon may be the best engine for your content when translated into French, while Google may be better with the same material into Russian; and there might not be a good engine right now to translate your content into Japanese.

  • What NMT engines to use – More than one is key!

Since quality varies a lot between NMT engines, as well as within the same engine, and it changes with time, it is important to identify specifically which NMT engine to use for each material and language pair at any given time. In addition, you should consider training a dedicated NMT engine just for you. 

  • When to start – Once above a minimal quality threshold.

To benefit from NMT, it should be used when the basic quality is above a certain level. If dedicated NMT training is required, it takes more time and should start even before such a level is reached.

Is NMT for you? The answer is probably yes; the question is when?

Before investing any resources in NMT, it is worth checking if the current quality of NMT engines fits your company’s needs. Since NMT quality changes all the time and since automatic measures are not good enough, we launched ONEs – OHT NMT Evaluation score. (Read more about it here and here.)

ONEs is an objective assessment of the quality of NMT engines, based on a unique human evaluation. Once a quarter, we run tens of thousands of string evaluations, using hundreds of linguists, while using different NMT engines to translate into several target languages.

ONEs can provide a general answer regarding the current state of NMT for your language pairs. 

Assuming NMT looks promising, you need to answer a more specific question: Which engines perform best with your material per target language?

Since the quality of different NMT engines — Amazon, Google, Microsoft, etc. — varies a lot across languages and types of material, and since the quality changes on a monthly basis, it makes sense to check the translation quality of a sample of your specific material.

From our experience, using just one NMT engine is typically not good enough

From our experience, using just one NMT engine is typically not good enough, and does not provide an optimal outcome. You need to check your material with different NMT engines, and with your target language pairs, to see which engine performs best in each language. It will take a while before the generic engines can handle all materials and all target languages at the same quality level.

Another important aspect is: Not to look at the average quality of a document, a page, or even a set of strings!

From our experience, the average NMT quality across several sentences, even in the same paragraph, will almost always look bad because there is high quality variance. That is, when running NMT on a set of sentences, the first sentence may have a perfect machine translation, the second will require some human post-editing, and the third will need to be retranslated by a human translator from scratch.

We check for customers the quality produced by several NMT engines, such as Amazon, Google, etc., by running a sample of the customer’s material. The result of this examination is a table that tells you what NMT engine to use per material and language pair, and what level of quality you should expect; i.e., what level of manual work will be required in each case.

Finally, should you start or should you wait?

In the NMT report we produce for customers, we calculate the estimated potential savings per target language and type of material using the optimal combination of NMT engines. In general, if potential savings are at 30% or more, you should start using NMT. The reason is simple — at this level, the savings are substantial enough for both you and your competitors to use NMT.

In relevant cases, we also recommend training a dedicated NMT engine. Still, we recommend to first use existing NMTs in an optimal way, and then start training a dedicated engine. One thing to consider is that training a dedicated engine takes time and depends on the volume of existing high-quality translations that can serve as input during training. Dedicated training, which we call MyNMT, will be discussed separately.

There are other considerations, of course; mainly regarding the use of the right software platform that can send each sentence to the right NMT engine dynamically, and that can select the right workflow (pre-processing, quality control, etc.)

I will discuss these points and others in my presentation at SlatorCon Amsterdam on November 28, 2019. More details and registration please visit the event website.