As Neural Machine Translation’s Core Model Seems Settled, Focus Shifts to Products

Jean Senellart at SlatorCon London 2019

As key driver of the language industry, Neural machine translation (NMT) has been a recurring topic at SlatorCon. At SlatorCon London 2019 on May 16, 2019, Jean Senellart, CEO of Systran, discussed the two sides of the NMT coin: technology and product.

Senellart said that while NMT developments have brought more innovation to the language industry than he has seen in his over 20 years at Systran, in other industries, discussions revolve around the product first, and then the technology and data that fuels it.

“We forgot the product side a bit. We’re talking about the technology like it’s the thing that the user wants,” he said. “They don’t want the technology, they want the product.”

Transformer Rules (For Now)

Senellart pointed out one big breakthrough in 2017: Google’s self-attentional transformer NMT model, which has since become the leading model in both research and deployment.

Two years later and no new technology has knocked the transformer off its throne. Instead, researchers have been looking at new ways to approach the technology. A couple of examples Senellart pointed out was Facebook research speeding up training by up to 50 times, and Systran’s own work speeding up inference by up to 80 times.

“So we changed the way of looking at the technology, but the technology itself kind of reached a plateau at this time,” he said, adding that he believes the transformer will continue to rule for some time.

“Users don’t want the technology, they want the product” — Jean Senellart, CEO, Systran

Still, research into NMT continues unabated. Senellart said that, in 2018, Google Scholar search yielded over 6,000 research papers about or related to NMT — nearly twice the number of those published in 2017.

Key R&D Research Trends

Senellart went on to break down the busiest research trends in NMT today.

Domain adaptation – One major NMT issue for the localization industry is making sure engines can be adapted to domains. However, this has become a vicious cycle, where each approach to domain adaptation has, so far, come with tricky downsides — from being too resource and time-intensive to being prone to pitfalls common to neural networks.

MT and translation memory (TM) – Senellart touched on the MT and TM combo, where MT leverages TM in the same way a translator would during their work. He mentioned one paper to be published this summer that implements in-domain, on-the-fly translation by using TM. Senellart said they had done something similar, called micro-adaptation, at Systran.

MT and post-editing – Given that NMT has proven a bit more useful than statistical MT, post-editing is expected to continue to be a rising trend. While noting the advent of neural post-editing with no human intervention involved, Senellart explained a new post-editing approach that uses post-edited TM to dynamically retrain the NMT model with updated data. In this case, the post-editor’s TM data (already corrected and “annotated”) is fed back into the NMT model for retraining.

“Where is the human in the loop?” Senellart asked the SlatorCon audience. The human in this loop is both the post-editor and data annotator.

Low-resource languages – There are hundreds of languages used online for which training data for MT models is sparse; and high quality corpora is sparser still. Most Asian languages, for instance, are low-resource. Senellart noted an interesting, fully unsupervised NMT model researched by Facebook that managed to increase output quality nearly three times in just 18 months of development.

“This field is one of the most promising,” Senellart said, adding that Facebook is now working on self-supervised models as well.

Beyond the sentence – Another challenge facing researchers is adding external context to NMT output; that is, translating beyond the sentence. “Today, every NMT system in the world is sentence-based. We know that’s not enough. If you are to translate a document, you need to know what the document is talking about,” Senellart said. “You need to make connections between sentences. If there are pronouns, you need the correct anaphorization between sentences, for instance.”

He mentioned two approaches to beyond-sentence NMT currently gaining steam: one where the NMT model refers to a previously translated sentence for context; another, from Unbabel, which uses important keywords found in the entire source document to inform the translation output.

Multilingual translation – Senellart also explained how multilingual translation systems, such as zero shot translation, are now able to translate between 100 languages through a single model.

“One single model to translate all languages simultaneously”  is how Senellart explained multilingual translation in a nutshell. “Training multilingual models is very exciting because it’s close to unsupervised learning in that each language helps the others. When you are translating English to French then you’re sort of helping out Spanish, for example, because there are a lot of similarities. The model is discovering what is similar between languages and learning more general translation rules.”

Perils of BLEU

During his presentation, Senellart called out false claims by some NMT vendors.

One myth is how NMT metrics, such as BLEU scores, can lead to inaccurate sales pitches. When NMT providers claim to be, for example, five times better in BLEU scores over Google AI or Facebook, the comparison is meaningless, according to Senellart.

BLEU scores are typically taken from competitions (e.g., Conference on Machine Translation), which operate under constrained conditions with limited training data and a specific testing set and goal. Senellart pointed out that those BLEU scores only matter within the competition’s constrained conditions.

Behind the scenes: Systran CEO Jean Senellart filming an interview with Slator Research Director Esther Bond.

Another myth he called out was vendors claiming that having their own technology (as opposed to open source) meant they can better control every aspect of it.

“It’s not correct,” Senellart said plainly. “When you see that there have been, last year, 6,000 papers published on the topic, it’s better to use open source technology because you have so many people who have tested it, improved it, and put the latest state-of-the-art algorithm inside. And then you can still modify the code on your own because it’s open source.” He concluded there is no contradiction between open source and version control.

“In our field, nobody can claim that closed source code is better than open source code today,” he said.

Ticking All Boxes

About the productization of AI in language technology, Senellart said NMT is just one box in a long checklist. On the list: involving terminology, the capability to process structured documents, real-life deployment strategies, and compliance to regulatory standards, such as the General Data Protection Regulation (GDPR).

“In our field, nobody can claim that closed source code is better than open source code today”

Then there are also buy-side concerns, such as translation volume and scalability, cloud-based versus on-premise, and more. The carbon footprint of the NMT system is an important concern for Senellart as well.

Gaëlle Bou, Systran’s Sales & Marketing Director, discusses NMT in a panel discussion.

Finally and most importantly, he said a quintessential aspect for any NMT product is the ability to control inevitable mistakes. This extends not only to accuracy and adequacy of output, but also style transfer, where a vendor may want an NMT engine to adapt a certain style from their human translators.

Senellart said the industry needs to fill a very important gap. “The big issue I would point out is that we are missing standards,” he said. “We need to have some real benchmarks and be able to compare. If we are missing standards, you know we are not a mature industry. Standardization is an important next step we need to work on.”

SlatorCon London 2019 Presentation (Jean Senellart, Systran)

2.79 MB