May 2018 Marked (yet) Another Record Month for Neural Machine Translation

Neural machine translation (NMT) research continues to accelerate, with May 2018 eclipsing April 2018 as the busiest month on record in terms of research output.

For the month of May, research published on the Arxiv platform that mentioned NMT in the title or abstract reached a record 55 papers compared to April’s 44.

37 of those 55 papers were strictly about NMT. Furthermore, most of the research not strictly about NMT experimented with NMT as a subdomain of deep learning or natural language processing (NLP).

“I believe that NMT is the gateway for all the NLP Technologies,” said Systran Global CTO Jean Senellart, speaking at SlatorCon London 2018. At the very least, the cross-experimentation reflects how much of a hot research topic NMT is.

How much of this research will impact production supply chains? Time will tell. For now, a few curious things in the research stand out:

Familiar Names

The research papers were co-authored by experts from many familiar companies. The usual players like Google, Microsoft, and Amazon were of course present. In fact, both Google and Microsoft researchers were also involved in research not directly relating to NMT, i.e. in other adjacent areas that incorporated NMT either as a basis or for experimentation.

Systran and recent NMT partner Ubiqus showed up on the list due to the submission of a paper about their joint venture with Harvard: Open NMT.

NVIDIA also featured on the list with a paper on OpenSeq2Seq, its very own neural toolkit that “provides building blocks for training encoder-decoder models for neural machine translation and automatic speech recognition.”

China was well-represented with e-commerce giant Alibaba and internet company Tencent both publishing papers in May. Sogou Inc, who invested millions of dollars in training data and whose Sogou Knowing NMT was a top performer at the 2017 Conference for Machine Translation, also made an appearance. The paper Sogou was involved in was not strictly about NMT, however, but looked at improved automatic typographical error correction for the Chinese pinyin input method based on NMT.

Russian search company Yandex was involved in a paper about giving NMT engines more context so they can better handle gendered pronouns and anaphora for Russian-English.

Meanwhile, SDL published two papers in the month of May, both of which aimed to improve NMT output through constrained decoding and incorporating target-side semantic syntax.

Race to Better Quality

Indeed, the research topics for May reflect the race to better quality. Of the 37 papers strictly about NMT, 19 were about improving output via:

  • Experiments specifically aiming for better accuracy or adequacy – Researchers used everything from new methods of training such as reinforcement learning to additional input such as using relation models or integrating grammatical understanding.
  • Figuring out the inner workings of encoder-decoder NMT models – Researchers played around with hyper-parameters and learning more about noise in training data, and even replaced an encoder layer to better understand their systems and apply lessons learned towards gaining better output.
  • Improving the translation process – A couple of papers sought to improve efficiency or robustness of NMT engines to reduce the resulting impact of small issues during the training phase.
  • More document-level context – Despite NMT beating predecessor tech in terms of fluency as well as incremental improvements over the past couple of years of research, NMT engines still basically translate a sentence without considering the context of the entire document or content. Last May, a couple of papers focused on how NMT can get around this limitation.

Low-Resource, Non-Latin Focus

As NMT continues to gain steam, non-Latin based languages such as Chinese, Japanese, Korean (CJK) and low-resource languages—those languages with very few sources of parallel corpora—are a hot new research direction.

Many experts Slator talked to for the Neural Machine Translation Report 2018 predicted that low-resource languages will be a priority for researchers, and the forecast is proving accurate.

Six papers were focused on CJK NMT specifically. Among them was a paper on automatic word preordering for English-to-Japanese NMT. Another paper on Japanese NMT focused instead on dealing with complex verb conjugation. Two papers from the same authors explored the integration of Chinese and Japanese radicals into the NMT process to see if it would improve output.

Meanwhile, the World Intellectual Property Organization (WIPO) recently announced that South Korea will be adopting WIPO Translate, their patent NMT portal. So it seems NMT is starting to resonate with the CJK side of the industry and academia.

Aside from these papers, another five of the 37 NMT research papers in May focus on the challenge of low-resource languages. Some accomplish this through creating synthetic parallel corpora by back-translating monolingual training data. Others are using techniques similar to using a pivot language, where they insert another language between the source language and low-resource target language, preferably one that can bridge the gap in training data.

An interesting approach was outlined in a paper where the researchers did not focus on creating synthetic parallel training data but instead approached the problem through multi-task learning. “We scaffold the machine translation task on auxiliary tasks including semantic parsing, syntactic parsing, and named-entity recognition,” the paper’s abstract read. “This effectively injects semantic and/or syntactic knowledge into the translation model.”

Finally, Google seems to have an upgrade to its zero shot translation. Google’s researchers combined zero shot with dual learning to come up with dual zero shot NMT and created a zero shot system that they claim approaches supervised NMT within 2.2 BLEU points.

Second Workshop on Neural Machine Translation and Generation

During Slator’s research into the papers published on Arxiv for May 2018, we also came upon the Findings of the Second Workshop on Neural Machine Translation and Generation published on June 12, 2018.

The workshop was held alongside the annual conference of the Association for Computational Linguistics (ACL 2018) and called for research papers to “synthesize the current state of knowledge in NMT and generation” and to “expand the research horizons in NMT.” The summary of the contributions accepted by the workshop also reflected trends towards improving NMT output and tackling low-resource languages.

The Workshop found that the contributions’ subject matter focused around five core topics, which are broadly aligned with Slator’s review of research papers submitted to Arxiv in May 2018. The Workshop’s categories for “linguistic structure” and “domain adaptation” showed research in incorporating linguistic structure in NMT systems and adapting engines to specific domains to improve output. Meanwhile, papers that fell under the categories of “data augmentation” and “inadequate resources” were broadly aligned with Arxiv research on unsupervised training and low-resource languages.

Another interesting part of the summary on the Workshop is its results for their shared translation task. Four teams created NMT engines trained with data from the WMT 2014 English to German task and competed in terms of accuracy (measured via BLEU and NIST scores) and computational and memory efficiency (measured via model loading time on systems run on CPU, and model size and number of parameters on systems run on GPU). Measuring accuracy and efficiency is meant to reflect the reality of NMt deployment in production environments.

The four teams that participated were Team Amun, Team Marian, Team OpenNMT, and Team NICT. All NMT engines reportedly beat the Workshops’ baselines in terms of speed and accuracy. Team Marian tended to perform best overall, according to the findings.

They also found that recurrent neural net (RNN) NMT systems were faster but less accurate on GPU while self-attentional NMT models were more effective on CPU. Another noteworthy result is how little memory one NMT model—OpenNMT Tiny—used throughout the shared task: just 220mb.

Meanwhile, in production settings, Google recently announced that they are bringing their NMT technology offline to the Google Translate app for Android at the memory cost of only 30 to 40mb per language, available for 59 languages. Free, albeit generic, NMT can already fit into smartphones.

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.