Cornell University’s automated online distribution system for research papers, Arxiv.org, is a prolific source for anyone interested in staying up to date on progress in neural machine translation (NMT). It has been almost a year from when we first wrote about the dramatic acceleration of academic NMT research as reflected on the number of papers submitted to Arxiv, and the upward trend continues.
To understand where current research is heading, we reviewed NMT-related papers within the research repository for the first six weeks of 2018 as well as the last couple of months of the previous year. From November 1, 2017 to February 14, 2018, there were 58 relevant papers. Twelve of those papers are not directly about NMT specifically, but were focused on either machine learning via neural networks in general or adjacent technology such as natural language processing.
So in the past 105 days until the middle of last week, 46 research papers on NMT were submitted to Arxiv. No wonder we bumped into one nearly every other day—there is literally a new paper about NMT every 2.3 days.
Upon closer examination, patterns emerged regarding research directions based on the topics of these NMT papers. Upon categorization of the research papers based on a cursory reading of their contents, Slator decided to group them based on intent and not results. After all, nearly every research direction will yield the same final result: improvements in NMT models and output overall.
Disclaimer: Slator is not the ultimate authority on academic research and categorization, and these categories are meant to show the general direction researchers are taking.
Improving NMT Output
The most obvious next step for NMT is also the most researched topic. Eight of the 46 research papers recently published on Arxiv deal with improving NMT output in one way or another.
There is research applying facets of predecessor phrase-based MT approaches to current NMT models, experiments on changing what the attention mechanisms of the decoders focus on locally through syntax-based weights, and even applying methods to help NMT models deal with more creative aspects of translation such as handling idioms.
Indeed, there were two papers on idiom translation among the 46 published. One used a blacklist of literal translations of idiomatic expressions to identify literal translation errors in a test set. Another approach added idiomatic expressions in the training data for the models and annotated them for identification.
Addressing Training Data Constraints
NMT models have been described as “data hungry,” and the higher the quality of the data and the more in-domain corpora there are, the better the system will become.
Seven out of the 46 recently published research papers delved into training data constraints, trying to either figure out why NMT models require specific data or how to address existing known limitations, such as low-resource languages.
Research has been done on training NMT models with only partially aligned corpora, understanding how synthetic and natural noise in the training data breaks down NMT output fluency, and of course, the most challenging and pressing concern: tackling NMT of low-resource languages. An example of these is a paper discussing a forest-to-sequence model that improves translation accuracy in low-resource languages by adding syntactic information to the training data. Another one focuses on using external dictionaries of out-of-vocabulary words to augment training data.
New or Improved NMT Models
Recurrent neural nets, convolutional neural nets, and self-attentional transformers are the dominant types of deep learning models NMT systems use today. That does not mean researchers will stop looking for new or improved models.
Indeed, seven research papers focused on just that. Salesforce’s weighted self-attentional transformer model that they claim to increase process speed by 10x is one such model. Another one is Amazon’s Sockeye, which the Amazon research team pit against other models at the end of last year.
Other research focused on Variational Recurrent Neural Machine Translation and asynchronous bidirectional decoding.
Document-Level Context
Research on infusing document-level context to NMT is also a hot direction, with six papers focused on the task.
Since NMT’s fluency is limited sentence-by-sentence, it cannot use context outside of the source sentence to translate its text. In short, it cannot translate an entire document with the same level of fluency and adequacy that it translates individual sentences within it.
Some of the methods researchers focused on include:
- Stream decoding, a constant stream of pre-existing context from previously translated sentences
- External memories used in conjunction with NMT models
- Using caches to act like “translation history” or as additional reference points
- Applying adaptive control of the NMT model’s attention mechanism based on decoding history
Post-Editing and Model Learning
Another six research papers were concerned with post-editing, online and offline model learning, and human evaluation.
One of these papers is Facebook’s NMT post-editing through “very simple interactions.” There was also discussion on offline logging of data to online NMT models and on-the-fly, online machine learning.
Also, a couple of papers focused on human evaluation, specifically a paper on the “first user study on online adaptation of NMT to user post-edits” and a paper presenting “a quantitative fine-grained manual evaluation approach to comparing the performance of different MT systems.”
Other Directions of Research
Aside from the above, five papers were dedicated to improving various facets NMT decoding-encoding processes. These usually revolved around increasing speed or efficiency, or reducing power consumption or requirements.
Four research papers were concerned with figuring out various aspects of the inner workings of NMT models. Three other papers dealt with miscellaneous topics, such as a paper on privacy that proposes a way to preserve the meaning of sentences translated or analyzed without giving away any sensitive information about the subject.
Of course, most language industry practitioners don’t need to go down the rabbit hole of reviewing individual research papers. They just use any of the publicly available NMT portals or NMT plug-ins in their productivity tools and quickly get a sense for how the technology is progressing. However, it’s still worth keeping an eye on what is happening in academia. After all, the technology that is currently reshaping the industry also began life as an innocuous research paper.