Neural Machine Translation: Mainstream and Extremely Fast-Moving

Neural machine translation (NMT) is now mainstream. This was New York University Assistant Professor Kyunghyun Cho’s first message during his presentation on NMT at the recent SlatorCon New York on October 12, 2017.

When Cho’s team started looking into NMT in 2013 and 2014, he said previous MT researchers and industry insiders were convinced it would not work. Efforts in the 1980s and mid-1990s failed, after all.

Fast forward to 2017, Cho pointed out that big names like Google, Microsoft, and Facebook use NMT, and sites like and even the European Patent Office have all caught the NMT bug.

“So it’s mainstream,” Cho concluded. He added though, that research was continuous and ongoing, despite existing NMT systems outperforming statistical models that have been in place and aided by improvements for over ten years.

“Somehow Nobody has Tried It”

The key difference lies in how Cho and fellow researchers approached the problem. “So far a lot of the research on machine translation has been focused on sub-word level translation,” Cho said. “That is looking at a sentence as a sequence of sub-words.”

Cho and his co-researchers decided to go down to character-level modelling.

“In 2016 we decided to try it out; somehow nobody has tried it,” he said. “When a new technology comes in, what everyone tries to do is use the new technology to mimic what you were able to do with the old technology. So everyone was stuck with morpheme-level or word-level modelling and then somehow forgot to try this new technology on new ways of representing a sentence, that is view it as a sequence of characters.” And the results were telling.

Watch a video interview with Kyunghyun Cho on NMT’s progress.

Record Breaking

“This model beats any single paired model you can think of,” Cho said, reporting how the NMT system performed either on the same level or—and often—better than existing MT models when assessed through BLEU (bilingual evaluation understudy) scores or even human evaluation.

Cho also highlighted some other advantages to NMT aside from better quality, such as its robust handling of spelling mistakes and morphology. Another pleasant surprise was how the NMT system can translate into compound words that rarely appear in a training corpus the size of 100 million words.

“When a new technology comes in, what everyone tries to do is use the new technology to mimic what you were able to do with the old technology.” – Kyunghyun Cho, Assistant Professor of Computer Science and Data Science, New York University

One breakthrough in particular was quite promising: the NMT system can translate into a desired target language even without knowing the source language.

Cho’s team trained their NMT system to translate from German, Czech, Finnish, and Russian to English. They then tasked the system to translate any given sentence into English without providing a language identifier.

“The decoder doesn’t care which source language it was written in, it’s just going to translate into the target language,” Cho said. “Now, since our model is actually of the same size as before, we are saving around four times the parameters. Still, we get the same level or better performance.”

They took the experiment a step further and fed the system a sentence written in three different languages. The system did the translation without any external indication which part of the sentence was written in which language, proving the model automatically learns how to handle code-switching within a sentence.

Finally, Cho touched on low resource languages. What his team and other NMT research teams across the globe have found is that as their system learns shared similarities across languages, it can actually apply learnings from high resource languages to low resource ones and improve their translation.

The Future is “Extremely Fast-Moving”

Cho saved cutting-edge for last: non-parameter NMT. He says this system translates the way a human translator would: by leveraging translation memory (TM) as an on-the-fly training set.

This way, the NMT system acts like a translator and does not need an entire training corpus in its database, but instead accesses relevant TMs to translate. Cho commented that this system actually displays higher consistency in style and vocabulary choice.

Finally, Cho closed his presentation on state-of-the-art NMT by explaining the future direction of NMT research.

First, low resource language translation is a priority. Second, he said there is already some body of work on zero resource translation. The third and last direction is better handling of Chinese, Japanese, and Korean translation.

“It’s only the apparent disruption you see. Even if I can tell you the challenges that I’m working on at the moment, that probably won’t tell you or anybody how the next disruption is going to happen.”

Later on in the panel session, Cho fielded a question about the biggest challenge in NMT.

He said hundreds of people have been working on MT for over 30 years, and research on NMT has been going on for about three years. “It’s only the apparent disruption you see,” Cho said, explaining that it will be hard to tell what kind of disruption will result from incremental advances in research.

“Even if I can tell you the challenges that I’m working on at the moment, that probably won’t tell you or anybody how the next disruption is going to happen,” he said.

Pondering how fast these breakthroughs make it to market, May Habib, CEO, Qordoba, asked after the presentation how long it takes between research breakthrough and deployment in the field.

Cho pointed out that they published their first paper on NMT in 2015, and the first big commercial announcement regarding application was from Google Translate in September 2016. He added that though Google did not disclose details of their deployment, Facebook still managed to launch their own NMT system a year later.

“It’s an extremely fast-moving field in that every time there is some change, we see the improvement,” Cho said. “So you gotta stay alert.”

For a copy of Kyunghyun’s presentation, register free of charge for a Slator membership and download a copy here.

State-of-the-Art in Neural MT

6.23 MB


Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.