“Nearly Indistinguishable From Human Translation”—Google Claims Breakthrough

Google Neural Machine Translation

The Google team working on neural machine translation (NMT) has finally revealed what they have been working on. In a paper published on September 26, 2016 entitled Google’s Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation, the researchers present results of the search engine giant’s all-out effort to maintain its lead in machine translation.

The claims are bold. The researchers not only found that Google’s NMT (GNMT) beats Google’s existing production phrase-based statistical translation system (aka Google Translate) by miles based on the widely used BLEU score, they also claim that “in some cases human and GNMT translations are nearly indistinguishable” and that “our system’s translation quality approaches or surpasses all currently published results.”

The researchers set out to address what, up until now, have been considered NMT’s most apparent weaknesses. First, training is computationally expensive—especially in cases with large data sets. Second, NMT struggles with the translation of rare words. And third, NMT sometimes fails to translate all parts of the input sentence, which according to the researchers can result in “surprising translations.”

“The quality of the resulting translation system gets closer to that of average human translators”—Google Neural Translation Team

The obstacles have, so far, prevented NMT from being deployed in a production environment, which is key to making NMT available through a free service like today’s Google Translate.

The paper says human evaluations show that GNMT “has reduced translation errors by 60% compared to our previous phrase-based system on English↔French, English↔Spanish, and English↔Chinese.” Apparently, the system is not constrained too much by what language it translates and “performs well on a range of datasets across many pairs of languages without the need for language-specific adjustments.”


NMT has also been known to take considerable time to process even when run on graphical processing units; another impediment to deploying it in a live production environment such as Google Translate.

The researchers addressed this by running their model on a Google Tensor Processing Unit, which resulted in a processing time that was three times faster than on a CPU, and nearly eight times faster than on a GPU. Conventional wisdom to date was that NMT runs faster on GPUs than CPUs.

Google assembled a diverse group of researchers for the project. In the lead were Yonghui Wu, Senior Staff Software Developer, who has been with Google since 2008. Wu worked on Google’s machine learning search algorithm RankBrain. Research Scientist Mike Schuster, who’s been with Google since 2006, was previously with speech recognition software company Nuance, and Japanese telco NTT Japan.

Also in the group were Zhifeng Chen, a Software Engineer who worked on TensorFlow; Quoc V. Le, Software Engineer, who has been with the company since 2011 and also works on Google Brain; and Mohammad Norouzi, a Google PhD fellow, who has been a research intern for tech giants Microsoft and Google.

The rest of the team includes, among others, Wolfgang Macherey, who joined Google in 2006, worked in the machine translation group with Franz Och, and has been working on natural language processing since 1996.

“We observed that human raters, even though fluent in both languages, do not necessarily fully understand each randomly sampled sentence sufficiently”—Google Neural Translation Team

Back to the claim that GNMT now almost matches human translation. The GNMT team bases that claim on evaluation data consisting of “500 randomly sampled sentences from Wikipedia and news websites, and the corresponding human translations to the target language.”

Thankfully, the researchers do acknowledge they are dealing in human language and not mathematical certainty. It is the human raters of the translations, however, whose judgment is qualified: “We observed that human raters, even though fluent in both languages, do not necessarily fully understand each randomly sampled sentence sufficiently and hence cannot necessarily generate the best possible translation or rate a given translation accurately.”

What’s next? Google will now start testing the system on particularly difficult translation cases and inputs longer than just a single sentence.

Editor’s Note: A previous version of this article read that Wolfgang Macherey “works” in an MT group with Franz Och. The correct tense is “worked” as Och is no longer with Google.