What’s so Massive About Google’s Massively Multilingual Neural Machine Translation?

Google’s AI team recently unveiled a new research paper on neural machine translation that has been five years in the making. The research paper, entitled “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges,” was published on July 11, 2019.

It was authored by a group of researchers on the Google AI Team: Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu.

The concept of massively multilingual neural machine translation (NMT) is not new, and the paper builds on existing research, such as work from Carnegie Mellon University and a separate paper authored by researchers at Bar-Ilan University and Google AI. Multilingual NMT systems differ from other state-of-the-art systems in that they use one model for all languages, rather than one model per language.

The paper identifies one obvious benefit of multilingual models as “dramatically reducing the training and serving cost and significantly simplifying deployment in production systems.”

Graham Neubig, Assistant Professor at Carnegie Mellon’s Language Technology Institute in the School of Computer Science, told Slator that “the advantages of these systems are twofold: (1) they can improve accuracy by learning from many different languages, and (2) they can reduce the computational footprint of deploying models by having only one model trained to translate many languages, as opposed to one model per language.”

How Massive Is Massive

As part of their research, Google set out to investigate “building a universal neural machine translation (NMT) system capable of translating between any language pair.” The model is “a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples.”

A point of note in Google’s research is the scale of the model, which, according to the paper, “is the largest multilingual NMT system to date, in terms of the amount of training data and number of languages considered at the same time.”

The term “in the wild” in the paper’s title comes from the fact that the training data is realistic; the researchers used an “in-house corpus generated by crawling and extracting parallel sentences from the web,” which spans a vast range of domains.

“The scale of the data is 25 billion sentences, which is several orders of magnitude larger than previous multilingual models” — Graham Neubig, Assistant Professor at Carnegie Mellon’s Language Technology Institute in the School of Computer Science

According to Neubig, the scale of the data is “several orders of magnitude larger than previous multilingual models.”

“It is also a realistic reflection of the ‘actual data’ that is available on the web, so any limitations of the translation results achieved therein are not ones that can simply be solved by adding more data, and instead are ones that will require serious research work to solve,” he added.

The (Transfer-Interference) Trade-Off

Given that the model is applied to many languages, Google was also looking at the impact of the multilingual model on low-resource languages as well as higher-resourced languages.

As a result of joint training, the model improves performance on languages with very little training data thanks to a process called “positive transfer.” However, the model also “results in performance degradation on high-resource languages due to interference and constrained capacity,” the paper said. There is, therefore, a trade-off between transfer and interference, the researchers discovered.

John Tinsley, CEO and Co-founder of Iconic Translation Machines, explained this phenomenon: “What they’re seeing is that the more multilingual they try to make the engine, i.e., the more languages they add, the quicker the quality drops, particularly for the high-resource languages, which would already have strong baselines,” he told Slator.

“What they’re seeing is that the more multilingual they try to make the engine, i.e., the more languages they add, the quicker the quality drops” — John Tinsley, CEO, Iconic Translation Machines

Commenting on the likely short-to-medium-term practical impact of the transfer-interference trade-off, Tinsley said that “developers might consider keeping their strong baselines for medium- or high-resource languages, but then having a single multilingual engine as a catchall for the lower-resourced ones.”

What Are the Implications?

The Google AI team recognizes that although they have “achieved a milestone […], we still have a long way to go towards truly universal machine translation.”

It is likely that Google and others will continue working on multilingual NMT since it “is one of the largest multi-task problems being studied in academia or industry,” the paper stated. Moreover, “many promising solutions appear to be interdisciplinary, making multilingual NMT a plausible general test bed for other machine learning practitioners and theoreticians.”

Iconic’s Tinsley summarized the paper’s findings thus: “It’s taken a number of years to get to this point and I believe this is a line of research in MT that will continue to grow strongly, particularly on the academic side and within the likes of Google whose ultimate goal is that of the universal, one-size-fits-all solution.”

Slator also spoke to Adam Bittlingmayer, ex-Google Translate engineer and Founder of ModelFront (a machine translation risk-prediction startup), about his take on the Google paper and on multilingual models in general. Bittlingmayer said he was of the opinion that “it’s the future and has to happen at some point. The idea was always there, it’s just a huge effort to execute.”

Bittlingmayer continued, “It’s an open question how well the pairs will perform if they have zero data (e.g., Swahili-Basque). The key is for transfer learning, which is to use data from the big pairs to improve quality for the small pairs.”

“The idea was always there, it’s just a huge effort to execute” — Adam Bittlingmayer, Founder, Modelfront

On the significance of Google’s multilingual NMT paper, he added that “it shows that top MT providers are strongly interested in it; in my opinion, because it would radically reduce their engineering work. So even if quality stays the same or gets 1% worse, they would go forward with it.”

According to Bittlingmayer, “This is a small step closer to how multilingual humans learn to translate. [But] “even if it works, no, AI will not take our jobs and eat us.”