Inside Google’s Custom Neural Machine Translation—AutoML Translate

One of the more significant product announcements in 2018 in neural machine translation (NMT) was Google’s launch of AutoML Translate, a cloud-based service that lets users train Google’s NMT engines with their in-domain data.

To better understand Google’s cloud offering and get a sense of where thing are going with AutoML Slator hosted Francesco Bombassei, Senior Technical Program Manager at Google, at SlatorCon Zurich 2018.

Bombassei is part of Google Cloud and its Professional Services Team that helps clients in a consultative manner regarding their cloud technology deployments. Bombassei, who is based at Google’s Zurich office, started his presentation with the challenge to secure talent. As machine learning is an emerging area that combines complex math, code, and a very large quantity of data, “the implication of this is that there is very little talent in the market that can deal with that, at least at the moment,” he said.

He said they estimated the current number of deep learning scientists to be in the thousands, not tens of thousands. This means talent is “very difficult to recruit, very difficult to retain, and by the way, you’re competing with us,” he said. “We’re trying to hire all the data scientists we can find.”

Since the technology is still emerging, it also presents the technical challenge of tools being “crude” or even “esoteric,” making them hard to learn, maintain, and develop in-house. This is where AutoML comes in.

AutoML Behind the Scenes

In a nutshell, AutoML is “a way to create custom models without you having to actually write the code,” according to Bombassei. For AutoML Translate, for instance, users can train Google’s generic NMT engines with in-domain data to make it customized for their use-cases.

“The more fitting the translations, the better the model will work,” he said. “And of course that’s intended to be your domain-specific vocabulary or your particular niche terminology.”

The user provides the data through graphical interface; no coding or command lines required. After uploading the data to the cloud, training takes “approximately three hours, depending on the complexity,” and after that the trained model can perform custom translation. “And that translation will actually be using your terms, your terminology, your dictionary,” Bombassei said.

Francesco Bombassei, Google

Under the hood, he explained that AutoML works with transfer learning and neural architecture search. Transfer learning is a way to use machine learning models as a basis for training others.

“The analogy is if you already know how to go on a bicycle, it’s gonna be much easier for you to go on a motorcycle because your brain can transfer some of the learning to the new thing,” Bombassei said. Meanwhile, he said neural architecture search “essentially [uses] machine learning to find the best machine learning model to solve the problem.”

He added that AutoML is the first commercial implementation of these two elements. Naturally, a third element is the training data, which the user provides.

“Moore’s Law is essentially dead”

Bombassei said AutoML is cutting edge from software to hardware. He said that since transfer learning and neural architecture search are quite computationally demanding, Google has come up with physical hardware chips more suited to the task: tensor processing units or TPUs.

“Those are chips specifically designed by Google to perform this type of calculations in the most efficient way,” Bombassei said. He explained that traditional chips and graphical processing units—CPUs and GPUs—are inefficient when it comes to the calculations required for machine learning. “In machine learning, you need highly parallel, low-precision calculation,” he said.

“[CPUs and GPUs] very good at parallel computations but they’re high precision, so they kind of waste a lot of energy doing things that are not needed.”

Hence the need for TPUs, and Bombassei said this year marks the third generation of Google’s TPUs. He said they are 10x more efficient or faster in training machine learning models.

“We’re hiring all the machine learning specialists we can find, we plan to accelerate the development of this technology.”

During SlatorCon Zurich’s speaker panel, an audience member asked Bombassei if Moore’s Law still applied to all this avant garde tech and if and when it will soon plateau.

“I think the consensus is that Moore’s Law is essentially dead,” he said. He explained that Moore’s Law predicts the doubling of computation power, but that growth curve is getting increasingly flatter as tech companies lean towards parallel computing. “At the chip level, the performance difference is not that much, but we’re increasing the ability to parallelize much more and that raises the scale,” he said.

As far as hardware improvements, Bombassei said the difference between the first and current, third generation of TPUs is “something like 60x.”

Regarding what’s next for AutoML and Google in general, he said “there’s quite more stuff in the works.”

“Since we’re hiring all the machine learning specialists we can find, we plan to accelerate the development of this technology,” he concluded.

Download the Slator 2019 Neural Machine Translation Report for the latest insights on the state-of-the art in neural machine translation and its deployment.

Download Francesco’s presentation here.

SlatorCon Zurich 2018 Presentation

Francesco Bombassei—Google

1.36 MB