Some weeks have passed since the keynote address at Google I/O 2021. Held in the middle of May, it included a few notable language technology-related advances; from novel assisted writing capabilities to staple machine translation updates and even powerful new language models. Nothing groundbreaking was on display.
However, the address — primarily delivered by Google CEO Sundar Pichai — highlighted how much users around the world continue to rely on Google Translate, and also mentioned promising new multimodal machine learning models with a lot of potential for natural language processing (NLP) and generation (NLG).
20 Billion Web Pages Per Month
Pichai showcased some formidable user stats across technologies powered by Google Translate. Google Translate itself had been used to translate 20 billion web pages globally in April 2021 alone, he said.
According to him, live captioning on Google Meet and Android processes 250,000 hours of captions every day, and the Translation function is also available for those live captions. As for Google Lens, the company’s Android-embedded, image-recognition technology, it visually translates a billion words each day for its users.
Pichai added that Lens has been useful to students learning in secondary languages, so Google is pushing technology in that direction, combining “visual translation with educational content from the web to help people learn in more than 100 languages.” As an example, he said users can “snap a photo of a science problem, and Lens will provide learning resources in your preferred language.”
Politically-Correct Writing Assistants, etc.
Meanwhile, in another NLP-related area, AI subsidiary DeepMind has deployed WaveNet to improve speech generation, resulting in 51 new voices for Google Assistant. Additionally, self-proclaimed breakthrough language model BERT — short for Bidirectional Encoder Representations from Transformers — was leveraged to improve search capabilities and enhance the answers Google provides to question-type queries.
“LaMDA is able to carry a conversation no matter what we talk about. You can have another conversation without retraining the model” — Sundar Pichai, CEO, Google
Assisted writing capability was also added to a feature called Smart Canvas, which appears to be an improvement on Google Workspaces that fully integrates Google Docs (and Sheets, etc.) with functionalities like Google Meet.
The idea is to create a fully integrated collaborative platform instead of just a word processing or spreadsheet function. The assisted writing capability goes beyond grammatical syntax and semantics. In an example shown during the keynote, the word “chairperson” was suggested to replace “chairman” in light of inclusive language. Initially reserved for corporate clients, these features are all being released to the public soon.
Next-Gen TPU
Pichai also announced the release of Google’s next generation Tensor Processing Units (TPUs), the TPUv4. TPUs are custom-integrated circuits designed specifically for machine learning, which underpins related technologies such as NLP and Machine Translation.
Reportedly twice as fast as the previous TPUv3, these TPUv4s come together into pods with 4,096 chips apiece. According to Pichai, “Each pod has 10x the interconnect bandwidth per chip at scale compared to any other networking technology. This makes it possible for a TPUv4 pod to deliver more than 1 exaFLOP […] of computing power.” An exaFLOP of computing power, as he put it, is equivalent to that of nearly 10 million laptops used together.
“Imagine taking a photo of your hiking boots and asking, can I use these to hike Mount Fuji? MUM would be able to understand…” — Sundar Pichai, CEO, Google
“This is the fastest system we’ve ever deployed at Google and a historic milestone for us,” Pichai said, adding “we’ll soon have dozens of TPUv4 pods in our data centers, many of which will be operating at or near 90% carbon-free energy.”
He added that these chips will be made available to Google Cloud customers later in the year.
1,000 Times Better Than BERT
The Alphabet Inc. and Google CEO also unveiled a couple of notable language and learning models: Language Model for Dialogue Applications (LaMDA) and Multitask Unified Model (MUM).
LaMDA, aptly named, is a language model specifically for dialogue applications. Unlike typical chatbots with pre-programmed replies — albeit with complex nested and branching options — LaMDA synthesizes conversation topics and answers from its training data, making dialogue open-ended and conversations more natural.
“LaMDA is able to carry a conversation no matter what we talk about. You can have another conversation without retraining the model,” Pichai said.
Currently, LaMDA is in beta and used internally. Pichai said the focus right now is quality: “We are making sure it’s developed consistent with our AI principles.”
Moving forward, he said Google is interested in incorporating the model into Google Assistant, Search, and Workspace, as well as giving resulting capabilities to developers and enterprise users.
Now, where LaMDA is only trained on text data, MUM is completely multimodal — it takes into account text, images, and audio, across all supported languages.
Like BERT, MUM is built on a transformer architecture and is reportedly 1,000 times more powerful. Instead of training on one language at a time, it does so across 75 languages at once, and also learns from imagery and audio.
Pichai illustrated how the application of MUM could improve the experience of Google users: “Imagine taking a photo of your hiking boots and asking, can I use these to hike Mount Fuji? MUM would be able to understand the content of the image and the intent behind your query, let you know that your hiking boots would just work fine, and then point you to a list of recommended gear in a Mount Fuji blog.”