Google AI Chief Touts Ultra Low Resource Machine Translation in Gemini 1.5 Pro Launch

Google AI Chief Touts Ultra Low Resource Machine Translation in Gemini 1.5 Pro Launch

Google Research and DeepMind Chief Scientist, Jeff Dean, has a lot on his plate. But after a lengthy post to X on February 15, 2024, introducing Gemini 1.5 Pro, the latest version of Google’s multimodal large language model (LLM), Dean just had to add one more point.

“I want to draw people’s attention to the ultra low resource translation use case for Kalamang,” Dean emphasized in the later post. “In context language learning from a single grammar book!”

The 58-page technical report also explained, in greater detail, Gemini 1.5 Pro’s success in “learning” Kalamang, described as having “fewer than 200 speakers and therefore virtually no presence on the web, which means that the model must rely on the data given in context (rather than knowledge stored in its weights at training time).”

The benchmark, referred to as “machine translation (MT) from one book,” or MTOB, appears to be a benchmark established prior to Gemini 1.5 Pro — albeit relatively recently, in a 2023 paper. 

The paper’s conclusions echo claims of human parity that continue to crop up in MT-related headlines from time to time, specifically estimating that “when given a grammar manual for Kalamang […] the model learns to translate English to Kalamang at a similar level to a person learning from the same content.”

“This sounds extremely dubious,” one skeptic retorted on X. “Aren’t there only 200 people who could say whether the translation was any good? Did they weigh in?”

But fans were undeterred. One observer asked whether the LLM might be made available for trial runs in other languages, such as Icelandic. Others called Gemini 1.5 Pro “really impressive” and “mind-blowing, even in a post-GPT4 world.” (The release of Gemini 1.5 Pro comes on the heels of a February 2024 research paper highlighting Gemini as a “valuable tool” for MT.)

Google practically invited such comparisons, specifically stating in its paper that Gemini 1.5 Pro outperformed specialist models, such as OpenAI’s Whisper, at audio comprehension, including tasks with longer-context audio. 

The model’s predecessor similarly outperformed Whisper on this task, although the latest experiments also covered Gemini 1.5 Pro’s main claim to fame, the ability to handle long context, defined here as 700,000 words for text and 40-105 minutes of video. The more significant finding for the authors was the fact that Gemini 1.5 Pro’s long context capabilities did not compromise its audio comprehension.