One year after the launch of ChatGPT by Microsoft-backed OpenAI, Google is launching Gemini in an attempt to regain the upper hand in AI. Google CEO Sundar Pichai called the natively multimodal Gemini 1.0 the company’s “most capable and general AI model yet.”
But claims about Gemini’s performance are not limited to comparisons with Google’s products. An introductory blog post by Google pointedly states that Gemini surpasses ChatGPT-4V’s state-of-the-art performance “on a range of multimodal benchmarks,” including automatic speech recognition (ASR) and automatic speech translation.
In both cases, it was Gemini Pro that outperformed Whisper. (Gemini comes in three “sizes.” They are, from largest to smallest: Ultra, for highly complex tasks; Pro, for scaling across many tasks; and Nano, for working efficiently on devices.)
Google researchers evaluated ASR performance based on the 62-language FLEURS benchmark and using word error rate (for which lower scores indicate better performance). Whisper v3’s word error rate was 17.6%, while Gemini’s was 7.6%.
The Best of Bard
While Gemini 1.0 was trained to respond to a range of input — including text, images, and audio — Bard with Gemini Pro can currently handle only text-based prompts, “with support for other modalities coming soon.” Confusingly, Bard Gemini Pro’s linguistic capabilities are also limited for the time being, reportedly accessible in English only, albeit in more than 170 countries and territories. Google plans to expand coverage to “more languages and places, like Europe, in the near future.”
At the time of writing, Bard was still able to respond to prompts in multiple languages, including prompts to translate, and in one instance even provided a list of “new words” in a non-English language, along with (mostly correct) transliterated pronunciations.
Bard’s responses regarding its linguistic offerings, however, were inconsistent. Bard was also unable to handle non-English prompts via audio for ASR, but helpfully recommended other online tools that (it said) could.
Fans and Critics
Gemini has already inspired pundits on social media to wax poetic about AI in general and about Google in particular. (Investors also reacted, with Google’s stock price jumping the day of the release.)
“If you were impressed by OpenAI’s ChatGPT, prepare to have your mind blown by Google’s Gemini,” Aaron Francesconi, IRS Director of Data Management Services and Support, wrote on LinkedIn.
Linus Ekenstam praised Google’s strategy in a thread on X with more than 10,000 likes: “Instead of chasing hype, they have been laser-focused on certain things. Maybe much like Amazon, they win not by being the first, but by being the best.”
However, the duck-drawing demo Ekenstam called “jaw-dropping” has been outed by TechCrunch as “faked.” Not wasting any time, MIT Technology Review suggested — as soon as Gemini launched — that it “could signal peak AI hype.”
Wharton professor Ethan Mollick took a similarly measured approach, writing on X, “We really don’t know anything about Gemini Ultra. Does it beat GPT-4 for real? If so, why by such a small amount?”
Mollick went on to wonder aloud whether “the failure to crush GPT-4 shows limits of LLMs approaching.”
It seems, though, that nothing can dampen Bard’s enthusiasm, with the chatbot itself gushing that “initial testing and user feedback suggest that Gemini significantly improves the quality of Bard’s translations. As Gemini continues to evolve, we can expect further improvements in accuracy, fluency, and overall translation quality.”