Google Irons Out Another Kink in Live Speech-to-Text Translation Via New Update

Google Speech to Text Automated Translation

Over the course of a single year since the pandemic started, real-time translation / transcription morphed from niche to core feature as the world’s conferences moved online — and the tech is trying to catch up.

As a speaker works through a presentation, a running translation at the bottom of the screen can keep the online audience engaged. Not so much if that translation keeps flickering.

That is one distracting thing about live speech-to-text translation: frequent revisions to the translated text as words are added (spoken) to a source sentence, changing the context of the words at the start of the translation. The dependency on source words yet to be spoken results in a flickering of ongoing translation.

This phenomenon is what Google hopes to minimize in a newly released update to Google Translate (i.e., transcribe feature), according to a January 26, 2021 Google AI blog post. The post was authored by two scientists from Google Research; Naveen Arivazhagan, Senior Software Engineer, and Colin Cherry, Staff Research Scientist.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

The models behind this new Google Translate update are discussed in papers co-authored by the same two scientists and published on pre-print server arXiV. The first has to do with reducing the instability and latency of the live translation.

In “Re-translation Strategies For Long Form, Simultaneous, Spoken Language Translation,” researchers used TED talks for multilingual test data to come up with an evaluation framework using three metrics: Erasure, Lag, and BLEU score.

The second model is about reducing that distracting flicker. In “Re-translation versus Streaming for Simultaneous Translation,” the scientists “reduce erasure and achieve a more favorable Erasure / Lag / BLEU trade-off.”

Moreover, they minimize flicker, “by truncating some number of words from the translation until the end of the source sentence has been observed. This masking process thus trades latency for stability, without affecting quality,” the authors explained in the blog.

They added, however, that “reducing erasure is just one part of the story” and they look forward to developing new technology that can reduce latency and “enable better transcriptions when multiple people are speaking.”

Among the more prominent use cases for live speech-to-text translation deployed at scale are the meetings and debates of the European Parliament.

As Slator reported in September, two consortia and Microsoft Belgium were named awardees of a contract for a tool that can automatically transcribe and translate multilingual parliamentary debates in real time in 24 languages.