Meta Doubles Down on Direct Speech-to-Speech Translation

Meta Speech-to-Speech Machine Translation

On June 13, 2022, Meta (fka Facebook) published a post about a “direct speech-to-speech translation (S2ST) approach.” Direct S2ST eliminates the text-generation step in spoken language conversion, thus including languages without writing systems.

Typically, S2ST requires speech recognition followed by text-to-text translation and, finally, the conversion of text back into speech.

Meta’s multilingual textless S2ST methodology uses systematically-processed audio samples in a type of system training the company described as “mined speech to speech data.” It uses mega speech samples that include their own Meta AI FAIR S2ST and multilingual Vox Populi audio datasets.

The social media giant described the approach as the first S2ST framework “trained on real-world open sourced audio data.” It is now being tested using the University of Pennsylvania’s Fisher Spanish-English speech translation corpus, an audio database of 139,000 sentences from phone conversations in Spanish.

Scientists involved in this and similar projects at Meta claim that, until now, S2ST systems had not been successfully trained with “publicly available real-world data on multiple languages.”

The implications for this advancement are many, including language-neutral connectivity across live action platforms for business or leisure — while transforming the interpreting landscape a lot sooner than many anticipate.

Meta researchers expect their novel speech-to-speech translation research will make a difference in translation quality, language conversion speed, and improved communication for users.

In a sort of surreptitious application crowdsourcing, it has made available free of charge all related papers and code on the blog post, stating its “hope to enable future direct speech-to-speech translation advancements across the research community.”

Whether in the hands of the lone developer, techie entrepreneur, or academic researcher, a scientific breakthrough of this nature has the potential of shortening the path to multilingual implementations within the “Metaverse” and beyond.