It recognizes speech (that is, automatically — as in automatic speech recognition). It translates speech into speech (or text), and text into text (or speech) — in 100+ languages. Meta’s new Massively Multilingual & Multimodal Machine Translation (SeamlessM4T) is the Swiss army knife of language models. Proud parent Meta introduced the new model in a blog post published on August 22, 2023.
The SeamlessM4T launch follows a number of language technology announcements by Meta over the past 12 months. These include low resource massively multilingual MT in mid 2022, massively multilingual speech translation in May 2023, and multilingual speech model Voicebox in June 2023. The social media giant is spending considerable resources on tackling the language problem of its metaverse vision.
On X, one observer described SeamlessM4T as “revolutionary” and called it a “game-changer.” Another gushed, “It’s not just a tool; it’s a step towards a world where everyone can be understood, regardless of language.”
“The code switching support of SeamlessM4T is pretty cool!” shared a fan with a sense of humor. “It doesn’t do very well with my French or Japanese, but then again neither is very good.”
One Dr. Hubertus Becker questioned the model’s reliability for critical translations, noting, “It’s concerning that an experimental demo can alter the meaning of input words.”
Kalev Leetaru, reporting on SeamlessM4T’s performance in translating Weibo social media posts, cited inconsistent results.
“For some posts it yields translations that compare favorably to both NMT and LLM translations, but with the added cost of having to use language-specific punctuation rules to split into sentences to translate a sentence at a time,” Leetaru explained. “For other posts, it yields subpar translations that can remove or truncate key details, suggesting promise but that it is not quite ready for production use.”
Better than Whisper?
Of course, the more than 60 authors behind the August 22, 2023 paper introducing SeamlessM4T, believe in what they dubbed “the first multilingual system” to translate from and into English for both speech and text.
If the stats behind SeamlessM4T’s training seem somewhat disparate, that might be because the model required training in so many (formerly) separate and siloed tasks. Similarly, the number of languages handled by the model varies by task.
SeamlessM4T can provide automatic speech recognition (ASR) for almost 100 languages; speech-to-text (STT) translation for nearly 100 input and output languages; speech-to-speech translation and text-to-speech translation for nearly 100 input languages and 36 output languages (including English); and traditional “text” translation for close to 100 languages.
According to the authors, Meta’s motivation for the new model was to work around the existing separate systems that can complete the above tasks — but generally perform well in only one modality per system.
SeamlessM4T, by contrast, reportedly achieves state-of-the-art results for all these languages while offering “multitask support” in a single model. The paper also asserts that SeamlessM4T outperforms its previous SOTA competitors, namely Whisper and AudioPaLM-2.
Meta has publicly released the contributions to its new model, and encourages researchers and developers to build on this first iteration.