The Seamless suite of models comprises SeamlessM4T v2, SeamlessExpressive, and SeamlessStreaming, plus a unified model, simply called Seamless, which combines the capabilities of all three.
SeamlessM4T, introduced in August 2023, serves as the foundation for SeamlessExpressive and SeamlessStreaming. SeamlessExpressive preserves expression, such as pauses, emphasis, and tone for speech-to-speech translation for English, Spanish, German, French, Italian, and Chinese.
“We’re also thrilled to share SeamlessStreaming,” Research Manager Paden Tomasello said in a video shared by Meta. He described it as the first massively multilingual model able to translate speech and text in real-time.
SeamlessStreaming can handle automatic speech recognition and speech-to-text translation for “nearly 100 input and output languages.” Speech-to-speech translation works from nearly 100 input languages into 36 output languages.
Tomasello noted that Seamless would be the first translation model to include watermarking to indicate model-generated audio output — a feature that seems to adhere to the US’s recently issued first Executive Order on AI.
“Watermarking actively embeds a signal that is imperceptible to the human ear, but still detectable within the audio using a detector model,” Meta stated in its blog post. “Through this watermark, the origin of the audio can be accurately traced.”
According to Meta’s blog post, SeamlessM4T v2 achieves state of the translation art quality for speech-to-speech and speech-to-text translation, and “also beats Whisper v3’s for automatic speech recognition on average and in particular for lower resource languages.”
SeamlessStreaming, meanwhile, is also said to achieve state-of-the-art low latency quality with speech-to-speech translation.
Meta has made available to the public (i.e. developers) all four Seamless models, as well as metadata, data, and alignment tools.
The end goal of all of this open-sourcing of advanced speech translation AI seems clear: get developers to accelerate Meta founder Zuckerberg’s vision of a frictionless, language-agnostic metaverse.