Live Speech-to-Speech AI Translation Goes Commercial

Live Speech-to-Speech AI Translation

The adoption of live speech-to-speech translation (S2ST) has rapidly accelerated across multiple commercial applications since mid-2023.

Large multimodal language models such as Meta’s SeamlessM4T, a model that can translate and transcribe speech in more than 100 languages, and Google’s AudioPaLM, a model that its creators claim “can process and generate text and speech with applications including speech recognition and speech-to-speech translation,” are helping reduce latency in end-to-end spoken language conversion (i.e., end-to-end speech translation, or E2E-ST).

An increasing number of language AI researchers are focussing on S2ST, with multiple academic and private teams working on model improvements, including accuracy and incorporation of low-resource languages into the mix.

But how does all this research work translate into practical applications in the real world? Here we briefly examine five recent examples of live, business speech-to-speech translation applications:

Business Meetings

Microsoft Translator, integrated with the Teams meeting and communications application, supports real-time speech translation in over 30 languages through Azure AI services. The app can be customized with specific terminology, using supported bilingual training documents, tuning documents, test documents, a phrase dictionary, and a sentence dictionary.

Live Events and Conferences

Google Translate has a feature that enables two-way communication during live events by translating speech from multiple participants. Interprefy and KUDO also have real-time AI speech translation functionality, including multi-directional conversion for different languages.

Retail and Hospitality

S2ST is used to translate conversations between customers and hotel and restaurant staff. This is usually done via an app running on a device, like a mobile phone or a tablet. An example of a commercial product is SoundHound AI Voice AI Concierge, which is available in 25 languages. 

Healthcare

Existing providers of voice technologies for healthcare organizations have begun to implement speech-to-speech translation for their clients. An example of this type of technology provider is Orion Labs, which offers live speech translation via the Push-to-Talk 2.0 platform, among other services, to hospitals.  

Customer Service

In settings such as transportation systems, multilingual customer service is crucial to a smooth operation. The City of Paris, for example, commissioned accessibility company Ava to develop a system employing S2ST on a tablet that subway customer service agents can use to assist non-French speakers. The system will be put to the ultimate test during the 2024 Olympics.