How to Build a Startup in Real-Time AI Speech Translation

SlatorCon London Language AI Panel 2024

Can AI solve real human problems? This is a question that Karolina Sjöberg, Co-founder and CEO of Mabel AI, and Snow Huo, Co-founder and CEO of Byrdhouse AI, have tackled head-on.

Both CEOs founded real-time AI speech translation startups in 2022. Sjöberg, a doctor by training, gained firsthand experience of the impact that language barriers can have in medical settings. She saw a pressing need for an app that translates medical conversations.

Huo was also motivated to solve her own problems first. Moving to the US at age 17, she was picked on for not speaking English fluently. Later, in her professional life, she found herself on multilingual teams whose talent was held back by differences in language proficiency. She wondered — could AI solve this?

Sjöberg and Huo took to the stage for the Language AI Startup panel at SlatorCon in London to share the thinking behind their innovative AI startups, unpack the technical challenges of speech translation, and explain how they found product-market fit.

The Mabel AI app, Sjöberg explained, translates interactions in healthcare settings. Voice-to-voice translation lets patients and healthcare providers speak freely in their own languages. 

Built with data security in mind, the app is private — it does not save or send data — and the AI models can run on-premise or locally on the phone. “This means it can be used without a network, for instance on flights and even in bomb shelters,” Sjöberg told the SlatorCon audience.

Byrdhouse AI is an AI interpreting tool for strategic business communication. 

“[Imagine] you work with companies that manufacture goods,” Huo posited. “On the supply side you need to work with vendors outside your home country. And on the demand side, you have consumers or retail partners that you work with to expand into the market,” she said.

Byrdhouse AI was thus conceived as an “all-in-one solution for product development and international sales calls in different languages,” Huo said.

Both AI applications string together three AI processes: speech recognition, machine translation, and speech synthesis.

Mabel AI started with open-source models, which Sjöberg and her team fine-tuned with medical conversations, medicine names, and medical terms.

“Our models are trained on a wide range of voices, which is of course very important in healthcare,” Sjöberg explained, adding, “and we also train with background noises because hospitals are not quiet environments.”

A key challenge is that much of the voice data available in the public domain is of people reading text, such as audiobooks. “The way people speak in the hospital when they are in pain and tired is very different and there is also a use of colloquial language that is not always in the data set,” the CEO said.

Byrdhouse AI combines specialized machine translation engines with large language models (LLMs) to achieve more natural and contextually appropriate translations. Fine-tuning is performed using a proprietary data set that contains industry-specific vocabulary and company-specific terms.

SlatorCon London Language AI Panel 2024 Snow

For both products, latency is a critical parameter. “Because it’s real-time, latency is so important,” Huo stressed to the SlatorCon audience.

A trade-off between latency and accuracy is sometimes necessary. “One challenge in the speech translation industry is that you have to wait for the text to stabilize in order for the voice to start speaking,” Huo explained.

Byrdhouse experimented with finding a threshold that optimally balances the two variables. 

“What we figured out is that around four seconds [for voice to voice] is the point where you can have a good experience — without feeling like it’s too long — and still be able to create an accurate and effective translation,” Huo concluded.

Huo also pointed out that customers have a higher tolerance for a delay in audio translation than in speech-to-text. “If a caption appears after a few sections, that’s a big deal,” she said. “We try to make it as fast but stable as possible in Byrdhouse and it’s now less than 300 milliseconds.”

The Mabel AI app sells B2B as well as into the public sector. Initially, the startup focused on interactions that would normally happen without an interpreter such as care-related interactions in a medical ward or in a home for the elderly.

“My estimate is that about 90% of interactions over language barriers in healthcare happen without an interpreter,” Sjöberg explained. “So there is a huge need that is there.”

SlatorCon London Language AI Panel 2024 Karolina

On June 13, 2024, Sweden-based Mabel AI announced a partnership with German hospital Sophienklinik Hannover to provide AI-based interpretation.

Byrdhouse AI’s own customer discovery journey started internally. Huo’s team uses the tool for every external and internal meeting and “we were able to iterate pretty fast before we launched it publicly,” Huo explained.

The focus then shifted to Byrdhouse’s external customers, who are primarily focused on manufacturing and the international organization sectors. “We became best friends with them,” Huo told the SlatorCon audience. “We wanted to know —  what value do they find from Byrdhouse? How are they using it? Why is this important to them?”

The real-time AI speech translation tool is now “like a painkiller” for these companies, according to Huo, who elaborated, saying, “this is a must-have solution. Their business won’t function without it.”

The wide-ranging panel also covered questions on business expansion, advances in speech AI, and the role of human interpreters. Recordings are available via Slator’s Pro and Enterprise plans.

You can also read more about Mabel AI and Byrdhouse AI in the Slator 2024 50 under 50 Language AI startups list.