Investor interest in multilingual voice assistants is more than just talk. A recent slew of investments and partnerships in a range of AI-enabled customer service tools attest to the increasing productization of research into machine translation (MT) and speech translation in a growing field now referred to as VaaS or Voice Assistant as a Service.
As the research behind chatbots and voice assistants matures, so do the range of use cases expand to include both industry and avocation. For example, Canadian voice recognition company Fluent.ai is integrating voice automation into factories managed by European home appliance manufacturer BSH. The Fluent.ai solution offers multilingual and multi-accent support, the goal being to increase line efficiency and improve worker ergonomics.
SenpAI.GG has raised over USD 3m to date — including USD 2.9m from ACT Venture Partners — to support a multilingual voice assistant for hardcore gamers via a desktop app.
Far from a niche experiment, users and shareholders alike now expect heavy-hitters, such as Amazon and Google, to add multilingualism to their voice assistant capabilities. In fact, Google Assistant led a July 2021 funding round that helped in-app voice assistant startup Slang Labs raise USD 0.5m. In addition to Google Assistant’s investments, participants included 100x Entrepreneurs, Thomas George, Velu Murugan, and Endiya Partners.
According to Slang Labs Co-Founder and CEO, Kumar Rangarajan, his company is the first in-app voice assistant backed by Google. Their tool, called “Slang Conva,” embeds a multilingual assistant into an e-commerce app, giving users direct voice interaction with the app’s interface. The array of AI components include automatic speech recognition (ASR) and text-to-speech (TTS).
Rangarajan said Slang Labs plans to expand from its native India to other English-speaking markets, such as the US, Australia, and England, as well as the Middle East.
Facebook Plays Catch-Up
Naturally, Google competitor Facebook is paving the way for an upgrade of its own voice assistant. Though Facebook confirmed its voice assistant project back in 2019, the impact beyond its own products has been limited, thus far, to a line of Oculus virtual reality headsets. But Facebook’s chatbot, BlenderBot 2.0, shows potential.
In July 2021, Facebook open-sourced BlendorBot 2.0 (an update of the original chatbot Blender from 2020). Built on Facebook’s multilingual speech-to-text translation tech, BlenderBot 2.0 won first place at the multilingual speech translation competition hosted by the International Conference on Spoken Language Translation (IWSLT) in 2021.
From Fast Talk to Fast Food
ConverseNow ordering assistants “George & Becky,” could very well be the poster children for multilingual VaaS in customer-facing roles. In July 2021, the Austin, Texas-based startup that caters to the restaurant industry closed a USD 15m series A round led by Craft Ventures.
Also participating were investors from ConverseNow’s USD 3.3m seed round in May 2020, including LiveOak Venture Partners, Tensility Venture Partners, Knoll Ventures, Bala Investments, 2048 Ventures, Bridge Investments, Moneta Ventures, as well as angel investors Federico Castellucci and Ashish Gupta.
George & Becky were designed to handle restaurant orders for high-volume voice channels, such as phone, drive-thru, self-service kiosks, and voice-assisted chat on mobile devices. The pandemic has, of course, driven demand for the service. More customers than ever use these limited-contact channels while restaurants struggle to retain staff. National and multinational restaurant brands already use ConverseNow.
CX automation platform Yellow.ai (formerly, Yellow Messenger), uses text and voice AI bots to communicate with customers in more than 100 languages. Clients supply data from their Salesforce, Shopify, or Cisco service accounts, and Yellow.ai deploys automated chat services, which already run on several major messaging services, company websites, and voice assistant platforms, such as Alexa.
In January 2021, the company embarked on a partnership to develop a voice assistant with Microsoft Azure; and, by August 2021, Yellow.ai closed a USD 78.15m series C led by WestBridge Capital, bringing total funds raised to more than USD 102m. New investors included Sapphire Ventures, Salesforce Ventures, and previous investor Lightspeed Venture Partners.
A Chorus of Voice Assistants
VaaS interest — and investment — is not limited to household names. Take, for instance, Seattle-based Jargon, which started out in 2017 as an on-demand interpreting service. Jargon eventually shifted its focus to developers and created a low-code platform for managing voice application content sans the need for a team of engineers. In 2019, Jargon raised some USD 1.8m from Amazon, Crosslink Capital, and Ubiquity Ventures.
International money transfer startup Remitly, which enables users to send money to recipients in more than 100 countries, acquired Jargon in May 2021 for an undisclosed amount. Remitly closed a USD 85m funding round in July 2020 at a USD 1.5bn valuation and reportedly received an unspecified investment from Visa prior to filing initial IPO paperwork in June 2021.
SlatorCon Remote December 2022 | $150
A rich online conference which brings together our research and network of industry leaders.
Flying the flag for dialectal speech technologies is Kanari.ai, which claims to detect 19 different Arabic dialects. Among their solutions are “voice driven experiences,” which include voice assistance, voice commerce, and interactive voice response (IVR) systems. Kanari.ai raised around USD 0.5m in a pre-seed round in 2021 and has partnerships with the likes of Microsoft, NVIDIA, and the United Nations.
At least one nonprofit is also getting a piece of the action. Mozilla Common Voice, a public database where volunteers can “donate” recordings of their voices and listen to others’ clips to validate them, received a USD 1.5m investment from NVIDIA just this April.
Startups, researchers, and engineers can use the database to train voice-enabled apps, including multilingual voice assistants. So far, more than 164,000 people worldwide have contributed over 9,000 hours of voice data in 60 languages.
Common Voice plans to use the funds to expand their dataset, add new staff, and engage more communities and volunteers to join the project.