Amazon’s voice-activated intelligent personal assistant, Alexa, can now translate short sentences from English to 36 major languages with the Translated Alexa skill. Developed by language services provider Translated.net, the skill has become the most used translation-related Alexa skill just a few months after its release. It clocks in 3,000 to 5,000 daily users, according to Simone Perone, Product and Marketing Director for Translated.net.
“What’s interesting and sometimes funny is to read through the reviews to understand how people are using it,” Perone said, quoting one review with a 5-star rating: “I wanted to say sorry to my girlfriend in all the languages of the world.” He explained the skill is mainly used for entertainment and language learning.
The Amazon Echo—the physical device that houses Alexa—controls 70% of the emerging voice-enabled speaker device market. Amazon Echo beats Google Home 70.6% to 23.8% in market share. Estimates are forecasting that the Echo will be used by 35.6 million US consumers at least once a month in 2017, indicating a 128.9% jump from 2016’s numbers.
Amazon is no stranger to machine translation (MT). The company acquired MT vendor Safaba in 2015 and turned it into an internal MT research group and then opened an office dedicated to MT two years later. Despite this, it was Translated.net that brought translation capabilities to Alexa.
Starting from Smart Chatbots
Perone said they started from experimenting with chatbots. They have a multilingual chatbot that takes over website chat support when representatives are offline.
Clients can get quotes and even order translations from the chatbot, which is also accessible via Facebook. Clients can talk naturally to the chatbot, giving it specifics such as source and target language and word count, and the chatbot can provide quotes or even take orders. Alternatively, clients can upload the document and the chatbot does the word count.
So was this the final product Translated.net envisioned?
“We have translations. Alexa can’t do translations natively. So we decided to give users this feature and see how they react if they like it or not.”
Perone said they originally wanted to create a real-time translator, but soon found that Amazon’s intent behind the Echo was not aligned with this ambitious goal. For instance, Alexa can only be in “listen mode” for less than 10 seconds.
So plans changed. “We decided to go to these voice-controlled devices because we wanted to give users access to professional translation technology in a simple way,” Perone said. “We have translations. Alexa can’t do translations natively. So we decided to give users this feature and see how they react if they like it or not.”
A 3-Layer Tech Stack
The Translated Alexa skill uses a three-layer tech stack to deliver translations of short English phrases.
Amazon’s own Lex engine captures audio and performs natural language processing. The processed output is then sent to Translated.net’s model built inside Alexa, which deciphers the user’s intent, i.e. what phrase to translate to which language. The actual MT happens through Translated.net’s proprietary translation memory technology called MyMemory, which is a corpus of machine translated segments corrected and improved by human translators with 1.5 billion contributions.
Once a translation has been performed on the English phrase, it is sent to a couple of external vendors of speech-to-text services. These same vendors can only process 36 languages, thus the limitation. Translated.net performs some optimization on the final audio output, such as making it louder and clearer.
The result: the requested translation is spoken in an accent native to that language a second after the command has been received by Alexa.
Perone said they intend to internally develop speech-to-text tech for languages currently not supported by their vendors, as MyMemory can translate more than a hundred languages.
Future Business Applications
So what’s the end goal here for Translated.net?
“It’s not a revenue stream for our company,” Perone said. “At the moment, this skill will stay as it is: a translator.” The skill is not only free to use but also ad-free.
The skill received a lot of feedback from users and Amazon itself, which contacted Translated.net to encourage its ongoing development. Perone shared that the skill has been recognized as one of the most engaging skills in the ecosystem for the months of July, August, September, and October 2017.
The skill is not only free to use but also ad-free
What Translated.net has been doing is improving user experience (UX), translations, and text-to-speech. They will also be releasing it in Amazon’s recently opened India store.
Then again, arguably a more important achievement is successfully developing a proof of concept applicable across industries, from B2B to B2C.
“We wanted to explore the opportunities behind Alexa devices in terms of product and face the challenge of building a VUI (voice user interface),” Perone said. Their translation proxy fits any scenario where a user and an app communicate—by speech or text—in two different languages.
“We believe that instant vocal translations will play a big role in the industry of tomorrow,” said Daniele Patrioli, Translated.net’s Vice President of Marketing. “We meant to address our products to early adopters.”
First in a Race That’s Barely Begun
Come 2020, RBC Capital Markets estimates Amazon Alexa users at around 128 million, bringing in USD 10bn in revenue. Though they have some catching up to do, Google and Apple are also on the bandwagon.
The investment bank sees three main revenue streams for this market: device sales, voice driven shopping sales, and platform revenues. As the number of skills rises, RBC predicts Amazon will likely create a marketplace of paid skills on Alexa.
Meanwhile, Patrioli said so far, they are the only translation company to release an Alexa skill that addresses this future market. They are likewise planning something similar for Google Home while waiting for Apple to release their own voice-controlled device.
Perone weighed in on their development journey, explaining that they had to go through several iterations of the skill, incorporating negative user feedback along the way. User experience (UX) was particularly tricky, but also exciting: “A lot of us we come from the web. We know about UX on the web, but here we have to design a voice user interface.”
“In the near future where devices like Amazon Echo become very common in the market, each family will have one in each home”
“So when it comes time to develop for future applications, we can provide the technology to do that,” Perone said. On the same tech stack, it will be possible to build several innovative new services or apps to respond to the increasing demand of instant voice translation services.
“In the near future where devices like Amazon Echo become very common in the market, each family will have one in each home,” said Patrioli.
By the time that consumer need becomes the norm, the proof of concept behind their Alexa skill will power applications such as voice-enabled virtual assistants that help users shop online in any language.
Translated.net will have the necessary knowledge in the field to position themselves as a specialized language technology provider of this three-stack service: voice recognition, adaptive MT, and text-to-speech.