The goal: a single machine learning model that can parse and understand input in many languages. The use case: people interacting with Alexa in their native tongue (among other commercial applications).
On April 20, 2022, Amazon announced three developments toward reaching that goal, which it termed MMNLU-22, the initials standing for massively multilingual natural language understanding or Massively Multilingual NLU.
The three developments are the release of a dataset with one million labeled utterances in 51 languages and open-source code; a competition using that dataset (deadline: June 1, 2099); and a workshop at the world’s biggest machine translation conference (EMNLP 2022 in Abu Dhabi, December 7–11, 2022).
Amazon called the dataset MASSIVE; that is, Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-Assistant Evaluation. The dataset comes with examples on how to perform MMNLU modeling so others can recreate the baseline results for two critical NLU tasks — intent classification and slot filling — as described in the SLURP (or SLU resource package) paper linked above.
NLU is a sub-discipline of natural language processing (NLP) and Amazon said they are focusing on NLU as a component of spoken-language understanding (SLU), where audio is converted into text before NLU is performed. Alexa is one example of an SLU-based virtual assistant.
The MASSIVE dataset comprises “one million realistic, parallel, labeled virtual-assistant text utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots.”
Amazon created the dataset “by tasking professional translators to localize or translate the English-only SLURP dataset into 50 typologically diverse languages from 29 genera, including low-resource languages.”
Amazon is basically trying to overcome a major obstacle of SLU-based virtual assistants like Alexa; academic and industrial NLU R&D still being limited to a few languages.
“One difficulty in creating massively multilingual NLU models is the lack of labeled data for training and evaluation — particularly data that is realistic for a given task and natural for a given language. High naturalness typically requires human vetting, which is often costly.”
Hence, R&D is “limited to a small subset of the world’s 7,000+ languages,” Amazon pointed out. “By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is scarce.”
Good Start, Wider R&D Scope Needed
SlatorCon Remote December 2022 | Super Early Bird Now $98
A rich online conference which brings together our research and network of industry leaders.
He pointed out how a single model that can understand voice commands in any language is very beneficial when users switch between languages while speaking. For example, in India, where a Hindi-English hybrid (Hinglish) is common. Or in other places such as MENA, where there is “a mix of an Arabic dialect with English words in between in the English / Latin alphabet and not the Arabic alphabet when written (Arabizi). We see similar mixes of languages across Africa, especially in Nigeria.”
Generally, Laumann explained, one needs more data for each language to train multilingual models than the amount of data needed to train a single, monolingual model. “This is an issue for many of our customers and probably most companies that are not as large as Amazon. Their alternative is to include a simple language identification model prior to letting the conversational data (i.e., a short voice command, question, written comment, or message) be processed by the NLU model.”
Companies can then train one model in each language they expect their users to speak or write in — “and let the language identification model allocate the input to the specific model that ‘understands’ that one language. When users are targeted who are known to communicate in mixed languages (e.g., Hinglish, Arabizi), the language identification model can allocate the input to such a ‘more narrow’ multilingual model.”
The CEO concluded, “We at NeuralSpace see more use for such narrower multilingual models than for a single multilingual model. The impact on NLP in low-resource languages is noteworthy, but will not change the problems for many companies, in my opinion.”
The dataset-size problem described above prevails in most companies, according to Laumann, and Amazon’s datasets are strongly based around personal assistant use cases.
Indeed, Amazon hinted at where it hopes to apply these latest developments commercially by noting that, of the more than 100 million smart speakers sold worldwide (e.g., Echo), most use a voice interface exclusively and rely on NLU to function. The company estimated that the number of virtual assistants will reach eight billion by 2023, and most will be on smartphones.
Editor’s Note: This article was updated to include quotes from Felix Laumann, CEO, NeuralSpace.