BabelNet, a multilingual encyclopedic dictionary and semantic network created through funding from the European Research Council (ERC), is getting its feet wet in terms of commercial partnerships. At least one computer-assisted translation (CAT) company is already using it to improve its translations, according to its project head, Prof. Roberto Navigli.
Navigli is an associate professor at the Linguistic Computing Laboratory of the Sapienza University of Rome. He holds a PhD in Computer Science and is the principal investigator of MultiJEDI (Multilingual Joint word sensE DIsambiguation), a EUR 1.3 million (USD 1.4 million) five-year ERC grant that started in 2011. It aims to create “large-scale lexical resources for dozens of languages,” and enable “multilingual text understanding.” BabelNet is one of MultiJEDI’s outputs.
The Largest Multilingual and Semantic Encyclopedic Dictionary
BabelNet is a complex project with equally nuanced potential applications across a wide variety of industries, so Slator reached out to Navigli to understand what it is and how it affects the language services market. “BabelNet is the largest multilingual encyclopedic dictionary and semantic network: it synergistically integrates dozens of dictionaries and encyclopedia into a unified resource covering 272 languages and offering 14 million high-quality entries,” Navigli said, trying to summarize his work in a nutshell.
Currently on version 3.5, Navigli’s BabelNet is the fruition of integrating Wikipedia, the world’s largest multilingual online encyclopedia, and WordNet, the world’s most popular computational lexicon of English. Aside from these two, BabelNet also incorporates other lexical semantic resources such as OmegaWiki, Wiktionary, Wikidata, Wikiquote, OpenMultilingual WordNet, VerbNet, WoNeF, Microsoft Terminology, GeoNames, and ImageNet, all connected via a linking algorithm with lexical gaps filled in via machine translation.
What this means for BabelNet users is that they now have “a huge multilingual dictionary which visually depicts concepts, defines and translates them in hundreds of languages and provides novel ways to browse knowledge,” according to Navigli. In fact, for “groundbreaking work in overcoming language barriers […] making use of heterogeneous data sources,” BabelNet won the META prize, from the META, a “Network of Excellence” of 60 research centers in 34 countries whose mission is “building the technological foundations of a multilingual European information society.”
BabelNet’s Business Applications in Translation
When it comes to business applications, BabelNet’s potential becomes quite different compared to the perspective of a user.
“For the entrepreneur, BabelNet is an enabler of high-quality and wide-coverage multilinguality in any application, from computer-assisted translation, machine translation, visual representation of documents to semantic document similarity, news analytics and event extraction,” Navigli told Slator, adding that Babelfy, “a powerful disambiguation service,” enables much of BabelNet’s capabilities. Babelfly “can semantically index text written in any language, including text written in mixed languages (e.g. mixed Chinese and English),” Navigli said.
The professor was straightforward regarding how all this technicality is easily lost on anyone wanting to discover BabelNet’s commercial applications. “Many companies just do not realize what this can mean for them,” he said. For instance, using BabelNet and Babelfy, a company can associate “documents… with concepts and named entities which are lexicalized in hundreds of languages, hus enabling language-independent document search, comparison and analysis, and opening up new scenarios for multilingual event detection and linking.”
Other examples Navigli gave are current projects and potential partnerships in the works: “We are currently negotiating with an important newspaper publisher for a semantic service which will radically change the way journalists annotate newspaper articles and readers navigate through the online news. BabelNet will enable totally new ways of browsing the newspaper following the reader’s interests.”
And there is also the previously mentioned CAT company, which is XTM International. According to Navigli, XTM “is already using BabelNet to improve the translations and the user experience they provide to human translators.” In the field of statistical machine translation, Navigli notes that he heard from colleagues in different countries “that the performance of a standard system improves considerably by ‘just’ including all the translations in BabelNet.
“Simply Countless” Commercial Applications
BabelNet’s business applications extend well beyond translation though.
“We just started a new project with a government body to create the highest-performance system for calculating the semantic similarity between patents across languages (e.g. comparing patents written in Spanish with patents written in English) and determine the closest patent proposals to a new one and possible cases of plagiarism, also across languages,” Navigli shared. This might be relevant to EU’s recent forays into creating a unitary patent and their disclosure that they intend to use machine translation to make the process easier.
“We are also negotiating with companies and government bodies for an innovative multilingual term, concept, and entity extraction system from domain texts written in different languages. The concept and entity extraction process can also lead to the creation of a ‘custom,’ potentially proprietary, part of BabelNet which is attached to the core BabelNet knowledge base but is used for the specific interests of the customer,” Navigli said.
Obviously, BabelNet is starting to garner more attention. In terms of revenue generation or a business model, Navigli said they have API services for BabelNet and Babelfy, “but we also have requests to license both as an offline product or service.” Most of the time, Navigli explained, “companies do not really need the whole semantic network, so a customized version of it can be provided.”
BabelNet is not the only language translation or technology project funded by EU organizations that have commercial potential. The Horizon 2020-funded EUR 3 million project TraMOOC may also be commercialized at the end of its development run. Another state-funded initiative is Spain’s USD 100 million investment in natural language processing (NLP) and machine translation.
However, TraMOOC is limited to MOOC platforms and Spain is focused on becoming a leader in NLP. As for BabelNet, “the applications are simply countless,” Navigli enthuses.