The German Research Center for Artificial Intelligence (DFKI) is rallying support for the Human Language Project, a long-term and large-scale European research, development and innovation program whose ambitious scientific goal is Deep Natural Language Understanding by 2030.
Advances in Natural Language Processing (NLP) are expected to be able to tackle the many challenges of Europe’s multilingual set-up. Hence, the campaign is on for the project to be an EU Flagship Project, a science-driven research initiative that runs for about 10 years, with a total budget of around EUR 1bn.
According to the EU website, Future and Emerging (FET) Flagship Projects brings together a large number of research organizations and cannot be undertaken by one commission or a single member state.
Dr. Georg Rehm, Senior Researcher at DFKI, is encouraging organizations across the EU to provide letters of support to the advocacy.
“The idea of initiating a Human Language Project of this kind and scope – large-scale and long-term – was born several years ago at a workshop on Multilingual Europe and Language Equality in the European Parliament,” Rehm told Slator.
“Back then we had just published the key findings of our META-NET White Paper Series that 21 European languages are in danger of digital extinction. This means that more than 20 European languages, most of the languages with smaller numbers of speakers, cannot be fully used online. Not all services are available in these languages and not all services are able, internally, to process content in these languages,” he explained.
The unfortunate end would be what we call ‘digital language extinction’
This means that languages being supported far less than languages with much more speakers, such as Maltese, Lithuanian or Icelandic, revert to using bigger languages online with English being the prime example.
“The unfortunate end would be what we call ‘digital language extinction’,” he explained further.
Future-Proof All of Our Languages
Rehm clarified that it’s not only the DFKI that is pushing for the Human Language Project but many different stakeholders – from research centers to universities to members of the affected language communities, to translators to language service providers as well as smaller and bigger companies.
“All of these stakeholders do not only have an interest in bringing about our key scientific goal, Deep Natural Language Understanding by 2030, but especially in providing a balanced technology base so that we can future-proof all of our languages and make sure that we can use them online for many years to come,” he said.
To date, he said the call for letters of support of the EU project the DFKI recently circulated was met with an unprecedented level of enthusiasm.
“It was really amazing. The letters just kept pouring in!” he said.
What will it take for a project to be an EU FET Flagship Project? Rehm said it needs political will to move such a large project in motion. The politicians and administrators in Brussels need to be convinced that an investment of this size makes sense for the European society, for European industry, and for European research.
“With the proposal that we submitted on 20 February 2018 we’re trying to get a preparatory project. The goal of this prep project is to develop, together with the whole community, the key building blocks of an EU Flagship Project: the research roadmap, the governance structure, the unifying vision, the overall setup,” he said.
It was really amazing. The letters just kept pouring in!
Undoubtedly, it is a long-term process. Rehm said DFKI’s role would be to coordinate this prep project and to make sure that the consortium delivers a mature, well thought-through, sustainable EU Flagship concept that the whole community endorses and stands behind.
Ambitious Scientific Goal
Rehm explained that with NLP, researchers can identify named entities, summarize documents, and translate a text written in one language into another language. However, at the end of the day, it remains just simple processing.
“What typical NLP pipelines or systems lack is genuine, deep understanding of language. It’s not a new concept, the concept of Natural Language Understanding has been around for decades. We believe that our field is now in the position of actually being able to successfully tackle the next step, from simple processing to actual understanding of human language,” he said.
“If we manage to reach this breakthrough, the resulting Language Technologies would be a game changer in terms of massively increased accuracy, coverage, robustness, and quality. That’s what we’re trying to bring about,” he emphasized.
If we manage to create a truly multilingual continent that is supported through sophisticated language technologies, we would’ve finally overcome language and communication barriers.
The senior DFKI researcher added that all language technologies would massively benefit from a Human Language Project — from written language translation to speech-to-speech translation, from all sorts of text analytics processes (summarization, entity recognition, relation extraction) to more natural and proactive conversational interfaces.
“If we manage to create a truly multilingual continent that is supported through sophisticated language technologies, we would’ve finally overcome language and communication barriers. We would also be able to help the Digital Single Market by putting in place technologies that get rid of language barriers online, enabling online shops to sell their services or products in many different markets and languages,” he said.
Mixed Funding Model
Rehm explained that EU Flagship Projects are typically funded in a mixed model. For the Human Language Project, the mix would most likely involve the European Commission, member states and the European industry, including national funding programs and funding sources.
To date, he said a few countries are already active in the area of future-proofing their languages and building basic or sophisticated technologies for their respective languages. However, these activities are carried out in a fragmented, isolated and uncoordinated way.
“I see the Human Language Project also as an umbrella under which these national activities can be coordinated to identify and make use of synergies,” he said.