EU to Grant EUR 5M for Low Resource Language Data and MT Implementation

In its 2018 work program, the Connecting Europe Facility – Telecommunications Sector (CEF Telecom) is making available up to EUR 5m in grant funding to projects that will support the collection of more language resources for eTranslation, the European Union’s newly launched neural machine translation (NMT) facility, and its integration into online services that require multilingual functionality.

CEF Telecom said in its call for proposals that the projects must offer collaborative language resource projects, which include “identification, processing (anonymization, aggregation, alignment, conversion, etc.) and collecting language resources in the EU member states.”

The deadline for submission of grant applications is September 18, 2018. The preparation and signature of grant agreements will take place between January and June 2019.

“Priority will be given to resources in those languages for which there is not enough data available to offer good quality eTranslation services,” CEF said. “Generic language resources may also be targeted to improve the overall quality of eTranslation for a broader set of text styles and to extend its lexical coverage.”

Real-Life Use Cases

Also needed are proposals for integration projects that will use the eTranslation service (alone or in combination with other commercial tools) in relevant digital services within the EU, especially public administrations, citizens, and businesses.

According to CEF, while eTranslation is mainly intended to be integrated into such digital services, “it also offers useful stand-alone services for the translation of documents and snippets of text.”

“Priority will be given to proposals applying mature cutting-edge language technologies other than machine translation (e.g. for interactive translation, semantic interoperability),” CEF explained.

From MT@EC to eTranslation

For years, the EU has been using MT@EC, an online MT service based on the MOSES open-source translation toolkit, a statistical machine translation (SMT) system. This facility has been retired following the launch of eTranslation, which was built using neural network technology.

CEF, however, said that eTranslation builds on MT@EC “whose translation engines are trained using the vast Euramis translation memories, comprising over 1 billion sentences in the 24 official EU languages, produced by the translators of the EU institutions over the past decades.”

But unlike MT@EC which offers general-purpose translation, eTranslation “will gradually adapt to specific terminology and text types that are typically used in specific contexts, such as tender documents, legal texts, medical terminology and so forth. It will also help reduce the time and cost of translating documents,” according to CEF.

Hence, eTranslation requires a much bigger scope of language resources and translation data.

With eTranslation, the EU has gone all out on NMT. Only recently, CEF has released the last 15 neural engines for translation, completing its migration to artificial intelligence-based translation. The latest release covers automated translation from English into Greek, Spanish, Italian, Maltese, Portuguese and Romanian; and from Danish, Dutch and Slovenian into English, according to CEF in a recent blog post.

This means that NMT is now available for all 24 official languages of the EU. More specifically, eTranslation, which was developed by EU’s Directorate-General for Translation, can translate documents between all official EU languages.

The push for neural MT at the EU began years ago with funding from CEF and co-funding from various EU research and innovation programs. Its goal, according to CEF, is to continuously develop machine translation capabilities that can be used by European and national public administrations to exchange information across borders.

Image: Andrus Ansip, Vice-President of the EC, who oversees the Digital Single Market project, of which CEF is a part. Image Source: EC – Audiovisual Service