Human-in-the-loop translation services and language technology provider Pangeanic was awarded almost 2M € to gather resources in all official EU language combinations and build dockerized, deployable engines for European Public Administrations
Pangeanic convinced two other major machine translation companies in Europe (Tilde and KantanMT, pre-acquisition) to join forces in a major data acquisition effort and deviate from common “bridge” practice of using English as a pivot to translate between some language pairs to create 1-1 language pairs in more than 500 language arcs. The deliverable will be a dockerized collection of “into and out of” all European language pairs that Public Administrations will be able to use and deploy as national infrastructures. The General Technical Office of the Spanish Language Technology Plan (Department for Digital Advancement) has already collaborated with these companies on previous projects and it will coordinate the evaluation of the results to ensure the engines are production-ready for use at Public Administration.
Manuel Herranz, Pangeanic CEO, said: “eTranslation is already offering a great service to European Public Administrations that require API connections or document translation, including Icelandic and Norwegian. However, some European Public Administrations need to go beyond the service to actually integrate machine translation as part of their national digital infrastructure – sometimes because the country is bilingual or multilingual, sometimes because they need to digest documentation from other Member States, for conflict resolution, to asses data in other languages, to integrate it in security or defense, or simply to help their own staff translators. There are many reasons why, in the 21st century, a country’s public sector would need machine translation as a State technology and a service to its citizens.”
“There are many reasons why, in the 21st century, a country’s public sector would need machine translation as a State technology and a service to its citizens.”
Neural Translation for the EU (www.nteu.eu) will offer a translation panel in Q2 2020 so that European users can test accuracy of the first language arcs (for example Spanish to and into French, German, Italian, Latvian to and into Baltic languages, English into and out of Romanian, Dutch and Bulgarian). It will also offers some text handling features.
The European Commission’s interest in the nteu.eu project lies in its objective of extending the coverage of the current eTranslation system, promoted by the Commission itself. Translation and language technologies are a key tool in the European strategy to create a digital single market across language barriers.
Given the great dependence that machine learning technology has on data, the Consortium’s challenge consists in obtaining a training corpus of sufficient quality and quantity to train the different engines – both with bilingual data and monolingual data. In order to complete the language pairs with less initial data, the Consortium plans to use a mixture of data coming from automatic text generation techniques using state-of-the-art multilayer neural networks, chatbot generated content, data from other projects such as Paracrawl, selected datasets from DGT clean data, data from the companies’ own repositories, TAUS and other sources.
The project received coverage in the Spanish press and technological magazines
- La Razón: https://innovadores.larazon.es/es/not/el-nuevo-google-translate-de-la-ue-tiene-sello-espanol
- Blog RuralVía : https://blog.ruralvia.com/sabias-que-una-empresa-espanola-desarrollara-el-google-translate-de-la-union-europea/
The project’s Grant Agreement number is INEA/CEF/ICT/A2018/1816500 for project proposal 2018-EU-IA-0051.