The European Union is pouring fresh investment into language data under a new EUR 8m (USD 8.7m) contract that has been awarded to a consortium of four organizations.
The contract, named “A Common European Language Data Space” (LDS), centers on language data sharing and exchange. Specifically, the LDS will lead to the “creation of an infrastructure for sharing and exchanging language data, tools, services and models; to define — in full respect for European values and compliance with European and / or national rules — commonly agreed principles and practices for the Language Data Space.”
In addition, the contractors will deploy the Language Data Space and “promote its increasing uptake across the EU via a series of communication and dissemination activities as well as social media campaigns.”
The winners of the DG CONNECT-led contract were announced on January 18, 2023, after a competitive process in which three tenders were received. The winners are the German Research Center for Artificial intelligence (DFKI), the Evaluations and Language Resources Distribution Agency (ELDA), the Athena Research and Innovation Center in Information, Communication, and Knowledge Technologies (ILSP), and Tilde.
The contract will run for three years, with the possibility of a one-year extension.
Language Resources Monetization and Re-use
The original call for tenders, which ran between July and September 2022, explained that “the LDS will be part of a connected and competitive European data economy, supporting the valorization and re-use of language resources within the European Data Spaces Ecosystem.”
More specifically, with the LDS, participants will have access to a single platform on which they will be able to share and monetize their language data as well as other language resources — such as language models, tools, or services.
The LDS will therefore considerably increase the availability of high-quality data that is essential for the development and deployment of large language models (LLMs) and other AI-based language technology services for a variety of industries.
European Language Technology (ELT) — the joint communication channel for the sister projects, European Language Grid (ELG) and European Language Equality (ELE) — announced in its January 2023 newsletter that the LDS will be connected to existing repositories such as ELRC-SHARE and European Language Grid (ELG).
According to ELT, the stakeholder groups that are specifically targeted include language technology providers and the media industry, as well as research, public administrations, cultural associations, NGOs, and European citizens.