Address from Philippe Gelin

Dear All,

What a year, what a decade!

The European Language Resource Coordination (ELRC) initiative, as we know it, is now coming to an end, but it has been a cornerstone to the multilingual digital ecosystem that is taking form in Europe.

The EC now offers a significant set of state-of-the-art Language Technologies tools, free of charge, to the public sector, SMEs, NGOs and academia. But performant Language Technologies are only possible with clean, high-quality training language data. With the support of the Member States and the CEF/DIGITAL associated countries, ELRC has been instrumental in contributing to collecting and giving access to language data for a range of languages.

Since the start of ELRC in June 2014, 3,306 Language Resources have been made available through the ELRC-SHARE repository. Researchers, developers, administrations and language professionals have benefitted from them not only to work towards preservation of digitally endangered/lesser-used languages, but also to help lower language barriers, making it easier for people to access information, even in times of crisis. What fantastic reactions when ELRC called for COVID-19 specific text data or when ELRC organised data collection to build the Ukrainian machine translation, right after the war started.

6 conferences and 86 workshops! Europe is large and diversified. ELRC never stopped promoting the collection of multilingual language data at national, regional and local levels. Some workshops enjoyed record levels of attendance, while the conferences, associated to major gatherings in the field, or EU Council presidency events, helped enormously in increasing visibility and highlighting the importance of both language data and the latest language technologies.

The ELRC White Papers series were also a fantastic medium to share information. The latest one, “AI for a Multilingual Europe”, is providing valuable insights into the state of language resources in each of the countries addressed by ELRC. These White Papers enshrine that the importance of language data is not equally recognized across all the Member States.

The future is bright and ELRC has been an essential actor in its drawing. Under the DIGITAL Europe Programme, the Language Data Space (LDS) project will start in January 2023. This project will take the effort of ELRC to the next level and create a whole ecosystem around language data by advancing the collection and exchange of language data in public and private sectors. Beyond the LDS and thanks to the Digital Decade Policy Programme, we are also actively looking into how to further synchronise and federate Member States efforts to use language data, for instance, by building large language models (LLMs) and supporting economies of scale.

Last but not least, under the HORIZON Europe Programme, a call has just been published on ‘Natural Language Understanding and Interaction in Advanced Language Technologies’.

Let me wish you all the best for this end of the year break but also for our future endeavours.

Philippe Gelin
Head of DG/CONNECT – European Commission

2022-12-20