2024/04/12

Study on the Pan-European Railway Data Factory, as the basis for AI-based railway operations, has been concluded

The European railway sector is currently facing one of the biggest technological leaps in its history: Many railway infrastructure managers and railway companies are striving for a high degree of automation and digitization of railroad operations. The aim is to significantly increase the capacity and reliability of rail operations. In Germany, the Digitale Schiene Deutschland sector initiative (DSD) has been set up for this.

Automation and digitization go hand in hand with the increasing generation and use of data in the rail system. Especially in the context of fully automated driving (so-called automation level 4, GoA4), sensors and cameras are needed to be able to react automatically to hazards in the rail environment by means of artificial intelligence (AI). For example, to develop such a AI software for environment perception, it requires very large amounts of data with very high data quality. However, it appears that individual rail companies or railway vendors will not be able to collect enough sensor data to sufficiently train AI for fully automated rail operations.

For this reason, it is assumed that a kind of pan-European Railway Data Factory is needed – an ecosystem with a common infrastructure that enables railroad companies and suppliers to collect and process sensor data and make it available for shared use. In this way, simulations can be carried out and AI models can be trained, certified and used in automated railroad operations.

The now concluded CEF2 RailDataFactory project was a study jointly conducted by DB, SNCF and NS and co-funded by the European Health and Digital Executive Agency (HADEA). Its aim was to assess the feasibility of a pan-European Railway Data Factory from a technical, economic, legal, regulatory and operational perspective and determine the next steps towards deploying a pan-European Railway Data Factory.

The study has achieved all planned objectives. In particular, it has

identified the main operational scenarios and use cases that a data factory should cover;
derived the requirements of these scenarios and use cases for the data factory infrastructure (in particular w.r.t. the Pan-European Railway Data Factory Backbone Network, security, data and IT platforms, etc.);
determined legal and regulatory aspects to be considered as well as a possible economic incentive model for the pan-European Railway Data Factory;
developed concrete recommendations on how a pan-European Railway Data Factory could be set up and grown, also including specific governance proposals.

In short, the study has helped to underline the value of a pan-European Railway Data Factory for the development of fully automated driving, and also the role of such infrastructure and ecosystem as overall catalyst for technological advancement and innovation in the railway sector. Various aspects have been identified that require special attention, such as legal and regulatory aspects, but no major showstoppers have been identified.

All public deliverables of the project are now available under the links below.

We would like to use this occasion to thank our partners from SNCF and NS for the good and very efficient collaboration in this project!

Public deliverables:

Study on the Pan-European Railway Data Factory, as the basis for AI-based railway operations, has been concluded

Read more about Data Factory and multi-sensor data set

Data Factory - "Data Production" for the training of AI software

First freely available multi-sensor data set for machine learning for the development of fully automated driving: OSDaR23

Study on the Pan-European Railway Data Factory, as the basis for AI-based railway operations, has been concluded

Read more about Data Factory and multi-sensor data set

Data Factory - "Data Production" for the training of AI software

First freely available multi-sensor data set for machine learning for the development of fully automated driving: OSDaR23

Data Factory - "Data Production" for the training of AI software