Intelligent ecosystem to improve
the governance, the sharing,

and the re-use of health data for rare cancers

Pilot data governance

The deliverable “Pilot Data Governance” details the work done so far by the consortium to identify the legal and ethical requirements needed to deploy the pilot projects across the 11 clinical centers involved in IDEA4RC. Pilot projects are case studies which will serve to test the IDEA4RC ecosystem. They will be selected among the research questions that rare cancers experts have identified in deliverable “Pilots selection” (D8.1).

Legal requirements to share health data for research purposes

Legal and ethical requirements concerning the secondary use of health data for research purposes are currently set by the General Data Protection Regulation (GDPR), adopted by the EU Parliament in 2016 and then by member states through a series of national legislations during the following years. The GDPR identifies the safeguards that should be put in place when processing personal data for purposes different from those for which they were initially collected. This is why it applies to the activities of IDEA4RC, which seeks to improve the knowledge on rare cancers by exploiting health data routinely collected by clinical centers to deliver care to patients suffering from these diseases.

The GDPR requires to identify the roles played by the different actors involved in the data processing activities. Depending on their roles, the actors of the platform need to fulfil different obligations and to put in place different kind of safety measures. These obligations are specified in data-sharing agreements.

Researchers at the European Centre for Certification and Privacy (ECCP), the legal partner of the IDEA4RC consortium, proposed three possible scenarios of data processing which differ in the details of how the analysis would be executed on the platform if the data user (the researcher who wishes to exploit the data to answer a specific research question) was to receive the green light from the data providers, i.e. the clinical centers. In each scenario, the data user and the data providers play different roles and need to comply with different requirements, which also depend on how the GDPR has been interpreted in the national legislations and data protection authorities’ guidelines.

During the last month, the legal experts of the clinical centers’ were presented with these scenarios, and they were asked to indicate their preferred one.

The results of this survey show that the preferred scenario varies across centers, due the lack of homogeneity among national and local legislation and guidelines. This lack of homogeneity requires to establish multiple types of agreements to accommodate each center’s needs.

The federated approach and the forthcoming European Health Data Space

IDEA4RC adopts a federated learning approach to allow researchers to perform analyses on the data provided by different clinical centers. “Federated” means that data never leave the hospital. Each center’s data are stored in a secure processing environment belonging to the IDEA4RC network called a “capsule”. The analysis algorithms are executed locally in one capsule after the other and only the results of the execution, say the model parameters, are shared with the researcher. This approach simplifies the data sharing agreements envisaged by the GDPR, that are the legal contracts that both the data user and the data provider need to sign to allow data analyses.

Having data stored inside capsules designed to be privacy preserving and secure also complies with the European Health Data Space (EHDS) requirements of having secure processing environments. The EHDS is the forthcoming EU regulation on the primary and secondary use of health data in the EU. The European Parliament adopted the regulation on 24 April 2024, it will take around two years for the regulation to fully enter into force.

This is why IDEA4RC researchers are taking this regulation into account in the development of the data governance layer, the component of the IDEA4RC ecosystem which will regulate the access of the researchers to the platform once it will be up and running.

The future governance of IDEA4RC

The last chapter of the deliverable presents the discussion initiated with partners on IDEA4RC governance, that is, the governing bodies, rules, and procedures for accessing and managing the IDEA4RC federated ecosystem after the end of the project.

This discussion has been carried out through a series of initiatives organized by the coordinators and the Utrecht University researchers. The starting point was the co-creation workshop held in Madrid during the third plenary meeting in (a report is available here) where participants discussed three possible implementation scenarios for IDEA4RC (the scenarios were defined in deliverable “Data ecosystem baseline value positions, value analysis, and scenarios”, a summary of which can be found here).

Based on the insights gathered during the co-creation workshops, the Utrecht University team and the coordination developed a survey focused on four main topics: (1) the IDEA4RC users; (2) general rules for accessing and re-using data about rare cancers via IDEA4RC; (3) the IDEA4RC data access application and data permit; (4) financial or other contributions to access and re-use data about rare cancers via IDEA4RC.

The survey was distributed among IDEA4RC clinical and technical partners. Based on the survey results, the researchers developed the first draft of the IDEA4RC governance to ensure a fair and open access to the data infrastructure on the basis of ethical principles and to promote a discussion on the data economy.