EU legal landscape on health data sharing and re-use: interview with Vasiliki Tsiompanidou

In the previous months European institutions have added new pieces to the legal landscape of the Union regarding artificial intelligence, data and health data in particular. The EU is doing so to protect basic human rights while at the same time not hampering innovation and favoring data exploitation to improve European citizens’ quality of life.

Vasiliki Tsiompanidou works as a legal researcher at the European Centre for Certification and Privacy and is taking care of the legal and ethical aspects of IDEA4RC. We talked with her about how the latest events are influencing her work on the project.

What are the main challenges you are facing within the IDEA4RC project?
We are dealing with a rapidly evolving landscape from the regulatory point of view, and this mainly derives from the fact that the targets of the regulations are extremely complex technologies that develop very fast. Just think about how difficult is to agree on a common definition of Artificial Intelligence.

IDEA4RC pilot projects are meant to start by the end of 2024. By that time, we need to establish the proper framework that enables the clinical centers to share data with each other and run analyses on it. To do so we will have to abide by the General Data Protection Regulation (GDPR) adopted in 2016, which will remain the primary legislative instrument regulating health data re-use in the EU for the next two years, when the European Health Data Space Regulation (EHDS) will become applicable. At the same time, we need to develop a data governance framework legally compliant with the EHDS, which will be regulating IDEA4RC after the end of this project, when we hope more centers will join and users will start to truly benefit from it.

Why does the GDPR affect IDEA4RC pilot projects?
The IDEA4RC platform will facilitate the re-use of health data, that is personal data originally collected by clinical centers to provide care to their patients, in order to advance the knowledge on rare cancers and ultimately improve the quality of care. The pilot projects will test this kind of re-use, leveraging on the data of the 11 clinical centers that participate in the consortium and on the research questions identified by IDEA4RC cancer researchers (see deliverable D8.1).

The GDPR sets a series of legal requirements to process personal data for purposes other than the one for which they were initially collected, and this is exactly the kind of processing IDEA4RC pilot projects will carry out.

The clinical centers involved in IDEA4RC are spread across European member states and this adds a layer of complexity, since each state has adopted the GDPR in their national legislation often introducing additional or more specific requirements that differ from one another. Furthermore, each clinical center has its own policies and measures to ensure compliance with the national legislation and data protection authorities’ guidelines. We have mapped out these differences in a recent deliverable (D8.2), which will serve as the starting point for the development of appropriate data-sharing agreements to allow the deployment of pilot projects .

Anonymizing the data would simplify things?
It depends, for three main reasons. Firstly, it is not always so easy to agree on adequate and efficient anonymization procedures that would be suitable for all of the centers. Secondly, doing so on unstructured data, medical or pathology reports, can be highly demanding. This kind of data, which is essentially texts, is crucial to the IDEA4RC ecosystem since the consortium is developing Natural Language Processing pipelines to extract structured data which can then contribute to the ecosystem. Thirdly, anonymizing data remains a data processing activity and, thus, still requires that the GDPR-based procedures are followed.

How does the GDPR affect IDEA4RC pilot projects?
First of all, we need to identify the roles that each of the actors involved in the pilot project deployment will play. In particular, we need to define who are: (a) the data controllers, who determine the purposes and means of personal data processing, and (b) the data processors, who carry out the processing of personal data on behalf of the controller and under their instructions. The role of the data controller can be played by more than one subject who are called ‘joint controllers’. Identifying which role is assumed by the clinical centers that provide the data and by the data user, the researcher, can be tricky. You need to specify exactly how the platform will work. Once you have done this, you can develop a template for the bilateral data-sharing agreements between the researcher’s institution and each of the centers that agree to provide their data for the analysis. These agreements constitute the legal basis for processing the data, according to the GDPR.

What approaches are you pursuing other than bilateral data-sharing agreements?
An alternative to the bilateral agreements on which we have been working for some time now is the use of unilateral contractually binding commitment tools. These tools list all the requirements and obligations that someone would have to abide by when sharing data depending on their role, whether that’s a data user or a data holder, a data processor, or a data controller. The signatory parties to this type of contract thus commit to abide by all of these requirements. Through specific amendable annexes, one can further define the exact scenario in order to be able to further specify these obligations.

This approach would save a lot of time, because the signatory parties have already agreed on the requirements everyone needs to abide by depending on the respective roles and any data user interested in the IDEA4RC ecosystem can simply sign the unilateral commitment with the same content as it has already been signed by the other parties.

This kind of agreement has been designed for the second phase of IDEA4RC, beyond the project’s duration, when the EHDS is applicable, and the platform opens up to new users and clinical centers. We are working on getting approval from the Luxembourgish authorities so as to be able to implement such agreements to facilitate the introduction of new stakeholders in the IDEA4RC ecosystem.

What kind of advantages does the federated learning approach entails?
The IDEA4RC architecture and the federated learning methods it adopts provide a secure environment for data processing that promotes privacy.

Federated learning is a decentralized approach to data analysis. As such, no external actors have visibility of the data each center provides for the analysis, but only receive aggregated information. The centers’ data remains inside local capsules, the analysis algorithms are executed locally in one capsule after the other and only the results of the execution, say the model parameters, are shared with the user.

As a result, the federated learning approach can also simplify various elements, such as performing a Data Protection Impact Assessment, or DPIA, which the data controller is required to conduct by the GDPR every time a data processing activity is likely to result in a high risk to the rights and freedoms of natural persons. In addition, the federated learning approach and the IDEA4RC capsules comply with the requirements set by the EHDS of having secure processing environments.

What changes the EHDS will bring forward?
The EHDS Regulation establishes a new framework for the secondary use of data, ensuring both its re-use to benefit society as a whole and the protection of patients’ rights, including privacy. It lays down specific rules on how to design and implement a secure framework and paves the way for the introduction of further standards and harmonized rules.

Yet, perhaps the most important element for IDEA4RC and research projects in general is the introduction of an obligation to share data for research. Based on this, data holders including hospitals, clinical centers and doctors will be able to step away from consent as a legal basis, which can be highly problematic, especially in rare cancer research, and comply with related requirements much easier.

Finally, the EHDS Regulation ensures that patients will also be able to benefit from the research performed, since all data holders contributing data will need to be informed of any findings that they can then implement during the provision of healthcare services to their patients.

How the Data Act and the AI Act should be taken into account in IDEA4RC?
As for the current design and purpose of IDEA4RC, we should not abide by any of the requirements set by those legislations.

The Data Act applies to non-personal data collected by connected devices and its main scope is to allow users to switch easily among service providers. But, IDEA4RC could evolve in different directions in the future. So, we already considered the case in which IDEA4RC decides to expand beyond what has been envisioned so far, offering such services. In particular, that would be the case if the platform were to evolve in a manner that offers patients the possibility of moving their health data across the European Union in order to facilitate receiving healthcare services in other member states.
Even if IDEA4RC never evolves in such a direction, it is always good to have this type of legislation in mind because a more holistic approach towards compliance generates trust in other stakeholders that could be willing to join the platform.

The AI Act introduces bans on the use of certain types of artificial intelligence and additional safeguards to the use of artificial intelligence algorithms depending on the level of risk they entail. The majority of requirements introduced by the AI Act concern high-risk AI systems, which include, for instance, biometric surveillance systems, safety components for medical devices or critical infrastructures, algorithms used for recruitment or for accessing essential private and public services, such as banking, loans, healthcare and insurances. For this category of AI systems, the AI Act imposes, among others, the obligation to perform a conformity assessment, so as to prove that any risks are mitigated as much as possible, that the quality of the AI model and the data included is very high, proper data governance activities are in place, as well as all of the technical documentation and record keeping, and that there is a possibility to allow for human oversight.

Although the IDEA4RC ecosystem is unlikely to fall under the category of high-risk AI systems, it is always more prudent to collect and compile all required information at the design phase than at a later stage. Also, being able to validate that IDEA4RC is compliant with the conformity assessment requirements imposed by the AI Act for high-risk AI systems, increases the platform’s trustworthiness and robustness, encouraging others to join it.