Intelligent ecosystem to improve
the governance, the sharing,

and the re-use of health data for rare cancers

Hot takes from the fourth consortium meeting in Milan

IDEA4RC members gathered in Milan on 29 and 30 April for the fourth consortium meeting, hosted by the National Cancer Institute. It was the occasion for the various working groups to share updates on their most recent activities, and to validate the first version of the virtual assistant, the software component which will allow IDEA4RC users to explore the data, request a permit to access them and finally run their analyses.

The meeting was kicked-off by Christina Kyriakopoulou, scientific officer at DG Research and Innovation, who stressed the importance of collaboration among the projects of the cluster “Innovative Tools for Electronic Health Records and patient registries”, AIDAVA, DataTools4Heart, eCREAM, IDEA4RC and RES-Q+. She also highlighted the possibility of sharing expertise with the QUANTUM project, started in January 2024, that aims at developing health data quality standards, an effort that also IDEA4RC researchers are also undertaking.

Vasiliki Tsiompanidou, legal researcher at ECCP, summarised the work done so far on the data agreements necessary to run pilot projects. An advanced draft is under revision by the legal experts of the centres of expertise, the 11 clinical centres of the EURACAN network involved in IDEA4RC.. The draft has been developed based on a survey run among the legal experts about three different scenarios in which the data processing could happen within the ecosystem. A summary of the survey results and a description of the scenarios have been included in deliverable “Pilot data governance”, a summary of which is available here. The centres of expertise are expected to send their final feedback by the end of June.

Claudia Egher, sociologist at Utrecht University, summarised the results of a survey run among IDEA4RC members about the governance model of the data ecosystem to be adopted after the end of the project. These results are also discussed at length in deliverable “Pilot data governance”. As a next step, a survey on data governance will be run among the legal representatives of the centres of expertise- This survey will gather insights into the rules and procedures of the various centres in order to understand how the high-level picture of the governance model emerged from the first survey, could be implemented.

Eugenio Gaeta and Franco Mercalli updated about the deployment of the IDEA4RC architecture and of the pilot projects, that will later this year.

Ioanna Drympeta, researcher at CERTH, updated participants about the development of the Data Governance layer, the software component of the IDEA4RC ecosystem which will manage the data permit application phase. This layer will allow researchers interested in running analyses over IDEA4RC data to easily submit multiple applications towards the centres whose data they wish to include in their study. Drympeta gave details about the privileges and responsibilities of the various profiles involved in this phase, both on the data holders side (the centres of expertise) and on the data users side (the research team). A chatbot trained on scientific publications concerning ethical and legal aspects of the use of data and AI algorithms will guide the researchers during this phase.

Unai Zulaika, engineer at Deusto University, revised all the steps that led to the formulation of the two IDEA4RC data models, one for sarcomas and the other for head and neck cancers. The process started from clinicians identifying the relevant variables for each rare cancer family. These variables were then converted into a set of entities related to each other through a diagram. These two diagrams represent the two IDEA4RC data models, one for each of the two rare cancer families considered by the project.

Laura Lopez and Itziar Alonso, engineers at UPM, shared with the participants the first mock-ups of the virtual assistant which will allow researchers to interact with the IDEA4RC data ecosystem, from data discovery to data access application and data use. A lively discussion with the oncology researchers followed. It gave useful insights to tailor the virtual assistant to their needs and practices.

The meeting ended with a few demonstrations from technical partners. Frank Martin, software engineer at IKNL, showed how to use the suite of federated learning tools of Vantage6 which will be integrated into the IDEA4RC ecosystem. Soumitra Ghosh and Alberto Lavelli, researchers at FBK, showed the latest results of the large language model developed to process clinical texts, such as physicians notes and pathology reports, with the aim of extracting data from them.