The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical need for interoperability and standardization. However, the heterogeneous nature of real-world clinical data poses significant challenges to ensuring seamless data exchange and secondary use for research purposes. These challenges include syntactic inconsistencies (e.g., variable use of terminologies like ICD-10 vs SNOMED CT), semantic mismatches (e.g., differing conceptualizations of disease staging across institutions), and structural fragmentation (e.g., laboratory results encoded in free text rather than structured fields). Fast Healthcare Interoperability Resources (FHIR) has emerged as a leading standard for structuring and harmonizing healthcare data, enabling integration across diverse systems. This work presents a FHIR-based transformation pipeline that leverages Resource Description Framework (RDF) to convert raw, conceptually heterogeneous oncology data into research-ready, semantically enriched datasets. By representing FHIR resources as RDF graphs, our approach enables semantic interoperability, enhances data linkage across heterogeneous sources, and supports automated reasoning through ontology-based queries and inference mechanisms. The pipeline employs a templated conversion strategy, allowing for the declarative definition of mappings that enable domain experts to focus on the data model. In Cancer Virtual Lab, we applied this methodology to a real-world oncology dataset comprising 36,335 anonymized patient records, successfully converting 1,093,705 clinical records into 1,151,559 distinct RDF-based FHIR resource types. The process incorporated syntactic and semantic validation, along with expert review, to ensure technical correctness and clinical relevance. Our results demonstrate the feasibility of semantically integrating oncology data using FHIR and RDF, fostering machine-readable, interoperable knowledge representation. This enriched representation supports data quality monitoring and improvement, data harmonization, longitudinal analysis, advanced analytics, and AI-driven decision support, promoting large-scale secondary use.

From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting / Carbonaro, Antonella; Giorgetti, Luca; Ridolfi, Lorenzo; Pasolini, Roberto; Pagliarani, Andrea; Cavallucci, Martina; Andalò, Alice; Del Gaudio, Livia; De Angelis, Paolo; Vespignani, Roberto; Gentili, Nicola. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - 197:Pt B(2025), pp. 1-14. [10.1016/j.compbiomed.2025.111051]

From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting

Livia Del Gaudio;
2025

Abstract

The exponential growth of healthcare data, driven by advancements in medical research and digital health technologies, has underscored the critical need for interoperability and standardization. However, the heterogeneous nature of real-world clinical data poses significant challenges to ensuring seamless data exchange and secondary use for research purposes. These challenges include syntactic inconsistencies (e.g., variable use of terminologies like ICD-10 vs SNOMED CT), semantic mismatches (e.g., differing conceptualizations of disease staging across institutions), and structural fragmentation (e.g., laboratory results encoded in free text rather than structured fields). Fast Healthcare Interoperability Resources (FHIR) has emerged as a leading standard for structuring and harmonizing healthcare data, enabling integration across diverse systems. This work presents a FHIR-based transformation pipeline that leverages Resource Description Framework (RDF) to convert raw, conceptually heterogeneous oncology data into research-ready, semantically enriched datasets. By representing FHIR resources as RDF graphs, our approach enables semantic interoperability, enhances data linkage across heterogeneous sources, and supports automated reasoning through ontology-based queries and inference mechanisms. The pipeline employs a templated conversion strategy, allowing for the declarative definition of mappings that enable domain experts to focus on the data model. In Cancer Virtual Lab, we applied this methodology to a real-world oncology dataset comprising 36,335 anonymized patient records, successfully converting 1,093,705 clinical records into 1,151,559 distinct RDF-based FHIR resource types. The process incorporated syntactic and semantic validation, along with expert review, to ensure technical correctness and clinical relevance. Our results demonstrate the feasibility of semantically integrating oncology data using FHIR and RDF, fostering machine-readable, interoperable knowledge representation. This enriched representation supports data quality monitoring and improvement, data harmonization, longitudinal analysis, advanced analytics, and AI-driven decision support, promoting large-scale secondary use.
2025
197
Pt B
1
14
From raw data to research-ready: A FHIR-based transformation pipeline in a real-world oncology setting / Carbonaro, Antonella; Giorgetti, Luca; Ridolfi, Lorenzo; Pasolini, Roberto; Pagliarani, Andrea; Cavallucci, Martina; Andalò, Alice; Del Gaudio, Livia; De Angelis, Paolo; Vespignani, Roberto; Gentili, Nicola. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - 197:Pt B(2025), pp. 1-14. [10.1016/j.compbiomed.2025.111051]
Carbonaro, Antonella; Giorgetti, Luca; Ridolfi, Lorenzo; Pasolini, Roberto; Pagliarani, Andrea; Cavallucci, Martina; Andalò, Alice; Del Gaudio, Livia;...espandi
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0010482525014039-main.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Licenza: [IR] creative-commons
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1403729
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact