Data preparation is crucial for achieving good data management following the four foundational FAIR principles-Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This article examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality tradeoffs.

Sustainable Quality in Data Preparation / Pernici, B.; Cappiello, C.; Bono, C. A.; Sancricca, C.; Catarci, T.; Angelini, M.; Filosa, M.; Palmonari, M.; De Paoli, F.; Bergamaschi, S.; Simonini, G.; Mozzillo, A.; Zecchini, L.. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025), pp. 1-33. [10.1145/3769120]

Sustainable Quality in Data Preparation

Filosa M.;Bergamaschi S.;Simonini G.;Mozzillo A.;Zecchini L.
2025

Abstract

Data preparation is crucial for achieving good data management following the four foundational FAIR principles-Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This article examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality tradeoffs.
2025
17
4
1
33
Sustainable Quality in Data Preparation / Pernici, B.; Cappiello, C.; Bono, C. A.; Sancricca, C.; Catarci, T.; Angelini, M.; Filosa, M.; Palmonari, M.; De Paoli, F.; Bergamaschi, S.; Simonini, G.; Mozzillo, A.; Zecchini, L.. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 17:4(2025), pp. 1-33. [10.1145/3769120]
Pernici, B.; Cappiello, C.; Bono, C. A.; Sancricca, C.; Catarci, T.; Angelini, M.; Filosa, M.; Palmonari, M.; De Paoli, F.; Bergamaschi, S.; Simonini,...espandi
File in questo prodotto:
File Dimensione Formato  
3769120.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Licenza: [IR] creative-commons
Dimensione 568.73 kB
Formato Adobe PDF
568.73 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1399868
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact