A fingerprint of a heterogeneous data set

In this paper, we describe the fingerprint method, a technique to classify bags of mixed-type measurements. The method was designed to solve a real-world industrial problem: classifying industrial plants (individuals at a higher level of organization) starting from the measurements collected from their production lines (individuals at a lower level of organization). In this specific application, the categorical information attached to the numerical measurements induced simple mixture-like structures on the global multivariate distributions associated with different classes. The fingerprint method is designed to compare the mixture components of a given test bag with the corresponding mixture components associated with the different classes, identifying the most similar generating distribution. When compared to other classification algorithms applied to several synthetic data sets and the original industrial data set, the proposed classifier showed remarkable improvements in performance.

A fingerprint of a heterogeneous data set / Spallanzani, M.; Mihaylov, G.; Prato, M.; Fontana, R.. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5355. - 16:3(2022), pp. 617-657. [10.1007/s11634-021-00452-9]

A fingerprint of a heterogeneous data set

M. Spallanzani;G. Mihaylov;M. Prato;R. Fontana

2022

Abstract

In this paper, we describe the fingerprint method, a technique to classify bags of mixed-type measurements. The method was designed to solve a real-world industrial problem: classifying industrial plants (individuals at a higher level of organization) starting from the measurements collected from their production lines (individuals at a lower level of organization). In this specific application, the categorical information attached to the numerical measurements induced simple mixture-like structures on the global multivariate distributions associated with different classes. The fingerprint method is designed to compare the mixture components of a given test bag with the corresponding mixture components associated with the different classes, identifying the most similar generating distribution. When compared to other classification algorithms applied to several synthetic data sets and the original industrial data set, the proposed classifier showed remarkable improvements in performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Data di prima pubblicazione
	
				3-lug-2021
			
	Rivista
	
				ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
			
	N° del Volume
	
				16
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				617
			
	Pagina finale
	
				657
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11634-021-00452-9
			
	Codice WoS
	
				WOS:000669186800001
			
	Codice Scopus
	
				2-s2.0-85109292522
			
	Citazione
	
				A fingerprint of a heterogeneous data set / Spallanzani, M.; Mihaylov, G.; Prato, M.; Fontana, R.. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5355. - 16:3(2022), pp. 617-657. [10.1007/s11634-021-00452-9]
			
	Tutti gli autori
	
						Spallanzani, M.; Mihaylov, G.; Prato, M.; Fontana, R.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
s11634-021-00452-9.pdf Open access Tipologia: VOR - Versione pubblicata dall'editore Dimensione 5.37 MB Formato Adobe PDF Visualizza/Apri	5.37 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1246415

Citazioni

ND

0

0

social impact