A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model

Morlini, Isabella

doi:10.1007/s11634-011-0101-z

For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paperproposes a model-based clustering approach with mixed binary and continuous variables where each binaryattribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, andwhere the scores of the latent variables are estimated from the binary data. In economics, such variables arecalled utility functions and the assumption is that the binary attributes (the presence or the absence of a publicservice or utility) are determined by low and high values of these functions. In genetics, the latent responseis interpreted as the ‘liability’ to develop a qualitative trait or phenotype. The estimated scores of the latentvariables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture modelfor clustering, instead of using a mixture of discrete and continuous distributions. After describing the method,this paper presents the results of both simulated and real-case data and compares the performances of themultivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions.Results show that the former model outperforms the mixture model for variables with different scales, bothin terms of classification error rate and reproduction of the clusters means.

A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model / Morlini, I.. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:1(2012), pp. 5-28. [10.1007/s11634-011-0101-z]

A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model

MORLINI, Isabella

2012

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2012
			
	Rivista
	
				ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
			
	N° del Volume
	
				6
			
	Fascicolo
	
				1
			
	Pagina iniziale
	
				5
			
	Pagina finale
	
				28
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11634-011-0101-z
			
	Codice WoS
	
				WOS:000301000000003
			
	Codice Scopus
	
				2-s2.0-84857787698
			
	Citazione
	
				A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model / Morlini, I.. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:1(2012), pp. 5-28. [10.1007/s11634-011-0101-z]
			
	Tutti gli autori
	
						Morlini, Isabella
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Morlini Adac 2012.pdf Accesso riservato Tipologia: AAM - Versione dell'autore revisionata e accettata per la pubblicazione Licenza: [IR] closed Dimensione 714.42 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	714.42 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris