For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paperproposes a model-based clustering approach with mixed binary and continuous variables where each binaryattribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, andwhere the scores of the latent variables are estimated from the binary data. In economics, such variables arecalled utility functions and the assumption is that the binary attributes (the presence or the absence of a publicservice or utility) are determined by low and high values of these functions. In genetics, the latent responseis interpreted as the ‘liability’ to develop a qualitative trait or phenotype. The estimated scores of the latentvariables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture modelfor clustering, instead of using a mixture of discrete and continuous distributions. After describing the method,this paper presents the results of both simulated and real-case data and compares the performances of themultivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions.Results show that the former model outperforms the mixture model for variables with different scales, bothin terms of classification error rate and reproduction of the clusters means.

A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model / Morlini, Isabella. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:1(2012), pp. 5-28. [10.1007/s11634-011-0101-z]

A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model

MORLINI, Isabella
2012

Abstract

For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paperproposes a model-based clustering approach with mixed binary and continuous variables where each binaryattribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, andwhere the scores of the latent variables are estimated from the binary data. In economics, such variables arecalled utility functions and the assumption is that the binary attributes (the presence or the absence of a publicservice or utility) are determined by low and high values of these functions. In genetics, the latent responseis interpreted as the ‘liability’ to develop a qualitative trait or phenotype. The estimated scores of the latentvariables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture modelfor clustering, instead of using a mixture of discrete and continuous distributions. After describing the method,this paper presents the results of both simulated and real-case data and compares the performances of themultivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions.Results show that the former model outperforms the mixture model for variables with different scales, bothin terms of classification error rate and reproduction of the clusters means.
2012
6
1
5
28
A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model / Morlini, Isabella. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:1(2012), pp. 5-28. [10.1007/s11634-011-0101-z]
Morlini, Isabella
File in questo prodotto:
File Dimensione Formato  
Morlini Adac 2012.pdf

Accesso riservato

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 714.42 kB
Formato Adobe PDF
714.42 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/686655
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 17
social impact