In this paper we propose a new index Z for measuring the dissimilaritybetween two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions.We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.
Dissimilarity and similarity measures for comparing dendrograms and their applications / Morlini, Isabella; S., Zani. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:2(2012), pp. 85-105. [10.1007/s11634-012-0106-2]
Dissimilarity and similarity measures for comparing dendrograms and their applications
MORLINI, Isabella;
2012
Abstract
In this paper we propose a new index Z for measuring the dissimilaritybetween two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions.We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.File | Dimensione | Formato | |
---|---|---|---|
Morlini & Zani Adac 2012.pdf
Accesso riservato
Tipologia:
Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
556.94 kB
Formato
Adobe PDF
|
556.94 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris