In this paper we propose a new index Z for measuring the dissimilaritybetween two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions.We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.

Dissimilarity and similarity measures for comparing dendrograms and their applications / Morlini, Isabella; S., Zani. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:2(2012), pp. 85-105. [10.1007/s11634-012-0106-2]

Dissimilarity and similarity measures for comparing dendrograms and their applications

MORLINI, Isabella;
2012

Abstract

In this paper we propose a new index Z for measuring the dissimilaritybetween two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions.We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.
2012
6
2
85
105
Dissimilarity and similarity measures for comparing dendrograms and their applications / Morlini, Isabella; S., Zani. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - STAMPA. - 6:2(2012), pp. 85-105. [10.1007/s11634-012-0106-2]
Morlini, Isabella; S., Zani
File in questo prodotto:
File Dimensione Formato  
Morlini & Zani Adac 2012.pdf

Accesso riservato

Tipologia: Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione 556.94 kB
Formato Adobe PDF
556.94 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/730662
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact