Using sampling experiments, we found that, when there are fewer groups than variables, between-groups PCA (bgPCA) may suggest surprisingly distinct differences among groups for data in which none exist. While apparently not noticed before, the reasons for this problem are easy to understand. A bgPCA captures the g − 1 dimensions of variation among the g group means, but only a fraction of the ∑ ni- g dimensions of within-group variation (ni are the sample sizes), when the number of variables, p, is greater than g − 1. This introduces a distortion in the appearance of the bgPCA plots because the within-group variation will be underrepresented, unless the variables are sufficiently correlated so that the total variation can be accounted for with just g − 1 dimensions. The effect is most obvious when sample sizes are small relative to the number of variables, because smaller samples spread out less, but the distortion is present even for large samples. Strong covariance among variables largely reduces the magnitude of the problem, because it effectively reduces the dimensionality of the data and thus enables a larger proportion of the within-group variation to be accounted for within the g − 1-dimensional space of a bgPCA. The distortion will still be relevant though its strength will vary from case to case depending on the structure of the data (p, g, covariances etc.). These are important problems for a method mainly designed for the analysis of variation among groups when there are very large numbers of variables and relatively small samples. In such cases, users are likely to conclude that the groups they are comparing are much more distinct than they really are. Having many variables but just small sample sizes is a common problem in fields ranging from morphometrics (as in our examples) to molecular analyses.

Seeing Distinct Groups Where There are None: Spurious Patterns from Between-Group PCA / Cardini, Andrea; O’Higgins, Paul; Rohlf, F. James. - In: EVOLUTIONARY BIOLOGY. - ISSN 0071-3260. - 46:4(2019), pp. 303-316. [10.1007/s11692-019-09487-5]

Seeing Distinct Groups Where There are None: Spurious Patterns from Between-Group PCA

Cardini, Andrea;
2019

Abstract

Using sampling experiments, we found that, when there are fewer groups than variables, between-groups PCA (bgPCA) may suggest surprisingly distinct differences among groups for data in which none exist. While apparently not noticed before, the reasons for this problem are easy to understand. A bgPCA captures the g − 1 dimensions of variation among the g group means, but only a fraction of the ∑ ni- g dimensions of within-group variation (ni are the sample sizes), when the number of variables, p, is greater than g − 1. This introduces a distortion in the appearance of the bgPCA plots because the within-group variation will be underrepresented, unless the variables are sufficiently correlated so that the total variation can be accounted for with just g − 1 dimensions. The effect is most obvious when sample sizes are small relative to the number of variables, because smaller samples spread out less, but the distortion is present even for large samples. Strong covariance among variables largely reduces the magnitude of the problem, because it effectively reduces the dimensionality of the data and thus enables a larger proportion of the within-group variation to be accounted for within the g − 1-dimensional space of a bgPCA. The distortion will still be relevant though its strength will vary from case to case depending on the structure of the data (p, g, covariances etc.). These are important problems for a method mainly designed for the analysis of variation among groups when there are very large numbers of variables and relatively small samples. In such cases, users are likely to conclude that the groups they are comparing are much more distinct than they really are. Having many variables but just small sample sizes is a common problem in fields ranging from morphometrics (as in our examples) to molecular analyses.
2019
46
4
303
316
Seeing Distinct Groups Where There are None: Spurious Patterns from Between-Group PCA / Cardini, Andrea; O’Higgins, Paul; Rohlf, F. James. - In: EVOLUTIONARY BIOLOGY. - ISSN 0071-3260. - 46:4(2019), pp. 303-316. [10.1007/s11692-019-09487-5]
Cardini, Andrea; O’Higgins, Paul; Rohlf, F. James
File in questo prodotto:
File Dimensione Formato  
2019 cardini et al bgPCA.pdf

Accesso riservato

Tipologia: Versione pubblicata dall'editore
Dimensione 2.94 MB
Formato Adobe PDF
2.94 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1195644
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 77
  • ???jsp.display-item.citation.isi??? 76
social impact