An important step in data analysis is class assignment which isusually done on the basis of a macroscopic phenotypic or bioprocesscharacteristic, such as high vs low growth, healthy vs diseased state,or high vs low productivity. Unfortunately, such an assignment maylump together samples, which when derived from a more detailedphenotypic or bioprocess description are dissimilar, giving rise tomodels of lower quality and predictive power. In this paper we pre-sent a clustering algorithm for data preprocessing which involves theidentification of fundamentally similar lots on the basis of the extentof similarity among the system variables. The algorithm combinesaspects of cluster analysis and principal component analysis byapplying agglomerative clustering methods to the first principalcomponent of the system data matrix. As part of a rational strategyfor developing empirical models, this technique selects lots (sam-ples) which are most appropriate for inclusion in a training set byanalyzing multivariate data homogeneity. Samples with similar datastructures are identified and grouped together into distinct clusters.This knowledge is used in the formation of potential training sets.Additionally, this technique can identify atypical lots, i.e., samplesthat are not simply outliers but exhibit the general properties of oneclass but have been given the assignment of the other. The method ispresented along with examples from its application to fermentationdata sets.
|Data di pubblicazione:||2000|
|Titolo:||Mining of Biological Data II: Assessing Data Structure and Class Homogeneity by Cluster Analysis|
|Autori:||KAMIMURA R.T.; S. BICCIATO; SHIMIZU H.; ALFORD J.; STEPHANOPOULOS G.N.|
|Digital Object Identifier (DOI):||10.1006/mben.2000.0155|
|Appare nelle tipologie:||Articolo su rivista|
I documenti presenti in Iris Unimore sono rilasciati con licenza Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia, salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris