Mining of Biological Data II: Assessing Data Structure and Class Homogeneity by Cluster Analysis

Kamimura, R. T.; Bicciato, Silvio; Shimizu, H.; Alford, J.; Stephanopoulos, G. N.

doi:10.1006/mben.2000.0155

An important step in data analysis is class assignment which isusually done on the basis of a macroscopic phenotypic or bioprocesscharacteristic, such as high vs low growth, healthy vs diseased state,or high vs low productivity. Unfortunately, such an assignment maylump together samples, which when derived from a more detailedphenotypic or bioprocess description are dissimilar, giving rise tomodels of lower quality and predictive power. In this paper we pre-sent a clustering algorithm for data preprocessing which involves theidentification of fundamentally similar lots on the basis of the extentof similarity among the system variables. The algorithm combinesaspects of cluster analysis and principal component analysis byapplying agglomerative clustering methods to the first principalcomponent of the system data matrix. As part of a rational strategyfor developing empirical models, this technique selects lots (sam-ples) which are most appropriate for inclusion in a training set byanalyzing multivariate data homogeneity. Samples with similar datastructures are identified and grouped together into distinct clusters.This knowledge is used in the formation of potential training sets.Additionally, this technique can identify atypical lots, i.e., samplesthat are not simply outliers but exhibit the general properties of oneclass but have been given the assignment of the other. The method ispresented along with examples from its application to fermentationdata sets.

Mining of Biological Data II: Assessing Data Structure and Class Homogeneity by Cluster Analysis / Kamimura, R.T., Bicciato, S., Shimizu, H., Alford, J., Stephanopoulos, G.N.. - In: METABOLIC ENGINEERING. - ISSN 1096-7176. - STAMPA. - 2:3(2000), pp. 228-238. [10.1006/mben.2000.0155]

Mining of Biological Data II: Assessing Data Structure and Class Homogeneity by Cluster Analysis

KAMIMURA R. T.;BICCIATO, Silvio;SHIMIZU H.;ALFORD J.;STEPHANOPOULOS G. N.

2000

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2000
			
	Rivista
	
				METABOLIC ENGINEERING
			
	N° del Volume
	
				2
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				228
			
	Pagina finale
	
				238
			
	Codice DOI
	
				https://dx.doi.org/10.1006/mben.2000.0155
			
	Codice WoS
	
				WOS:000208079700008
			
	Codice Scopus
	
				2-s2.0-0033673313
			
	Codice PubMed
	
				11056065
			
	Citazione
	
				Mining of Biological Data II: Assessing Data Structure and Class Homogeneity by Cluster Analysis / Kamimura, R.T., Bicciato, S., Shimizu, H., Alford, J., Stephanopoulos, G.N.. - In: METABOLIC ENGINEERING. - ISSN 1096-7176. - STAMPA. - 2:3(2000), pp. 228-238. [10.1006/mben.2000.0155]
			
	Tutti gli autori
	
						Kamimura, R. T.; Bicciato, Silvio; Shimizu, H.; Alford, J.; Stephanopoulos, G. N.
					
	Tipologia
	
				Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Kamimura_MetabolicEng2.pdf Accesso riservato Tipologia: VOR - Versione pubblicata dall'editore Dimensione 264.35 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	264.35 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris