Mining of Biological Data I: Identifying Discriminating Features via Mean Hypothesis Testing

Kamimura, R. T.; Bicciato, Silvio; Shimizu, H.; Alford, J.; Stephanopoulos, G. N.

doi:10.1006/mben.2000.0154

Large volumes of data are routinely collected during bioprocessoperations and, more recently, in basic biological research usinggenomics-based technologies. While these data often lack sufficientdetail to be used for mechanism identification, it is possible that theunderlying mechanisms affecting cell phenotype or process outcomeare reflected as specific patterns in the overall or temporal sensorlogs. This raises the possibility of identifying outcome-specificfingerprints that can be used for process or phenotype classificationand the identification of discriminating characteristics, such asspecific genes or process variables. The aim of this work is to providea systematic approach to identifying and modeling patterns inhistorical records and using this information for process classifica-tion. This approach differs from others in that emphasis is placed onanalyzing the data structure first and thereby extracting potentiallyrelevant features prior to model creation. The initial step in this over-all approach is to first identify the discriminating features of the rele-vant measurements and time windows, which can then be subse-quently used to discriminate among different classes of processbehavior. This is achieved via a mean hypothesis testing algorithm.Next, the homogeneity of the multivariate data in each class isexplored via a novel cluster analysis technique called PC1 TimeSeries Clustering to ensure that the data subsets used accuratelyreflect the variability displayed in the historical records. This will bethe topic of the second paper in this series. We present here themethod for identifying discriminating features in data via meanhypothesis testing along with results from the analysis of case studiesfrom industrial fermentations.

Mining of Biological Data I: Identifying Discriminating Features via Mean Hypothesis Testing / Kamimura, R. T.; Bicciato, Silvio; Shimizu, H.; Alford, J.; Stephanopoulos, G. N.. - In: METABOLIC ENGINEERING. - ISSN 1096-7176. - STAMPA. - 2:3(2000), pp. 218-227. [10.1006/mben.2000.0154]