Assessing feature relevance in NPLS models by VIP Chemometrics and Intelligent Laboratory Systems

(NPLS) version (NPLS-DA)are very model multi-way regression cients data feature correlation 23 covariance structure we propose an extension of the Variable Importance 24 in Projection (VIP) parameter to multi-way arrays in order to the most relevant features to predict the studied dependent properties either for interpretative purposes or to operate feature selection. VIPs implemented for each mode of the data array and the case of multivariate dependent responses con-27 sidering both the cases expressing VIP respect single y-variable account y-variables real the properties of bread loaves from near infrared spectra of dough, at different leavening times, and corresponding to different ﬂ our formulations. VIP values used to assess the spectral regions mainly involved in deter- mining our performance; assessing the authenticity of extra virgin olive oils by NPLS-DA elaboration of gas chromatography/mass spectrometry data (GC – MS). VIP values used to assess both GC and MS dis- 34 criminant features; iii) NPLS analysis of a fMRI-BOLD experiment based on a pain paradigm of acute 35 prolonged pain in healthy volunteers, in order to reproduce ef ﬁ ciently the corresponding psychophysical 36 pain pro ﬁ les. VIP values were used to identify the brain regions mainly involved in determining the pain in-37 tensity pro ﬁ le.

Multilinear PLS (NPLS) and its discriminant version (NPLS-DA) are 45 very diffuse tools to model multi-way data arrays. NPLS represents 46 the multi-way extension of two-way partial least squares regression 47 (PLS) for multi-way data and was first developed by Bro in 1996 [1] 48 and successively by Bro, Smilde and De Jong [2][3][4]. 49 It has been demonstrated that multi-way data analysis tools, tak-50 ing into account the multi-way structure of data are much more effi-51 cient compared to unfolding procedures, that is re-arranging the 52 multi-way data into a two-way matrix structure and then applying 53 bilinear models. Multi-way analysis allows simplifying the interpreta-54 tion of the results and providing more adequate and robust models 55 using relatively few parameters [5,6]. While this is true in general, it 56 is worth noticing that when dealing with real-time monitoring, e.g. both X and Y-blocks. In the following, the method is described consider-120 ing the case of a three-way data array, X, but the extension to further di-121 mensions can be simply deduced. As for the dependent variables 122 Y-block, we will describe here the case of a two-way matrix, but the 123 method can be easily extended to higher orders in the Y-block [1]. 124 Specifically, PLS regression aims to find a relationship between a set 125 of predictor (independent) data, X, and a set of responses (dependent), 126 Y. In the more general case, the arrays of independent, X and dependent 127 Y variables are decomposed in such a way that the score vectors from 128 these models have pair-wise maximal covariance [3,4]. Multilinear 129 PLS was firstly developed as a PARAFAC-like model of X and it was 130 shown that the method could be easily extended to any desired order 131 for both X and Y arrays. This method was further elaborated and lastly 132 improved with respect to residual analyses by introducing a core 133 array in the model of X [2]. 134 Considering an X array of dimension I × J × K, the NPLS model is 135 obtained by modeling X as in Tucker3 decomposition: 136 137 where X is the X array unfolded to an I × JK matrix, T holds the first mode 138 scores (sample mode), W J and W K are the second and the third mode 139 weights, respectively. The symbol ⊗ denotes the Kronecker product [5].
140 G X is the matricized core array of size F × F × F where F is the 141 number of NPLS components (factors) and it is defined by: 142 143

144
Here the superscript '+' means that the Moore-Penrose is pseudo 145 inverse.

146
In the case of a two-way data matrix, Y I,M is defined by: By regressing the data onto their weights vectors, a score vector is 155 found in the X-space providing a least squares model of the X data.

156
Furthermore, by choosing the weights such that the covariance be-157 tween X and Y is maximized a predictive model is obtained as: 158 159

160
Regression coefficients that apply directly to X(I × JK) may also be 161 derived [4,22]: 162 163 164 165 In the case of mono-dimensional y I × 1 holds: where T is the X score matrix and b the PLS inner relation coefficients.

189
Thus a VIP value for each variable is computed in order to quantify 190 its importance by using the PLS weight w jf weighted by how much of 191 y is explained in each model dimension (latent variable).

192
VIP formulation as originally proposed [17] is intended to be a pa- discussed but feature selection is not accomplished. In the cases of 202 marker identification and variable selection, resampling methods 203 such as bootstrap are more appropriate [19,20]  In the case of a two-dimensional Y(I × M) the previous relation ap- 207 plies to each mode, e.g. in the case of a three-way array X(I × J × K): 214 215 and each y-variable y m :  The variable importance is then normalized so that VIP 2 equals the 224 number of the variables.

233
Extension to the other Y modes can be easily obtained.    The data set [25] consists of a set of extra virgin olive oil (EVOO), be-       Fig. 3).

326
The X and Y data were not centered or scaled within any mode.   In particular, Y loading plot (Fig. 4, left plot) shows that the first NPLS   The regression coefficient maps are not so easily interpretable, see       ten networks take place [24].

402
The combination of multi-way methods applied to NIR spectra is 403 here useful to supervise changes of the system according to the

493
The second factor can be considered as a component that takes into 494 account a sort of "prolonged activation due to tonic pain input" (see 495 Fig. 15, in gray dashed) and it is particularly dedicated to describe vol-496 unteer #2 with its ample bell-shape of the pain perceived. In Fig. 16 497 the comparison between the pain profiles of volunteer #2 and the refer-498 ence volunteer #3 is shown. High weight w k2 value (see Fig. 14) for this 499 volunteer seems to be only related to its particular behavior that is also 500 responsible for the separation of the ROI regions in two groups with re-501 spect to the weight values on the second mode (Fig. 13).

502
Thus, discussion of the weights plot is useful to recover the detailed 503 information on specific subject behavior while the overall VIP values 504 point to the most relevant ROIs whose activation is involved in pain 505 perception and that are thus capable of reproducing the Y-psycho 506 responses.

508
The recent developments in feature selection methods for two-way 509 data have addressed the problem of increasing the performance of re-510 gression models, such as PLS. Complex filter, wrapper or embedded 511 methods [11,20] improve predictor performance compared to simpler 512 variable ranking methods, but the improvements are not always signifi-  purposes where the interest is mainly focused on assessing the X-part co-538 varying with Y only, other methodologies may represent a valuable solu-539 tion, such as OPLS [28] and Selectivity Ratio metric [21]. 540 In the studied cases, VIPs provided an easier and complementary 541 way to interpret the variable relevance in NPLS models, especially 542 when examination of regression coefficients was not so straightfor-543 ward due to the unreadable complex patterns associated [27], as in 544 the case of spectral data (as for the Bread data set) and moreover 545 with two signal dimensions as in the case of hyphenated analytical 546 techniques. In the EVOO application, which is an example of chroma-547 tography/mass spectrometry data, the joint information from the VIPs 548 on the retention time and m/z directions allows discussion in chemi-549 cal terms of the most discriminant features.