Automated quantification of defective maize kernels by means of multivariate 1 image analysis

9 This article describes the development of a fast and inexpensive method based on digital image 10 analysis for the automated quantification of the percentage of defective maize (%DM). Defective 11 kernels tend to foster high levels of mycotoxins like Deoxynivalenol (DON), which represents a 12 risk for the health of humans and of farm animals. In this work, 332 RGB images of 83 mixtures 13 containing different amounts of defective maize kernels were acquired using a digital camera. The 14 mixtures were also analysed with a commercial ELISA test kit to determine their concentration of 15 DON, that resulted highly correlated with the amount of defective kernels. Each image was then 16 converted into a signal, named colourgram , which codifies its colour-related information content. 17 The colourgrams were firstly explored using Principal Component Analysis. Then, calibration 18 models of the %DM values were developed using Partial Least Squares (PLS) and interval-PLS. 19 The best interval-PLS model allowed to predict the %DM values of external test set samples with a 20 root mean square error value equal to 2.6%. Based on the output of this model it was also possible 21 to highlight the defective-maize areas within the images, confirming the significance of the 22 proposed approach. 23


Introduction
The great importance of maize (Zea mays L.) is due to its primary role for multiple uses, including human food, livestock feed, biofuels and bioplastics (FAO, 2006).A current issue of high relevance related to the consumption of maize as food or feed is its possible contamination with mycotoxins.
Indeed, maize mycotoxins can be directly found in the human food or, as animal feed, they are ingested by animals and then pass to humans through the food chain.Due to their high toxicity, mycotoxins represent a major risk for human health.Their ingestion can lead to a wide range of effects, including deterioration of liver or kidney functions, skin necrosis, immunological disturbances, neurotoxicity and carcinogenicity (Steyn, 1995;Sweeney and Dobson, 1998;Edite Bezerra da Rocha et al., 2014).
Mycotoxins are secondary metabolites naturally produced by some filamentous fungi, which frequently develop in maize.The most common mycotoxins in maize are produced by fungi belonging to the genera of Fusarium, Aspergillus and Penicillium (Hossain and Goto, 2014).These microorganisms mainly develop in the field or at the post-harvest stage, when storage conditions are inadequate.The types and levels of contamination strongly depend on the contaminant fungi species, on the harvesting year and on environmental conditions such as temperature and humidity (Suleiman et al., 2013).One of the most common mycotoxins found in maize is deoxynivalenol (DON), also known as vomitoxin due to its strong emetic effects.DON is primarily produced by Fusarium graminearum and Fusarium culmorum, and is one of the most common mycotoxins found in maize (Kushiro, 2008;Sobrova et al., 2010;Edite Bezerra da Rocha et al., 2014).Because In order to ensure food safety, proper techniques to estimate the concentration of mycotoxins in maize have been developed, which are mainly based on chromatographic methods and on immunoassays (Maragos and Busman, 2010).These methods allow to gain high sensitivity and specificity, but present some drawbacks, mainly due to the relatively long times required for the analysis, to the costs and to the limited amount of analysed sample, which implies the risk of a poorly representative sampling of large maize batches.These aspects are particularly crucial during the transfer phase of the maize crops to the warehouse, when it is necessary to evaluate in very short times large amounts of product conferred by farmers, in order to fix the price and the final destination of each batch, or to reject it.
In this context, the availability of proper systems to perform a fast analysis of representative maize quantities might constitute a very useful tool, at least for a preliminary assessment in view of more refined analyses of the accepted batches by traditional wet chemistry methods.To this aim, digital image processing is suitable for screening heterogeneous food or feed matrices like maize, to detect local defects connected to fungal and toxin contaminations (Udomkun et al., 2017).Some authors have recently proposed the use of near infrared hyperspectral imaging (NIR-HSI) to detect maize kernels infected with fungi, and to estimate the degree of infection with a fast and accurate system (Del Fiore et al., 2010;Singh et al., 2012;Williams et al., 2012).Notwithstanding the great advantages offered by NIR-HSI, the efforts needed to efficiently extract useful information from the huge amount of hyperspectral data and the relatively high cost of hyperspectral cameras are still limiting factors for its widespread application in maize monitoring (Ferrari et al., 2013;Ulrici et al., 2013;Calvini et al., 2016).
For these reasons, much cheaper instrumentations based on the use of common digital cameras constitute an interesting alternative for the implementation of fast and non-destructive methods to monitor maize defects.In fact, although the lack of visible defects does not ensure the complete absence of mycotoxins, the presence of stained, dark or rotten maize kernels is generally correlated with the presence of fungal infections.In other words, the higher the amount of defective kernels, the higher the possibility of significant mycotoxins contamination.
In this context, the use of Multivariate Image Analysis (MIA) offers a wide range of effective tools to properly detect and quantify visible defects through RGB imaging.Essentially MIA consists in the development and application of various chemometric strategies for the analysis of multivariate images, consisting of a given number of picture elements (pixels), each one characterized by a series of spectral variables, or channels (Esbensen and Geladi, 1989;Geladi and Grahn, 1996;Prats-Montalbán et al., 2011;Duchesne et al., 2012;Reis, 2014).Many approaches have been proposed to characterise food samples based on MIA applied to RGB images, by using information in the original RGB colour space, in the latent variable space (e.g., using PCA), or in other colour spaces like Hue, Saturation, Intensity (HSI) (Yu et al., 2003;Pereira et al., 2009;Pierini et al., 2016).
In particular, many research works have been reported in the literature, where morphological, textural and colour features extracted from RGB images were used to develop automated systems for monitoring damaged and non-damaged kernels (Ruan et al., 1998;Choudhary et al., 2008).
Valeinte-Gonzàlez et al. ( 2014) developed an effective approach based on the combined use of computer vision and Principal Component Analysis (PCA) to identify the damaged regions of single maize kernels.However, in the perspective of an industrial application, the determination of the degree of defectiveness of a maize batch based on the investigation of single kernels would be too demanding in terms of time and computational effort.
In this context, this study was aimed at developing an automated system for a preliminary assessment of DON contamination, based on the simultaneous analysis of a dataset of RGB images of mixtures containing different percentages of defective maize (%DM).The correlation between the %DM values and the concentration of DON, estimated by means of a commercial ELISA test kit, was also investigated.Each image was converted into a one-dimensional signal, named colourgram, which codifies its colour-related information content (Antonelli et al., 2004;Lo Fiego et al., 2007;Foca et al., 2011;Ulrici et al., 2012).In turn, the colourgrams were used to develop calibration models to predict %DM, using Partial Least Squares (PLS) and the feature selection algorithm interval-Partial Least Squares (iPLS).Moreover, the reconstruction of the maize images considering the colour-related features selected by iPLS allowed to visualize the defective kernels and thus to evaluate in a critical manner the choices made automatically by the algorithm.

Maize samples
In the present study, two different types of maize kernels were considered: dry maize (13 % moisture) and wet maize (24 % moisture).For both the maize types, based on their visual aspect the kernels were manually separated into defective (stained, dark or rotten) and non-defective (uniform yellow pericarp) kernels (Nguyen V.H., 2013).After the separation between defective and nondefective kernels, the maize samples were sealed in plastic bags and stored in the dark at 4 °C for a maximum of two days before analysis.

Image acquisition
The RGB images were acquired using a Panasonic DMC-TZ25 digital camera, using a 24 mm focal length (in 35 mm equiv.),1/125 s shutter speed, ISO-100 and f/3.5.Before the acquisition sessions, white balance was set to a constant value by pointing the camera towards a white paper sheet, under the same lighting conditions used to capture sample images.The images, with 24-bit colour depth and spatial resolution equal to 4000 × 3000 pixels (corresponding to an image area approximately equal to 310 × 235 mm) were stored in JPEG format, with an average file size equal to 4.87 MB.
In order to have constant and homogeneous lighting conditions, the camera was mounted on a carton box (Figure 1) whose inner surface was covered with white paper sheets.The lighting system consisted in a strip of white light-emitting diode lamps (SMD 3528 LED 5V USB, colour temperature 6500 K) assembled on a metallic support and directed upwards at a 90 degree angle with the carton box wall.In this manner, the sample was only illuminated by diffused light to avoid the presence of undesired shadows or reflection effects.A white conveyor belt was used as background of the images.Furthermore, a colour reference was included in the image scene to correct possible variations in the lighting conditions.The colour reference, reported in Figure 2, consisted in a white paper sheet including a series of eight squares (size 1 cm 2 ) of different colours, i.e., white, black, the primary additive colours (red, green, blue) and the primary subtractive colours (cyan, magenta, yellow), that were obtained using a laser printer (HP LaserJet Pro 200 color MFP M276n).
The acquisition of the RGB images was performed in two subsequent steps.A schematic representation of the procedure followed for the preparation of the imaged samples is reported in Figure 3.
For wet maize, due to a smaller amount of available defective maize, only one mixture group (W1) was considered.Therefore, in the first step 26 samples of dry maize and 13 samples of wet maize were considered, for a total of 39 different samples.Each mixture consisted in a total amount of maize kernels equal to 150 g.
For each sample two images were acquired, shuffling the kernels before each acquisition.The same image acquisition procedure was repeated in a different day in order to check the day-to-day variability.All the samples were acquired in random order to minimise, as much as possible, the effects of uncontrollable factors.Therefore, on the whole, 156 images (= 39 samples × 2 repeated acquisitions × 2 measurement sessions) were obtained in this first step.
Simultaneously, for each mixture the concentration of DON was determined using a commercially available ELISA kit (see Section 2.3).The results of this analysis showed that mixtures with an amount of defective kernels equal to or greater than 10% presented high concentrations of DON.
For this reason, a second acquisition step was planned with the aim of collecting additional images with an amount of defective kernels in the range between 0 and 10%, in order to improve the performance of the calibration models in proximity to low %DM values.
Therefore, in the second acquisition step, additional mixtures were prepared, considering 11 levels corresponding to percentages by weight (w/w) of defective maize kernels ranging from 0% to 10%, with steps of 1%.In this case, for both dry and wet maize types two mixtures for each considered level were prepared, that were split in the D2a and D2b mixture groups for dry maize, and in the W2a and W2b mixture groups for wet maize.Therefore, in the second step a total of 44 samples were obtained, each one containing 150 g of kernels.
The experimental procedure followed for image acquisition was the same for the previous step, leading to 176 additional images (= 44 samples × 2 repeated acquisitions × 2 measurement sessions).The digital images acquired during the two acquisition steps were merged together to obtain a final dataset composed of 332 images.

Determination of deoxynivalenol
In order to verify the correlation between the %DM values and the concentration of DON, ELISA test was performed on the maize samples using AgraQuant ® DON 0.25/5.0 Assay (Romer Labs Inc., USA).
Before the first acquisition step, the concentration of DON was determined on the defective and non-defective kernels of wet and dry maize.In particular, for dry maize 10 aliquots of non-defective kernels and 10 aliquots of defective kernels were randomly collected, while for wet maize 5 aliquots were collected for each group of kernels.Each aliquot consisted in 20 g of maize kernels, which were ground and subjected to ELISA test following the standard procedure provided by the manufacturer.
Afterwards, simultaneously with image acquisition, the ELISA test was performed also on the mixtures imaged during both the acquisition steps.In this case, the entire amount of 150 g of kernels of each sample was grinded and analysed with AgraQuant kit.The quantification range of the AgraQuant DON assay is between 0.25 and 5.0 ppm; therefore, samples containing DON levels higher than 5 ppm were diluted with deionized water in order to fall within the quantification range.
The dilution was performed up to a maximum quantification value equal to 10 ppm.

Standardization and conversion to colourgrams
The key steps followed for the elaboration of the RGB images are summarized in Figure 2. Firstly, from each original image the two areas corresponding to the reference (coloured squares) and to the sample (maize kernels) were automatically selected and stored as separate images.The size of the obtained images was equal to 305 × 1714 pixels and to 2653 × 3733 pixels for reference images and for the sample images, respectively.Then, in order to minimize the effect of uncontrollable factors such as drifts in the acquisition system or variations of the illumination conditions, each sample image, S i , was standardized using the corresponding reference image, R i .To this aim, each reference image, was compared with the reference of the first captured image, that was defined as the master reference image, M R .In particular, for each channel c (equal to R, G or B), the difference between the mean value of all the pixels of R i and the corresponding mean value of all the pixels of M R was computed as follows: Then, this difference was used to calculate the corrected sample image, CS i : After image standardization, the sample images were converted into the corresponding colourgrams.Essentially, colourgrams are one-dimensional signals obtained by merging in sequence the frequency distribution curves of a series of colour-related parameters extracted from each RGB image, together with the loading vectors and the eigenvalues of PCA models calculated on the RGB data.In this manner, datasets of RGB images are converted into matrices of signals, each one acting like a fingerprint of the corresponding image and codifying its colour-related information content, while the spatial resolution is lost.The colourgrams matrix can be further analysed by means of suitable multivariate analysis techniques, allowing to evaluate all the acquired images together, i.e., to consider the colour-related information of the dataset as a whole.
For the conversion of images to colourgrams, the three-dimensional data array corresponding to each RGB image with size {2653 pixel rows × 3733 pixel columns × 3 R, G and B channels} was firstly unfolded into a two-dimensional matrix with size {9903649 rows (total number of pixels) × 3 columns (corresponding to the R, G and B channels)}.
Then, this matrix was expanded by adding a series of columns, corresponding to parameters calculated for each pixel starting from the R, G and B values: • Lightness (L), defined as the sum of the three channel values; • the relative colours (rR, rG and rB), defined as the ratio between each channel and L; • Hue (H), Saturation (S) and Intensity (I), obtained by converting the RGB data into the HSI colour space; • the nine score vectors obtained by calculating three PCA models on the raw, mean-centered and autoscaled RGB data (three principal components for each PCA model).
Then, for each one of the 19 columns of the resulting data matrix, the corresponding 256 pointslong frequency distribution curve was calculated.The 19 frequency distribution curves were joined in sequence to form a unique vector, at the end of which the loading vectors and the eigenvalues of the three PCA models were also added.
The so obtained signal with length equal to 4900 points (= 256 × 19 + 36) is the colourgram, which retains the colour-related information of the corresponding image.For a more detailed description of the algorithm used to create the colourgrams, the reader is referred to Antonelli et al. (2004).
In this work, the 332 digital images were converted into the corresponding colourgrams, thus obtaining a colourgrams matrix with size {332 rows × 4900 columns}.

Exploratory data analysis and calibration of the colourgrams matrix
In order to obtain an overview of the dataset structure and to identify possible outlier images, a first evaluation of the colourgrams matrix was made by PCA.Both mean-centering and autoscaling were considered as column preprocessing methods to calculate the PCA models, and the number of PCs was selected according to the analysis of the corresponding scree plots.
Subsequently, Partial Least Squares (PLS) regression was applied to the colourgrams matrix in order to calculate calibration models able to predict the %DM values of the imaged samples.To this aim, the 332 colourgrams were split into: • Also for the PLS models, both mean-centering and autoscaling were considered as column preprocessing methods.The performance of the PLS models was evaluated by means of the Root Mean Square Error (RMSE) and of the coefficient of determination (R 2 ) statistics, calculated on the calibration set (RMSEC, R 2 Cal ), in cross-validation (RMSECV, R 2 CV ) and in prediction of the test set (RMSEP, R 2 Pred ).The optimal number of Latent Variables (LVs) was chosen by minimizing the value of RMSECV.In particular, a custom cross-validation method was used, subdividing the samples in 4 deletion groups (D1a+D2a; D1b+D2b; W1+W2a; W2b, see Figure 3).
Generally the information contained in the colourgram is partially redundant, since the whole signal is calculated without choosing a priori some relevant variables on the basis of the specific problem at hand.The evaluation of the image dataset by PCA and PLS considering the whole colourgram can be therefore helpful to perform a global assessment of the sources of colour variability of the analysed samples.However, in order to better focus on the quantification of the %DM values, and to increase predictive performance and robustness of the calibration models, it is necessary to retain only the useful (defect-related) colour features by means of proper variable selection algorithms.
To this purpose, a wide choice of methods is available in the literature, such as interval Partial Least Squares (Norgaard et al., 2000;Foca et al., 2016), genetic algorithms (Leardi, 2000) and sparse methods (Rasmussen and Bro, 2012;Calvini et al., 2015).Furthermore, feature selection can be applied in conjunction with transform methods able to compress the useful information pieces into a limited number of relevant variables, such as the wavelet transform (Antonelli et al., 2004;Foca et al., 2011;Pereira et al., 2011;Ulrici et al., 2012).
In particular, in the present work the simple but effective interval Partial Least Squares (iPLS) method (Norgaard et al., 2000;Ferrari et al., 2013) has been applied to the colourgrams matrix.
Briefly, the iPLS algorithm starts by subdividing the whole signal in intervals of equal length, defined by the user.In the forward iPLS search strategy that has been used in this work, firstly local PLS models are calculated on each interval, to select the one leading to the minimum value of RMSECV.Then, local PLS models are calculated considering all the combinations of the selected interval together with each one of the other intervals, and the best two-intervals combination is selected again on the basis of the lowest RMSECV value.If the single-interval RMSECV value is lower than the two-intervals RMSECV value, only the first interval is selected.Otherwise, this iterative procedure is repeated by increasing each time the number of considered intervals, until no further decrease of the RMSECV value is achieved.
In this work forward iPLS was used considering six different interval size values (256, 128, 64, 32, 16 and 8 variables), and using both mean-centering and autoscaling as signal preprocessing methods.Finally, the best overall iPLS model was again selected on the basis of the lowest RMSECV value.

Image reconstruction using selected features
In addition to the parameters that are commonly used to evaluate the performance of the calibration models, the relevance of the best iPLS model results was also assessed by means of a specific algorithm that allows to represent the colourgram selected variables into the original image domain.
Firstly, the image reconstruction algorithm converts the indexes of the colourgram variables selected by iPLS into the corresponding colour property values.For example, if one of the selected regions is in the range from 300 to 319 colourgram units, this region corresponds to the green channel values from 43 to 62.Then, the original image is segmented according to the selected range: only the pixels with green values from 43 to 62 are kept, while the remaining ones are set equal to 0 for all the R, G and B channels.
For each colourgram selected region, the resulting reconstructed image is displayed.In this manner, although no spatial information about the original image is retained in the colourgram, it is however possible to localize the image areas corresponding to the features of interest.
The algorithms used for image correction, conversion in colourgrams and image reconstruction of the selected features were written in MATLAB language (ver.7.12, The Mathworks Inc., USA), while PCA, PLS and iPLS models were calculated using PLS_Toolbox (ver 7.5, Eigenvector Research Inc., USA).

ELISA analysis
As described in Section 2.3, the ELISA test was firstly performed on the groups of manually selected defective and non-defective maize kernels.Concerning the defective maize, the results of ELISA test showed that for both dry and wet types the concentration of DON was greater than the maximum quantification value, equal to 10 ppm.As regards the non-defective maize, the average concentrations of DON estimated by ELISA were equal to 2.00 ppm and to 1.66 ppm for dry and wet types, respectively.
Afterwards, the ELISA test was also performed on the mixtures of defective and non-defective maize kernels.In particular, the results of the analysis performed during the first acquisition step showed that mixtures with %DM values greater than or equal to 10 presented concentration values of DON exceeding 10 ppm.
This observation was confirmed by the ELISA test performed during the second acquisition step, where concentrations of DON below the 10 ppm threshold were observed for the mixtures with %DM values between 0 and 6, as reported in Figure 4.In this range, the concentration of DON was directly proportional to the %DM value.

Exploratory data analysis of the colourgram dataset
The colourgram dataset was initially investigated by means of PCA considering both autoscaling and mean-centering as signal preprocessing methods.The PCA model calculated on autoscaled colourgrams was found to have an optimal dimensionality equal to 3 PCs, accounting for about 65% of the total variance.The score plot of the first two PCs is reported in Figure 5a, where the samples are coloured according to the %DM values, and in Figure 5b, where the samples are coloured according to the maize type (dry and wet).Figure 5a shows that the samples are distributed along PC1 (40 % explained variance) according to the amount of defective kernels, while Figure 5b highlights that the two maize types are separated along PC2.Also PC3 (not shown) accounts for the separation between dry and wet maize.The separation between the two maize types observed along PC2 and PC3 can be ascribed to a generally more reddish colour of the nondefective wet maize with respect to the non-defective dry maize.
However, it must be underlined that the colourgrams variability due to difference between dry and wet maize is orthogonal to that ascribable to %DM.For this reason, the moisture content of maize should have a relatively limited influence on the development of calibration models for the prediction of the %DM values.
The same colourgram dataset was also investigated by means of PCA using mean-centering as signal preprocessing method.Two PCs were selected, accounting for about 70% of the total variance.Also in this case, the samples are distributed along PC1 according to the %DM values (Figure 5c), while the separation between dry and wet maize is not clearly visible (Figure 5d).The arch-shaped distribution of the samples in these PC1-PC2 score plots is due to the fact that PC2 essentially accounts for the sample heterogeneity: homogeneous samples (i.e., samples that are either almost completely non-defective or almost completely defective) are located at negative values of PC2, while heterogeneous samples (i.e., mixtures with significant percentages of both defective and non-defective maize kernels) have positive values of PC2.

PLS calibration models
The results of the PLS calibration models calculated on the whole colourgrams using both autoscaling and mean-centering as signal preprocessing methods are reported in the first two rows of Table 1.Both the PLS calibration models led to satisfactory results, with R 2 values always greater than 0.975.The best PLS calibration model, chosen on the basis of the lowest RMSECV value, was obtained using autoscaling and led to a RMSEP value equal to 3.1%.
Figure 6a shows the plot of the %DM values predicted with the best PLS model versus the experimental %DM values.It is possible to observe that, at low %DM values (from 0 to 10), the samples of dry maize are generally underestimated (samples mainly under the bisector), while the samples of wet maize are generally overestimated (samples mainly over the bisector).

iPLS calibration models
On the whole, 12 different iPLS calibration models were calculated, considering six different interval size values both for the mean-centered and for the autoscaled colourgrams.The last two rows of Table 1 report the results of the iPLS models showing the lowest RMSECV values for each preprocessing method.The two models led to almost identical results, both in terms of performance and as for the number of LVs.However, the iPLS model calculated on the autoscaled colourgrams can be considered as the best one since it is more parsimonious, including only 64 colourgram variables.
Compared with the corresponding PLS model calculated on the whole colourgram, the best iPLS model allowed to obtain only a slight reduction of the RMSEP value, from 3.1% to 3.0%.However, compared to Figure 6a, Figure 6b shows that variable selection allowed to drastically reduce the effect of the different maize types (dry/wet) on the estimate of the lowest %DM values (from 0 to 10).
Moreover, a detailed visual inspection of the images of the analysed samples revealed that the actual composition of the mixtures was slightly different from the supposed one, i.e., that the experimental %DM values were affected by a small experimental error.For example, some samples with a supposed %DM value equal to 0 (non-defective samples) actually showed the presence of some defective kernels (some of which are highlighted with black circles in Figure 7a).Similarly, some samples with a supposed %DM value equal to 100% (defective samples) showed the presence of some maize kernels without defects (black circles in Figure 7b), at least on the kernel side that was imaged.
Therefore, the RMSEP values of the PLS and iPLS models were at least partly affected by the presence of experimental error in the reference measurement values.Moreover, the presence of few defective kernels within the non-defective maize could explain the relatively high average value of DON found with the ELISA test in non-defective dry maize (2.00 ppm), which is slightly above the limit of 1.75 ppm set for unprocessed maize used in foodstuff by the European Parliament (Commission Regulation (EC) No 1126/2007).
Finally, it has to be underlined that the calibration models were calculated considering the four images acquired for each mixture as separate objects, in order to evaluate the reproducibility of the %DM estimated values.Therefore, the RMSE values reported in Table 1 include also the withinsample variability and represent an overestimate of the values that would be obtained in a real application of the models.Indeed, for an industrial application the estimate of the %DM values would be obtained as the average of the values predicted from multiple images acquired in sequence on different aliquots of the same sample.The application of the same approach to the test set objects, considering the average of the four %DM values predicted for each sample, led to a RMSEP value equal to 2.6% for the best iPLS model.Referring to the data reported in Figure 4, this RMSEP value corresponds approximately to a concentration of DON ranging between 3 and 5 ppm.
Although not comparable to the error of the reference analytical methods, this result suggests however the possibility to use RGB image analysis for a quick preliminary estimate of the degree of maize contamination by DON, allowing for example the separate storage of the maize batches depending on their %DM values, and/or to immediately reject those batches whose %DM value is excessively high.

Reconstruction of the selected features
In order to obtain more direct information about variable selection made by the iPLS algorithm, samples with different %DM values were randomly selected, and the corresponding images were used for the reconstruction and the visualization of the features selected by the best iPLS model.
First of all, the plot of the regression vector of the best iPLS model was analysed in order to identify the regions with regression coefficient values greater than zero, corresponding to variables that are directly proportional to the %DM values.As shown in Figure 8, these regions belong to the distribution curves of green, relative green, intensity andwith a smaller contributionlightness.
As an example, Figure 9 reports the reconstruction of the selected features of green (Fig. 9b), relative green (Fig. 9c) and intensity (Fig. 9d) of a portion of an image containing 50% of defective maize kernels, together with the original RGB image (Fig. 9a).The reconstruction of the image considering the selected features demonstrated that the colour-related parameters automatically selected by the algorithm were actually related to the presence of defective maize kernels.
Furthermore, the portion of the white background pixels that were reconstructed was negligible, therefore it did not interfere with the identification of defects in maize kernels.

Conclusions
In the present paper, an approach based on multivariate analysis of RGB images for the determination of the percentage of defective maize (%DM) has been presented, that could be used as a fast pre-screening of large maize batches for a preliminary estimate of the degree of maize contamination by DON.In fact, the analyses performed on the investigated samples with a commercially available ELISA test kit demonstrated that the %DM values and the concentration of DON are highly correlated with each other.
Through the automated selection of the colour features related to the presence of defective maize kernels, it was possible obtain a satisfactory prediction of the average %DM values of the test set samples (RMSEP = 2.6%).Interestingly, the best calibration model was scarcely affected by the marked colour differences between the two considered maize types (wet and dry).The robustness towards this source of variability can be reasonably ascribed to the fact that the colour features selected by the best calibration model were essentially related to the maize defective areas, as it was confirmed by the inspection of the reconstructed images.
These promising results demonstrated the possibility to develop a fast, cheap and non-destructive automated system for a preliminary screening of maize quality based on the presence of defective kernels.Indeed, based on the outcome of this research work, an industrial prototype is currently under development, which allows to automatically analyse 3 kg of maize in less than 1 min.
Compared with the commercially available ELISA test kits, which usually require 20 minutes to analyse 20 g maize samples, this system should allow to speed up the transfer phase from harvesting to the warehouse and to further increase the quality and safety level of the final product.
In view of industrial implementation, based on the experience gained in the present work, further improvements will be made.Firstly, defective and non-defective maize samples used for model calibration will be submitted to multiple selection steps, thus minimizing the contribution of human error in the definition of the reference mixture samples.Moreover, a wider dataset of images including a larger number of batches and of maize varieties will be acquired, in order to better estimate the effect of these sources of variability on the prediction error.Further improvements can be also reasonably gained by implementing a more refined image standardization procedure, in order to adjust the possible variations of the colour dynamic ranges.In addition to the first estimation of the degree of maize contamination by DON, the predicted %DM could also constitute an objective tool to quickly evaluate the maize batches, allowing, e.g., to define three quality categories that could be stored separately from each other: i) batches that could be potentially used in foodstuff, after proper evaluation by reference analytical methods; ii) batches that could be only used in animal feed materials, after proper evaluation by reference analytical methods; iii) batches that could not be used as food or feed.
Future developments could lead to automated systems for real-time evaluation of the maize quality of whole batches, also enabling the removal of defective kernels.Moreover, in an industrial application, the image reconstruction could also help to inspect outlier samples, to detect foreign particles or to highlight instrumental faults.

Captures to tables and figures
Table 1 -Results of the PLS models and of the two best iPLS models.
of the health hazards to humans and animals, the European Parliament has set a limit of 1750 ppb in the unprocessed maize used in foodstuff (Commission Regulation (EC) No 1126/2007), while in the animal feed materials the recommendations generally suggest to not exceed 8 ppm of DON (Commission Directive 2003/100/EC).

Figure 1 -
Figure 1 -Experimental setup used for acquisition of the sample images.

Figure 2 -
Figure 2 -Key steps followed for the elaboration of the RGB images.

Figure 3 -
Figure 3 -Diagram representing the different mixture samples considered in the two acquisition

Figure 4 -
Figure 4 -Concentration of DON measured by ELISA test vs. percentage of defective maize

Figure 5 -
Figure 5 -PC1 vs PC2 score plots of the PCA models calculated on autoscaled (a and b) and mean

Figure 6 -
Figure 6 -Results of the best PLS (a) and iPLS (b) calibration models calculated on the autoscaled

Figure 7 -
Figure 7 -RGB images of samples with experimental %DM values equal to 0 (a) and 100 (b).

Figure 8 -
Figure 8 -Regression coefficients of the best iPLS model.

Figure 9 -
Figure 9 -Original RGB image (a) of a sample with a %DM value equal to 50 and reconstructed

Table 1 -
Results of the PLS models and of the two best iPLS models.