Electronic eye for the prediction of parameters related to grape ripening

An electronic eye (EE) for fast and easy evaluation of grape phenolic ripening has been developed. For this purpose, berries of different grape varieties were collected at different harvest times from veraison to maturity, then an amount of the derived must was deposited on a white sheet of absorbent paper to obtain a sort of paper chromatography. Thus, RGB images of the must spots were collected using a flatbed scanner and converted into one-dimensional signals, named colourgrams , which codify the colour properties of the images. The dataset of colourgrams was used to build calibration models to relate the colour of the images with the phenolic composition of the samples determined by reference analytical methods and therefore to follow the ripening trend. Satisfactory calibration models were obtained for the prediction of the most important parameters related to phenolic ripening of grapes, such as colour index, tonality, total anthocyanins content, malvidin-3-O-glucoside and petunidin-3-O-glucoside


Introduction
The maturity level of grapes (Vitis Vinifera) at the harvest is the first factor that influences the quality of the resulting wine [1].Among the various features, sugar content, pH and acidity levels are the parameters more frequently used to monitor the maturity level of grapes [2].However, also the phenolic composition of grapes plays an important role on the development of several sensorial attributes of wine, such as colour, body, structure, bitterness and astringency [3].
The determination of the parameters related to phenolic ripening is performed by means of classical analytical procedures, mainly based on UV-Vis spectrophotometry and high-performance liquid chromatography [4][5][6].These methods are very accurate but require expensive instrumentations and the involvement of skilled personnel.Alternatively, sensory analysis is frequently used to provide a description of colour, aroma, flavour and texture of the grapes, giving a global characterisation of the maturity level [7].However, the sensory evaluation requires a lot of training and experience to minimize the inherent subjectivity and the variability of tasters [8].
For these reasons, nowadays increased efforts are devoted to develop easy-to-use, inexpensive and objective methods based on artificial sensors [9], known as electronic nose (EN) [10,11], electronic tongue (ET) [12,13] and electronic eye (EE) [14,15].These devices are generally used to analyse proper chemometric techniques that, following a blind-analysis approach, extract the sought information, such as the amount of specific analytes or the sensory features responsible of smell, taste and colour of the sample.
In a recently published paper, some of us have presented an ET aimed at monitoring grape ripening [16].In that work it was shown that, by proper fusion of the data measured with two voltammetric sensors, it was possible to quantify the parameters related to the technological maturity of grapes, i.e., pH, total acidity and sugar content, in addition to the anthocyanins content.In the present work, which was conducted on the same grape samples and can be considered as a follow up from our previous investigation, we propose the development of an EE sensing system for the prediction of colour-related parameters, principally suitable to monitor the phenolic maturity.
To this aim, purple grape samples were collected at different harvest times from veraison to maturity, then a drop of the derived must was deposited on a paper sheet to obtain a sort of paper chromatography.The spots of must were imaged by means of a flatbed scanner, and the resulting RGB images were analysed using multivariate methods, in order to use the colour-related information content of the images to predict the phenolic composition of the samples, and therefore to follow the phenolic ripening trend.
To date, few research studies reported the use of image analysis for the evaluation of colour characteristics and colour changes of grape berries during ripening.For instance, in [17,18] RGB images of grape berries have been categorised into different classes according to ripening.More in detail, considering the histograms of colour-related parameters such as hue or brightness, proper threshold values have been identified to distinguish grapes into different clusters according to their colour development (e.g., pre-veraison and post-veraison).In other research papers, the digital images of grape seeds have been used to investigate CIELAB parameters and morphological features, such as area, aspect, roundness, length, width and heterogeneity, in order to predict their maturity stage [19,20].Notwithstanding the satisfactory results obtained in these studies, the proposed methods were limited to the identification of correlations between colour and/or morphological features and the maturity level of grapes, which was determined based on harvest time or on visual inspection by expert assessors.
In the present work, we followed an alternative approach based on the use of RGB images for the quantitative prediction of specific analytical parameters related to the colour properties and the phenolic composition of the grape samples, which could allow to obtain a more detailed and comprehensive overview of the maturity level.
Basically, our method consists in converting each image into a one-dimensional signal, named colourgram, which codifies all the colour-related information and represents a sort of fingerprint of the corresponding RGB image [21].The main advantage of this approach is that the whole dataset of colourgrams can be analysed in the same manner as any other dataset of signals.For instance, exploratory analysis tools like Principal Component Analysis (PCA) can be used to highlight the presence of trends, of clusters or of outlier images, whereas multivariate calibration or classification methods can be used to predict the value of specific parameters or to assign a sample to a specific class, based on its colour-related characteristics.
The colourgram approach has been successfully applied for the automated solution of several colour-related issues concerning food industry, among which the quantification of defective maize kernels, related to the presence of mycotoxins [22], the detection of red skin defect of raw hams [23], the quantification of Lactobacillus in fermented milk [24], the prediction of the compositional and sensory characteristics of pesto sauce [25] and the classification of different pesto brands [21].
In this study, after an exploratory analysis of the colourgrams matrix by PCA, multivariate calibration models were developed using a feature selection/calibration algorithm, namely interval-Partial Least Squares (iPLS) [26].The purpose was to define the correlation between the images of must spots and a series of twelve parameters related to the phenolic composition of the samples.
The use of iPLS was particularly profitable for identifying the colourgram features related to the changes in colour and phenolic composition of must samples during the grape berries ripening.
Moreover, by using a proper algorithm [23] the selected features were displayed in the original image domain, allowing to evaluate the relevant colour features automatically selected by iPLS.

Samples
In this study, three Italian purple grape varieties were considered: Ancellotta (A), Lambrusco Marani (L) and Malbo Gentile (M).The samples were collected in Reggio Emilia (Italy) during vintage 2015.
For each variety, grape sampling was conducted on three grapevines (field replicates) in order to account for the vineyard variability.More in detail, about 100 individual grape berries per grapevine were randomly gathered for each one of 5 subsequent harvest times (T0, T1, T2, T3, T4) at about 10-day intervals, starting from veraison and ending at harvest of the mature grapes.
Therefore, 45 grape samples were collected on the whole, resulting from (3 grape varieties × 3 field replicates × 5 harvest times) and were immediately carried to the laboratory under refrigerated conditions.Each grape sample was crushed into a falcon tube under nitrogen atmosphere to prevent the oxidation of phenolic compounds.The crushed berries were left to macerate for 60 minutes at 4°C in the dark, then centrifugation at 4000 rpm for 15 minutes was performed (refrigerated centrifuge 4237R, ALC, Cologno Monzese, MI).The supernatant from here onwards was divided into two different aliquots to perform replicate determinations.
Each aliquot was stored at -20 °C and unfrozen just before spectrophotometric and chromatographic analyses, that were performed in parallel with image acquisition.All the analyses were replicated twice for each must sample: in the first measurement session the acquisition order was randomized, and the first aliquot of each sample was analysed.Then, the order was shuffled again and the second aliquot of each sample was analysed.The overall number of analyses was therefore equal to 90 (3 grape varieties × 5 harvest times × 3 field replicates × 2 analytical replicates).

Determination of parameters related to phenolic ripeness
The following twelve parameters were measured by means of spectrophotometric and chromatographic assays: -total flavonoids content (TF) was determined by UV spectroscopy as reported in the literature [27]: after a proper dilution, the absorbance of the sample was measured at 280 nm and TF was expressed as mg of (+) catechin/L.All the UV-Vis measurements were performed by means of a Perkin Elmer Lambda 650 spectrophotometer using a 10 mm quartz cuvette as sample holder.Before spectrophotometric analysis the samples were diluted 50 times in hydrochloric acid-ethanol solution (ethanol:H2O:HCl 70:30:1 v/v/v); -total anthocyanins content (TAnt) was determined by UV-Vis spectroscopy according to the method described in [28]: the absorbance value was measured at 540 nm and TAnt was expressed as mg of oenin chloride/L; -colour index (CI) was calculated as the sum of the absorbance values measured at 420 nm (corresponding to a yellow-orange sample colour), at 520 nm (corresponding to a red-purple sample colour) and at 620 nm (corresponding to a blue sample colour) [29].In oenology, CI is used to evaluate the colour of red wines: if a wine needs a colour correction it is blended with a different wine up to the desired CI value; -optical density values (OD420%, OD520%, OD620%) defined as the percentage contribution of each absorbance, at 420, 520 and 620 nm, to the colour index [30]; -tonality (Ton) was calculated as the ratio between the absorbance values measured at 420 nm and at 520 nm.In oenology, Ton is a parameter frequently used to assess the oxidation of wine during aging [30].In this case, Ton is used as a parameter suitable to describe the colour variation which occurs during grape ripening; -the five major anthocyanins in the form of 3-O-monoglucoside, i.e. malvidin (Mv-3-glc), petunidin (Pt-3-glc), peonidin (Pn-3-glc), delphinidin (Df-3-glc) and cyanidin (Cn-3-glc), were separated and quantified by reverse phase-high performance liquid chromatography with a diode array detector (RP-HPLC-DAD) following the chromatographic method described in [31] and adjusted to our equipment, as reported by Vasile Simone and coauthors [32].The concentration of the anthocyanins was determined by measuring the absorbance at 520 nm by Total-Chrom Workstation version 6.2.1 chromatography system software (PerkinElmer, Inc.), and was expressed as malvidin-3-O-glucoside equivalents.

Image acquisition
From each aliquot of must sample obtained as described in Section 2.1, 50 µl drops of must were deposited on A4-sized sheets of white absorbent paper.For the deposition of the must drops, a precise scheme was used: 8 drops of each must sample were put on each sheet of absorbent paper following a 4×4 chessboard scheme, alternated to 8 drops of another randomly chosen sample.
Therefore, each absorbent paper sheet contained 16 spots of two samples, i.e., 8 spots per sample (as an example, see the image on the left side of Figure 1).
Several paper types were tested to choose the most proper one to be used as a substratum for spot deposition.Among them, it was found that the absorbent and coated paper (weight: 120 g/m 2 , thickness: 0.24 mm) gave more homogeneous spots due to the presence of two layers: an upper one, made of cellulose with a high liquid absorption capability, and a lower one, made of polyethylene with a high water resistance that allowed to prevent paper wrinkling.
RGB images of the paper sheets were acquired immediately after drops deposition using a CanoScan Lide 220 scanner and saved in JPEG format with a spatial resolution of 4960 × 7015 pixels and an average file size equal to 2 MB.The image scene included also a set of standard colour references used for the subsequent image standardization.

Image elaboration
The key steps followed for the elaboration of the RGB images are illustrated in the right side of Figure 1 and can be summarized as follows:

Internal calibration
The RGB images were firstly standardized by means of an internal calibration procedure, in order to reduce possible variability among images over time, mainly due to instrumental instability.The correction procedure reported here is derived from the approach already developed by some of us for the elaboration of hyperspectral images [33].For each image, eight coloured squares of the standard colour references were considered as regions of interest (ROIs), and a random sampling procedure was used to divide the pixels of each ROI into three groups, in order to account for possible spatial variability of the coloured squares.For each pixel group, the corresponding median values of the R, G and B channels were calculated, then they were stored in a matrix with size equal to {24, 3}, i.e., including the 24 median values (= 3 pixel groups × 8 ROIs) for each one of the 3 channels.
The internal calibration step was carried out by computing, for each channel c, a regression model between the median vector extracted from the first acquired image, that was considered as the master image (M), and the median vector of each one of the remainder S).To select the most appropriate order of the regression model, the values of the first-and second-order regression coefficients were estimated, together with the corresponding statistical significances.
Based on the results, for each slave image the quadratic models were finally considered, which can be expressed as: where b 0 , b 1 and b 2 are the regression coefficients calculated for the channel c of the i-th slave image, S i .
Then, the regression coefficients calculated using the standard colour references were used to standardize each pixel of the corresponding image of the must spots, according to the equation: where img corr (i, c) is the channel c of the i-th standardized image, img orig (i, c) is the corresponding channel of the original image, and b 0 (i, c), b 1 (i, c), b 2 (i, c) are the regression coefficients calculated by equation 1.

Cropping
After internal calibration, for each image each spot of must deposited on the paper was separately cropped to recover an image of size equal to 900 × 900 pixels.The cropping procedure generated a dataset of 720 images (= 3 grape varieties × 3 field replicates × 5 harvest times × 8 spots × 2 measurement sessions).

Spot segmentation
Before converting all the images into the corresponding colourgrams, a preliminary inspection of the frequency distribution curves of the parameters included in the colourgram was conducted on a representative subset of images.This inspection evidenced that the pixels with values of relative red (rR, see paragraph 2.4.4 for definition) lower than 0.34 correspond to the background (white paper sheet).Therefore, the colourgram of each image was calculated after background removal (segmentation), i.e., considering only the pixels of the spots which present rR values greater than 0.34.

Conversion into colourgrams
After spot segmentation, the images of the individual spots were converted into the corresponding colourgrams, following the procedure reported in [21].
Basically, the first step of this approach consists in unfolding each RGB image in a twodimensional matrix with as many rows as the number of segmented pixels, and 3 columns, corresponding to the R, G and B channels.For each pixel, a series of additional variables directly derived by the R, G and B values are calculated: lightness (L), defined as the sum of R, G and B values; relative colours (rR, rG and rB), which are respectively the ratios between each channel and L; hue (H), saturation (S) and intensity (I), obtained by converting the RGB data into the HSI colour space, and the nine score vectors (three for each model) obtained by applying PCA to the raw, meancentered and autoscaled unfolded image data.For each variable, the frequency distribution curve (i.e., a histogram with 256 bins) is calculated, then the histograms of the 19 variables are joined in sequence to form a unique one-dimensional signal, which is then divided by the number of segmented pixels.In the end, the loading vectors and the eigenvalues of the three PCA models are added to the signal, that reaches a final length equal to 4900 points (= 256 × 19 + 36).
The conversion of the image into a one-dimensional signal is useful to gain a considerable data compression, which results in a faster computation time necessary for data elaboration.Indeed, the obtained matrix of these fingerprint-like signals can be analyzed by means of common multivariate analysis techniques, such as PCA or PLS.
On the other hand, the colourgram does not account for the spatial location in the original image domain of the colour-related information extracted by multivariate analysis.To address this issue, a proper algorithm has been implemented to visualize directly in the original image domain the colourgram features of interest for the problem at hand.In other words, it is possible to have a reconstructed image where the pixels related to the colourgram features recognized as interesting by the user are displayed in false colours, while the remainder pixels are represented in black [23].
All the steps followed for image elaboration were performed with routines written ad hoc in MATLAB language (ver.7.12, The Mathworks Inc., USA).In particular, the correction of the images and the subsequent conversion into colourgrams have been performed with specific Graphical User Interfaces (GUI): RGB Image Correction GUI and Colourgrams GUI, which are available at www.chimslab.unimore.it.

Multivariate analysis of colourgrams
PCA was firstly applied to the meancentered colourgrams matrix as an exploratory data analysis tool suitable to recognize similarities and differences among the images of must spots, also highlighting the possible presence of outliers.
Then, multivariate calibration models were calculated on the meancentered colourgrams matrix to quantitatively determine the values of the twelve considered parameters (Y variables).To this aim, the 720 colourgrams were split into a training set, used to calculate the models, and a test set, used for validation.In order to avoid overfitting, the splitting was conducted in a way to maintain together the colourgrams derived from replicate images: 480 colourgrams corresponding to 60 samples belonging to two field replicates were included in the training set, and the remainder 240 colourgrams corresponding to 30 samples belonging to the remaining field replicate were included in the test set.Results and discussion

Exploratory data analysis
The PCA model calculated on the colourgrams matrix highlighted a trend according to harvest time that was clearly visible along the PC1 scores, as reported in Figure 2a.In general, it can be observed that PC1 values generally increase for the different grape varieties according to the harvest time.
However, considering the single grape varieties, it is possible to recognise some minor clusters formed by samples picked in different time periods.The comparison of the PC1 score distribution with sample images collected at the five harvest times (Figure 2b), highlights that actually the observed clustering involves groups of spots having similar colour-related characteristics, i.e., that there is a separation between spots having a lighter colour and spots having a more intense colour.
For instance, the colourgrams derived from Ancellotta samples gather in two clusters, the first one composed of the light colour samples collected at T0 and T1, and the second one composed of the darker colour samples collected at T2, T3 and T4.
Moreover, the colour evolution of the must spots during grape ripening is different for the distinct varieties.

interval-PLS calibration models
For each Y variable 6 iPLS models were calculated considering different interval widths.Table 1 reports the results of the best calibration models selected in cross-validation, considering the lowest RMSECV value.
The overall best calibration model was obtained for the total amount of anthocyanins (R 2 pred = 0.94).This model has been built using only 416 variables (26 intervals made of 16 variables) out of 4900 original variables.The selected regions include variables belonging to several distribution curves of the colourgram, i.e., R, G, B, L, Rr, S, V, the score vectors of PC1-raw, PC3-raw, PC1meancentred, PC1-autoscaled and PC2-autoscaled.The actual values versus the predicted values of TAnt for the test set samples are reported in Figure 3a.
The performance of the calibration model obtained for the prediction of TF were less good, even though acceptable (R 2 pred = 0.67).As a matter of fact, flavonoids are a large family of hydroxylated polyphenolic compounds: the major group consists of anthocyanins (red-purple pigments), but many other compounds belonging to the family of flavonoids are weakly coloured or colourless, such as flavonols (pale yellow pigments) and flavanols (colourless pigments that become brown in case of oxidation) [34].Therefore, RGB images are more suitable to accurately predict the anthocyanins content, which increases during the maturation, strongly influencing the red-purple grape colour and, consequently, the colour of the resulting must.
Colour index is the most useful attribute to evaluate the overall colour of this kind of samples.For CI, a high predictive performance (R 2 pred = 0.91) was obtained considering only 576 variables, that were selected using an interval width of 32 variables.The most part of the selected regions belong to the same frequency distribution curves selected for the calibration of TAnt.In this case, variables from the histograms of R, G, B, L, H, S, V, score vectors of PC1-raw, PC3-raw, PC1-meancentred, PC2-meancentred, PC1-autoscaled and PC2-autoscaled were selected.This fact indicates that few regions of the colourgram are sufficient to have an accurate prediction of both TAnt and CI, which are among the most important parameters to monitor the phenolic ripening of grapes.colour of a solution is strictly related to the presence of one or more analytes of interest.Therefore, further applications could be not necessarily limited to the food industry, e.g. for the analysis of other beverages like wine, fruit juice, beer and coffee, but also in other contexts; for example, after proper adjustments, this approach could be also helpful for the analysis of specimens of forensic or clinical interest, among others.

Captions to Tables and Figures
Table 1 Results of the best iPLS calibration models obtained for the colour-related parameters.

Figure 1
Acquisition scheme of the RGB images and key steps followed for the image elaboration.

Figure 2
Figure 2 Sample index vs.PC1 score values (a); representative RGB images of Ancellotta, Lambrusco and Malbo Gentile must spots collected at the five harvest times (b).

Figure 3
Figure 3 Actual amount of total anthocyanins (TAnt, expressed as mg of oenin chloride/L) vs. predicted amount, for the test set samples.Results obtained considering each single spot (a) and the averages over the eight spots (b).

Figure 4
Figure 4 VIP scores plot resulting from the calibration models of TAnt, CI and Mv-3-glc.The zoom corresponds to the SC3RAW variables.

Figure 5
Figure 5 Segmented RGB images and corresponding reconstructed images of five must spots at different maturity levels, for each grape variety.