Journal of Plant Physiology & Pathology ISSN: 2329-955X

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Research Article, J Plant Physiol Pathol Vol: 9 Issue: 2

Early Detection of Fusarium Oxysporum Infection Using Binomial Logistic Regression Models from Visible-Near Infrared Reflectance Spectroscopy

Juan Carlos Marin-Ortiz1*, Lilliana Maria Hoyos-Carvajal1, Verónica Botero-Fernandez2 and Lucio Flavio de AlencarFigueiredo3

1National University of Colombia, Department of Agricultural Sciences, Medellín050034, Colombia

2National University of Colombia, Department of Geosciences and Environment, Medellín050034, Colombia

3University of Brasilia, Department of Botany, Brasília 70910-900, Brazil

*Corresponding Author:
Juan Carlos Marín-Ortiz
National University of Colombia, Department of Agricultural Sciences, Medellín 050034, Colombia
E-mail: [email protected]

Received Date: January 19, 2020; Accepted Date: February 18, 2021; Published Date: February 26, 2021

Citation: Marin-Ortiz JC, Hoyos-Carvajal LM, Botero-Fernandez V, de Alencar Figueiredo LF (2021) Early Detection of Fusarium Oxysporum Infection Using Binomial Logistic Regression Models from Visible-Near Infrared Reflectance Spectroscopy. J Plant Physiol Pathol 9:2.

Copyright: © All articles published in Journal of Plant Physiology & Pathology are the property of SciTechnol, and is protected by copyright laws. Copyright © 2020, SciTechnol, All Rights Reserved.


Vascular wilt is a serious threat to a large number of economicallyimportant crops. The evaluation of the disease incidence is done visually, which makes it subjective and delayed, besides, it requires the destructive sampling. The application of the binomial logistic regression models (BLRM) to predict the Fusarium infection using spectra reflectance data in the visible and near-infrared (VIS/NIR) spectral range has not been attempted so far in any of its hosts. The aim of this research was to develop a methodology based on BLRM that allow the incidence detection of Fusarium infection in tomato plants using reflectance spectra. The study was carried out during the asymptomatic period of the disease with two tomato varieties, one tolerant and one susceptible to all races of Fusariumoxysporum. They were developed 16 BLRM, one model per sampling (every three days), which were highly significant (p <0.001) and showed high goodness of fit after 6 days postinfection (DPI). Three key wavelengths that are reliable for fusarium wilt detection wasidentified in a greenhouse setting: reflectances at 430 nm, 550 nm and 750 nm (R430, R550 and R750). In the models developed in tolerant plants only the variables R970 in incidence at 3 dpi (I3dpi) and R704 in incidence at 9 DPI (I9dpi)were not significant.According to the results obtained, the BLRMs generated from the reflectance data in tomato plants have a higher prediction yield.The BLRMs models developed in this research have potential use for rapid detection and non-destructive estimation of vascular wilt incidence in plants during the disease incubation period.

Keywords: Vascular wilt; Fusarium Oxysporum;. Spectral reflectance; Binomial logistic regression models; Plant disease; Infection prediction


Vascular wilt; Fusarium Oxysporum; Spectral reflectance; Binomial logistic regression models; Plant disease; Infection prediction


Fusariumoxysporum is currently considered as a species complex of plant pathogenic fungal widely distributed around the world [1]. This species complex has many pathogenic strains that parasitize more than 100 plant species, causing great economic losses in many agricultural and natural ecosystems [2]. In general, the majority of pathogenic strains produce wilt symptoms, accompanied by partial yellowing of the leaves, folding of the shoots and decreased overall plant growth. A characteristic that allows to diagnose and quickly differentiate this disease is the whitish, yellowish or brown coloration in the vascular bundles [3].

After the appearance of the diseases symptoms in plants, their presence is verified by means of detection techniques. Currently, detection techniques can be summarized in serological methods, “purely molecular” methods, biomarker-based disease detection methods, and methods based on plant properties and stress [4]. This last group can be divided into imaging techniques (hyperspectral and fluorescence images) and spectroscopic techniques (visible, infrared, fluorescence and multispectral bands). During the past decades research has been deepened to develop methods based on spectroscopy to detect diseases and stress in plants using portable spectrometers, drone, and satellite data [5,6]. These techniques are based on measuring the amount of radiation reflected by a surface as a function of wavelengths to produce a unique reflectance spectrum for each material, which can be used as a “fingerprint” (spectral signature) that it allows to discriminate between healthy and diseased plants.Therefore, this research works under the hypothesis that it is possible to use the spectral signature for the early detection of vascular wilt in tomato plants from multivariate models based on wavelengths specific to the studied pathosystem.

Despite the availability of these techniques, rapid, non-destructive, more sensitive and selective methods are still required today for the rapid detection of plant diseases. The developments in recent decades in agricultural technology, such as the development of precision agriculture, have led to a demand for non-destructive automated detection methods for plant diseases. The application of reflectance spectroscopy within the PA is possible from its recently definition by the International Society for Precision Agriculture (ISPA) as the “management strategy that gathers, processes and analyzes temporal, spatial and individual data and combines it with other information to support management decisions according to estimated variability for improved resource use efficiency, productivity, quality, profitability and sustainability of agricultural production”. Early detection of plant diseases in planting or propagation materials, and selective eradication programs are methods used to prevent or slow the advance of pathogens. Once the problem is established, control has traditionally been carried out with exclusively chemical methods. However, this is difficult because these methods generate resistance problems, added to the low efficiency of some active ingredients [7]. The high cost of control measures and the environmental impact they generate have sparked great interest in precision agricultural techniques in which the products are applied to specific sites affected by the disease. These site-specific applications result in a reductionof pesticides use, and therefore can minimize costs and the ecological impact of agricultural crop production systems [8].

The presence, discrimination, and quantification of diseases in plants can be determined with high precision by means of multivariate calibration models of the spectral data based on plant-pathogen interaction [9]. Most published studies report models performed from diseases in plants that develop local symptoms [10- 12]. Local symptoms are generated by physiological or structural changes within a limited area of the host tissue, like leaf spots, gill, and cankers, and the disease severity is calculated by measuring them. This method of evaluation is a subjective visual estimate in which the infection level in a given plant is established based on the amount of diseased tissue, so it refers to the percentage of the affected area of a specific organ of the plant. In studies in which spectroscopy is applied in the modeling of infection of local diseases, the response variable is usually the percentage of severity, and the absorbance or reflectance at certain wavelengths are the explanatory variables [13]. However, studies focused on infection modeling in plants’ systemic diseases are less common, perhaps because their measurement is done by calculating the percentage of diseased plants throughout the crop (incidence) and should be used only for diseases that affect the entire plant. That is, if a specific measurement is made on an affected organ with a systemic disease, that plant is diseased or healthy (a percentage of infection is not considered) [14]. Additionally, multiple measurements must be carried out during the incubation period of the disease to evaluate the minimum infection time and stability of the discriminatory capacity of the models made [15]. For this reason, in modeling infection by systemic diseases in plants, the modeling techniques mentioned previously (CLS, ILS, PCR, PLS) are not recommended (exempting some variations of them). For this purpose, the use of logistic regression methods is recommended, since it is one of the best-known techniques used to model a categorical response variable based on continuous or categorical predictor variables [16].

Visible/Near-infrared (VIS/NIR) applications are generally based on calibration models, which stablish a mathematical relationship between absorption or reflectance spectra and the interest factors. These models require measurements of the sample spectra, of a population that includes all the possible variations for future prediction, considering the population as the set of all the measurements that cover the characteristics of the sample [17]. The potential use of different statistical methods for the spectral data modeling to estimate physical and chemical properties can only be carried out if the properties studied are dependent on molecular structure and biophysical characteristics. This is because changes in molecular structure and biophysical characteristics are reflected in the spectra, and linearly related to spectral intensities [18,19]. Traditionally, different multivariate statistical tools have been used to perform calibration models using visible and near-infrared data applied to early detection in plant diseases. Some of the most methodcited in this area of knowledge are least squares modeling (CLS) [20] linear discriminant analysis (LDA)[21] principal components regression (PCR) [22], and the partial least square (PLSR)[23]. Perhaps the latter one, of the most widely used in prediction models with high sensitivity levels and accuracy in detection, discrimination and quantification of plants disease [24,25] and damage caused by insects [26,27]. This is because the PLS is related to the regression of principal components, but instead of finding hyperplanes of minimum variance between the response variable and the independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. A linear regression is found through the projection of the prediction variables and the observable variables to a new space, so it has some advantages over other methods for the analysis of spectral data. For example, other types of modeling offers qualitative information, but PLS modeling can offer important information used to allocate bands and identify unexpected components in models [28].

Traditionally, nonlinear indices have been used in plant epidemiology, most often using linearized transformations or non-linear Classical least squares (CLS)[29]. However, these results assume normality in the distribution of the disease index, independence, and constant variation index, which cannot be verified. For the particular case of systemic diseases, binomial logistic regression models (BLRM) are more appropriate, since they predict the probability that an observation is assigned in one of the two categories of a dependent dichotomous variable, based on one or more independent variables that can be continuous or categorical. On the other hand, the BLRM do not assume normality of the data, although it is advisable to perform a normality analysis of the residuals of each model to avoid Type I error inflation (rejecting the null hypothesis when it is true in the population). Also, the BLRM assume binomial distribution in the response variable, linear relationship between the independent variables and the link function (logit), absence of extreme values and low correlations between the predictors [30]. Considering this, our aim was to develop a metodology based on binary logistic regression models that allow the prediction of the incidence of Fusarium infection in tomato plants using reflectance spectra, in order to perform a non-destructive diagnosis when the plants are in disease asymptomatic period.

Materials and Methods

Plant material

Tomato (Solanumlycopersicum L.) seeds of two varieties (Ponderosa and Santa Cruz) were sown along 2 weeks in a greenhouse with temperature (~25ºC) and humidity (~70%) semi-controlled in December and January of 2017/2018. Seeds were obtained with EmbrapaHortaliças. Seedlings uniforms and healthier were selected and tranfered to plastic bags with soil autoclaved and fertilized. For each variety 60 seedling were cultivated for 21 days, when day were inoculated with F. oxysporum race 1 (Foc R1T). Ponderosa variety is susceptible to all races of F. oxysporum[31], while Santa Cruz variety shows tolerance to F. oxisporum races 1 and 2.

Sample collection and preparation

This study was conducted at the Universidad Nacional de Colombia, Medellín (Antioquia, Colombia). A completely random design was used to compare two treatments with their respective controls: I) tomato plants var. Ponderosa inoculated with F. oxysporum (Fo5), II) tomato plants var. Santa Cruz inoculated with Fo5. To confirm the plants infection, destructive sampling was performed when the susceptible plants showed visible symptoms, about 21 days post infection (DPI). The infectivity test was performed using the indexing technique in PDA+SE medium at 50 mg kg-1 [32]. At 5 d after sowing the indexed stem segments in the used medium, the presence and growth of the pathogen was observed. More information about the experimental design and inoculation process are widely described in Marin [15,33]

Spectra analysis and development of models

The Vis/NIR reflectance spectra were acquired with a USB2000+ portable spectrometer (Ocean Optics, FL, USA), with spectral range between 400-1100 nm. Further, a light source HL-2000-HP tungsten halogen (wavelength range between 360-2,400 nm) was used with a WS-1 diffuse reflectance standard (reflectivity >98% in the 250-1,500 nm range) and a premium grade 600 μm reflectance probe QR600-7- VID-125F (Ocean Optics, FL, USA). Different parameters required for spectrometer calibration, such as integration time, average readings per measurement and interval time were determined at the beginning of the experiments. It was used standard normal variate (SNV) to remove the spectra with noise, either because the spectra was deformed and/or because of a reading error. This transformation was used as the best pretreatment that allows a good grouping of the infected plants in the treatments carried out [15].

Due to this design, four groups of datawere collected with 125 spectra each, 25 leaves per treatments and control plants for each sampling day (500 spectra/day of sampling). Additionally, five plants of susceptible and tolerant tomato were measured per treatment (with their respective controls) and per sampling day in an independent experiment, like test data to perform the corresponding validation. Cross-validation with independent samples was performed to estimate the error rate of each model. In previous works, a set of relevant specific wavelengths were identified for Fo5 infection during the incubation period of the disease, using a classification algorithm [15]. Further, It was related changes in specific spectral bands with physiological changes associated with F. oxysporum infection in tomato [31]. A total of 27 variables (selected wavelengths) were obtained from infected and healthy plants. Table 1 summarizes the number of spectra used to evaluate each model as training and test data on the two evaluated varieties.

Variety Infected Non infected Total/simple by day
Training Test Training Test
1Susceptible 125 25 125 25 300
2Tolerant 125 25 125 25 300

Table 1: The number of spectra/day used to evaluate each model as training and test data on two tomato varieties.

Wavelengths selection

The selection of the wavelengths (explanatory variables) for the BLRM was done using biplots and loading plots for principal components analysis (PCA) for each data group without previous association. The likelihood ratio test (LRT) used to verify the model’s significance and the effects of each predictor (explanatory variable) in the BLRMs were evaluated.Additionally, the akaike information criterion (AIC) was calculated as a relative quality measure of models for a set of given data; the McFadden and Ragg-Uhler (r2CU) pseudo R2 were used to evaluate the goodness of fit of models. The BLRM makes several assumptions about the data, hence its verification is essential to build a suitable model. First, the response variable in this research has two possibilities: “1” (infected plants with F. oxysporum) or “0” (healthy plants) which realizes the first assumption, that requires the result of binary or dichotomous variable to be like “yes” vs “no”, “positive” vs “negative”, “1” vs “0”. To validate the second assumption, graphs were made of the results of the logit function, and the three predictor variables that explained the higher variability of data in each model, since this requires linearity between them (Logit (p) = log (p / ( 1-p)), where p is the probability of the result. To comply with the third assumption (outlier values should not be taken), those values whose leverage was more than three times the critical value threshold in residual value exploration in an individualized manner were marked and eliminated. Finally, the generalized variance-inflation factors (vif) were calculated at the time of inserting each explanatory variable “one by one” to avoid high correlations between the predictors.

Evaluation of predictive capacity and model performance

To evaluate the predictive capacity of models, probabilities were generated in the form of P (I=1 | R, where I= variable response, R=explanatory variable), and a high threshold value of 0.8 was defined. Therefore, if P (I=1 | R)>0.8 then I=1, otherwise I=0. To perform the corresponding validation in the BLRMs, test data from a sample of susceptible and tolerant tomato plants (with their respective controls) taken independently in a later experiment performed was used to obtain the training data. The receiver operating characteristic (ROC) type curves were drawn with the corresponding areas under the curve (AUC), and precisions were calculated, which are the typical performance measurements for a binary classifier. All statistical analyses were performed using R Software.


The plot in Figure 1 shows the variable selection process, in which a biplot was used (Figure 1A), and the “loading plot” of PC1 (Figure 1B), to select the variables with highest loadings and incorporate them into the model. Once these variables with highest loadings in the PC1 were incorporated to model , the same procedure is used to select the variables edwith the highest loadings in the first five PCs (Figure 1C and 1D) and incorporate them into the model “one by one”, as long as they meet the following criteria: A) the model remains significant; B) the new predictor variable is not highly correlated with the previously selected ones; C) improve the model predictive capacity.

Figure 1: Biplot (A) and loading plots (B: PC1, C: PC2, D: PC3) of the PCA with VIS/NIR reflectance measures without grouping of infected (12 dpi) and healthy plants of a susceptible tomato variety.

In general, the reflectances at 550 and 750nm (R550 and R750) presented high positive charges in PC1 in all models, so this component may be related to the variables that determine plant infection. Other variables, such as R430, R445, R510, R704, and R970, were also selected using this method, although in general, they explain a lower percentage of data variability in the PCAs. However, the relevance of model variables, their significance, and the predictive capacity of the model with these variables will be discussed in detail below.

BLRMs to predict F. oxysporum infection in tomato plants during the incubation period of the disease

Using the variables defined with the selection methodology previously described, 16 BLRMs were developed. A model for each group of data obtained per day of sampling, taken every three days during the incubation period of the disease. Table 2 summarizes the developed models, where “I” denotes the response variable with an indicator of the DPI in which the reflectance data was taken, which has two possibilities: “1” (plants infected with F. oxysporum) and “0” (healthy plants). The explanatory variables are denoted by the capital letter R followed by the corresponding wavelength value.

  DPI Equation Pseudo R2 LRtest
McFadden r2CU
Susceptible plant 3 I3dpi = -0.58 + 1.43(R430) - 3.70(R445) + 1.33(R510) - 0.55(R550) + 0.13(R750) 0.16 0.80 ***
6 I6dpi = -30.09 + 16.13(R430) - 4.69(R510) + 1.22(R564) - 0.19(R704) + 0.10(R750) 0.52 0.69 ***
9 I9dpi = -23.07 + 2.63(R430) + 4.45(R445)+ -0.64(R510) + 0.30(R550) + 0.01(R750) + 0.15(R970) 0.50 0.83 ***
12 I12dpi = -94.34 - 2.47(R430) - 6.19(R445) + 1.82(R550) - 0.08(R750)+0.85(R970) 0.79 0.88 ***
15 I15dpi = -103.75 + 15.01(R445) + 0.36(R550) + 1.09(R750) 0.79 0.80 ***
18 I18dpi = -18.32 + 12.19(R430) + 1.36(R550) - 1.28(R704) + 0.05(R750) 0.67 0.88 ***
21 I21dpi = -20.71 - 11.25(R445) + 1.60(R550) + -0.04(R704) - 0.28(R750) 0.67 0.77 ***
Tolerant plants 3 I3dpi = -2.92 - 3.04(R430) - 0.01(R550) + 0.10(R750)- 0.04(R970) 0.14 0.24 ***
6 I6dpi = -8.36 + 0.87(R550) - 1.28(R650) - 0.09(R750) 0.31 0.47 ***
9 I9dpi = -20.43 - 5.75(R430) + 1.03(R550) -0.25(R704) + 0.15(R750) 0.62 0.77 ***
12 I12dpi = -40.27 + 12.74(R430) + 3.35(R500) + 0.46(R750) 0.87 0.94 ***
15 I15dpi = -34.65 + 8.88(R430) + 0.66(R704)+ 0.37(R750) 0.87 0.94 ***
18 I18dpi = -12.00 – 4.43(R430) + 4.85(R510) – 0.42(R704) + 0.20(R750) 0.73 0.92 ***
21 I21dpi = -66.57 + 5.07(R510) + 1.06(R550) + 0.51(R750) 0.83 0.91 ***

Table 2: Results of data adjustment to each binomial logistic regression model to predict F. oxysporum infection in tomato plants (susceptible and tolerant) using VIS/NIRs spectral data during the incubation period of the disease.

To analyze the goodness of fit of the logistic regression models (LRM) there is no statistical equivalent to R2. However, several pseudo R2s have been developed, which receive the name since they are on a scale similar to R2 (although some never reach “0” or “1”), but cannot be interpreted as an R2 in the sense of the ordinary least squares (OLS). A clear example of this point is the pseudo R2 of McFadden, which considers that high goodness of fit of the LRM happens when the pseudo R2 presents values between 0.2 and 0.4. Considering this, the models both varieties in our study show high goodness of fit after the six dpi, which was corroborated with the pseudo R2 of Cragg and Uhler (r2CU), which has a range of 0-1 [34]. Additionally, the likelihood ratio test (LRTest) results used to verify the significance of models were highly significant (Table 2). This step is very important, considering that if the global model results were not significant, the effects of each predictor or explanatory variable would not be revised.

Verification of BLRM assumptions in reflectance data

The BLRMs make several assumptions about the data. This chapter describes the main assumptions and provides a practical guide to verify if these assumptions are true for the data used, which is essential to build a successful model. The first assumption requires that the dependent variable be binary (dichotomous, “dummy”). In the particular case of this study, the dependent variable has only two categories: plant infected with F. oxysporum (“1”) and healthy plants (“0”). To pass the second assumption there must be a linear relationship between the logit and the variables of each predictor. Figure 2 shows the relationship between the logit function of the results and the three-predictor variables that explained the greater variability of data in each model. In models built with data at 0 dpi and 3 dpi for the susceptible variety, the variables tend to move in the same relative direction, but not at a constant rate, that is, they have a “monotonous” relationship (the linear relationships are also monotonous). In addition, there are high dispersions in models made during this period. After 6 dpi, there are clear relationships between the logit function and the predictor variables, in addition to a low dispersion of data during the incubation period of the disease, except the R750 in the model made for 21 dpi that presents a “curved pattern”. The BLRM performed for tolerant plants follow the same pattern as the susceptible plants described above, with “monotonous” relationships and high dispersion of data in the relationships (logit-predictive variables) during the first week They also showed clear linear relationships and low data dispersion in the models compared with the rest of the incubation period, except for some particular cases (R750 in I6ddi, R430 in I9dpi, R430 and R750 in I18dpi).To verify the second assumption, the logit function of the results vs value of the predictor variable was graphed (Figure 2).

Figure 2: Relationship between the “logit” function of the result and three predictor variables that explain greater variability in each model.

Before proceeding with the statistical significance of predictor variables, the overdispersion coefficients (φ) were estimated, which were used to recalculate the significance (φ should be ~ 1). The final models for both varieties obtained φ values between 0.2-1.0 (Table 3). Finally, generalized variance-inflation factors (vifs) were calculated at the time of insertion (“one by one”) of each explanatory variable to avoid high correlations between the predictors. As a general criterion, the variables were included in the different models when the magnitude of vif<10, with some exceptions that showed high values of vif between two pairs of variables: R550 and R704 for the BLRM in susceptible plants of 18 dpi, and R510 and R704 for BLRMs in tolerant plants of the same day (Table 3).

  dpi Φ R430 R445 R484 R510 R550 R650 R704 R750 R970
Susceptible plantsvif 3 0.99 3.31 3.83 - 5.80 7.51 - - 2.97 -
6 0.44 2.47 2.85 - 11.40 8.74 - 2.14 1.72 -
9 0.69 2.30 2.75 - 9.30 6.93 - - 8.04 7.46
12 0.37 7.11 11.33 - - 8.13 - - 4.49 7.11
15 0.37 1.46 - - - 1.02 - - 1.48 -
18 0.63 1.49 - - - 16.43 - 17.33 1.38 -
21 0.66 - 3.90 - - 5.94 - 2.06 2.61 -
Tolerantplantsvif 3 1.03 1.60 - - - 2.64 - - 9.05 6.79
6 0.93 - - - - 5.06 5.32 - 3.76 -
9 0.61 2.01 - - - 6.57 - 5.65 1.18 -
12 0.24 3.12 - - - 1.82 - - 3.72 -
15 0.24 1.20 - - - - - 1.02 1.21 -
18 0.75 1.66 - - 14.70 - - 14.51 2.61 -
21 0.23 - - - 4.26 1.22 - - 3.85 -

Table 3: Overdispersion coefficients (φ) according to the day after the inoculation (dpi) and their generalized variance-inflation factors (vif) in reflectances analyzed (nm).

The logistic regression does not assume the assumption of the normality of the data. However, a residuals normality analysis of each model was performed (data not shown), considering that ifit do not have normality, can be inflating the type I error. The residuals of deviance (a measure of the residual variation of a model) are centered at zero, and there is no predominance of negative or positive residual values, confirming the homoscedasticity of the models. The distribution of the residues is not biased, shows positive value and bias on the right, with low values of kurtosis (positive value close to 3); this slight bias does not substantially alter the critical alpha values of significance. Finally, the normal probability graph for each BLRM evaluated, in which the observed empirical data represented against the data that would obtained in a theoretical normal distribution, which confirmed that there is a slight deviation from the normal canonical assumptions.

Evaluation of explanatory variables significance

Considering that the models are highly significant at a global level, it was evaluated the explanatory variables significance. First, it was verified that the variables chosen are statistically significant in most models, except R430 (in I3dpi and I9dpi), R445 and R750 (in I9dpi), and R550 (I15dpi) in susceptible plants (Table 4). In the models developed from reflectance data in tolerant plants, only the variables R550 (in I3dpi), R970 (in I3dpi), and R704 (in I9dpi) were not significant (Table 4). Regarding the statistically significant variables, in general R550 had lower values of p in susceptible plants, whereas R750 had the lowest in the tolerant variety. This suggests a strong association between reflectance at 550 and 750nm and the probability that the plant is infected, but does not necessarily mean a high predictive capacity of models (this will be discussed later in this text). The positive coefficient for these two predictors in most models indicates infection in plants. By performing a more detailed analysis involving more parameters of each predictive variable, an increase in one unit of reflectance at 550 and 750nm increases the probability of fit in small quantities compared to other variables. Conversely, an increase in one unit of reflectance at 430nm greatly increases the success probabilities, as can be seen in the models for plants of the susceptible variety: 16.13 (6 dpi), and 12.19 (12 dpi). Despite the models and the highly significant variables, we cannot yet say anything about the model’s quality, compliance with the assumptions of data used and their predictive capacity. Regarding the relative quality of the BLRM, we can see that the AICs are lower after the 12 DPI, which supposes a higher quality of the models generated from the explanatory variables selected in this stage of disease incubation period (Table 4).

  A. Susceptible plants
Wavelength 3 DPI 6 DPI 9 DPI 12 DPI 15 DPI 18 DPI 21 DPI
R430 1.43 16.13** 2.63 -2.47 NS 12.19** NS
R445 -3.70* NS 4.45 -6.19 15.01** NS -11.25**
R510 1.33** -4.69** -0.64 NS NS NS NS
R550 -0.55** NS 0.30* 1.82** 0.36* 1.36** 1.60**
R564 NS 1.22** NS NS NS NS NS
R704 NS -0.19 NS NS NS -1.28** -0.04
R750 0.13** 0.10* 0.01 -0.08 1.09** 0.05 -0.28**
R970 NS NS 0.15 0.85*** NS NS NS
Constant -0.58 -30.09** -23.07** -94.34** -103.75** -18.32** -20.71**
Observations 217 223 243 193 180 199 183
AIC 266.00 205.34 182.87 67.70 47.67 101.23 94.08
  B. Tolerant plants
Wavelength 3 DPI 6 DPI 9 DPI 12 DPI 15 DPI 18 DPI 21 DPI
R430 -3.04** NS -5.75** 12.74** 8.88** -4.43** NS
R500 NS NS NS 3.35* NS NS NS
R510 NS NS NS NS NS 4.85** 5.07**
R550 -0.01 0.87** 1.03** NS NS NS 1.06*
R650 NS -1.28** NS NS NS NS NS
R704 NS NS -0.25 NS 0.66* -0.42 NS
R750 0.10** -0.09** 0.15** 0.46** 0.37** 0.20** 0.51**
R970 -0.04 NS NS NS NS NS NS
Constant -2.92** -8.36** -20.43** -40.27** -34.65** -12.00** -66.57**
Observations 228 235 229 225 236 231 228
AIC 280.949 232.665 130.074 47.292 64.387 162.349 62.98

Table 4: Coefficients summary of dependent variables of BLRM to predict F. oxysporum infection in tomato plants (susceptible and tolerant) using VIS/NIR spectra data during the incubation period of the disease.

Evaluation of predictive capacity and models performance

Although so far it has been proven that BLRMs describe well the reflectance observations set in healthy plants and infected plants with F. oxysporum, it is clear that most researchers are more interested in the accuracy of the predictions than in the goodness of fit [35].In the previous steps, the adaptation of the BLRMs were evaluated, now It can see how these models perform when predicting with a new data set. The accuracies obtained after the sixth dpi (6dpi) were greater than 80% (except for I15 of the susceptible variety in which was 0.79), presenting even values over 90% in the tolerant variety. At this point, it is important to note that these results depend to some extent on the test data origin and validation mechanism.

The ROCs were constructed using True Positive Rate (TPR) Vs false positive rate (FPR) (Figure 3). It should be remembered that the TPR defines how many correct positive results occur among all the positive samples available during the test (equivalent to sensitivity), while the FPR defines how many incorrect positive results occur among all negative samples available during the test (1- specificity). The best possible prediction method would produce a point in the upper left corner or coordinate (0,1) of the ROC space, which represents 100% sensitivity (without false negatives) and 100% specificity (without false positives). A random assumption will yield a point along a diagonal line (grey line in Figure 3). The ROC graphs of the models made from reflectance data at 0 dpi and 3 dpi are very close to the line of non-discrimination and have low values of AUC, which represent high randomness of the test and low efficiency of the models for correctly classifying the observations during this period. On the contrary, the ROC curves of BLRMs for the tolerant variety showed maximum sensitivity (100%) after 6 dpi, with low FPRs (high specificity); in addition to AUC values greater than 0.9 after 9 dpi. The ROC curves of the susceptible variety also reached high sensitivities, but with higher FPRs than in the tolerant variety (lower specificity), mainly between 12 and 21dpi. In general, their AUC was lower than for the previous variety, but with also high levels of performance, greater than 0.8 (except ROC at 21 dpi).

Figure 3: Receiver Operating Characteristic (ROC) curves of reflectance predictors to predict infection by F. oxysporum in two tomato varieties, susceptible (black line) and tolerant (blue line). The diagonal line of no-discrimination (grey line)


The absorbance and/or reflectance data in the VIS/NIR ranges has been used for the development of models in a wide range of plant applications, such as disease detection in plants, forage quality determination, chlorophylls and carotenes concentration, estimation of seed oil content and total antioxidant capacity, among others [36- 40]. The use of spectroscopy in plant diseases detection has been carried out mainly in diseases with local symptoms from the so-called “Disease indices”, which have the disadvantage of not being linear, adjusting most of the occasions with linearized transformations [33]. Specifically in infections with Fusarium, soft independent modeling of class analogy (SIMCA), neuronal and parametric classifiers have been used for pathogen identification in corn grains; in addition to the use of the linear discriminat analysis for detection of F. oxysporum isolates [41,42]. However, studies that describe the VIS/NIR application to predict systemic infections, and specifically vascular wilt in plants are very limited.

In this study, the VIS/NIR models were developed to predict F. oxysporum infection during the vascular wilt incubation period, when disease symptoms are not yet visible. The BLRMs constructed from variables, selected using the described methodology,were highly significant. The selected predictor variables that explained the highest percentage of data variability were R550 and R750, which were highly significant in most models.However, the R550 was significant in most of the BLRMs performed for the susceptible variety during the incubation period, while R750 was significant mainly for the BLRMs for tolerant variety. The explanatory variable R430 was also significant in most models, but its percentage of data explained variation was very low.

The test performed is a difference of null model residual deviances (the one that does not introduce any predictor variable, only the ordered one in the origin that coincides with the average of the answer) and interest model [30]. Therefore, model significance does not imply that original data fulfill the assumptions for BLRM and/or that the model has good predictive capacity. At this point, the question “which are the best models?” may arise, since each BLRM model developed in this study was made from a different data set, so it could not be compared with traditional indicators. Therefore, in this work the Akaike Information Criterion (AIC) was used as a relative quality measure of models for a set of given data, even with models made from different data groups. The AIC results in models of both varieties were quite consistent, as the BLRMs values were lower after 9 dpi when the physiological response of the plant to the pathogen increased its intensity (data not shown).

In general, the data adjustment to the BLRM was good in both varieties, presenting values greater than 0.5 (McFadden) and 0.8 (r2CU) in the Pseudo R2 of models made after 6 dpi. However, the Pseudo R2 values tend to be slightly higher on tolerant variety models. Unfortunately, to this date there are no other reports on the BLRMs performance to determine infection by F. oxysporum, but the Pseudo R2 obtained are consistent (even superior) with R2 homologs in multivariate models and traditional indexes applied to other plant disease detection [25,43-45]. The model estimates from logistic regressions are maximum likelihood estimates obtained through an iterative process that is not calculated to minimize the variance, so the OLS approach is not applied [46]. For this reason, to goodness of fit evaluate in logistic models have been developed several pseudo R2; they are called like this since they are measured on a similar scale (although some never reach “0” or “1”), but they cannot be interpreted as an R2 in the strict approach of OLS.

The adjustment of reflectance data to BLRMs asks for verification of some assumptions, and it is important to discuss them for the development of well-adjusted models. The first assumption requires that the dependent variable be binary, or in a practical way, each reflectance spectrum measured on the sheet could only have two options as to its origin. It can be measured from a spot in a sick plant that responds biochemically and physiologically to the pathogen (with or without visual symptoms), or it can be measured from a healthy plant. By definition, a systemic disease alters the normal physiological function of an organ or whole organism, and specifically on infections caused by F. oxysporum it has been demonstrated that plant resistance to disease is based precisely on the activation of the plant’s systemic defense mechanisms [47].

The reflectance at selected wavelengths (explanatory variables) presented linear relationships with the link function (logit), that is, the second assumption (linearity) is fulfilled. This is very important since it is possible discard some problems that can be generated by its non-compliance (specification error): omission of important independent variables, the inclusion of irrelevant independent variables, an incorrect functional form, changing parameters, and that the dependent variable can be part of simultaneous equation systems [48]. Checking the third assumption requires the atypical and influential cases’ detection, and their subsequent treatment is a crucial task in any modeling exercise. It is common to find atypical data in the reflectance measurements used to develop the BLRMs caused by environmental noise and human errors, which can involve large residues and often have marked effects on the linear maximum likelihood predictor [49]. The “manual” process performed to select the variables, comparing the vif between explanatory variables of models is also important to reduce the collinearity that commonly exists in hyperspectral data [50], since the collinear predictor, the variables can cause unstable estimates and inaccurate variations that affect confidence intervals and hypothesis testing. It can also inflate the variances of the parameter estimates and, consequently, cause incorrect inferences about the relations between the explanatory and response variables [51].

The models developed in this study were restricted to short periods of time (every three days), while the symptoms of the disease were not visible. The lowest predictive capacity in models obtained with reflectance data in the first week after infection is related to the physiological low response of the plant to F. oxysporum during this period, which has been supported by other authors [52,53]. According to the AUCs obtained, the BLRMs generated have a high prediction yield, exceeding an AUC of 0.8 in susceptible plants and 0.9 for tolerant plants after 6 dpi. One possible reason for not obtaining higher values is that the wavelengths selected as predictors for the BLRMs are in the spectral range from 400 to1000 nm and some important biomolecules in the plant-pathogen interaction have peaks of absorbance/reflectance in the NIR [54,55]. The increase of the predictive variables in the models could also improve the values in the AUC, but the correlation between the multiple predictors used in this study would have caused problems in the adjustment of the model.A final point that draws attention is the higher prediction performance in the models developed from the reflectance data in the tolerant plants. Since the plant-pathogen interaction is highly specific, each variety has particular physiological changes generated in the process of recognition of the pathogen and generates different polysaccharides important for its inhibition. These changes, at different times of the incubation period,can be causing differences in prediction yields.

The use of reflectance spectroscopy in the VIS/NIR in BLRMs to predict the infection by F. oxysporum in tomato plants facilitates the objective, rapid and non-destructive estimation of the samples, which contrasts with other techniques for detecting diseases in plants [4]. The methodology developed to generate the BLRMs could be a viable technique to detect infection by F. oxysporum during the incubation period of the disease when the symptoms are not yet visible. This research is the first step towards the application of BLRM based on reflectance data in the VIS/NIR range for the prediction of fungal systemic diseases, which can be used as a basic input in the design of technological tools that allow the plant disease detection in real-time.


It is possible to apply the BLRMs to the early detection of vascular wilt in tomato plants using reflectance data in the VIS/NIR range; additionally, a general methodology for calculating the adjustment has been provided. Several logistic regressions based on different combinations of predictive variables (reflectance at a specific wavelength) are shown in this study. By using the spectral data in the BLRMs it was possible to predict the incidence of vascular wilt in tomato plants with a reasonable degree of accuracy (accuracy>0.8, after 6 dpi).

It was possible to create models with good predictive performance, mainly in the susceptible tomato variety evaluated, using nine identified variableswith the selection method described above. However, the obtained results in this work suggest that the infection in tomato plants can be predicted using three basic variables: R750, R550, and R430 wavelengths located in the upper limit of the red, green and violet. Therefore, it is expected that by using some combinations of these three reflectances in the BLRMs or in plant disease indices it should be possible to quickly examine a large number of tomato cultivars due to their reactionto F. oxysporum infection. The BLRMs make fewer errors of identification when exceeding 6 dpi, and those generated from the reflectance data in the tolerant variety were more efficient.

Finally, the results of this study provided valuable information for the use of reflectance data for evaluation at organ and plant scales, which can be scaled for measurements made from remote sensors, on aerial or satellite platforms for evaluation of large infected areas with vascular wilt. Subsequent studies should focus on the specific relationship of some important biomolecules in tomato F. oxysporum interaction with reflectance data in the NIR ranges not covered in this project (from 1000 to2500 nm).


The Colombian Administrative Department of Science, Technology and Innovation (COLCIENCIAS) through the 2014 National Doctorates Program, which supported this work. Additionally, we appreciate the partial financing of the project by The National Call for the Support to Research Projects and Artistic Creation of the Universidad Nacional de Colombia 2017-2018.


Track Your Manuscript