Prediction of vine vigor and precocity using data and ... .fr

Indeed berry composition and wine characteristics depend on joint effects of ..... The deepest roots exploit the clayey sands which have a higher water content.
795KB taille 3 téléchargements 483 vues
1 Prediction of vine vigor and precocity using data and knowledge-based fuzzy 2 inference systems. 3 Cécile COULON

(1)

, Brigitte CHARNOMORDIC

(2)

, Dominique RIOUX

(3)

, Marie

4 THIOLLET-SCHOLTUS (1), Serge GUILLAUME (4) 5

(1)

INRA UE1117 Vigne et Vin, UMT Vinitera, F-49071 Beaucouzé, France

6 Tel: (33) 241225668; fax: (33) 241225665; mail: [email protected] 7

(2)

INRA, Supagro, UMR MISTEA, F-34060 Montpellier, France

8

(3)

Cellule Terroirs Viticoles, UMT Vinitera, F-49071 Beaucouzé, France

9

(4)

Cemagref, UMR ITAP, F-34196 Montpellier, France

10 Abstract 11 Aims: The evolution of the economical and environmental context (low-input 12 management practices, increase of energetic cost and climate change) requires 13 adaptation and/or optimization of winegrower’s practices in order to elaborate 14 competitive and qualitative wines. To adapt and sustain their practices at the plot scale 15 e.g. rootstock selection or plantation density, winegrowers and extension officers need 16 indicators to predict vine development according to immutable environmental factors 17 (soil, parent rock and landscape). The aim of this work is to develop operational and 18 useful indicators with a scientific justification. 19 Methods and results: This paper proposes a new approach based on a computer 20 model, composed of a cascade of fuzzy expert systems to estimate the two variables 21 that best characterize vine development: vigor and precocity. It combines expertise of 22 pedologists and data analysis. Based on a literature survey, in particular a previous 23 expert system using analytical equations, the new approach allows a continuous 24 estimation of vine vigor and precocity imparted by soil, parent rock and landscape. It 25 avoids the drawbacks of the previous expert system, due to the use of traditional crisp 26 partitions for continuous input variables. Model parameters setting by combination of 1

27 expert knowledge and data mining - making the most of them - is another novel 28 aspect. The method is tuned and validated on two different databases. 29 Conclusion: Vine vigor and precocity imparted by environmental factors can be now 30 evaluated with a method more efficient than the former ones, as shown by the 31 validation procedures. Post-evaluation correction by experts is not necessary which 32 saves time. The new method allows having a continuous estimation. Each step can be 33 controlled and analyzed during the design. The method is generic, reasoning rules 34 corresponding to the relations between variables have a range of validity that exceeds 35 a given area. It is quite easy to customize and to transfer to new areas by adjusting the 36 parameters using local knowledge and data. 37 Significance and impact of the study: This work gives an answer to the significant 38 problem of vigor and precocity assessment according to environmental factors, which 39 is a prerequisite in order to best adapt long-term cultural practices. 40 Key words: Composite indicators, fuzzy inference system, soil, landscape, 41 mesoclimate, pedoclimate, crop, imprecision, expert knowledge, k-means. 42 Résumé 43 Objectifs : L’évolution du contexte économique et environnemental (réduction des 44 intrants, augmentation des coûts énergétiques et changement climatique) nécessite une 45 adaptation et/ou une optimisation des pratiques des viticulteurs pour continuer à 46 élaborer des vins de qualité tout en restant compétitif. Pour adapter leurs pratiques et 47 plus particulièrement celles qui sont pérennes à l’échelle de la parcelle, les viticulteurs 48 et les conseillers viticoles ont besoin d’indicateurs pour prédire le développement de la 49 vigne en fonction des facteurs environnementaux permanents (sol, roche-mère et 50 environnement paysager). De tels indicateurs sont actuellement absents ou trop 51 simplistes dans leur mode de construction. L'objectif de ce travail est de développer 52 des indicateurs opérationnels et utiles avec une justification scientifique bien 53 construite. 2

54 Méthodes et résultats : Cet article propose une nouvelle approche basée sur un 55 modèle informatique composés de plusieurs systèmes experts flous. Il combine 56 l’expertise et l’analyse de données pour estimer les deux variables qui caractérisent le 57 mieux le développement de la vigne : la vigueur et la précocité. Grâce à une analyse 58 bibliographique et en particulier à partir d'un précédent système expert basé sur une 59 équation analytique, la nouvelle méthode permet d’obtenir une estimation continue de 60 la vigueur et de la précocité conférées par les facteurs du sol, de la roche-mère et du 61 paysage. Elle élimine les inconvénients du précédent système expert dus à un 62 découpage traditionnel en classes des variables continues. Un autre aspect nouveau 63 concerne l'obtention des paramètres du modèle en combinant efficacement la 64 connaissance experte et l’analyse de données. La méthode est mise au point et validée 65 à partir de deux bases de données. 66 Conclusion : La vigueur et la précocité conférées par les facteurs environnementaux 67 peuvent désormais être évaluées par deux indicateurs qui fonctionnent mieux que ceux 68 construits précédemment, évitant une correction a posteriori des valeurs des 69 indicateurs par les experts. La nouvelle méthode permet une estimation continue de 70 ces variables. Chaque étape peut être contrôlée et analysée lors de l'agrégation des 71 variables. La méthode est générique car le raisonnement lié aux relations entre les 72 variables n'est pas spécifique à une région. Elle peut facilement être transférée à de 73 nouvelles zones en adaptant les valeurs des paramètres à partir de la connaissance 74 experte et des données du nouveau terrain d'étude. 75 Signification et impact de l’étude : Ce travail apporte une réponse au problème de 76 l’estimation de la vigueur et de la précocité conférées par les facteurs 77 environnementaux permanents, qui est un pré-requis pour pouvoir mieux adapter les 78 pratiques culturales pérennes. 79 Mots-clés : Indicateurs composites, système expert flou, sol, paysage, mésoclimat, 80 pédoclimat, culture, imprécision, connaissance experte, k-means.

3

81 INTRODUCTION 82

Winegrowers must keep adapting their cultural practices according to targeted

83 wine quality and environmental factors in order to evolve towards sustainable 84 viticulture, to remain competitive and to improve both production methods and wine 85 quality. Indeed berry composition and wine characteristics depend on joint effects of 86 environmental factors and practices. Among these effects, the main key elements are 87 at least (Carbonneau et al., 2007): the level of the photosynthetic source (estimated for 88 instance by the model of Exposed Leaf Area), the sink corresponding to yield or 89 quantity of grapes (particularly during their maturation), the competitive sink 90 represented by shoot vigor (particularly in summer the growth of laterals); the general 91 equilibrium between those elements determines the carbon balance of the plant; it is 92 remarkable to note that a moderate water limitation optimizes the carbon flux towards 93 berries, and also to some extent the level of secondary metabolisms; the general 94 earliness of the phenological stages and the duration of the growing cycle, mostly 95 depends on temperature, and determines the time of the previous physiological 96 functions on a time scale. Among those effects, the choice of vine vigor and precocity 97 of vine vegetative cycle has been made in a first step, because they are of major 98 interest in cool climate viticulture (Jackson and Lombard, 1993; Morlat, 2010). Let us 99 define those terms. 100 First of all, the vine vigor corresponds to the rhythm and intensity of growth of the 101 shoot (Carbonneau et al., 2007). At the plant level, the ‘vine vigor’ term is commonly 102 used, both in practice and in the literature, to characterize the total vegetative biomass 103 of the plant ('vegetative expression'), knowing that there is no automatic link with the 104 shoot vigor, a vigorous vine, for instance, being able to bear weak or vigorous shoots 105 depending on the number of shoots or buds this vine is producing (Carbonneau et al., 106 2007). Nevertheless, in a first approach the model will deal only with vine vigor. 107 Depending on age or size, a vine can have few but very vigorous shoots, or many that 108 are less vigorous (Dry and Loveys, 1998). The management of the vineyard 4

109 throughout the control of vine vigor determines the balance between vegetative 110 growth and productivity (Kliewer and Dokoozlian, 2005). This balance can be 111 achieved through a rational management system involving the vine shape (Reynolds 112 and Heuvel, 2009) and interaction practices in the planting of the plot, such as choice 113 of rootstock, planting density and foliage height (Dry and Loveys, 1998; Rives, 2000). 114 The management strategy is dependent on the environmental factors. 115 As to earliness, also called precocity, the variables affecting the growth of the vine are 116 linked partly to water availability, which plays a major role in the early stages of the 117 vegetative cycle (Tesic et al., 2002; Carbonneau et al., 2007). Early vine development 118 means an earlier date of onset of the main phenological stages (budbreak, flowering 119 and veraison). Depending on the degree of earliness the optimal maturity of the grapes 120 for a given type of wine will be reached sooner or later in the growing season. For 121 example, in the northern vineyards of France, a late variety may not achieve optimal 122 maturity; so harvest would eventually be decided to avoid excessive rot of grapes. 123

Let us now give some elements about the ways to measure or estimate vine

124 vigor and earliness, which is not an easy task as they depend on immutable factors 125 such as soil, parent-rock, landscape characteristics and practices implemented at 126 plantation time, and also on non-permanent factors such as climate and annual 127 practices. 128

Vine vigor and precocity can be estimated during the vine growth cycle by

129 various measurements. Vine vigor can be evaluated by direct or indirect, destructive or 130 undestructive methods (Tregoat et al., 2001): in precision viticulture, remote-sensing 131 is a widely used technique (Drissi et al., 2009, Goutouly et al., 2006; Homayouni et 132 al., 2008), leaf area or pruning weight measurements (Carbonneau et al., 2007) and 133 expert evaluations (Bodin 2003; Carey et al., 2007). Precocity of the vine cycle can be 134 directly observed on plots and estimated according to indices that combine dates of 135 flowering and veraison (Barbeau et al., 1998). Some models allow prediction of 136 phenological dates according to air temperature (Garcia de Cortazar Atauri et al., 5

137 2009; Parker et al., 2011). Concerning budbreak specifically, Pouget (1963) 138 demonstrated that the time of budbreak (Baggiolini's stage B) is depending on the 139 temperature of the air (and then temperature of the bud) whatever be the temperature 140 of other parts, roots in particular. But in the Loire valley, Morlat and Hardy (1987) and 141 Morlat (2010) found a better correlation between the date of budbreak and the 142 temperature of the soil, than with the temperature of the air, leading to the conclusion 143 that the temperature of the root is the causal agent of the budbreak. In fact, the authors 144 have ignored the fact that a bud requires some time to warm up when the air is on a 145 warming trend, and that the soil requires also some time to warm up. In this particular 146 vineyard, that warming lag phases were similar which explains the best correlation 147 between budbreak and soil temperature, instead of air temperature. Besides, those 148 authors noted budbreak using the Baggiolini's stage C which may correspond more to 149 the beginning of growth (for which root activity is determining) than to budbreak. 150 Nevertheless, soil temperature is important for the beginning of growth, more than for 151 budbreak. In the same way, other authors showed the impact of root-zone temperature 152 on budbreak for vines (Woodham and Alexander, 1966; Kliewer, 1975; Zelleke and 153 Kliewer, 1980; Tagliavini and Marangoni, 1992). Even if some measurements are 154 available, they do not provide an a priori evaluation to ease the choice of long term 155 practices, at plantation time, because they are only available during the vine growth 156 cycle. 157

To address the lack of a priori evaluations and to propose operational tools to

158 best adapt cultural practices, models can be designed to estimate vine vigor and 159 precocity. Vine vigor and precocity variables can be considered as useful variables for 160 reasoning in realistic operating conditions. For example, if a medium vine vigor is 161 expected, winegrowers can compensate a high vigor imparted by environmental 162 factors by planting a rootstock that imparts a low vigor e.g. Riparia. Therefore, to 163 build efficient decision support tools to help winegrowers and extension officers to 164 best adapt long-term cultural practices, it is important to design models capable to 6

165 predict vine vigor and precocity variables at plot level but using a limited number of 166 input variables and easy to get. 167

Meynard (2008) noticed the necessity to think in a systemic way by

168 considering the interactions between practices themselves and with the environment. 169 A systemic approach in viticulture can benefit from modeling. Vine vigor and 170 precocity modeling according to the immutable factors such as soil, parent-rock, 171 landscape characteristics contribute to develop this systemic approach. The aim of our 172 modeling approach is to provide operational tools to best know the potentialities of 173 plots according to permanent factors for optimizing the adaptation of cultural 174 practices. It is not to propose an exhaustive representation of the system involving all 175 the physiological knowledge of vine development as proposed by (Dai et al., 2010). 176 We used a limited number of input variables and we aggregated these inputs using 177 expert systems to predict vine vigor and precocity level imparted by permanent 178 environmental factors. 179

Numerous studies on characterization and mapping of viticultural terroirs have

180 been carried out during the last twenty years (Vaudour, 2003). Vine vigor and 181 precocity imparted by soil, parent rock and landscape are not directly measured but 182 can be estimated through various mathematical algorithms (Morlat et al., 2001), which 183 aggregate input variables obtained from viticultural terroir mapping. Vine vigor and 184 precocity are then called ‘composite variables’, to distinguish them from the original 185 input variables, called 'raw variables'. We based our research work on this existing 186 algorithms. 187

The algorithms by Morlat et al. (2001) use analytical equations, which require

188 continuous variables to be partitioned into crisp classes. In many cases, estimations 189 need to be checked and re-evaluated by expertise because of the sharp transition 190 between classes. To avoid this problem, we propose in this paper a new strategy to 191 characterize vine vigor and precocity imparted by environmental factors. This method 192 replaces the analytical model of Morlat et al. (2001) by a computer model, involving 7

193 data and knowledge-based fuzzy expert systems. As our method aims to improve the 194 well documented mathematical method of Morlat et al. (2001), a comparison of the 195 results obtained with the two different methods is included in the design and transfer.

196 MATERIALS AND METHODS 197

Relations between environmental factors and vine vigor and precocity (Bodin

198 and Morlat, 2003; Morlat et al., 2001) settle the expert knowledge on which the 199 present method is based. First we detail the basis and input values of the analytical 200 algorithms by Morlat et al. (2001). Then we briefly recall the principles of fuzzy logic 201 and fuzzy inference systems. Next we present the design of the fuzzy inference 202 systems that replace the analytical algorithms. This design combines data mining and 203 expert knowledge. Finally we develop validation steps. 204 1. Input variables in the former algorithms (Morlat et al., 2001) 205

Vine vigor (VIG) is estimated by three input variables: water holding capacity

206 (WHC), gravel percentage on soil profile (GOP) and parent rock hardness (PRH). We 207 will now present in detail these input variables, and their measurements or estimation 208 techniques. 209 WHC is the main variable that influences vine vigor: WHC impacts vine vigor twice 210 as much as GOP and PRH. To characterize a vineyard, WHC must be estimated at a 211 large scale (vineyard or region) with sufficient precision at the plot scale, to allow a 212 good adaptation of the viticultural practices. Many methods are available to measure 213 WHC but many of them are expensive and tend to be used at only a few sampling 214 points (Acevedo-Opazo et al., 2008). The alternative approach used to evaluate WHC 215 is based on the Baize’s equation (2000) (Equation 1). It takes into account pedological 216 data gathered during viticultural terroirs mapping. The Baize’s equation uses the 217 following input variables: number of horizons (h), soil horizon depth (HD), percentage 218 of fine soil (FS), field capacity humidity (FCH), wilting point humidity (WPH) and

8

219 bulk density (BD). FCH, WPH and BD are determined at each sampling point 220 according to the texture of the horizons of the soil and the characteristics of the parent221 rock as proposed by Goulet et al. (2004). 222 Equation 1: Estimation of water holding capacity (WHC) according to Baize (2000). h: 223 number of horizons, FCH: field capacity humidity (%), WPH: wilting point humidity 224 (%), FS: percentage of fine soil (%), HD: horizon depth (dm), BD: bulk density.

225 226 HD and h are determined by the pedologist. HD of the last horizon can easily be 227 determined when PRH is medium or hard. When PRH is crumbly and the soil is very 228 deep, WHC is calculated for the whole rooting depth. The rooting depth is visually 229 observed on pedological trenches. However, it must be noted that in some cases, even 230 if the parent rock is hard, some roots can grow throughout crack of the rock and 231 significantly contribute to the water supply. 232 GOP is directly observed on pedological trenches. It is estimated for each horizon. 233 The average GOP is calculated according to the depth of each horizon. PRH is 234 determined according to the parent-rock and deep horizon types. It is classified into 235 three classes: 3 for hard, 2 for soft and 1 for crumbly. 236 As mentioned by Piedallu et al. (2011), the texture of the horizon, HD and GOP are 237 visually quantified, involving subjectivity in their estimation. Since more than one 238 operator is often involved in the data survey, operators must agree on the same frame 239 of reference. FCH, WPH, FS and BD are determined in a laboratory; a source of 240 uncertainty is introduced by the unstructuration of the soil. 241

Vine precocity (PRE) is influenced by the following environmental factors: vine

242 vigor, root profile, thermal pedoclimate, and thermal mesoclimate. As for the vine 243 vigor estimation algorithm, the precocity imparted by environmental factors is 244 estimated on the basis of nine input variables: GOP, HD, PRH, FCH, natural drainage 245 (NAD), color of soil surface (CSS), maximum rooting depth (MRD), altitude (ALT), 246 landscape opening (LOP) and exposure (EXP). NAD is estimated according to

9

247 Jamagne’s scale (Jamagne, 1967) and CSS according to Munsell's Color Charts 248 (Munsell-Color-Company, 2000). Root profile and consequently the MRD are directly 249 determined according to the type of soil that is the most influential, rather than the 250 rootstock genotype (Smart et al., 2006). Generally, the second and the third soil 251 horizon have the highest root density. Some soils with sand on the first horizon and 252 clay on the deep horizons present two depths of high density root, superficial and 253 deep. The deepest roots exploit the clayey sands which have a higher water content 254 (Morlat, 1993) and are related to the increase of the vine cycle duration. Superficial 255 roots established in sands induce an early budbreak. As the algorithm estimates 256 earliness of budbreak, we choose to attribute a superficial rooting depth to this type of 257 soil, since the precocity of the next phenological stages is very dependent on air 258 temperature (Barbeau et al., 1998). LOP is determined according to the method of 259 Jacquet and Morlat (1997). EXP is determined according to the gradient and 260 orientation of slopes. 261

All input variables are partitioned into crisp classes and their values are

262 aggregated using analytical equations, which can be found in Morlat et al. (2001). In 263 the following, this method will be referred to 'MORLAT2001'. 264

In building the new strategy, we stand by the relations between environmental

265 factors and vine vigor and precocity established by Morlat et al. (2001). These 266 relations were validated by crossing viticultural terroir mapping and winegrower’s 267 surveys (Bodin and Morlat, 2006; Morlat, 2010; Morlat and Lebon, 1992).

10

268 2. New strategy 269

Vine vigor (VIG) is estimated according to the water holding capacity (WHC),

270 the gravel percentage on soil profile (GOP) and the hardness of the parent rock (PRH) 271 as with the former algorithm. 272

To estimate precocity (PRE), we choose a different approach to aggregate the

273 input variables. Firstly we estimate the thermal pedoclimate (PED) and thermal 274 mesoclimate (MES). Secondly PRE is estimated according to four input variables: 275 VIG, PED, MES values and the maximum rooting density (MRD). This breakdown of 276 variables is suitable because MES can also be useful to adapt the pruning date, since 277 cold mesoclimate situations are more sensitive to frost. The impact of the altitude on 278 the temperature, and consequently on precocity, was noticed by several authors 279 (Falcao et al., 2010; Guyot, 1999; Lebon, 1993; Morlat, 2010); a higher altitude yields 280 a lower air temperature and consequently a lower precocity. However the opposite can 281 be observed since atmospheric inversion may happen along hillsides. The down slope 282 case is taken into consideration with the ‘topography’ (TOP) variable. 283

Data used to tune the new method come from soil and landscape characteristics of

284 2353 sampling points located in the Chinon vineyard of the middle Loire Valley 285 (France) completed by observations on 9 pedological trenches that are representative 286 of soil diversity. 287

The proposed architecture (Figure 1) is a hierarchical structure with several linked

288 modules, composed of four Fuzzy Inference System (FIS). Firstly, vine vigor (VIG), 289 thermal mesoclimate (MES) and thermal pedoclimate (PED) are evaluated according 290 to three or four input variables and secondly precocity (PRE) is evaluated according to 291 VIG, MES, MED and the maximum rooting depth (MRD). The four fuzzy experts 292 systems are implemented using a computer model. In the following, the new method 293 will be referred to as 'CFS' (Cascading fuzzy systems).

11

294

295 296 297 298 299 300

Figure 1: Architecture of the system with input variables to estimate the four output variables: the vine vigor, the thermal mesoclimate, the thermal pedoclimate and the vine precocity. The continuous or discrete characteristics of the input variables are indicated. Input variables are weighted (figures next to arrows) and then aggregated by the means of fuzzy inference systems..

301 3. Introduction to fuzzy sets and fuzzy logic 302

Our proposal is based upon fuzzy logic and fuzzy inference systems (FIS)

303 (Bouchon-Meunier and Marsala, 2003; Nguyen, 1996; Zadeh, 1965). The goal of this 304 section is to provide the reader with some theoretical background on FIS and a brief 305 description of fuzzy linguistic modeling. 306

FIS are one of the most famous applications of fuzzy logic and fuzzy set

307 theory (Guillaume and Charnomordic, 2012; Guillaume and Charnomordic, 2011). 308 They are able to handle linguistic concepts, e.g. High or Low, implemented using 309 fuzzy sets. Fuzzy logic is used as an interface between the linguistic space - the one of 310 human reasoning - and the space of numerical computation.

12

311

A fuzzy set is defined by its membership function (MF). A point in the

312 universe, x, belongs to a fuzzy set with a membership degree, 0 ≤ µ(x) ≤ 1. If L is a set 313 of Low soil water content, the membership degree µL(x) of a given soil water content 314 value x can be interpreted as the level up to which the x water content should be 315 considered as Low. Several fuzzy sets, e.g. Low, Medium and High, can be defined on 316 the same universe, as illustrated in Figure 2.

318 Figure 2 : Example of three fuzzy sets defined on the same universe. ‘x’: a point of the 319 universe, µL(x): the membership degree in the ‘Low’ fuzzy set, µM(x): the membership 320 degree in the ‘Medium’ fuzzy set, µH(x): the membership degree in the ‘High’ fuzzy set.

321 As fuzzy sets usually overlap, a data point is likely to belong to more than one fuzzy 322 set. In the partition shown in Figure 2, the value x belongs to the fuzzy sets Low and 323 Medium with the corresponding membership degrees µ L(x) and µ L(x). This allows to 324 manage the progressiveness of the phenomenon as well as a smooth transition between 325 concepts. 326

With such partitions, called strong fuzzy partitions, a given point may belong,

327 with a non null degree, to at most two fuzzy sets. Moreover, for each point in the 328 universe, the sum of the membership degrees to all the fuzzy sets of the partition is 329 normalized to one. These partitions have been shown to have good properties 330 regarding semantics (Valente de Oliveira, 1999). 331

Fuzzy sets are used in a FIS to build linguistic rules, for instance “If Water

332 Holding Capacity is High then ...”. In that case of a single input, for a numerical value 333 x of Water Holding Capacity, the rule matching degree, denoted by w, which means

13

334 how true the rule is for the example, is given by the membership degree of x in the 335 fuzzy set High, µH(x). Usually several input variables are involved in the rule premises. 336 The jth rule in the FIS rule base is written as follows: If x1 is A1j and x2 is A2j … and xp 337 is Apj then y is Cr, 338 where Akj is the fuzzy set of the kth input variable used within the jth rule and Cr is the 339 rule conclusion. The rule conclusion being a scalar, this formulation corresponds to a 340 Sugeno FIS (Takagi and Sugeno, 1985). 341 In the multi-input case, the rule matching degree is obtained by the conjunction of the 342 premise elements, as follows: Wj = µAj1(x1) ∧ µAj2(x2) ∧ . . . ∧ µAjp(xp); where µAjk(xk) 343 is the membership degree of xk in the fuzzy set Ajk, ∧ being the conjunction operator 344 'minimum'. Because of the fuzzy set overlap, a given input is likely to fire several 345 rules simultaneously. Consequently, all the fired rules will be involved in the system 346 inference and the rule conclusions will be aggregated to give the final output. For a 347 Sugeno with m rules the aggregation is performed using a weighted sum of the rule 348 conclusions, the weights being the respective rule matching degrees (Equation 2).

349 Equation 2: Calculation of a final output value y. m: number of rules; W j: jth rule 350 matching degree; Cj: jth rule conclusion.

351

Fuzzy modeling allows representation of continuous input information (e.g., a

352 water holding capacity expressed in millimeters) in a lexical form (e.g., 'low' or 'high 353 water holding capacity) that can be integrated into an expert system. This 354 mathematical method takes account of imprecision through a progressive transition 355 between input variables classes. 356 4. Fuzzy inference system design 357

The first step of FIS design deals with variable partitioning. In the present

358 model, the number of linguistic concepts is set to two for each of the input variables, 14

359 corresponding to the linguistic terms Low and High. This is the lowest possible 360 number, it limits complexity but it is still good in terms of expressiveness, as the fuzzy 361 sets provide a smooth transition between concepts. 362

Two kinds of variables are considered: continuous and discrete variables.

363 Discrete variable partitioning using fuzzy sets is possible because the discrete values, 364 in the considered case, are ordered and have a progressive semantic meaning. The 365 characteristic points of discrete inputs are not too difficult to set by experts given the 366 limited number of possible values. The characteristic points of continuous inputs, C1 367 and C2 in Figure 3, are not so easy to determine because of the infinity of possible 368 choices. So the monodimensional k-means algorithm (MacQueen, 1967) was run on 369 the input data, independently for each variable. The two cluster centers, C1 and C2, 370 were chosen as characteristic points.

371 Figure 3: Fuzzy parameters: C1 and C2.

372

373

The second step of FIS design is the choice of the rule base. For each composite

374 variable, the rules are easily extracted from the former algorithms (Morlat et al., 2001) 375 and transposed to the fuzzy formalism. The set of decision rules cover all the 376 situations that may occur. The rule conclusion is a scalar ranging from 1 to 3; it takes 377 into account the weights of the various input variables. 378

Fispro 3.4 software was used to build fuzzy inference systems (Fispro User

379 Guide).

15

380 5. System behavior evaluation 381

The accurate transcription of the influence of environmental factors on vine

382 vigor and precocity estimations needed to be confirmed. As the expert evaluation by a 383 pedologist of some environmental factors rises the level of uncertainty, the effects of 384 these factors also had to be assessed to evaluate the acceptable level of uncertainty. To 385 answer these two points, it is necessary to study the system behavior. We chose to 386 conduct a sensitivity analysis to parameters and to analyze the output response to input 387 variations. 388

The study of the sensitivity to the partition parameters (C1 and C2 in Figure 3)

389 of the continuous variables aims to relate the output variations and the uncertainties 390 that can exist on data values. A dataset of input values was built considering all 391 combinations of the following items: ten distinct values of WHC and GOP, and three 392 distinct input values of PRH. For WHC as well as for GOP, the ten distinct values 393 included a value lower than C1, a value higher than C2, C1, C2 (membership degree in 394 the corresponding MF equal to 1) and six values equally distributed between C1 and C2 395 (membership degree in both MFs less than 1). To test the sensitivity to parameters, 396 inferred output values, for each composite variable, obtained for the original value of 397 C1 or C2, were compared with those inferred for parameters values, modified by more 398 or less 20%. 399

The output variation in response to the input variable changes was studied to

400 check the system behavior. Output value variation was analyzed with respect to a 401 given input while the others were set to a fixed value. Ten values of the studied input 402 were considered (the same that for the sensitivity test to parameters) and three given 403 values of the fixed inputs: the values allowing to predict the lower value of the 404 composite variable and the higher one, and the value that corresponds to the 405 intersection of the two MFs ((C1 + C2) / 2).

16

406

Sensitivity to parameters and output response analysis constitute a system

407 behavior evaluation. 408 6. Validation 409

The vine vigor and precocity indicators that are used in this study are well

410 documented and validated by Morlat et al. (2001), and Goulet and Morlat (2010). 411 These composite variables cannot be directly validated by comparison with measured 412 data on plots from previous years because they only evaluate the impact of permanent 413 factors as soil, parent-rock, and landscape characteristics. Vine vigor, pedoclimate and 414 precocity also depend on cultural practices and annual factors. For instance, in the 415 case of observations on precocity, this variable is also dependent on annual climatic 416 factors and pruning date (Martin and Dunn, 2000). Similarly, the vine vigor depends 417 on the rootstock (Ollat et al., 2003). We changed the mathematical method to calculate 418 them so we needed to validate the indicators again. We may have a representative 419 range of situations with soil, parent rock and landscape variability on a vineyard, but 420 without the same climatic conditions and cultural practices. Therefore, instead of a 421 complete direct validation, we conducted a three step validation: 422 - In a first step, we thoroughly investigated the system behavior as presented in 423 Section 5.. In this step, we had first to consolidate the design of the new method to 424 make sure the relations between variables are respected. 425 - In a second step, we compared the model outputs with winegrower's evaluations on 426 plots with similar practices but non similar soil, parent rock and landscape factors. 427 This step was conducted using a dataset collected in Chinon region in the middle Loire 428 valley (France). This dataset corresponds to the characterization of 2353 sampling 429 points and 756 plots of vine. Winegrowers were surveyed to provide an evaluation of 430 the vine vigor and precocity of budbreak levels, the soil temperature and the frost risk 431 of each plot. The evaluation can be correlated to the four outputs of our model. In the 432 Chinon area, the majority of vine plots are inter-cropped with a grass cover. This

17

433 situation generates a competition for water and nitrogen resources and consequently 434 impacts vine development (Cellette et al., 2005). The rootstock planted on plots 435 conferred different levels of vigor and precocity to the vine (Institut Français de la 436 Vigne et du Vin, 2007). To compare model outputs with winegrowers's evaluations, 437 we selected non-intercropped plots, planted with a rootstock conferring medium vigor 438 and precocity levels and pruned on the same month (January). The selection led to the 439 extraction of 19 plots from the dataset. These plots are planted on 15 contrasted 440 Terroir Basic Units (TBU); a TBU is defined by three associated components: a 441 geological component, a pedological component and a landscape component (Morlat 442 et al., 2001). The environmental factors variability appears as sufficient to test our 443 model. We evaluated the model predictions using the root mean square error (RMSE) 444 (Equation 3), so comparing predicted model values and observed values by 445 winegrowers. 446 Equation 3: Root mean square error (RMSE). A: number of plots, ŷi: predicted value, 447 yi:observed value. 448

450 - In a third step, the new method based on cascading fuzzy systems (CFS) was also 451 tested on a dataset from a completely different area, to assess its generalization 452 capability and what adaptations needed to be considered. The data set originates from 453 a vineyard located in the Douro region (Portugal). The numerical quality was assessed 454 by comparing the predicted outputs with the MORLAT2001 outputs. The 455 transferability of the method to end-users was also examined, as well as its user456 friendly characteristics.

18

457 RESULTS AND DISCUSSION 458

The CFS computer model design and tuning using the French Chinon dataset

459 introduced in Section Material and Methods is now detailed. It includes the fuzzy 460 partitioning of the input variables and the rule base composition. Then the system 461 behavior was studied and the model outputs compared with expert evaluations. Finally 462 the transferability to a new dataset and to new users was tested, by comparing the 463 MORLAT2001 and CFS methods using a dataset from the Douro (Portugal). 464 1. Fuzzy partitioning of the input variables 465

As stated in the Material and Method (section 3.), the number of linguistic

466 concepts is set to two for each of the input variables, corresponding to the linguistic 467 terms Low and High. In practice, discrete variables may of course have more than two 468 modalities. For instance PRH, EXP, LOP, CSS and MRD can take three different 469 values, and the middle one will be considered as both Low and High with a 0.5 470 membership degree in the corresponding fuzzy sets. The NAD variable has a whole 471 range of modalities between 1 and 6. The corresponding fuzzy set partition is designed 472 considering that NAD is completely Low for values lower than 2, and completely 473 High for values higher than 5 on the Jamagne scale (Jamagne, 1967). 474

These partitions were built according to general expert knowledge, and may not be

475 suited to the specific data of the area under study. For instance, medium is the 476 overwhelming modality (1933 sampling points) for the CSS variable, only 130 477 sampling points having a high value and 290 a low one. This unbalanced distribution 478 reflects the homogeneity of the area under study. Other areas might have a more 479 balanced distribution of soil surface colors. 480

For the four continuous variables (WHC, ALT, GOP and FCH), the two

481 limits, C1 and C2 on Figure 3, are found by the k-means algorithm. As they are 482 computed from a specific dataset, one can wonder about their meaning. For example, 483 Figure 4 (top left) shows that WHC is completely low for values lower than 134mm.

19

484 Indeed, Zufferey and Murisier (2006) observed that plots with a WHC below 100mm 485 are sensitive to water stress, so a high membership degree in the first fuzzy set means 486 that, according to the local climatic conditions, a water stress can be observed. 487

Data used for calibrating the method have a wide range of variation. This is

488 illustrated by the histograms showing the distributions of each input variable, with the 489 class intervals determined according to Sturges’ method (Sturges, 1926). Histograms 490 are shown in Figure 4, together with the corresponding fuzzy partitions.

491 492 493 494 495 496 497

Figure 4: Histograms showing the distributions of input variables and their fuzzy partitioning for the Chinon (France) dataset. Values between brackets correspond to (minimum, first parameter C1, second parameter C2, maximum). The class intervals are determined according to Sturges’method (Sturges, 1926). WHC: water holding capacity, GOP: gravel on profile, PRH: parent rock hardness, ALT: altitude, EXP: exposure, LOP: landscape opening, FCH: field capacity humidity, NAD: natural drainage, CSS: color of soil surface, MRD: maximum rooting depth.

498

To summarize the characteristics of our approach for partitioning, let us point

499 out the following remarks: i. Discrete variables are partitioned by expert knowledge. 500 Depending on local environmental characteristics, some classes are not well

20

501 represented. ii. Conversely continuous variables need to be interpreted and validated 502 by expertise as they are automatically partitioned using data mining. 503 2. Generation of decision rules 504

The expert rules for composite variables assessment (VIG, PED, MES and

505 PRE) are shown in Tables 1 to 4. As expected, the WHC input variable impacts on 506 VIG estimation (table 1) twice as much as the other inputs, WHC being the most 507 influential variable. These combinations lead to five distinct rule conclusions for VIG. 508 Table 1: Decision rules. WHC: water holding capacity, GOP: Gravel percentage on 509 profile, PRH: parent-rock hardness, VIG: vine vigor evaluation.

Rules 1 2 3 4 5 6 7 8

If WHC low low low low high high high high

and GOP high low high low high low high low

and PRH hard hard crumbly crumbly hard hard crumbly crumbly

then VIG 1 1,5 1,5 2 2 2,5 2,5 3

510 For PED estimation (Table 2), the three inputs variables have distinct weights that lead 511 to seven distinct rule conclusions. FCH is the most influential variable. Combinations 512 with high value of FCH and a bad drainage (NAD variable) lead two low rule 513 conclusions (1 and 1.3) and two high rule conclusions (2.8 and 3) when FCH is low 514 and NAD is good. 515 Table 2: Decision rules. FCH: field capacity humidity, NAD: natural drainage, CSS: 516 color of soil surface, PED: pedoclimate evaluation.

Rules 1 2 3 4 5 6 7 8

21

if FCH high high high low high low low low

and NAD bad bad good bad good bad good good

and CSS light dark light light dark dark light dark

then PED 1 1,3 1,8 2 2 2,3 2,8 3

517 The topography (TOP) plays a particular part in the mesoclimate (MES) estimation. 518 This appears in the rules given in Table 3. If the topography is a thalweg or a down of 519 slope, rule 1 conclusion is 1, whatever the value of the other input variables. If the 520 topography is neither a thalweg nor a down of slope (rules 2 to 9), the three other input 521 variables of the MES composite variable (ALT, EXP and LOP) have the same weight, 522 that configuration leading to four distinct rule conclusions. 523 Table 3: Decision rules. TOP: topography, ALT: altitude, EXP: exposure, LOP: 524 landscape opening, MES: mesoclimate evaluation; Top 1: thalweg or down of slope, Top 525 2 : other topography. if TOP and ALT and EXP and LOP then MES Rules Top 1 high or low high or low open or close 1 1

2 3 4 5 6 7 8 9

Top 2 Top 2 Top 2 Top 2 Top 2 Top 2 Top 2 Top 2

high high low high low high low low

low high low low low high high high

close close close open open open close open

1 1,7 1,7 1,7 2,3 2,3 2,3 3

526 In Table 4, the four input variables of PRE have the same weight, therefore there are 527 five distinct rule conclusions, equally distributed between 1 and 3. 528 Table 4: Decision rules. VIG: vine vigor estimation, PED: pedoclimate evaluation, MES: 529 mesoclimate evaluation, MRD: maximum rooting depth, PRE: vine precocity evaluation.

530

22

Rules 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

if VIG High Low High High High Low Low Low High High High Low Low Low High low

and PED cold cold cold cold hot cold cold hot cold hot hot cold hot hot hot hot

and MES cold cold cold hot cold cold hot cold hot cold hot hot cold hot hot hot

and MRD deep deep superficial deep deep superficial deep deep superficial superficial deep superficial superficial deep superficial superficial

then PRE 1 1,5 1,5 1,5 1,5 2 2 2 2 2 2 2,5 2,5 2,5 2,5 3

531

Reasoning and linguistic rules are generic, for example a low water holding

532 capacity (WHC) imparts a low vigor to the vine (VIG) whatever the studied area. 533 They may accurately retranscribe expert knowledge, which is expressed in natural 534 language, as are the rules. Weights given to the various variables are reflected in the 535 rule conclusions. The rule base is complete, meaning that the decision rules cover all 536 of the possible situations. 537 3. Fuzzy system behavior 538

The computer model is based upon expert rules. Parameters to partition input

539 variables are determined by expertise and data analysis. As previously underlined, an 540 important step in the design of the computer model is to check its representativeness 541 and evaluate its behavior when the model is confronted with real-world data. The 542 dataset from Chinon (France) is used for that purpose. 543

The output range of each composite variable (1 to 3) is well represented in the

544 vineyard except for PRE with 1.4 as minimum and 2.6 as maximum. All the output 545 range of PRE was predicted on other Loire vineyards. The four output distributions are 546 presented in Figure 5. The different VIG values are found in the same proportions. 547 Few sampling plots have a low MES value i.e a cold mesoclimate, corresponding to 548 the specific topography of thalweg or footslope. There is a higher proportion of 549 sampling points with a high PED value i.e. a hot pedoclimate, characterized by a low 550 FCH (between 2 and 16%), a good drainage (1 or 2) corresponding to a hard chalky 551 parent-rock of the Turonian geological stage from the cretaceous period, frequently 552 observed in the considered area. The majority of the sampling points yields a low to a 553 medium precocity estimations and the distribution of PRE is quasi Gaussian. The 554 range of VIG, MES, PED and PRE observed over the same area highlights the 555 necessity to adapt some viticultural practices; like planting a low vigor rootstock e.g. 556 Riparia while targeting medium vine vigor, compensating for the high vigor given by 557 the environmental factors (VIG value close to 3).

23

558

559 Figure 5: Histograms showing the distribution of output variables. VIG: vine vigor, 560 MES: mesoclimate, PED: pedoclimate, PRE: vine precocity. The class intervals are 561 determined according to Sturges’method (Sturges, 1926).

562

Thanks to fuzzy logic modelling, the new CFS method allows a smooth

563 response function as shown in Figure 6. VIG is plotted as a function of WHC and 564 GOP when PRH is set to the intermediate value. To make easier the comparison, C1 565 and C2 are the ones found in MORLAT2001: 50 and 100mm for WHC, 20 and 40% 566 for GOP. The threshold effect leads to a wide plateau and sharp transitions for 567 MORLAT2001 (left part of the figure). These drawbacks are avoided in CFC (right 568 part of the figure).

569 570 Figure 6 : Comparison of the former MORLAT2001 method to estimate vine vigor 571 according to environmental factors, variable ‘NPVT’ and the new CFS method, variable 572 ‘VIG’. VIG values are generated for combinations of WHC: the water holding capacity 573 and GOP : gravel percentage on profile and for intermediate parent-rock hardness.

574 Let us consider two plots (plot 1 and plot 2) characterized by the same type of parent 575 rock (intermediate rock hardness) and the same percentage of gravel on the soil profile 576 (10%), with different water holding capacities (plot 1 : 59mm and plot 2 : 99mm). If 577 we used the MORLAT2001 method to estimate their VIG value, the two plots would 578 have the same VIG value (2.25, only 9 values being possible between 1 and 3 with a 579 precision of +/- 0.25), despite their different water holding capacities. They would

24

580 need to be reclassified together by hand, with the value for plot 2 reset to 2.75, to be 581 more realistic. With the CFS model, the inferred VIG values for plots 1 and 2 are 582 respectively 2.0 and 2.8, which is more satisfactory. 583

This illustrate the fact that the new method avoids expert reclassification of the

584 plots having values very close to the class limits. All estimation procedures of vine 585 vigor and precocity imparted by the vineyard site can now be completely automated. 586 4. Sensitivity to parameter analysis 587

As previously explained in Section Material and Methods (Section 5.), the

588 sensitivity to membership function parameters was studied for the four continuous 589 variables, and each of the three fuzzy inference systems. We used a synthetic dataset 590 of input values, including all combinations of values that cover all the WHC and GOP 591 domain, combined with the three distinct PRH input values. 592

For each continuous input variable, the C1 and C2 parameters vary one after

593 the other by +/- 20% (Table 5). 594 Table 5: Values of the fuzzy parameters for the input continuous variables used in the 595 sensitivity analysis (WHC: water holding capacity, GOP: gravel percentage on profile, 596 FCH: field capacity humidity and ALT: altitude). C1 C2 Input variables "-20% A+20% "-20% A+20%

WHC (mm) GOP (%) FCH (%) ALT (m) 598 599 600 601

112 11 20 62

150 35 26 63

224 53 40 95

Table 6: Output value variation of VIG: vine vigor, PED: pedoclimate and MES: mesoclimate estimation when the fuzzy parameters C1 and C2 of the continuous input variables (WHC: water holding capacity, GOP: gravel percentage on profile, FCH: field capacity humidity and ALT: altitude) vary more or less by 20%. Output composite variables VIG (1-3 scale) PED (1-3 scale) MES (1-3 scale)

603

74 7 14 42

Input variables WHC GOP FCH ALT

C1 "-20% 0,4 -0,2 -0,3 -0,2

C2 A+20% -0,4 0,2 0,2 0,1

"-20% 0,4 -0,2 -0,2 -0,1

A+20% -0,4 0,2 0,3 0,2

Table 6 shows that a 20% variation of input variables parameters leads to an

604 important variation of output values, especially for the WHC parameters (0.4 on a

25

605 scale varying between 1 and 3). When one of the two parameters decreases by 20%, 606 VIG decreases by 0.4 meaning 20% of its domain of variation. When parameters of 607 GOP undergo the same variation, VIG varies more or less by 10%. Variation of 608 parameters of FCH leads to PED variation of more or less 15%. MES varies more or 609 less by 10% when parameters of ALT vary. 610

For each output variable, Figure 7 aims to illustrate the individual impact of

611 each input variable when the others are set to a given value. Let us take an example 612 and detail the impact of GOP on VIG when the other two inputs, WHC and PRH, are 613 given fixed values (Figure 7: plot at the top left, light grey curves). Three given 614 combinations of input values are considered for WHC and PRH: i. values set to 615 predict the lowest value of VIG, i.e. WHC 278mm and PRH = '1-crumbly' and iii. Intermediate values that 617 correspond to the intersections of the two fuzzy sets for each input, i.e. WHC = 618 206mm and PRH = '2'. The impact of GOP on the estimated VIG output is plotted for 619 each of these three combinations. Relations between the inputs and the output are 620 monotonous but not linear as the intermediate curve shows. This is an important 621 characteristic of fuzzy set modelling. 622

If we now make a similar study to assess the impact of PRH on VIG, we

623 observe that the response curves are identical, this being due to GOP and PRH having 624 the same weight in the rule base. However, this is not true any longer for the response 625 curve of VIG to WHC (Figure 7, top left, black curves). Indeed, the more influence 626 the input variable has on the output estimation, the more important the amplitude of 627 the curve is: e.g. the amplitude of WHC is 1 while the GOP or PRH one is 0.5.

26

628

This representation also allows to visually check the relation between

629 variables: e.g. VIG increases when WHC increases and it decreases when GOP 630 increases. Similar interpretations can be made about the impact of FCH, NAD and 631 CSS on PED estimation; ALT, EXP and LOP on MES and VIG estimation, PED, 632 MES and MRD on PRE estimation. All of the corresponding plots are shown on 633 Figure 7.

634 635 636 637 638 639 640 641 642

Figure 7: Individual impact of each input variable on output values (VIG : vine vigor, MES: mesoclimate, PED: pedoclimate, PRE: vine precocity, WHC: water holding capacity, GOP: gravel percentage on profile, PRH: parent-rock hardness, FCH: field capacity humidity, NAD : natural drainage, CSS: color of soil surface, ALT: altitude, EXP: exposure, LOP: landscape opening, MRD: maximum rooting depth), min: minimum, C1 and C2: fuzzy parameters. Three given combinations of given values of input values that were fixed are considered that correspond to the three curves: value set to predict the lower output value (the lower curve), the higher one (the higher curve) and values that correspond to the intersections of the two membership functions.

643

Therefore the computer model response and its representation, as shown in

644 Figure 7, allow the specialist to assess the individual effect of each input variable.

27

645 5. Comparison of output values with expert evaluations 646

We compared model outputs and winegrowers evaluations on 19 plots with

647 similar practices. Four variables were evaluated : vine vigor level, precocity level, 648 frost risk and soil temperature according to five values: 1 'very low', 1.5 'low', 2 649 'medium', 2.5 'high' and 3 'very high'. They were respectively compared to the 650 predictions given by the VIG, PRE, MES and PED model outputs. The 19 plots 651 correspond to 15 Basic Terroir Units. Diversity of soil, parent-rock and landscape 652 factors is well represented except the 'very bad drainage' modality (Table 7). 653

Scatter plots revealing the correlations between the predicted and observed values

654 of the four outputs are shown in Figure 8. RMSE values vary between 0.4 and 0.6 655 (Table 8) while the output range is [1-3]. These RMSE values, which quantify the 656 goodness-of-fit, should be discussed with in mind the uncertainty existing on expert 657 evaluation that we estimate to more or less 0.5. VIG is the most accurately predicted 658 variable among the four considered ones with a determination coefficient equal to 659 0.53. The winegrowers do not have too much trouble to assess vine vigor, as this 660 variable is familiar to them, and they naturally use a wide range of variation, which 661 may be the explanation for the relatively good prediction. In the absence of an 662 absolute referential, experts used the whole range of assessments, ranging between 1 663 and 3, but on a limited set of conditions that form a reduced subset of the possible 664 ones. This explain why the model outputs are limited to the [1.6-2.3]. The range of 665 variation of PRE is not entirely represented, it was not possible to validate the 666 prediction of extreme values (close to 1 or 3). All that leads to a very low 667 determination coefficient (0.05). Concerning MES output, some plots (numbers 8, 9 668 and 10) are characterized by a medium to hot mesoclimate, however winegrowers 669 indicated a risk of frost that can be explained by local cold airstreams. 670

The low correlation between the composite variables and the winegrower's

671 estimations suggests that some influential factors may not have been taken into

28

672 account as the previously mentioned impact of specific local climatic conditions or the 673 fertilisation practice. Winegrowers used to adjust this amount to avoid a deficiency but 674 in some cases it can not be optimized with regard to soil nutrient content. In our study, 675 we have selected plots with homogeneous practices. All the plots were known to be 676 pruned on the same month and a more precise pruning period was not recorded in the 677 database. Martin and Dunn (2010) observed that the budbreak of a vine pruned six 678 weeks later than another one is delayed by 5.3 days and this delay persists at anthesis 679 (5.0 days) and veraison (4.1 days). The highest delay according to the pedoclimate 680 mentioned by Morlat and Hardy is about 9 days in comparison. The difference of 681 precocity due to the variation of the pruning date could be detected by winegrowers 682 and might explain the poor correlation for the composite variable that estimate the 683 precocity of vine development (PRE). In the future, the pruning date should more 684 precisely recorded. 685

We used the observed values as they were given by the expert i.e. five possible

686 values, as references to compare predicted and observed values. Future work should 687 also consider the uncertainty on the reference expert values to define a more suitable 688 index than the classical RMSE and R² values.

29

689 690 691 692 693 694 695 696

Table 7: Characteristics of the soil, parent-rock and landscape factors of the 19 winegrower's plots used to evaluate model predictions. WHC: water holding capacity (mm), GOP: gravel percentage on profile (%), PRH: parent-rock hardness (score between 1 and 3), ALT: altitude (m), EXP: exposure (score between 1 and 3), LOP: landscape opening (score between 1 and 3), FCH: field capacity humidity (%), NAD : natural drainage (score between 1 and 6), CSS: color of soil surface (score between 1 and 3) and MRD: maximum rooting depth (score between 1 and 3). Plots Basic Terroir Unit 1 Stony old alluvium 2 Weakly weathered rock 3a Hydromorphic young alluvium 3b Sandy and silty old 4 alluvium 5a 5b Sandy old alluvium 5c 6 Calcareous colluvium Chalky and sandstony 7 weakly weathered rock Chalky and silty weakly 8 weathered rock Chalky and silty 9 moderately weathered rock Sandy moderately 10 weathered rock 11 Clayey old alluvium 12a Moderately weathered rock 12b Sandy and clayey old 13 alluvium Sandy and clayey 14 strongly weathered rock Clayey and greensandy 15 moderately weathered rock

30

WHC GOP 84 42 110 10 146 50 155 3

PRH 2 3 1 1

ALT 38 72 33 33

EXP 2 1 2 2

LOP 2 3 1 3

FCH 9 12 16 15

NAD 2 1 4 3

CSS 2 2 2 2

MRD 3 2 3 3

180

10

1

77

3

3

16

2

1

3

154 125 125 160

1 12 12 15

1 1 1 2

33 40 40 60

2 2 2 3

2 3 3 2

15 13 13 22

3 2 2 2.5

2 2 2 2

3 3 3 3

91

14

2

56

2

3

19

1.5

2

1

144

2

2

52

3

2

35

1

2

2

201

10

1

39

2

3

26

2

3

3

237

9

2

103

2

3

17

2.5

1

3

278 273 132

2 2 15

1 1 1

73 69 69

3 2 2

3 3 3

26 26 18

3.5 2 1

2 2 2

3 3 3

269

3

1

74

2

3

31

3.5

2

3

214

4

1

92

1

1

24

4.5

2

3

191

14

2

35

2

2

34

4

2

3

698 699 700 701

Figure 8: Comparison of predicted values by the model (in abscissa) and observed values by winegrowers (in ordinate) of vine vigor level (VIG), mesoclimate (MES), PED (pedoclimate) and precocity level (PRE), for 19 plots corresponding to 15 Basic Terroir Units.

702 Table 8: Root mean square error (RMSE) and coefficient of determination (R²) obtained 703 from the model validation.

704

31

VIG

MES

PED

PRE

RMSE

0.4

0.6

0.5

0.5



0.53

0.16

0.24

0.05

705 6. Comparison of use of the former and the new methods on a new data base 706

The transferability to new datasets and to end-users was tested comparing the two

707 method: MORLAT2001 and the new CFS computer model . 708

Data used to compare the methods and evaluate the transferability of CFS come

709 from a new dataset of 267 sampling points and 17 augers located in the Douro 710 vineyard (Portugal). As previously mentioned, reasoning and linguistic rules given in 711 Tables 1, 2, 3 and 4 are generic, so the rule base can be reused as it is. The input 712 variables are the same ones and the weights of all input variables are preserved as 713 well. An adaptation was made regarding EXP, the ‘exposure’. The west exposure is 714 considered as an intermediate exposure for Chinon and as a high exposure in the 715 Douro area according to local expertise. The fuzzy set C1 and C2 parameters of the 716 four continuous variables WHC, GOP, FCH and ALT were adapted as well (Figure 9). 717 The crisp boundaries of MORLAT2001 were determined using expert knowledge. The 718 results show that crisp boundaries and fuzzy parameters are very different. In 719 MORLAT2001, crisp boundaries were suited to local characteristics. In the vineyard 720 of Anjou in the middle Loire Valley (France) where Morlat et al. (2001) developed 721 their method, the crisp boundaries for WHC were 50 and 100mm while in the 722 vineyard of Chinon and Douro, they were 100 and 150mm. Using k-means analysis, 723 fuzzy set parameters are higher for WHC (144 and 290 mm), because of high WHC 724 values for some sampling points. 725

End-users highlighted the advantage of an automated data mining technique to

726 determine specific parameters from data but confirmed that they must be validated by 727 pedologists and agronomists.

32

728 729 730 731 732 733 734

Figure 9: Histograms showing the distributions of input variables and their fuzzy partitioning on Douro (Portugal) dataset. Values between brackets correspond to (minimum, first parameter C1, second parameter C2, maximum). The class intervals are determined according to Sturges’method (Sturges, 1926). WHC: water holding capacity, GOP: gravel on profile, FCH: field capacity humidity, ALT: altitude. Dotted vertical lines correspond to crisp boundaries of the former method.

735

The results of three estimation methods for precocity are plotted as maps in

736 Figure 10. Map (a), left side, is computed using MORLAT2001; in map (b) 737 pedologist and vinegrowers’s expertise from ‘Cellule Terroirs Viticoles’ was used to 738 correct map (a) while map (c) result from CFS. 739

The lowest estimated value of MORLAT2001 (20) corresponds to 1 in CFS ,

740 and 60, the highest value of MORLAT2001 to 3 in CFS. Four classes, liking to four 741 linguistic terms are considered: low, low to normal, normal to high and high.

33

742 743 744 745 746 747 748 749 750 751

Figure 10: Precocity estimation for a vineyard located in the Douro area (Portugal). Map a: the former method 'MORLAT2001' corresponds to the method of (Morlat et al., 2001), map b: map built by ‘Cellule Terroirs Viticoles’ based on MORLAT2001 method adding pedologist and winegrower’s expertise, map c: map built using the new 'CFS' method without any expert modification. Output values of MORLAT2001 vary between 20 and 60; the same amplitude intervals are used to compare the two methods. Vine plot limits are represented as black outlines. Situation 1: pediments with deep and silty soil; situation 2: altered schist with high clay content and an important deep water movement before budbreak. Situations 1 and 2 are low precocity area undetected by MORLAT2001 method.

752

At first glance, MORLAT2001 was not able to estimate Low precocity. Let us

753 discuss two different cases, labeled situation 1 and 2 on map (c), where a low 754 precocity is predicted with CFS. 755

Situation 1 is characterized by pediments with deep and silty soil inducing a

756 high vine vigor level (VIG about 2.5) and a cold pedoclimate (PED about 1.2), the 757 mesoclimate, MES, being intermediate, 2.3. All the area was assigned a “low to 758 normal” or “normal to high” precocity level by MORLAT2001. Pedologists found the 759 new map more appropriate. Situation 2 evaluation should also be a low precocity, 760 according to pedologists, given the specific terroir unit of this area (altered schist with 761 high clay content and an important deep runoff before budbreak), that leads to a cold 762 mesoclimate and pedoclimate. MORLAT2001 did not correctly estimate the precocity 763 for this kind of situations. 34

764

To summarize, the computer model parameters may be easily adapted

765 according to the local environment. Statistical analysis may help to propose local 766 parameters, the effort of pedologists and researchers is thus reduced to the validation 767 of the new values. But it could also make sense to compare areas or regions using the 768 same set of parameters. The method is fast to implement and when used by end-users, 769 it proved satisfactory. 770 7. Discussion on model validation 771

The composite variables built in this study can be given an indicator status. Gras

772 et al. (1989) defined an indicator as “a variable which supplies information on other 773 variables which are difficult to access (...) and can be used as bench marker to take a 774 decision”. Mitchell et al. (1995) considered indicators as ‘‘alternative measures (. . .) 775 that enable us to gain an understanding of a complex system (...) so that effective 776 management decisions can be taken that lead towards initial objectives”. Vine vigor 777 (VIG) and precocity (PRE) of a plot may constitute useful indicators to select the 778 variety, rootstock and cultural practices. Mesoclimate (MES) may help to adapt the 779 pruning date; a cold plot can be pruned later to minimize frost risk. 780

Bockstaller and Girardin (2003) highlight that indicators are used to assess

781 complex processes that often do not have quantitative equivalents. They mention that 782 many indicators cannot be validated in the same way as simulated models by 783 comparison with measured data. They propose a three-stage approach: design 784 validation, output validation and end-use validation. Cloquell-Ballester et al. (2006) 785 consider a similar approach to validate environmental and social indicators: a self786 validation stage carried out by the ‘working team’ itself to ensure the correct 787 performance and guarantee the proper documentation of indicators; a scientific 788 validation stage integrating independent experts’ judgments and a social validation 789 stage integrating stakeholder’s opinion.

35

790

We improved the existing method of Morlat et al. (2001) by using cascading

791 fuzzy expert systems. Figure 6 and table 8 show that output variables are now more 792 precisely estimated. A partial validation of our indicators predicting vine vigor and 793 precocity levels according to permanent environmental factors has been carried out, 794 using plots located on the same area and with similar practices. According to 795 Bockstaller

and

Girardin

(2003)

and

Cloquell-Ballester

et

al.

(2006)

796 recommendations, we concentrated our efforts on a deep analysis of the system 797 behavior. 798

Parameters of the new system were validated on a real-world dataset. Our new

799 method using fuzzy logic allows taking into account the imprecision on input variables 800 since a progressive transition between input classes is possible. Nevertheless, sharp 801 parameters need to be determined. The model behavior, studied according to the 802 parameters values, is satisfactory and confirms the efficiency of the new design. The 803 relations between variables are well respected and could be correlated to winegrowers 804 'observations. However, for input variables that are directly observed by specialists, it 805 is suggested to use a common frame of reference, to limit the heterogeneity of 806 observations from one expert to another, within the studied area. 807 808 CONCLUSION 809

This paper proposes a new approach, using a fuzzy inference system based

810 computer model, to estimate vine vigor (VIG), thermal pedoclimate (PED), thermal 811 mesoclimate (MES) and precocity (PRE) imparted by soil, parent rock and landscape. 812 Once VIG, PED, MES and PRE are evaluated, they constitute useful indicators to 813 better adapt viticultural practices taking climate characteristics into consideration. 814

Fuzzy logic allows transparency in each step control and analysis during the

815 design of the model. It allows having in only one step, a continuous estimation of the 816 four output variables. It avoids a second step of reclassification through expertise to

36

817 correct output values due to the problem of the sharp transition between classes of the 818 input variables. This is achieved because the model handles linguistic terms 819 implemented by using fuzzy sets that manage the progressiveness of the class change. 820 It allows to combine expert knowledge and data mining making the most of them. 821

Input variables were determined according to measurements or expert

822 evaluations. This last data acquisition involves subjectivity. Even if fuzzy inference 823 systems allow to better manage uncertainty on input variables; this highlights the 824 necessity to check i. the homogeneity within expert’s evaluations and ii. the 825 homogeneity within the studied area before using them as input variables. 826

The method proposed in this paper is generic because reasoning and linguistic

827 rules are non specific to an area and accurately retranscribe expert knowledge 828 concerning relations between raw variables and vine vigor and precocity estimation. 829

Using the new implementation, sharp parameters have been determined. These

830 parameters are easily adapted to local characteristics combining data-mining and 831 expert validation. 832

The method has been tested in different areas and has been validated by end-

833 users. Comparison of output values with measured data would seem ideal but data 834 representing a wide range of situations were not available. A partial validation of the 835 predicted composite variables has been carried out by comparing the model 836 estimations with winegrower's evaluations and the model limits have been discussed. 837

Based on literature and expert knowledge and using fuzzy inference systems,

838 these indicators provide a reliable – supported both by expertise and local data- pre839 diction of vine vigor and precocity of potential development of a plot according to 840 soil, parent-rock and landscape characteristics. 841

Fuzzy inference systems allow to consider uncertainty on data sources thanks

842 to a progressive transition between input variable classes. However the comparison of 843 the system output with an expert or sensory assessment would necessitate more meth-

37

844 odological developments to deal with the output uncertainty, and this is an attractive 845 perspective of work. 846

Another interesting perspective consists in studying the aggregation of these

847 indicators with viticultural practices (e.g. vigor and precocity imparted by rootstock). 848 It will permit to optimize at best the choice of long term practices in relation to envir849 onmental factors and targeted vine development. It is likely to limit corrective prac850 tices, reducing production costs while promoting the production of targeted quality 851 products and maximizing the potential value of a terroir.

852 Acknowledgments: The data used to build the method presented in this paper were 853 collected by the ‘Cellule Terroirs Viticoles’, supported by ONIVINS-Viniflhor (called 854 FranceAgriMer today), Pays de la Loire Region and Interloire (France). The first 855 author received a fellowship from the ‘INRA-SAD’ department and Pays de la Loire 856 Region (France). We are also grateful to Sogrape Vinhos S.A to make data available 857 to this study; G. Barbeau, C. Bockstaller, V. Courtin, R. Morlat and S. Sicard for 858 valuable comments on this work.

859 REFERENCES 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878

38

Acevedo-Opazo C., Tisseyre B., Ojeda H., Ortega-Farias S. and Guillaume S., 2008. Is it possible to assess the spatial variability of vine water status? Journal International des Sciences de la Vigne et du Vin, 42(4): 203-219. Baize, D., 2000. Guide des analyses en pédologie. Ed. INRA, Paris. Barbeau G., Asselin C. and Morlat R., 1998. Estimate of the viticultural potential of the Loire valley "terroirs" according to a vine's cycle precocity index. Bulletin de l'OIV, 71(805/806): 247-262. Bockstaller C. and Girardin P., 2003. How to validate environmental indicators ? Agricultural Systems, 76(2): 639-653. Bodin F. and Morlat R., 2003. Characterizing a vine terroir by combining a pedological field model and a survey of the vine growers in the Anjou Region (France). Journal International des Sciences de la Vigne et du Vin, 37(4). Bodin, F. and Morlat R., 2006. Characterization of viticultural terroirs using a simple field model based on soil depth I. Validation of the water supply regime,

879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 39

phenology and vine vigour, in the Anjou vineyard (France)? Plant and Soil, 281(1/2): 37-54. Bouchon-Meunier B. and Marsala C., 2003. Logique floue, principes, aide à la décision. Ed. Lavoisier, Paris. Carbonneau A., Deloire A. and Jaillard B., 2007. La vigne. Physiologie, terroir, culture. Ed. Dunod, Paris. Carey, V., Archer E., Barbeau G. and Saayman D. (2007). The use of local knowledge relating to vineyard performance to identify viticultural terroirs in stellenbosch and surrounds. Proceedings of the International Workshop on Advances in Grapevine and Wine Research. V. Nuzzo, P. Giorio and C. Giulivo. Leuven 1, International Society Horticultural Science: 385-391. Celette, F., Wery J., Chantelot E., Celette J. and Gary C., 2005. Belowground Interactions in a Vine (Vitis vinifera L.) - Fescue (Festuca arundinacea Shreb.) Intercropping System: Water Relations and Growth. Plant and Soil 276(1): 205-217. Cloquell-Ballester V.-A., Monterde-Díaz R. and Santamarina-Siurana M.-C., 2006. Indicators validation for the improvement of environmental and social impact quantitative assessment. Environmental Impact Assessment Review, 26(1): 79105. Dai, Z. W., Vivin P., Barrieu F., Ollat N. and Delrot S., 2010. Physiological and modelling approaches to understand water and carbon fluxes during grape berry growth and quality development: a review Australian Journal of Grape and Wine Research 16: 70-85. Drissi, R., Goutouly J.-P., Forget D. and Gaudillere J.-P., 2009. Nondestructive Measurement of Grapevine Leaf Area by Ground Normalized Difference Vegetation Index Agronomy Journal 101(1): 226-231. Falcao L. D., Burin V. M., Sidinei Chaves E., Vieira H. J., Brighenti E., Rosier J.-P. and Bordignon-Luiz M. T., 2010. Vineyard altitude and mesoclimate influences on the phenology and maturation of Cabernet Sauvignon grapes from Santa Catarina State. Journal International des Sciences de la Vigne et du Vin, 44(3): 135-150. FisPro User Guide, FisPro: An open source portable software for fuzzy inference systems. http://www.inra.fr/mia/M/fispro/FisPro_EN_doc.html Garcia de Cortazar Atauri I., Brisson N. and Gaudillere J. P., 2009. Performance of several models for predicting budburst date of grapevine (Vitis vinifera L.). International Journal of Biometeorology, 53(4): 317-326. Goulet E., Morlat R., Rioux D. and Cesbron S., 2004. A calculation method of available soil water content: application to viticultural terroirs mapping of the Loire valley. Journal International des Sciences de la Vigne et du Vin, 38(4): 231-235. Goulet, E. and Morlat R., 2010. The use of surveys among wine growers in vineyards of the middle-Loire Valley (France), in relation to terroir studies Land Use Policy 28: 770-782.

934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 40

Goutouly, J.-P., Drissi R., Forget D. and Gaudillère J.-P., 2006. Caractérisation de la vigueur de la vigne par l'indice NDVI mesuré au sol.In: Congrès International des Terroirs Viticoles, Bordeaux-Montpellier. Gras R., Benoit M., Deffontaines J.-P. D., M., Lafarge M., Langlet A. and Osty P.-L., 1989. Le Fait Technique en Agronomie. Ed. INRA, Paris. Guillaume S. and Charnomordic B., 2010. Interpretable fuzzy inference systems for cooperation of expert knowledge and data in agricultural applications using Fispro. In: IEEE International Conference on Fuzzy Systems published in Knowledge Creation Diffusion Utilization, p.18-23, Barcelone. Guillaume S. and Charnomordic B., 2011. Learning interpretable fuzzy inference systems with FisPro. Information Sciences 181(20): 4409-4427. Guillaume S. and Charnomordic B., 2012. Fuzzy inference systems: An integrated modeling environment for collaboration between expert knowledge and data using FisPro, Expert Systems with Applications, 39(10), 8744-8755 Guyot G., 1999. Climatologie de l'environnement. Ed. Dunod, Paris. Homayouni, S., Germain C., Lavialle O., Grenier G., Goutouly J. P., Van Leeuwen C. and Da Costa J. P., 2008. Abundance weighting for improved vegetation mapping in row crops: application to vineyard vigour monitoring Canadian Journal of Remote Sensing 34: S228-S239. Institut Français de la Vigne et du Vin, INRA, Montpellier SupaAgro and Viniflhor, 2007. Catalogue des variétés et clones de vigne cultivés en France. 2ème édition. Jackson D. I. and Lombard P. B., 1993. Environmental and Management Practices Affecting Grape Composition and Wine Quality - A Review. American Journal of Enology and Viticulture, 44(4): 409-430. Jacquet A. and Morlat R., 1997. Characterization of the climatic variability in the Loire Valley vineyard. Influence of landscape and physical characteristics of the environment. Agronomie, 17(9/10): 465-480. Jamagne M., 1967. Bases et techniques d’une cartographie des sols. Annales

d'Agrnomie, 18(Hors-série). Kliewer, W. M., 1975. Effect of root temperature on budbreak, shoot growth, and fruit-set of Cabernet Sauvignon grapevines. American Journal of Enology and Viticulture 26(2): 82-89. Lebon E., 1993. De l'influence des facteurs pédo et mésoclimatiques sur le comportement de la vigne et les caractéristiques du raisin. Thèse Doctorat, Université de Dijon. MacQueen J., 1967. Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, p. 281-297, Berkeley. Martin S. R. and Dunn G. M., 2000. Effect of pruning time and hydrogen cyanamide on budburst and subsequent phenology of Vitis vinifera L. variety Cabernet

989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 41

Sauvignon in central Victoria. Australian Journal of Grape and Wine Research, 6(1): 31-39. Mitchell G., May A. and Mc Donald A., 1995. PICABUE: a methodological framework for the development of indicators of sustainable development. International Journal of Sustainable Development and World Ecology, 2: 104123. Morlat R., 1993. The soil effects on the grapevine root system in several vineyards of the Loire Valley (France). Vitis, 32: 35-42. Morlat, R., 2010. Traité de viticulture de terroir. Ed. Lavoisier, Paris. Morlat R., Guibault P., Rioux D., Asselin C. and Barbeau G., 2001. Terroirs Viticoles : étude et valorisation. Ed. Oenoplurimedia, Avenir Oenologie, Paris. Morlat R. and Lebon E., 1992. Experience of multisite trials for the study of vineyards. Progrès Agricole et Viticole, 109(3): 55-58. Morlat R. and Hardy P., 1987. Résultats concernant les variations de précocité de la vigne dans le Val de Loire. Importance du pédoclimat thermique. Actes du 3ème Symposium sur la physiologie de la vigne, Bordeaux. Munsell-Color-Company, 2000. Munsell soil Color Charts, New-York. NguyenH. T. and Walker. E. A., 1996. A first course in fuzzy logic. Ed. CRC-Press Ollat N., Tandonnet J. P., Bordenave L., Decroocq S., Geny L., Gaudillere J. P., Fouquet R., Barrieu F. and Hamdi S., 2003. Vigour conferred by rootstock: hypotheses and direction for research. Bulletin de l'OIV, 76(869/870): 581-595. Parker A. K., De CortÁZar-Atauri I. G., Van Leeuwen C. and Chuine I., 2011. General phenological model to characterise the timing of flowering and veraison of Vitis vinifera L.. Australian Journal of Grape and Wine Research, 17(2): 206-216. Pouget, R., 1963. Recherches physiologiques sur le repos végétatif de la vigne (Vitis vinifera L.): La dormance des bourgeons et le mécanisme de sa disparition. Thèse ed doctorat, Bordeaux, 245 pp. Piedallu C., Gegout J. C., Bruand A. and Seynave I., 2011. Mapping soil water holding capacity over large areas to predict potential production of forest stands. Geoderma,160(3-4): 355-366. Smart D. R., Schwass E., Lakso A. and Morano L., 2006. Grapevine rooting patterns: A comprehensive analysis and a review. American Journal of Enology and Viticulture, 57(1): 89-104. Sturges H. A., 1926. The choice of a class interval. Journal of the American Statistical Association, 21: 65-66. Tagliavini M. and Marangoni B., 1992. Effects of high root-zone temperature on growth, water relations and mineral uptake of Trebiano grapevine. Proceedings of the 4th International Symposium on Grapevine Physiology, Turin.

1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070

42

Takagi T. and Sugeno M., 1985. Fuzzy Identification of Systems and Its Applications to Modeling and Control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1): 116-132. Tregoat, O., Ollat N., Grenier G. and Leeuwen C. V., 2001. Comparative study of the accuracy and speed of various methods for estimating vine leaf area Journal International des Sciences de la Vigne et du Vin 35(1): 31-39. Valente de Oliveira, J.V., 1999. Semantic constraints for membership functions optimization, IEEE Transactions on Systems, Man and Cybernetics. Part A 29,128–138. Vaudour E., 2003. Les terroirs viticoles. Définitions, caractérisation et protection. Ed. Dunod, Paris. Woodham, R. C. and Alexander D. M., 1966. The effect of root temperature on development of small fruiting sultana vines. Vitis 5: 245-250. Zadeh L. A., 1965. Fuzzy Sets. Information and Control, 8: 338-353. Zelleke, A. and Kliewer W. M., 1980. Effect of root temperature, rootstock and fertilization on bud-break, shoot growth and composition of Cabernet Sauvignon grapevines. Scientia Horticulturae 13(4): 339-347. Zufferey V. and Murisier F., 2006. Terroirs viticoles vaudois et alimentation hydrique de la vigne. Revue Suisse de Viticulture, Arboriculture et Horticulture, 38(5): 283-287.