A nonlinear mixed effects model of plant growth and estimation via

Laboratory of Mathematics Applied to Systems,´Ecole Centrale Paris . . Context . . • Strong genetic variability ... N3(0,Σb,p,r) if n = 0,. N2(0,Σb,p) if n ≥ 1. Σb,p,r =.
250KB taille 8 téléchargements 198 vues
A nonlinear mixed effects model of plant growth and estimation via stochastic variants of the EM algorithm Charlotte Baey, Samis Trevezas, Paul-Henry Courn`ede ´ Laboratory of Mathematics Applied to Systems, Ecole Centrale Paris .

.

. Context

.

Stochastic variants of the EM based on a sequence of MCMC simulations {ϕk,(m)}m=1,...,mk when the E-step is analytically intractable:

• Strong

genetic variability between plants of the same variety and locally varying environmental effects in a given field

mk ( ) ∑ 1 (k) • MCMC-EM: tˆ = t y , ϕk,(m) mk m=1 ) ] [( mk ( ) ∑ 1 k,(m) (k−1) (k) (k−1) ˆ ˆ • SAEM: tˆ t y, ϕ −t =t + γk mk m=1

→ Development of highly different neighboring plants • This

inter-individual variability (IIV) can have an impact at the agrosystem level

• However,

Approximation of the E-step

. .

current practices in plant growth models rarely take it into

account: I calibration is often based on averaged individuals I extrapolation of individual-based models to the population scale still at its early stages → Nonlinear mixed models are a natural framework for this type of repeated and correlated data

. MCMC simulations

A MCMC sequence {ϕk,(m)}i=1,...,mk is generated for each plant i. The following MCMC procedures were compared: H HH H

. Model

k,(m) (m) (m) N (ϕi,j , λi,j [Σi ]j,j )

k,(m) (m) (m) NP (ϕi , λi Σi )

2

Simulated data s = 50 plants simulated, with P = 3 random parameters. True Full

SAEM Est.

β0 σ0 β1 σ1 β2 σ2 σb2 σp2 ρ σr2

ξi ∼ NP (0, Γ),

1.7 0.15 -3 0.5 1.45 0.15 0.15 0.15 0.67 0.15

1.704 0.154 -2.906 0.573 1.491 0.123 0.202 0.083 0.534 0.139

1.698 0.148 -2.918 0.582 1.500 0.115 0.203 0.084 0.529 0.123

1.7

. Maximum likelihood estimation

Est.

Est.

SE

SE

0.021 1.698 0.021 1.696 0.020 0.015 0.148 0.015 0.146 0.016 0.082 -2.916 0.082 -2.906 0.085 0.058 0.579 0.059 0.568 0.060 0.016 1.499 0.018 1.495 0.017 0.011 0.116 0.014 0.114 0.013 0.006 0.203 0.006 0.228 0.006 0.002 0.084 0.003 0.038 0.003 0.014 0.529 0.014 0.653 0.041 0.025 0.127 0.035 0.128 0.039

1.5

1.5

1.4

1.4

• Best

results with MH and global adaptive scaling • SE from FIM consistent with SE from parametric bootstrap • Higher variability between different runs from SAEM

0.16 0.14 0.12 0.1 0.08 0.06 0.04

1.3 0 20 40 60 80 100

0 36 72 108 144 180

0.16 0.14 0.12 0.1 0.08 0.06 0.04 0 30 60 90 120150180

0 20 40 60 80 100

(a) β2 from MCMC-EM (left) and SAEM (right)

(b) σ2 from MCMC-EM (left) and SAEM (right)

Real data s = 18 plants

1.1

0.15

1.0

0.12

0.9

0.09

0.8

(2)

SAEM MCMC-EM Param Est. SE Est. SE β0 2.819 0.072 2.885 0.071 σ0 0.304 0.051 0.303 0.051 β1 -3.204 0.139 -3.251 0.139 σ1 0.588 0.098 0.584 0.103 β2 0.646 1.07e-04 0.620 0.004 σ2 0.0004 7.53e-05 0.015 0.002 σb2 1.208 0.071 1.214 0.083 σb2 1.373 0.081 1.378 0.094 ρ 0.976 0.002 0.976 0.002 σr2 3.01 1.155 3.036 1.161

For this class of models, the EM algorithm can be simply expressed according to sufficient statistics t: compute t (k) = Eθk (t(y , ϕ) | y ) → can be difficult to compute update θ according to ∇a(θ) = t . (k)

. . . Confidence intervals

= Covθ (t(y , ϕ)) − Covθ (t(y , ϕ) | y )

Bootstrap

1.6

1.3

MLE can be obtained with stochastic variants of the ExpectationMaximization (EM) algorithm. Moreover, the complete data likelihood belongs to the exponential family:

Asymptotic CI can be obtained using Louis’s method. The observed FIM is given by: I (θ, y ) = Ic (θ, y ) − Im(θ, y ) (3)

SE

MCMC-EM

1.7

1.6

Likelihood of the observed data y is expressed according to the likelihood of the complete data (y , ϕ): ∫ ∫ L(y ; θ) := f (y , ϕ; θ) d ϕ = f (y | ϕ; θ)f (ϕ; θ) dϕ. (1)

0.06

0.7 0.6

0.03

0.5

0.0 0 30 60 90 120 150 180

0 30 60 90 120 150 180

β2 and σ2 from: SAEM (↑) and MCMC-EM (↓) 1.1

0.15

1.0

0.12

0.9

0.09

0.8 0.06

0.7 0.6

0.03

0.5

0.0 0

20 40 60 80 100

0

20 40 60 80 100

. . . Conclusion • Results

from both algorithms are consistent, and SE from FIM are correctly approximated • SAEM runs faster but has a higher variability between independent runs • Different behaviours when the variance of a random effect decreases → compare automated MCMC-EM and automated SAEM → take into account the case of fixed-effects only (null variances) → more tests on real data

expression for Covθ (t(y , ϕ))

• Covθ (t(y , ϕ)

j = 1, ..., P

. Results

. .

• Analytic

NP (β , Γ )

. .

with β the vector of fixed effects and Γ the covariance matrix.

• M-step:

(k) (k) N (βj , σj ),

(k)

For the adaptive RW, we compared the classical value = 2.38 /P with a global adaptive scheme for MH, and we used a componentwise adaptive scaling for hGs.

Second-stage: inter-individual variation

• E-step:

hybrid Gibbs sampler

(m) λi

with ϕi the vector of parameters specific to plant i, Gn the vector-valued function from the Greenlab model, and where b, p, r stand for blades, petioles and root respectively.

f (y , ϕ; θ) = h(y , ϕ) exp {⟨s(θ), t(y , ϕ)⟩ − a(θ)}

Metropolis-Hastings (k)

Adaptive random walk

Population-based version of the Greenlab model, which can be seen as a two-stage hierarchical model:

ϕi = β + ξi ,

Algo

Prop Marginal

. .

First-stage: intra-individual variation Let us denote y = (yi,n)1≤i≤s, 1≤n≤ni the log-values of the observed biomasses of organs of rank n for plant i. ( ) Σb,p 0 yi,n = log Gn(ϕi ) + εi,n, Σb,p,r = , 2 { 0 σr ) ( 2 N3(0, Σb,p,r ) if n = 0, σb ρσb σp εi,n ∼ . Σb,p = N2(0, Σb,p ) if n ≥ 1. 2 ρσb σp σp

HH HH H HH HH HH H HH H

| y ) involves the same kind of conditional expectation than

the E-step . .