Kernels - Laval - 2011 - Freakonometrics

A natural way to estimate a density at x from an i.i.d. sample {X1,··· ,Xn} is to count (and then normalized) ..... Beta (independent) bivariate kernel , x=0.0, y=0.0.
5MB taille 3 téléchargements 303 vues
Arthur CHARPENTIER, transformed kernels and beta kernels

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier Université Rennes 1 [email protected] http ://freakonometrics.blog.free.fr/

Statistical seminar, Université Laval, Québec, April 2011

1

Arthur CHARPENTIER, transformed kernels and beta kernels

Agenda • General introduction and kernel based estimation Copula density estimation • Kernel based estimation and bias • Beta kernel estimation • Transforming observations Quantile estimation • • • •

Parametric estimation Semiparametric estimation, extreme value theory Nonparametric estimation Transforming observations

2

Arthur CHARPENTIER, transformed kernels and beta kernels

Moving histogram to estimate a density A natural way to estimate a density at x from an i.i.d. sample {X1 , · · · , Xn } is to count (and then normalized) how many observations are in a neighborhood of x, e.g. |x − Xi | ≤ h,

3

Arthur CHARPENTIER, transformed kernels and beta kernels

Kernel based estimation Instead of a step function 1(|x − Xi | ≤ h) consider so smoother functions, n

n

X X  x − xi  1 1 fbh (x) = Kh (x − xi ) = K , n i=1 nh i=1 h where K(·) is a kernel i.e. a non-negative real-valued integrable function such Z +∞ that K(u) du = 1 so that fbh (·) is a density , and K(·) is symmetric, i.e. −∞

K(−u) = K(u).

4

Arthur CHARPENTIER, transformed kernels and beta kernels

5

Arthur CHARPENTIER, transformed kernels and beta kernels

Standard kernels are 1 • uniform (rectangular) K(u) = 1{|u|≤1} 2 • triangular K(u) = (1 − |u|) 1{|u|≤1} 3 • Epanechnikov K(u) = (1 − u2 ) 1{|u|≤1} 4   1 1 2 • Gaussian K(u) = √ exp − u 2 2π

6

Arthur CHARPENTIER, transformed kernels and beta kernels

Standard kernels are 1 • uniform (rectangular) K(u) = 1{|u|≤1} 2 • triangular K(u) = (1 − |u|) 1{|u|≤1} 3 • Epanechnikov K(u) = (1 − u2 ) 1{|u|≤1} 4   1 1 2 √ • Gaussian K(u) = exp − u 2 2π

7

Arthur CHARPENTIER, transformed kernels and beta kernels

word about copulas A 2-dimensional copula is a 2-dimensional cumulative distribution function restricted to [0, 1]2 with standard uniform margins. Copula (cumulative distribution function)

Level curves of the copula

Copula density

Level curves of the copula

If C is twice differentiable, let c denote the density of the copula. 8

Arthur CHARPENTIER, transformed kernels and beta kernels

Sklar theorem : Let C be a copula, and FX and FY two marginal distributions, then F (x, y) = C(FX (x), FY (y)) is a bivariate distribution function, with F ∈ F(FX , FY ). Conversely, if F ∈ F(FX , FY ), there exists C such that F (x, y) = C(FX (x), FY (y). Further, if FX and FY are continuous, then C is unique, and given by −1 C(u, v) = F (FX (u), FY−1 (v)) for all (u, v) ∈ [0, 1] × [0, 1]

We will then define the copula of F , or the copula of (X, Y ).

9

Arthur CHARPENTIER, transformed kernels and beta kernels

Motivation Example Loss-ALAE : consider the following dataset, were the Xi ’s are loss amount (paid to the insured) and the Yi ’s are allocated expenses. Denote by Ri and Si the respective ranks of Xi and Yi . Set Ui = Ri /n = FˆX (Xi ) and Vi = Si /n = FˆY (Yi ). Figure 1 shows the log-log scatterplot (log Xi , log Yi ), and the associate copula based scatterplot (Ui , Vi ). Figure 2 is simply an histogram of the (Ui , Vi ), which is a nonparametric estimation of the copula density. Note that the histogram suggests strong dependence in upper tails (the interesting part in an insurance/reinsurance context).

10

Arthur CHARPENTIER, transformed kernels and beta kernels

Copula type scatterplot, Loss!ALAE

0.6 0.4 0.2

3

0.0

2 1

log(ALAE)

4

Probability level ALAE

0.8

5

1.0

Log!log scatterplot, Loss!ALAE

1

2

3

4 log(LOSS)

5

6

0.0

0.2

0.4

0.6

0.8

1.0

Probability level LOSS

Figure 1 – Loss-ALAE, scatterplots (log-log and copula type). 11

Arthur CHARPENTIER, transformed kernels and beta kernels

Figure 2 – Loss-ALAE, histogram of copula type transformation. 12

Arthur CHARPENTIER, transformed kernels and beta kernels

Why nonparametrics, instead of parametrics ? In parametric estimation, assume the the copula density cθ belongs to some given family C = {cθ , θ ∈ Θ}. The tail behavior will crucially depend on the tail behavior of the copulas in C Example : Table below show the probability that both X and Y exceed high −1 thresholds (X > FX (p) and Y > FY−1 (p)), for usual copula families, where parameter θ is obtained using maximum likelihood techniques. Clayton

Frank

Gaussian

Gumbel

Clayton∗

max/min

90%

1.93500%

2.73715%

4.73767%

4.82614%

5.66878%

2.93

95%

0.51020%

0.78464%

1.99195%

2.30085%

2.78677%

5.46

99%

0.02134%

0.03566%

0.27337%

0.44246%

0.55102%

25.82

99.9%

0.00021%

0.00037%

0.01653%

0.04385%

0.05499%

261.85

p

Probability of exceedances, for given parametric copulas, τ = 0.5.

13

Arthur CHARPENTIER, transformed kernels and beta kernels

−1 (p), Y FX

FY−1 (p)



> . If the Figure 3 shows the graphical evolution of p 7→ P X > original model is an multivariate student vector (X, Y ), the associated probability is the upper line. If either marginal distributions are misfitted (e.g. Gaussian assumption), or the dependence structure is mispecified (e.g. Gaussian assumption), probabilities are always underestimated.

14

Arthur CHARPENTIER, transformed kernels and beta kernels

Misfitting dependence structure Misfitting margins Misfit margins and dependence

0.00

0.0

0.05

0.2

0.10

0.4

0.15

0.6

0.20

0.25

Student dependence structure, Student margins Gaussian dependence structure, Student margins Student dependence structure, Gaussian margins Gaussian dependence structure, Gaussian margins

1.0

Ratios of exceeding probability

0.8

0.30

Joint probability of exceeding high quantiles

0.6

0.7

0.8

0.9

0.5

1.0

0.6

0.7

0.8

0.9

1.0

Quantile levels

−1 FX (p), Y

FY−1 (p)



Figure 3 – p 7→ P X > > when (X, Y ) is a Student t random vector, and when either margins or the dependence structure is mispectified. 15

Arthur CHARPENTIER, transformed kernels and beta kernels

Kernel estimation for bounded support density Consider a kernel based estimation of density f , fb(x) =

n X 1

nh

i=1

K



x − Xi h



,

where K is a kernel function, given a n sample X1 , ..., Xn of positive random variable (Xi ∈ [0, ∞[). Let K denote a symmetric kernel, then E(fb(0)) =

1 f (0) + O(h) 2

16

Arthur CHARPENTIER, transformed kernels and beta kernels

1.0 0.8 0.0

0.2

0.4

0.6

Density

0.6 0.4 0.2 0.0

Density

0.8

1.0

1.2

Exponential random variables

1.2

Exponential random variables

!1

0

1

2

3

4

5

6

!1

0

1

2

3

4

5

6

Figure 4 – Density estimation of an exponential density, 100 points.

17

Density 0.4

0.6

0.2

0.4

0.0

0.2 0.0

Density

0.6

0.8

0.8

1.0

1.2

1.0

Arthur CHARPENTIER, transformed kernels and beta kernels

0

2

4

6

!0.2

0.0

0.1

0.2

0.3

0.4

0.5

Figure 5 – Density estimation of an exponential density, 10, 000 points. 18

Arthur CHARPENTIER, transformed kernels and beta kernels

1.0 0.8 0.0

0.2

0.4

0.6

Density

0.6 0.4 0.2 0.0

Density

0.8

1.0

1.2

Kernel based estimation of the uniform density on [0,1]

1.2

Kernel based estimation of the uniform density on [0,1]

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 6 – Density estimation of an uniform density on [0, 1]. 19

Arthur CHARPENTIER, transformed kernels and beta kernels

How to get a proper estimation on the border Several techniques have been introduce to get a better estimation on the border, – boundary kernel (Müller (1991)) – mirror image modification (Deheuvels & Hominal (1989), Schuster (1985)) – transformed kernel (Devroye & Györfi (1981), Wand, Marron & Ruppert (1991)) In the particular case of densities on [0, 1], – Beta kernel (Brown & Chen (1999), Chen (1999, 2000)), – average of histograms inspired by Dersko (1998).

20

Arthur CHARPENTIER, transformed kernels and beta kernels

Local kernel density estimators The idea is that the bandwidth h(x) can be different for each point x at which f (x) is estimated. Hence, fb(x, h(x)) =

1 nh(x)

n X

 K

i=1

x − Xi h(x)

 ,

(see e.g. Loftsgaarden & Quesenberry (1965)).

Variable kernel density estimators The idea is that the bandwidth h can be replaced by n values α(Xi ). Hence, fb(x, α) =

n 1X

n

i=1

1 K α(Xi )



x − Xi α(Xi )

 ,

(see e.g. Abramson (1982)).

21

Arthur CHARPENTIER, transformed kernels and beta kernels

The transformed Kernel estimate The idea was developed in Devroye & Györfi (1981) for univariate densities. Consider a transformation T : R → [0, 1] strictly increasing, continuously differentiable, one-to-one, with a continuously differentiable inverse. Set Y = T (X). Then Y has density fY (y) = fX (T −1 (y)) · (T −1 )0 (y). If fY is estimated by fbY , then fX is estimated by fbX (x) = fbY (T (x)) · T 0 (x).

22

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.0

0.1

0.5

0.2

1.0

0.3

1.5

0.4

2.0

Density estimation transformed kernel

0.5

Density estimation transformed kernel

−3

−2

−1

0

1

2

3

0.0

0.2

0.4

0.6

0.8

1.0

Figure 7 – The transform kernel principle (with a Φ−1 -transformation). 23

Arthur CHARPENTIER, transformed kernels and beta kernels

The Beta Kernel estimate The Beta-kernel based estimator of a density with support [0, 1] at point x, is obtained using beta kernels, which yields fb(x) =

n X 1

n

i=1

u 1−u K Xi , + 1, +1 b b





where K(·, α, β) denotes the density of the Beta distribution with parameters α and β, K(x, α, β) =

Γ(α + β) α−1 x (1 − x)β−1 1{x∈[0,1]} . Γ(α)Γ(β)

24

Arthur CHARPENTIER, transformed kernels and beta kernels

10

Gaussian Kernel approach

0

2

1.5

4

6

2.0

8

Density estimation using beta kernels

0.2

0.4

0.6

0.8

1.0

1.0

0.0

4

0.0

6

8

0.5

10

Beta Kernel approach

0.2

0.4

0.6

0.8

1.0

0

2

0.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 8 – The beta-kernel estimate. 25

Arthur CHARPENTIER, transformed kernels and beta kernels

Copula density estimation : the boundary problem Let (U1 , V1 ), ..., (Un , Vn ) denote a sample with support [0, 1]2 , and with density c(u, v), which is assumed to be twice continuously differentiable on (0, 1)2 . If K denotes a symmetric kernel, with support [−1, +1], then for all (u, v) ∈ [0, 1] × [0, 1], in any corners (e.g. (0, 0))

Z

1 1 E(b c(0, 0, h)) = · c(u, v) − [c1 (0, 0) + c2 (0, 0)] 4 2

1

ωK(ω)dω · h + o(h). 0

on the interior of the borders (e.g. u = 0 and v ∈ (0, 1)), 1 E(b c(0, v, h)) = · c(u, v) − [c1 (0, v)] 2

Z

1

ωK(ω)dω · h + o(h). 0

and in the interior ((u, v) ∈ (0, 1) × (0, 1)), 1 E(b c(u, v, h)) = c(u, v) + [c1,1 (u, v) + c2,2 (u, v)] 2

Z

1

ω 2 K(ω)dω · h2 + o(h2 ).

−1

On borders, there is a multiplicative bias and the order of the bias is O(h) (while it is O(h2 ) in the interior).

26

Arthur CHARPENTIER, transformed kernels and beta kernels

If K denotes a symmetric kernel, with support [−1, +1], then for all (u, v) ∈ [0, 1] × [0, 1], in any corners (e.g. (0, 0)) 1

Z V ar(b c(0, 0, h)) = c(0, 0)

K(ω)2 dω

2

0

1 1 + o · nh2 nh2





.

on the interior of the borders (e.g. u = 0 and v ∈ (0, 1)),

Z V ar(b c(0, v, h)) = c(0, v)

1

K(ω)2 dω

 Z

−1

1

K(ω)2 dω

0



1 1 · + o nh2 nh2





.

and in the interior ((u, v) ∈ (0, 1) × (0, 1)),

Z V ar(b c(u, v, h)) = c(u, v)

1 2

K(ω) dω −1

2

1 1 d· + o nh2 nh2





.

27

Arthur CHARPENTIER, transformed kernels and beta kernels

1.0

Estimation of Frank copula

0.8

5

0.6

4

0.4

3

2

0.8

1

0.2

0.6 0.4

0.0

0.2 0 0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

1.0

Figure 9 – Theoretical density of Frank copula. 28

Arthur CHARPENTIER, transformed kernels and beta kernels

1.0

Estimation of Frank copula

0.8

5

0.6

4

0.4

3

2

1

0.2

0.8 0.6 0.4

0.0

0.2 0 0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

1.0

Figure 10 – Estimated density of Frank copula, using standard Gaussian (independent) kernels, h = h∗ . 29

Arthur CHARPENTIER, transformed kernels and beta kernels

Transformed kernel technique Consider the kernel estimator of the density of the (Xi , Yi ) = (G−1 (Ui ), G−1 (Vi ))’s, where G is a strictly increasing distribution function, with a differentiable density. Since density f is continuous, twice differentiable, and bounded above, for all (x, y) ∈ R2 , n     X x − X 1 y − Y i i K fb(x, y) = K , nh2 h h i=1

satisfies E(fb(x, y)) = f (x, y) + O(h2 ), as long as

R

ωK(ω) = 0. And the variance is f (x, y) V ar(fb(x, y)) = nh2

Z

2

2

K(ω) dω

1 +o nh2





,

and asymptotic normality can be obtained,

  √ L nh2 fb(x, y) − E(fb(x, y)) → N (0, f (x, y)

Z

K(ω)2 dω

2 ).

30

Arthur CHARPENTIER, transformed kernels and beta kernels

Since f (x, y) = g(x)g(y)c[G(x), G(y)].

(1)

can be inverted in f (G−1 (u), G−1 (v)) c(u, v) = , g(G−1 (u))g(G−1 (v))

(u, v) ∈ [0, 1] × [0, 1],

(2)

one gets, substituting fb in (2)

b c(u, v) =

1 nh · g(G−1 (u)) · g(G−1 (v))

n X

 K

i=1

−1

G

−1

(u) − G h

−1

(Ui ) G ,

−1

(v) − G h

(Vi )

 , (3)

Therefore, E(b c(u, v, h)) = c(u, v) +

o(h) . −1 −1 g(G (u))g(G (v))

31

Arthur CHARPENTIER, transformed kernels and beta kernels

Similarly, V ar(b c(u, v, h))

"

c(u, v) nh2

=

1 g(G−1 (u))g(G−1 (v))

+

1 1 o g(G−1 (u))2 g(G−1 (v))2 nh2



Z 

K(ω)2 dω

2 #

.

32

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimation of Frank copula

0.8

5

0.6

4

0.4

3

2

0.8

1

0.2

0.6 0.4 0.2 0 0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Figure 11 – Estimated density of Frank copula, using a Gaussian kernel, after a Gaussian normalization. 33

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimation of Frank copula

0.8

5

0.6

4

0.4

3

2

0.8

1

0.2

0.6 0.4 0.2 0 0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Figure 12 – Estimated density of Frank copula, using a Gaussian kernel, after a Student normalization, with 5 degrees of freedom. 34

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimation of Frank copula

0.8

5

0.6

4

0.4

3

2

0.8

1

0.2

0.6 0.4 0.2 0 0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

Figure 13 – Estimated density of Frank copula, using a Gaussian kernel, after a Student normalization, with 3 degrees of freedom. 35

Arthur CHARPENTIER, transformed kernels and beta kernels

Bivariate Beta kernels The Beta-kernel based estimator of the copula density at point (u, v), is obtained using product beta kernels, which yields n     1X u 1−u v 1−v b c(u, v) = + 1 · K Yi , + 1, +1 , K Xi , + 1, n b b b b i=1

where K(·, α, β) denotes the density of the Beta distribution with parameters α and β.

36

Arthur CHARPENTIER, transformed kernels and beta kernels

Beta (independent) bivariate kernel , x=0.0, y=0.0

Beta (independent) bivariate kernel , x=0.2, y=0.0

Beta (independent) bivariate kernel , x=0.5, y=0.0

Beta (independent) bivariate kernel , x=0.0, y=0.2

Beta (independent) bivariate kernel , x=0.2, y=0.2

Beta (independent) bivariate kernel , x=0.5, y=0.2

Beta (independent) bivariate kernel , x=0.0, y=0.5

Beta (independent) bivariate kernel , x=0.2, y=0.5

Beta (independent) bivariate kernel , x=0.5, y=0.5

Figure 14 – Shape of bivariate Beta kernels K(·, x/b+1, (1−x)/b+1)×K(·, y/b+ 1, (1 − y)/b + 1) for b = 0.2. 37

Arthur CHARPENTIER, transformed kernels and beta kernels

Assume that the copula density c is twice differentiable on [0, 1] × [0, 1]. Let (u, v) ∈ [0, 1] × [0, 1]. The bias of b c(u, v) is of order b, i.e. E(b c(u, v)) = c(u, v) + Q(u, v) · b + o(b), where the bias Q(u, v) is Q(u, v) = (1 − 2u)c1 (u, v) + (1 − 2v)c2 (u, v) +

1 [u(1 − u)c1,1 (u, v) + v(1 − v)c2,2 (u, v)] . 2

The bias here is O(b) (everywhere) while it is O(h2 ) using standard kernels. Assume that the copula density c is twice differentiable on [0, 1] × [0, 1]. Let (u, v) ∈ [0, 1] × [0, 1]. The variance of b c(u, v) is in corners, e.g. (0, 0), V ar(b c(0, 0)) =

1 −1 [c(0, 0) + o(n )], nb2

in the interior of borders, e.g. u = 0 and v ∈ (0, 1) V ar(b c(0, v)) =

1 2nb3/2

p

πv(1 − v)

[c(0, v) + o(n−1 )],

38

Arthur CHARPENTIER, transformed kernels and beta kernels

and in the interior,(u, v) ∈ (0, 1) × (0, 1) V ar(b c(u, v)) =

1 4nbπ

p

v(1 − v)u(1 − u)

[c(u, v) + o(n−1 )].

Remark From those properties, an (asymptotically) optimal bandwidth b can be deduced, maximizing asymptotic mean squared error, b∗ ≡

1 1 p · 16πnQ(u, v)2 v(1 − v)u(1 − u)

!1/3 .

Note (see Charpentier, Fermanian & Scaillet (2005)) that all those results can be obtained in dimension d ≥ 2. Example For n = 100 simulated data, from Frank copula, the optimal bandwidth is b ∼ 0.05.

39

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimation of the copula density (Beta kernel, b=0.1) 1.0

Estimation of the copula density (Beta kernel, b=0.1)

0.8

3.0

0.6

2.5

2.0

0.4

1.5 0.8

1.0

0.2

0.6 0.4

0.5 0.0

0.2 0.0 0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

1.0

Figure 15 – Estimated density of Frank copula, Beta kernels, b = 0.1 40

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimation of the copula density (Beta kernel, b=0.05) 1.0

Estimation of the copula density (Beta kernel, b=0.05)

0.8

3.0

0.6

2.5

2.0

0.4

1.5 0.8

1.0

0.2

0.6 0.4

0.5 0.0

0.2 0.0 0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

1.0

Figure 16 – Estimated density of Frank copula, Beta kernels, b = 0.05 41

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

4 3 0

1

2

Density of the estimator

3 0

1

2

Density of the estimator

3 2 1 0

Density of the estimator

Standard Gaussian kernel estimator, n=10000

4

Standard Gaussian kernel estimator, n=1000

4

Standard Gaussian kernel estimator, n=100

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Estimation of the density on the diagonal

Figure 17 – Density estimation on the diagonal, standard kernel. 42

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

4 3 0

1

2

Density of the estimator

3 0

1

2

Density of the estimator

3 2 1 0

Density of the estimator

Transformed kernel estimator (Gaussian), n=10000

4

Transformed kernel estimator (Gaussian), n=1000

4

Transformed kernel estimator (Gaussian), n=100

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Estimation of the density on the diagonal

Figure 18 – Density estimation on the diagonal, transformed kernel. 43

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

4 3 0

1

2

Density of the estimator

3 0

1

2

Density of the estimator

3 2 1 0

Density of the estimator

Beta kernel estimator, b=0.005, n=10000

4

Beta kernel estimator, b=0.02, n=1000

4

Beta kernel estimator, b=0.05, n=100

0.0

0.2

0.4

0.6

0.8

Estimation of the density on the diagonal

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Estimation of the density on the diagonal

Figure 19 – Density estimation on the diagonal, Beta kernel. 44

Arthur CHARPENTIER, transformed kernels and beta kernels

Copula density estimation Gijbels & Mielniczuk (1990) : given an i.i.d. sample, a natural estimate for the normed density is obtained using the transformed sample (FbX (X1 ), FbY (Y1 )), ..., (FbX (Xn ), FbY (Yn )), where FbX and FbY are the empirical distribution function of the marginal distribution. The copula density can be constructed as some density estimate based on this sample (Behnen, Husková & Neuhaus (1985) investigated the kernel method). The natural kernel type estimator b c of c is c(u, v) =

n 1 X

nh2

i=1

 K

u − FbX (Xi ) v − FbY (Yi ) h

,

h

 , (u, v) ∈ [0, 1].

“this estimator is not consistent in the points on the boundary of the unit square.”

45

Arthur CHARPENTIER, transformed kernels and beta kernels

Copula density estimation and pseudo-observations Example : in linear regression, residuals are pseudo observations. εi = H(Xi , Yi ) = Yi − α − βXi

bn (Xi , Yi ) = Yi − α εbi = H bn − βbn Xi Example : when dealing with copulas, ranks Ui , Vi yield pseudo-observations. (Ui , Vi ) = H(Xi , Yi ) = (FX (Xi ), FY (Yi ))

bi , Vbi ) = H bn (Xi , Yi ) = (FbX (Xi ), FbY (Yi )) (U (see Genest & Rivest (1993)). More formally, let X 1 , ..., X n denote a series of observations of X (∈ X), stationary and ergodic. Let H : X → Rd and set εi = H(X i ) (non-observable).

bn then b bn (X i ) are called pseudo-observations. If H is estimated by H εi = H 46

Arthur CHARPENTIER, transformed kernels and beta kernels

b n denote the empirical distribution function of those pseudo-observations, Let K b n (t) = K

n X 1

n

I(b εi ≤ t) where t ∈ Rd .

i=1

Further, if K denotes the distribution function of ε = H(X), then define the empirical process based on pseudo-observations,

 √  b n (t) − K(t) Kn (t) = n K As proved in Ghoudi & Rémillard (1998, 2004), this empirical process converges weakly. Figure ?? shows scatterplots when margins are known (i.e. (FX (Xi ), FY (Yi ))’s), and when margins are estimated (i.e. (FˆX (Xi ), FˆY (Yi )’s). Note that the pseudo sample is more “uniform”, in the sense of a lower discrepancy (as in Quasi Monte Carlo techniques, see e.g. Niederreiter (1992)).

47

Arthur CHARPENTIER, transformed kernels and beta kernels

Scatterplot of pseudo−observations (Ui,Vi)

0.6 0.0

0.0

0.2

0.4

Value of the Vis (ranks)

0.6 0.4 0.2

Value of the Yis

0.8

0.8

1.0

1.0

Scatterplot of observations (Xi,Yi)

0.0

0.2

0.4

0.6

Value of the Xis

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Value of the Uis (ranks)

Figure 20 – Observations and pseudo-observation, 500 simulated observations from Frank copula (Xi , Yi ) and the associate pseudo-sample (FˆX (Xi ), FˆY (Yi )). 48

Arthur CHARPENTIER, transformed kernels and beta kernels

Because samples are more “uniform” using ranks and pseudo-observations, the variance of the estimator of the density, at some given point (u, v) ∈ (0, 1) × (0, 1) is usually smaller. For instance, Figure 21 shows the impact of considering pseudo observations, i.e. substituting FˆX and FˆY to unknown marginal distributions FX and FY . The dotted line shows the density of cˆ(u, v) from a n = 100 sample (Ui , Vi ) (from Frank copula), and the straight line shows the density of cˆ(u, v) from the sample (FˆU (Ui ), FˆV (Vi )) (i.e. ranks of observations).

49

Arthur CHARPENTIER, transformed kernels and beta kernels

1.5 1.0 0.5 0.0

Density of the estimator

2.0

Impact of pseudo−observations (n=100)

0

1

2

3

4

Distribution of the estimation of the density c(u,v)

Figure 21 – The impact of estimating from pseudo-observations. 50

Arthur CHARPENTIER, transformed kernels and beta kernels

Roots of ‘transformed kernel’

51

Arthur CHARPENTIER, transformed kernels and beta kernels

52

Arthur CHARPENTIER, transformed kernels and beta kernels

53

Arthur CHARPENTIER, transformed kernels and beta kernels

54

Arthur CHARPENTIER, transformed kernels and beta kernels

55

Arthur CHARPENTIER, transformed kernels and beta kernels

Using a parametric approach If FX ∈ F = {Fθ , θ ∈ Θ} (assumed to be continuous), qX (α) = Fθ−1 (α), and thus, a natural estimator is qbX (α) = F −1 (α),

b θ

(4)

where θb is an estimator of θ (maximum likelihood, moments estimator...).

56

Arthur CHARPENTIER, transformed kernels and beta kernels

Using the Gaussian distribution A natural idea (that can be found in classical financial models) is to assume Gaussian distributions : if X ∼ N (µ, σ), then the α-quantile is simply q(α) = µ + Φ−1 (α)σ, where Φ−1 (α) is obtained in statistical tables (or any statistical software), e.g. u = −1.64 if α = 90%, or u = −1.96 if α = 95%.

Definition1 Given a n sample {X1 , · · · , Xn }, the (Gaussian) parametric estimation of the α-quantile is qbn (α) = µ b + Φ−1 (α)σ b,

57

Arthur CHARPENTIER, transformed kernels and beta kernels

Using a parametric models Actually, is the Gaussian model does not fit very well, it is still possible to use Gaussian approximation If the variance is finite, (X − E(X))/σ might be closer to the Gaussian distribution, and thus, consider the so-called Cornish-Fisher approximation, i.e. Q(X, α) ∼ E(X) + zα

p

V (X),

(5)

where zbα = Φ

−1

ζ1 −1 ζ2 −1 ζ12 2 3 −1 (α) + [Φ (α) − 1] + [Φ (α) − 3Φ (α)] − [2Φ−1 (α)3 − 5Φ−1 (α)], 6 24 36

where ζ1 is the skewness of X, and ζ2 is the excess kurtosis, i.e. i.e. E([X − E(X)]3 ) E([X − E(X)]4 ) ζ1 = and ζ2 = − 3. E([X − E(X)]2 )2 E([X − E(X)]2 )3/2

(6)

58

Arthur CHARPENTIER, transformed kernels and beta kernels

Using a parametric models Definition2 Given a n sample {X1 , · · · , Xn }, the Cornish-Fisher estimation of the α-quantile is v u n n X u 1 X 1 Xi and σ b=t (Xi − µ b)2 , qbn (α) = µ b + zbα σ b, where µ b= n

i=1

n−1

i=1

and zα = Φ

−1

ζb1 −1 ζb2 −1 ζb12 2 3 −1 (α)+ [Φ (α) −1]+ [Φ (α) −3Φ (α)]− [2Φ−1 (α)3 −5Φ−1 (α)], (7) 6 24 36

where ζb1 is the natural estimator of X , and ζb2 is the natural estimator of p for the skewness √ Pn n(n − 1) n i=1 (Xi − µ b )3 the excess kurtosis, i.e. ζb1 = 3/2 and Pn n−2 (Xi − µ b)2 i=1 Pn   4 n (X − µ b ) i n − 1 ζb2 = (n + 1)ζb20 + 6 where ζb20 = Pn i=1 2 − 3. (n − 2)(n − 3) 2 (Xi − µ b) i=1 59

Arthur CHARPENTIER, transformed kernels and beta kernels

Parametrics estimator and error model Density, theoritical versus empirical

Theoritical Student Fitted lStudent Fitted Gaussian

0.0

0.0

0.2

0.1

0.4

Theoritical lognormal Fitted lognormal Fitted gamma

0.2

0.6

0.3

0.8

Density, theoritical versus empirical

0

1

2

3

4

5

−4

−2

0

2

4

Figure 22 – Estimation of Value-at-Risk, model error.

60

Arthur CHARPENTIER, transformed kernels and beta kernels

Using a semiparametric models Given a n-sample {Y1 , . . . , Yn }, let Y1:n ≤ Y2:n ≤ . . .≤ Yn:n denotes the associated order statistics. If u large enough, Y − u given Y > u has a Generalized Pareto distribution with parameters ξ and β ( Pickands-Balkema-de Haan theorem). If u = Yn−k:n for k large enough, and if ξ> 0, denote by βbk and ξbk maximum likelihood estimators of the Genralized Pareto distribution of sample {Yn−k+1:n − Yn−k:n , ..., Yn:n − Yn−k:n }, βbk b Q(Y, α) = Yn−k:n + ξbk



−b ξk

n (1 − α) k

! −1

,

(8)

An alternative is to use Hill’s estimator if ξ > 0,

b α) = Yn−k:n Q(Y,



−b ξk

n (1 − α) k

k X 1 , ξbk = log Yn+1−i:n − log Yn−k:n . k

(9)

i=1

61

Arthur CHARPENTIER, transformed kernels and beta kernels

On nonparametric estimation for quantiles −1 For continuous distribution q(α) = FX (α), thus, a natural idea would be to consider −1 qb(α) = FbX (α), for some nonparametric estimation of FX .

Definition3 The empirical cumulative distribution function Fn , based on sample {X1 , . . . , Xn } is n X 1 Fn (x) = 1(Xi ≤ x). n

i=1

Definition4 The kernel based cumulative distribution function, based on sample {X1 , . . . , Xn } is n Z x n     X X 1 X − t 1 X − x i i dt = K Fbn (x) = k nh

Z

i=1

−∞

h

n

i=1

h

x

where K(x) =

k(t)dt, k being a kernel and h the bandwidth. −∞

62

Arthur CHARPENTIER, transformed kernels and beta kernels

Smoothing nonparametric estimators Two techniques have been considered to smooth estimation of quantiles, either implicit, or explicit. • consider a linear combinaison of order statistics, The classical empirical quantile estimate is simply Qn (p) =

Fn−1

i n

 

= Xi:n = X[np]:n where [·] denotes the integer part.

(10)

The estimator is simple to obtain, but depends only on one observation. A natural extention will be to use - at least - two observations, if np is not an integer. The weighted empirical quantile estimate is then defined as Qn (p) = (1 − γ) X[np]:n + γX[np]+1:n where γ = np − [np].

63

Arthur CHARPENTIER, transformed kernels and beta kernels

The quantile function in R ●



●● ●● ●●



type=1 type=3 type=5 type=7

●● ●● ●●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ● ● ●● ●● ●● ●●● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●

5

●● ●● ●● ●●● ●● ●●

3

4



4

quantile level

6

6









2

2

quantile level

8

type=1 type=3 type=5 type=7

7

The quantile function in R





0.0

0.2

0.4

0.6

probability level

0.8

1.0

●● ●●

0.0

0.2

0.4

0.6

0.8

1.0

probability level

Figure 23 – Several quantile estimators in R.

64

Arthur CHARPENTIER, transformed kernels and beta kernels

Smoothing nonparametric estimators In order to increase efficiency, L-statistics can be considered i.e. Qn (p) =

n X

Wi,n,p Xi:n =

i=1

n X

Wi,n,p Fn−1

i n

 

i=1

Z =

1

Fn−1 (t) k (p, h, t) dt

(11)

0

where Fn is the empirical distribution function of FX , where k is a kernel and h a bandwidth. This expression can be written equivalently n

Qn (p) =

X i=1

"Z

i n (i−1) n

#

t−p dt X(i) = k h





n  X

IK

i=1



i n

−p h

 i−1

 − IK

n

−p h

 X(i) (12)

Z

x

k (t) dt. The idea is to give more weight to order statistics

where again IK (x) = −∞

X(i) such that i is closed to pn.

65

0

1

2

3

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

quantile (probability) level

Figure 24 – Quantile estimator as wieghted sum of order statistics. 66

0

1

2

3

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

quantile (probability) level

Figure 25 – Quantile estimator as wieghted sum of order statistics. 67

0

1

2

3

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

quantile (probability) level

Figure 26 – Quantile estimator as wieghted sum of order statistics. 68

0

1

2

3

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

quantile (probability) level

Figure 27 – Quantile estimator as wieghted sum of order statistics. 69

0

1

2

3

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

quantile (probability) level

Figure 28 – Quantile estimator as wieghted sum of order statistics. 70

Arthur CHARPENTIER, transformed kernels and beta kernels

Smoothing nonparametric estimators E.g. the so-called Harrell-Davis estimator is defined as n

Qn (p) =

X i=1

"Z

i n (i−1) n

#

Γ(n + 1) y (n+1)p−1 (1 − y)(n+1)q−1 Xi:n , Γ((n + 1)p)Γ((n + 1)q)

• find a smooth estimator for FX , and then find (numerically) the inverse, The α-quantile is defined as the solution of FX ◦ qX (α) = α. If Fbn denotes a continuous estimate of F , then a natural estimate for qX (α) is qbn (α) such that Fbn ◦ qbn (α) = α, obtained using e.g. Gauss-Newton algorithm.

71

Arthur CHARPENTIER, transformed kernels and beta kernels

Improving Beta kernel estimators Problem : the convergence is not uniform, and there is large second order bias on borders, i.e. 0 and 1. Chen (1999) proposed a modified Beta 2 kernel estimator, based on

k2 (u; b; t) =

r where ρb (t) = 2b2 + 2.5 −

  (u)   k bt , 1−t b k

1−t

(u)

, if t ∈ [2b, 1 − 2b] , if t ∈ [0, 2b)

ρb (t), b    kt , if t ∈ (1 − 2b, 1] ,ρ (1−t) (u) b b

t 4b4 + 6b2 + 2.25 − t2 − . b

72

Arthur CHARPENTIER, transformed kernels and beta kernels

Non-consistency of Beta kernel estimators Problem : k(0, α, β) = k(1, α, β) = 0. So if there are point mass at 0 or 1, the estimator becomes inconsistent, i.e. fbb (x)

= =

  1X x 1−x k Xi , 1 + , 1 + , x ∈ [0, 1] n b b   1 X x 1−x k Xi , 1 + , 1 + , x ∈ [0, 1] n b b Xi 6=0,1

=

 X  n − n0 − n1 1 x 1−x k Xi , 1 + , 1 + , x ∈ [0, 1] n n − n0 − n1 b b Xi 6=0,1



(1 − P(X = 0) − P(X = 1)) · f0 (x), x ∈ [0, 1]

and therefore Fbb (x) ≈ (1 − P(X = 0) − P(X = 1)) · F0 (x), and we may have problem finding a 95% or 99% quantile since the total mass will be lower.

73

Arthur CHARPENTIER, transformed kernels and beta kernels

Non-consistency of Beta kernel estimators Gouriéroux & Monfort (2007) proposed fbb (x) (1) b , for all x ∈ [0, 1]. fb (x) = R 1 fbb (t)dt 0

It is called macro-β since the correction is performed globally. Gouriéroux & Monfort (2007) proposed n X kβ (Xi ; b; x) 1 (2) fbb (x) = , for all x ∈ [0, 1]. R1 n kβ (Xi ; b; t)dt i=1

0

It is called micro-β since the correction is performed locally.

74

Arthur CHARPENTIER, transformed kernels and beta kernels

Transforming observations ? In the context of density estimation, Devroye and Györfi (1985) suggested to use a so-called transformed kernel estimate Given a random variable Y , if H is a strictly increasing function, then the p-quantile of H(Y ) is equal to H(q(Y ; p)). An idea is to transform initial observations {X1 , · · · , Xn } into a sample {Y1 , · · · , Yn } where Yi = H(Xi ), and then to use a beta-kernel based estimator, if H : R → [0, 1]. Then qbn (X; p) = H −1 (qbn (Y ; p)). In the context of density estimation fbX (x) = fbY (H(x))H 0 (x). As mentioned in Devroye and Györfi (1985) (p 245), “for a transformed histogram histogram estimate, the optimal H gives a uniform [0, 1] density and should therefore be equal to H(x) = F (x), for all x”.

75

Arthur CHARPENTIER, transformed kernels and beta kernels

Transforming observations ? a monte carlo study Assume that sample {X1 , · · · , Xn } have been generated from Fθ0 (from a familly F = (Fθ , θ ∈ Θ). 4 transformations will be considered – H = Fb (based on a maximum likelihood procedure) θ – H = Fθ0 (theoritical optimal transformation) – H = Fθ with θ < θ0 (heavier tails) – H = Fθ with θ > θ0 (lower tails)

76

0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.8 0.6 0.4 0.2 0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Transformed observations

0.6 0.4 0.2 0.0

Transformed observations

0.8

1.0

Arthur CHARPENTIER, transformed kernels and beta kernels

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

Figure 29 – Fb(Xi ) versus Fθˆ(Xi ), i.e. P P plot.

77

1.4 1.2 1.0



0.6

0.8

Estimated density

1.2 1.0 0.8 0.6

Estimated density

1.4

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 30 – Nonparametric estimation of the density of the Fθˆ(Xi )’s.

78

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimated optimal transformation

4.0

Estimated optimal transformation



3.5

5





4



2.5

● ● ● ●

● ●

3

Quantile





●●

0.80



●●

●●

●●

●●

●●















1

●●

● ●●

● ●●

● ●●







2

2.0



1.5 1.0

Quantile

3.0

● ●

0.85

0.90 Probability level

0.95

1.00

●●●

0.80

●●●

●● ●●●

●●

0.85

●●

●●



● ●●

●●

●●

●●

0.90













0.95

1.00

Probability level

Figure 31 – Nonparametric estimation of the quantile function, Fθˆ−1 (q).

79

0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.8 0.6 0.4 0.2 0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Transformed observations

0.6 0.4 0.2 0.0

Transformed observations

0.8

1.0

Arthur CHARPENTIER, transformed kernels and beta kernels

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

Figure 32 – Fb(Xi ) versus Fθ0 (Xi ), i.e. P P plot.

80

1.4 1.2 1.0



0.6

0.8

Estimated density

1.2 1.0 0.8 0.6

Estimated density

1.4

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 33 – Nonparametric estimation of the density of the Fθ0 (Xi )’s.

81

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimated optimal transformation

Estimated optimal transformation ●

4

4







3



● ●



3

Quantile





● ● ●

0.80

●●●

●●



●●

●●



●●

















1

●●●

● ●●

● ●●

● ●●





2

2



1

Quantile



0.85

0.90 Probability level

0.95

1.00

●●●

0.80

●●●

●●

●●

● ●●

0.85

●●

●●

●●



● ●●







0.90













0.95

1.00

Probability level

Figure 34 – Nonparametric estimation of the quantile function, Fθ−1 (q). 0

82

0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.8 0.6 0.4 0.2 0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Transformed observations

0.6 0.4 0.2 0.0

Transformed observations

0.8

1.0

Arthur CHARPENTIER, transformed kernels and beta kernels

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

Figure 35 – Fb(Xi ) versus Fθ (Xi ), i.e. P P plot, θ < θ0 (heavier tails).

83

1.4 1.2 1.0



0.6

0.8

Estimated density

1.2 1.0 0.8 0.6

Estimated density

1.4

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 36 – Estimation of the density of the Fθ (Xi )’s, θ < θ0 (heavier tails).

84

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimated optimal transformation

12

Estimated optimal transformation



10

10

12







8

● ●

6

Quantile

6











●●●●●

0.80

●●

0.85

●●●

●●●

●●

●●

●●









2

●●● ●●●●

● ●●





4

4



2

Quantile

8



●●●●●

0.90 Probability level

0.95

1.00

0.80

●●

●●● ●●●●

0.85

●●●

●●●

●●



● ●●

0.90

●●











0.95



1.00

Probability level

Figure 37 – Estimation of quantile function, Fθ−1 (q), θ < θ0 (heavier tails).

85

0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.8 0.6 0.4 0.2 0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Transformed observations

0.6 0.4 0.2 0.0

Transformed observations

0.8

1.0

Arthur CHARPENTIER, transformed kernels and beta kernels

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

Figure 38 – Fb(Xi ) versus Fθ (Xi ), i.e. P P plot, θ > θ0 (lighter tails).

86

1.4 1.2 1.0



0.6

0.8

Estimated density

1.2 1.0 0.8 0.6

Estimated density

1.4

Arthur CHARPENTIER, transformed kernels and beta kernels

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 39 – Estimation of density of Fθ (Xi )’s, θ > θ0 (lighter tails).

87

Arthur CHARPENTIER, transformed kernels and beta kernels

Estimated optimal transformation

Estimated optimal transformation ●

3.5

3.5





3.0

3.0



● ●



2.0



●●

0.80

●●



●●

●●

●●

●●

●●













0.85

0.90 Probability level



2.0



● ●



1.0

●●

● ●●

● ●●







1.5

1.5



2.5

Quantile

2.5

● ● ●

1.0

Quantile



0.95

1.00

●●

●●

0.80

●●



● ●●

●●

●●

0.85

●●

● ●●

●●

●●







0.90













0.95

1.00

Probability level

Figure 40 – Estimation of quantile function, Fθ−1 (q), θ > θ0 (lighter tails).

88

Arthur CHARPENTIER, transformed kernels and beta kernels

A universal distribution for losses Buch-Larsen,Nielsen, Guillen, & Bolancé (2005) considered the Champernowne generalized distribution to model insurance claims, i.e. positive variables, (y + c)α − cα where α > 0, c ≥ 0 and M > 0. Fα,M,c (y) = (y + c)α + (M + c)α − 2cα The associated density is then α (y + c)α−1 ((M + c)α − cα ) fα,M,c (y) = . 2 α α α ((y + c) + (M + c) − 2c )

89

Arthur CHARPENTIER, transformed kernels and beta kernels

A Monte Carlo study to compare those nonparametric estimators As in Buch-Larsen,Nielsen, Guillen, & Bolancé (2005), 4 distributions were considered – normal distribution, – Weibull distribution, – log-normal distribution, – mixture of Pareto and log-normal distributions,

90

Arthur CHARPENTIER, transformed kernels and beta kernels

Box−plot for the 11 quantile estimators

Density of quantile estimators (mixture longnormal/pareto) 0.30

R benchmark

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

E Epanechnikov

● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

HD Harrell Davis

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

PDG Padgett



PRK Park

0.15

0.20

Benchmark (R estimator) HD (Harrell−Davis) PRK (Park) B1 (Beta 1) B2 (Beta 2)

Beta1

0.10

density of estimators

0.25

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

MACRO Beta1

●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.05 0.00





● ●





● ● ●

● ●





● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Beta2

● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ●

● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

MICRO Beta1

● ●

● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ●● ● ●



●● ● ● ● ● ●●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ●●

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●

MACRO Beta2

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●







5

10

15

20

MICRO Beta2

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●



Estimated value−at−risk

5

10

15

20

Figure 41 – Distribution of the 95% quantile of the mixture distribution, n = 200, and associated box-plots. 91

Arthur CHARPENTIER, transformed kernels and beta kernels

n= 50 n=100 n=200 n=500



1.5



1.5

1.5



n= 50 n=100 n=200 n=500

2.0

MSE ratio, normal distribution, MACB1 (MACRO−Beta1)

2.0

MSE ratio, normal distribution, B1 (Beta1)

2.0

MSE ratio, normal distribution, HD (Harrell−Davis)

n= 50 n=100 n=200 n=500





● ● ● ●









0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

MSE ratio, normal distribution, PRK (Park)

0.4

0.6

0.8

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

Probability, confidence levels (p)

MSE ratio, normal distribution, B1 (Beta1)

MSE ratio, normal distribution, MACB1 (MACRO−Beta1) 2.0



1.5

● ●



n= 50 n=100 n=200 n=500



1.5



n= 50 n=100 n=200 n=500

2.0

2.0



1.5







● ●

n= 50 n=100 n=200 n=500











0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0

● ●

1.0

● ●

● ●







0.5 0.0

0.5 0.0

0.5



0.0

● ● ●

● ● ●

MSE ratio

1.0





MSE ratio

1.0

MSE ratio



0.0

● ●

0.5 ●

Probability, confidence levels (p)





● ●

0.0

0.5 0.0

0.5



0.0

● ● ●

● ● ●

1.0



1.0





MSE ratio



MSE ratio

1.0

MSE ratio

● ●



0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

Figure 42 – Comparing MSE for 6 estimators, the normal distribution case. 92

Arthur CHARPENTIER, transformed kernels and beta kernels

●● ●

2.0 1.5 ● ●











n= 50 n=100 n=200 n=500



1.0

1.5 ●



MSE ratio





n= 50 n=100 n=200 n=500

1.0

● ●





MSE ratio

1.5



n= 50 n=100 n=200 n=500

1.0

MSE ratio

MSE ratio, Weibull distribution, MICB1 (MICRO−Beta1)

2.0

MSE ratio, Weibull distribution, MACB1 (MACRO−Beta1)

2.0

MSE ratio, Weibull distribution, HD (Harrell−Davis)



● ●



● ● ●





● ●

0.0

0.2

0.6

0.8

1.0

0.5 0.0

0.2

0.4

0.6

0.8

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

MSE ratio, Weibull distribution, PRK (Park)

MSE ratio, Weibull distribution, MACB1 (MACRO−Beta1)

MSE ratio, Weibull distribution, MICB1 (MICRO−Beta1)

2.0 ● ● ●

● ●







1.5



1.5



n= 50 n=100 n=200 n=500





n= 50 n=100 n=200 n=500



1.0

● ●

MSE ratio



1.0

2.0



n= 50 n=100 n=200 n=500

2.0

Probability, confidence levels (p)



1.5



Probability, confidence levels (p)

1.0

MSE ratio

0.4

0.0

0.5 0.0



MSE ratio

0.0

0.5

● ●



● ●



● ● ●





● ●



0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0

0.5 0.0

0.5 0.0

0.0

0.5

● ●



0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

Figure 43 – Comparing MSE for 6 estimators, the Weibull distribution case. 93

Arthur CHARPENTIER, transformed kernels and beta kernels

2.0 ●

n= 50 n=100 n=200 n=500 ●





1.5

n= 50 n=100 n=200 n=500

1.5



1.5

1.5



n= 50 n=100 n=200 n=500

MSE ratio, lognormal distribution, B1 (Beta1)

2.0

MSE ratio, lognormal distribution, MICB1 (MICRO−Beta1)

2.0

MSE ratio, lognormal distribution, MACB1 (MACRO−Beta1)

2.0

MSE ratio, lognormal distribution, HD (Harrell−Davis)

n= 50 n=100 n=200 n=500













● ●





1.0











MSE ratio



●●

1.0





1.0









MSE ratio



MSE ratio

1.0

MSE ratio

● ● ●

● ●







● ●





0.2

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.5 0.0

0.5 0.2



0.0

0.2

0.4

0.6

0.8

1.0



0.0

0.2

0.4

0.6

0.8

1.0

MSE ratio, lognormal distribution, MACB2 (MACRO−Beta2)

MSE ratio, lognormal distribution, MICB2 (MICRO−Beta2)

MSE ratio, lognormal distribution, B2 (Beta2)

0.4

0.6

0.8

Probability, confidence levels (p)

1.5

1.5 ●









● ●

1.0











0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0

● ● ●













● ●

0.0

0.0



0.0

1.0



● ●

MSE ratio



0.5

● ●

0.0



0.2





0.5



0.0



1.0







n= 50 n=100 n=200 n=500



MSE ratio

1.0



n= 50 n=100 n=200 n=500

0.5



n= 50 n=100 n=200 n=500

1.0

1.5



1.5



n= 50 n=100 n=200 n=500

2.0

MSE ratio, lognormal distribution, PRK (Park)

2.0

Probability, confidence levels (p)

2.0

Probability, confidence levels (p)

2.0

Probability, confidence levels (p)

0.5

MSE ratio



0.0

Probability, confidence levels (p)



0.0

0.0

0.5 0.0



0.0

MSE ratio

0.0

0.5





0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

Figure 44 – Comparing MSE for 9 estimators, the lognormal distribution case.

94

Arthur CHARPENTIER, transformed kernels and beta kernels



MSE ratio, mixture distribution, MICB1 (MICRO−Beta1)

n= 50 n=100 n=200 n=500



1.5



1.5

1.5



n= 50 n=100 n=200 n=500

MSE ratio, mixture distribution, B1 (Beta1)

2.0

2.0

2.0



2.0

MSE ratio, mixture distribution, MACB1 (MACRO−Beta1)

n= 50 n=100 n=200 n=500



1.5

MSE ratio, mixture distribution, HD (Harrell−Davis)

n= 50 n=100 n=200 n=500 ●











● ●



● ●

1.0





MSE ratio



1.0





MSE ratio





1.0





MSE ratio

1.0

MSE ratio









● ●













0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0



0.0

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

MSE ratio, mixture distribution, MICB2 (MICRO−Beta2)

MSE ratio, mixture distribution, B2 (Beta2)





n= 50 n=100 n=200 n=500



1.5



n= 50 n=100 n=200 n=500

1.5



1.5

n= 50 n=100 n=200 n=500

2.0

MSE ratio, mixture distribution, MACB2 (MACRO−Beta2) 2.0

MSE ratio, mixture distribution, PRK (Park) 2.0

Probability, confidence levels (p)

2.0

Probability, confidence levels (p)

n= 50 n=100 n=200 n=500











● ●









● ●



1.0









1.0



MSE ratio



1.0



MSE ratio

1.0



MSE ratio











● ● ●



0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0



0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0

0.5 0.0

0.5 0.0

0.5 0.0

0.0

0.5





0.0

1.0

Probability, confidence levels (p)



MSE ratio



0.0

Probability, confidence levels (p)



1.5

0.5

0.5 ●

0.0

0.0



0.0

0.0

0.5 0.0

0.0

0.5





0.0

0.2

0.4

0.6

0.8

Probability, confidence levels (p)

1.0



0.0

0.2

0.4

0.6

0.8

1.0

Probability, confidence levels (p)

Figure 45 – Comparing MSE for 9 estimators, the mixture distribution case.

95

Arthur CHARPENTIER, transformed kernels and beta kernels

Some references Charpentier, A. , Fermanian, J-D. & Scaillet, O. (2006). The estimation of copulas : theory and practice. In : Copula Methods in Derivatives and Risk Management : From Credit Risk to Market Risk. Risk Book Charpentier, A. & Oulidi, A. (2008). Beta Kernel estimation for Value-At-Risk of heavy-tailed distributions. in revision Journal of Computational Statistics and Data Analysis. Charpentier, A. & Oulidi, A. (2007). Estimating allocations for Value-at-Risk portfolio optimzation. to appear in Mathematical Methods in Operations Research. Chen, S. X. (1999). A Beta Kernel Estimator for Density Functions. Computational Statistics & Data Analysis, 31, 131-145. Devroye, L. & Györfi, L. (1981). Nonparametric Density Estimation : The L1 View. Wiley. Gouriéroux, C., & Montfort, A. 2006. (Non) Consistency of the Beta Kernel Estimator for Recovery Rate Distribution. CREST-DP 2006-32.

96