Goodness-of-fit tests based on -entropy differences

S(p) = − ∫ p(x) log p(x)dx, Shannon entropy of p ;. ▷ Pm,σ the set of p.d.f. ..... its exponential (or Maximum Entropy – ME) parameters (λ1,...,λJ). Both coordinate ...
209KB taille 3 téléchargements 285 vues
Goodness-of-fit tests based on φ-entropy differences J-F Bercher1, V. Girardin2, J. Lequesne2, P. Regnault3 Laboratoire d’informatique Gaspard-Monge, ESIEE, Marne-la-Vallée, FRANCE 2 Laboratoire de Mathématiques N. Oresme Université de Caen – Basse Normandie, FRANCE 3 Laboratoire de Mathématiques de Reims Université de Reims Champagne-Ardenne, FRANCE

MaxEnt 2014 - 25/09/14

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

1 / 21

Introduction Vasicek entropy-based normality test Paradigm of GOF test via entropy differences Maximizing (h, φ)-entropies under moment constraints Maximum φ-entropy distributions Scale-invariant entropies A Pythagorean equality for Bregman divergence Parametric models of MaxEnt distributions... ... for Shannon entropy ... for Tsallis entropy ... for Burg entropy Goodness-of-fit tests for... ... an exponential family ... an q-exponential family

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

2 / 21

Introduction

Vasicek entropy-based normality test

Vasicek entropy-based normality test Vasicek (1976) introduced a goodness-of-fit procedure for testing the normality of uncategorical data based on the maximum entropy property N (m, σ) = Argmax p∈Pm,σ S(p), with ◮ ◮ ◮

N (m, σ), the Gaussian distribution with mean m and variance σ 2 ; Z S(p) = − p(x) log p(x)dx, Shannon entropy of p ; Pm,σ the set of p.d.f. with mean m and variance σ 2 .

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

3 / 21

Introduction

Vasicek entropy-based normality test

Vasicek entropy-based normality test Vasicek (1976) introduced a goodness-of-fit procedure for testing the normality of uncategorical data based on the maximum entropy property N (m, σ) = Argmax p∈Pm,σ S(p). Precisely, from an n-sample (X1 , . . . , Xn ) drawn according to the p.d.f. p with finite variance, for testing H0 : p ∈ M = {N (m, σ), m ∈ R, σ > 0} against H1 : p 6∈ M, the Vasicek test statistics is   d − S(N (m b n, σ Tn = exp S(p) bn )) , n

n i h n X d = 1 (X − X ) is a consistent estimator of log where S(p) (i+m) (i−m) n n 2m i=1 S(p). Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

3 / 21

Introduction

Paradigm of GOF test via entropy differences

Paradigm of GOF test via entropy differences The main theoretical ingredients of Vasicek GOF test are : ◮

Maximum entropy property : the pdf in the null-hypothesis model M maximizes Shannon entropy under moment constraints ;



Pythagorean equality : for any p ∈ Pm,σ , we have K(p|Nm,σ ) = S(Nm,σ ) − S(p);



Estimation of Shannon entropy.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

4 / 21

Introduction

Paradigm of GOF test via entropy differences

Paradigm of GOF test via entropy differences The main theoretical ingredients of Vasicek GOF test are : ◮

Maximum entropy property : the pdf in the null-hypothesis model M maximizes Shannon entropy under moment constraints ;



Pythagorean equality : for any p ∈ Pm,σ , we have K(p|Nm,σ ) = S(Nm,σ ) − S(p);



Estimation of Shannon entropy.

Numerous authors adapted Vasicek’s procedure ◮

to various parametric models of maximum entropy distributions, where entropy stands for Shannon (overwhelming majority) or Rényi entropies ;



introducing other estimators for the entropy (of both the null-hypothesis distribution and actual distribution).

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

4 / 21

Introduction

Paradigm of GOF test via entropy differences

Extending the theoretical background

Parametric models for which Vasicek’s procedure-type can be developed by means of Shannon entropy maximization are well identified : exponential families ; see Lequesne’s PhD thesis. We investigate here the (informational geometric) shape and properties of parametric models for which entropy-based GOF tests may be developed, through the generalization of ◮

Maximum entropy property for φ-entropy functionals ;



Pythagorean property for Bregman divergence associated to φ-entropy functionals ;



φ-entropy estimation procedure adapted to the involved parametric models.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

5 / 21

Maximizing (h, φ)-entropies under moment constraints

Maximum φ-entropy distributions

(h, φ)-entropies The (h, φ)-entropy of a pdf p with support S is Sh,φ (p) := h(Sφ (p)), where Z Sφ (p) = − φ(p(x))dx, S

with ◮ ◮

φ : R+ → R a twice continuously differentiable convex function ; h : R → R a real function.

(h, φ)–entropy Shannon Ferreri Burg Itakura-Saito Tsallis Rényi L2 -norm Havrda and Charvat Basu-Harris-Hjort-Jones Ph. Regnault (LMR-URCA)

h(y ) y y y y ±(q − 1)−1 (y − 1) ±(1 − q)−1 log y y y y GOF tests via φ-entropy differences

φ(x) x log x (1 + rx) log(1 + rx)/r − log x x − log x + 1 ∓x q , q > 0, q 6= 1 ∓x q , q > 0, q 6= 1 x2 (1 − 21−r )−1 (x r − x) 1 − (1 + 1/r )x + x 1+r /r MaxEnt 2014 - 25/09/14

6 / 21

Maximizing (h, φ)-entropies under moment constraints

Maximum φ-entropy distributions

Maximum φ-entropy distributions For increasing functions h, maximizing Sh,φ (p) amounts to maximizing Sφ (p), which is solved by :

Theorem – Girardin (1997) Let M0 = 1, M1 , . . . , MJ linearly independent measurable functions defined on an interval S. Let m = (1, m1 , . . . , mJ ) ∈ RJ+1 and p0 ∈ P(m, M), where P(m, M) = {p : Ep (Mj ) = mj ,

j ∈ {0, . . . , J}} .

If there exists (a unique) λ = (λ0 , . . . , λJ ) ∈ RJ+1 such that φ′ (p0 ) =

J X

λj Mj ,

j=0

then Sφ (p0 ) ≥ Sφ (p),

p ∈ P(m, M).

The converse holds if S is compact. Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

7 / 21

Maximizing (h, φ)-entropies under moment constraints

Maximum φ-entropy distributions

Parametric models as families of MaxEnt distributions Given a parametric family M = {p(.; θ), θ ∈ Θ ⊆ Rd } of distributions supported by S, we look for ◮

φ a convex function from R+ to R,



M1 , . . . , MJ measurable functions from S to R with M0 = 1, M1 , . . . , MJ linearly independent,

such that for any θ, a unique λ ∈ RJ+1 exists satisfying   J X p(.; θ) = φ′−1  λj Mj  . j=0

Fortunately, requiring the entropy functionals to satisfy some natural properties allows the search to be drastically restricted.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

8 / 21

Maximizing (h, φ)-entropies under moment constraints

Scale-invariant entropies

Scale-invariant entropies Definitions ◮

An entropy functional S is said to be scaled-invariant if functions a and b exist, with a > 0 non-increasing such that S(µpµ ) = a(µ)S(p) + b(µ) for all µ ∈ R, where pµ (x) = p(µx).



Two entropy functionals S and e S are said to be equivalent for maximization if S(p) > S(q) iff e S(p) > e S(q).

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

9 / 21

Maximizing (h, φ)-entropies under moment constraints

Scale-invariant entropies

Scale-invariant entropies Definitions ◮

An entropy functional S is said to be scaled-invariant if functions a and b exist, with a > 0 non-increasing such that S(µpµ ) = a(µ)S(p) + b(µ) for all µ ∈ R, where pµ (x) = p(µx).



Two entropy functionals S and e S are said to be equivalent for maximization if S(p) > S(q) iff e S(p) > e S(q).

Theorem – Kosheleva (1998)

If an entropy functional is scale-invariant, then it is equivalent for maximization to one of the functionals Z Z Z 1 − p(x) log p(x)dx, p(x)q dx, log p(x). 1−q S S S Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

9 / 21

Maximizing (h, φ)-entropies under moment constraints

A Pythagorean equality for Bregman divergence

A Pythagorean equality for Bregman divergence The Bregman divergence (or distance) Dφ (p|q) associated to the φ-entropy of a distribution p with respect to another q with same support S is Z φ′ (q(x))[p(x) − q(x)]dx. Dφ (p|q) = Sφ (q) − Sφ (p) − S

Entropy Shannon Tsallis Burg

Associated Z Bregman divergence p(x) p(x) log K(p|q) = dx, q(x) S Z Z  qq(x)q−1 + p(x)q−1 p(x)dx, Tq (p|q) = (1 + q) q(x)q dx − S S Z  p(x) p(x) − log −1 . B(p|q) = q(x) q(x) S

Proposition Let p0 ∈ P(m, M) satisfy φ′ (p0 ) =

PJ

j=0

λj Mj for some λ ∈ RJ+1 . Then

Dφ (p|p0 ) = Sφ (p0 ) − Sφ (p), Ph. Regnault (LMR-URCA)

p ∈ P(m, M).

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

10 / 21

Parametric models of MaxEnt distributions...

... for Shannon entropy

Exponential families maximize Shannon entropy Given an exponential family M = {p(.; λ), λ ∈ Λ ⊆ RJ+1 }, the p.d.f.   J X p(x, λ) = exp  λj Mj (x) + λ0  , x ∈ S, j=1

maximizes Shannon entropy for the moment constraints MR1 , . . . ,P MJ . Note that λ0 = −ψ(λ1 , . . . , λJ ), with ψ(λ1 , . . . , λJ ) = log exp( λj Mj (x))dx. Parametric model

Density ∝

[0; 1]

Beta B(a, b), a, b > 0

x a (1 − x )b

[0; 1]

Alpha A(a, b, c), a, b, c > 0

Support

[0; ∞[

Exponential E(λ), λ > 0

[0; ∞[

Gamma G(λ, N), λ, N > 0

[0; ∞[

Beta prime B ′ (a, b), a, b > 0

[0; ∞[

Pareto type I PI (c), c > 0

[0; ∞[

Planck PL(a, b), a > 0, b > 1

R

Normal N (m, σ)

Ph. Regnault (LMR-URCA)

x a (1 − x )b e −cx e −λx x N−1 e −λx x a−1 (1 + x )−a−b x −c−1 x −b e −b/x 2 2 e −(x−m) /2σ

GOF tests via φ-entropy differences

Moment function(s) M1 (x ) = log(x ) M2 (x ) = log(1 − x ) M1 (x ) = log(x ) M2 (x ) = log(1 − x ) M3 (x ) = x M1 (x ) = x M1 (x ) = x M2 (x ) = log(x ) M1 (x ) = log(x ) M2 (x ) = log(1 + x ) M1 (x ) = log(x ) M1 (x ) = log(x ) M2 (x ) = 1/x M1 (x ) = x M2 (x ) = x 2

MaxEnt 2014 - 25/09/14

11 / 21

Parametric models of MaxEnt distributions...

... for Shannon entropy

Dual coordinate systems of exponential families Any distribution p ∈ M, where M is an exponential family, can be indexed whether by ◮

its moment (or expectation) parameters (m1 , . . . , mJ ) or



its exponential (or Maximum Entropy – ME) parameters (λ1 , . . . , λJ ).

Both coordinate systems are linked through the relation S(p) = −

J X j=1

λj mj + ψ(λ),

p ∈ M.

Precisely, for j ∈ {1, . . . , J},

Ph. Regnault (LMR-URCA)

λj

=

mj

=

∂ (−S(p(.; m))), ∂mj ∂ ψ(λ). ∂λj

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

12 / 21

Parametric models of MaxEnt distributions...

... for Tsallis entropy

q-exponential families maximize Tsallis entropy Given the the e q-exponential family M = {p(.; λ), λ ∈ Λ}, with e q = 2 − q, the p.d.f given by 1     q−1 J J X X ∝ expeq  λj Mj (x) , x ∈ S, (1) p(x; λ) =  λj Mj (x) + λ0  j=1

j=1

maximizes q-Tsallis entropy for the moment constraints M1 , . . . , MJ , where the 1

e q-exponential function expeq is given by expeq (x) = (1 + (1 − e q)x)+1−eq .

Proposition

In (1), the parameter λ0 can be expressed (locally) as a differentiable function ψ of (λ1 , . . . , λJ ) such that Z ∂ Mj (x)Eq (p)(x)dx, j ∈ {1, . . . , J}, ψ(λ1 , . . . , λJ ) = ∂λj S where Eq (p) is the q-escort distribution associated to p, given by p(x)q Eq (p)(x) = R , x ∈ S. q S p(x) dx Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

13 / 21

Parametric models of MaxEnt distributions...

... for Tsallis entropy

Some examples Support

Parametric model

[0; 1] [0; ∞[

Sub-Beta SB(u), u ∈ [0, 1] Pareto type II PII (a), a > 0

[0; ∞[

Pareto type IV PIV (a), a > 0

R

Student S(µ, σ), µ ∈ R, σ > 0

Moment function(s) M1 (x ) = x M1 (x ) = x

Density ∝

q

(1 − ux )−2 (1 + ax )−c−1

1 2 c c+1 c c+1

M1 (x ) = x k

ν−1 ν+1

M1 (x ) = x M2 (x ) = x 2

(1 + ax k )−c−1    − ν+1 2 x−µ 2 1 1+ ν σ

Particularly, the non-standard Student distributions with (fixed) degree of freedom ν > 2, location parameter µ ∈ R and scale parameter σ > 0, given by "  2 #− ν+1 2 1 Γ((ν + 1)/2) 1 x −µ , p(x; µ, σ) = √ 1+ νπσ Γ(ν/2) ν σ

x ∈ R,

maximize Tsallis entropy and hence Rényi entropy with parameter q = (ν − 1)/(ν + 1) for the algebraic moment functions M1 (x) = x and M2 (x) = x 2 . Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

14 / 21

Parametric models of MaxEnt distributions...

... for Tsallis entropy

Dual coordinate systems Any distribution p ∈ M, where M is a (2 − q)-exponential family, can be indexed whether by ◮

its expectation parameters (m1 , . . . , mJ ) or



its e q-exponential (or ME) parameters (λ1 , . . . , λJ ).

Both coordinate systems are linked through the relation   J   X 1 Tq (p) = λj mj − ψ(λ) − 1 .  1−q  j=1

Particularly, for j ∈ {1, . . . , J}, λj =

Ph. Regnault (LMR-URCA)

∂ (1 − q)Tq (p(.; m))). ∂mj

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

15 / 21

Parametric models of MaxEnt distributions...

... for Burg entropy

Inverse polynomial families maximize Burg entropy The densities of Pareto type III distributions with fixed parameter δ > 0   x δ −1 pδ (x; σ) ∝ 1 + , σ

x ∈ R+ ,

σ > 0,

and Cauchy distributions "

p(x; µ, σ) ∝ 1 +



x −µ σ

2 #−1

,

x ∈ R,

µ ∈ R, σ > 0,

are natural candidates for maximizing Burg entropy for the moment functions M1 (x) = x δ and M1 (x) = x, M2 (x) = x 2 respectively. Unfortunately, these algebraic moments are infinite for these distributions. One may avoid the infiniteness of the moment constraints by truncating the tail of the distributions, thus restricting their support S to a bounded interval – work in progress. Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

16 / 21

GOF tests for...

GOF tests : the theoretic background Let M = {p(.; λ), λ ∈ Λ} be a parametric family of MaxEnt distributions for constraints functions M1 , . . . , MJ . Let (X1 , . . . , Xn ) be an n-sample drawn according to a p.d.f. p satisfying Ep (Mj ) < ∞, j ∈ {1, . . . , J}. For testing H0 : p ∈ M against H1 : p 6∈ M, let the test statistics be

where ◮



\ bn )) − S bn := Sφ (p(.; λ T φ (p)n ,

bn is the ME estimator (MEE), i.e., the ME parameter corresponding to the λ P (j) b n given by m b n = n1 ni=1 Mj (Xi ), empirical moment estimator m j ∈ {1, . . . , J} ; \ S φ (p)n is some non-parametric estimator of the φ-entropy of the sample.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

17 / 21

GOF tests for...

... an exponential family

P For an exponential family M = {p(.; λ) = exp( λj Mj − ψ(λ)), λ ∈ Λ}, ◮



bn equals the (Quasi)-Maximum Likelihood Estimator (QMLE) of the MEE λ λ∗ , where λ∗ = Argmin λ∈Λ K(p|p(.; λ)) ;

spacing-based estimator of Shannon entropy S(P) (Tarasenko (1968), Vasicek (1976)) : n  n X bn,k = 1 S (X(i+k) − X(i−k) ) , log n k i=1

where X(1) ≤ · · · ≤ X(n) is the order statistics of the sample and k ∈ {1, . . . , n − 1}. bn,k is strongly consistent as n, k → ∞, with k/n → 0. S

bn = S(p(.; λ bn )) − S bn,k is consistent. Hence, the GOF test with statistics T Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

18 / 21

GOF tests for...

... an exponential family

P For an exponential family M = {p(.; λ) = exp( λj Mj − ψ(λ)), λ ∈ Λ}, ◮



bn equals the (Quasi)-Maximum Likelihood Estimator (QMLE) of the MEE λ ∗ λ , where λ∗ = Argmin λ∈Λ K(p|p(.; λ)) ; the k-Nearest Neighbor (kNN) estimator of Shannon entropy S(P) is bn,k = − 1 S n

n X i=1

log b pn,k (Xi ),

exp(ψ(k)) , with ρk (Xi ) is the distance from Xi to its 2(n − 1)ρk (Xi ) k-th closest neighbor in the sample. bn,k is strongly consistent as n, m → ∞, with m/n → 0. S

where b pn,k (Xi ) =

bn = S(p(.; λ bn )) − S bn,k is consistent. Hence, the GOF test with statistics T Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

18 / 21

GOF tests for...

... an q-exponential family

  1/(q−1) P J λ M , λ ∈ Λ , For a e q-exponential family M = p(.; λ) = j j j=0 ◮



bn is no more equal to the QMLE of λ∗ , nor to the the MEE λ (Quasi)-Maximum of logeq -Likelihood Estimator, but can be derived directly from the relation ∂ (j) b λn = , (1 − q)Tq (p(.; m)) ∂mj bn m=m P (j) b n = n1 ni=1 Mj (Xi ). where m   b n,k = 1 bIn,k − 1 where the kNN estimator of Tsallis entropy Tq (P) is T 1−q n

X bIn,k = 1 n i=1

with Gk =



Γ(k+1−q) Γ(k)

1/(1−q)



Gk 2(n − 1)ρk (Xi )

1−q

,

.

b n,k is strongly consistent as n, m → ∞, Leonenko et al. (2008) proved that T with m/n → 0, under mild assumption on q < 1.

bn = Tq (p(.; λ bn )) − T b n,k is consistent. Hence, the GOF test with statistics T Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

19 / 21

GOF tests for...

... an q-exponential family

Example : GOF test to non-standard Student p.d.f.

Poster Session 3 (this afternoon) : J. Lequesne. A goodness-of-fit test for Student distributions based on Rényi entropy.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

20 / 21

Some references

Some references



J-F Bercher (2008) On some entropy functionals derived from Rényi information divergence. Information Sciences, 178(12), 2489-2506.



V. Girardin (1997) Methods of Realization of Moment Problems with Entropy Maximization. In Distributions with given marginals and moment problems, edited by V. Benes and J. Stepan, Kluwer Academic Publishers.



O.M .Kocheleva (1998) Symmetry-group justification of maximum entropy method and generalized maximum entropy methods in image processing. In Maximum Entropy and Bayesian Methods, edited by G. J. Erickson and J. T. Rychert and C. R. Smith, Foundamental Theories of Physics.



Leonenko (2008) A Class of Rényi information estimators for multidimentional densities. The Annals of Statistics, 36(5), 2153-2182.



J. Lequesne (2015) Tests statistiques basés sur la théorie de l’information. Applications en biologie et en démographie. PhD thesis, Université de Caen - Basse Normandie, France.



O. Vasicek (1976) A test for normality based on sample entropy. Journal of the Royal Statistical Society. Series B, 38, 54–59.

Ph. Regnault (LMR-URCA)

GOF tests via φ-entropy differences

MaxEnt 2014 - 25/09/14

21 / 21