Geometry of F-likelihood Estimators and F-Max-Ent Theorem

Indian Institute of Space Science and Technology. 26 September 2014. Harsha K V and Subrahamanian Moosath K S. 26 September 2014. 1 / 25 ...
385KB taille 2 téléchargements 247 vues
Geometry of F -likelihood Estimators and F -Max-Ent Theorem Harsha K V and Subrahamanian Moosath K S Department of Mathematics Indian Institute of Space Science and Technology

26 September 2014

Harsha K V and Subrahamanian Moosath K S

26 September 2014

1 / 25

1. Introduction On a statistical manifold S, the Fisher information acts as a Riemannaian metric called the Fisher information metric g . Amari defined a one parameter family of connections ∇α called α−connections using α−embeddings. ( 1−α 2 α 6= 1 p 2 1−α Lα (p) = log p α=1 The connections ∇α and ∇−α are dual with respect to the Fisher information metric g .

Harsha K V and Subrahamanian Moosath K S

26 September 2014

2 / 25

1. Introduction On a statistical manifold S, the Fisher information acts as a Riemannaian metric called the Fisher information metric g . Amari defined a one parameter family of connections ∇α called α−connections using α−embeddings. ( 1−α 2 α 6= 1 p 2 1−α Lα (p) = log p α=1 The connections ∇α and ∇−α are dual with respect to the Fisher information metric g . An exponential family is flat with respect to ∇1 -connection (also known as exponential connection) and by duality it is also (−1)-flat. Hence (g , ∇1 , ∇−1 ) is a dually flat structure on exponential family. Amari has defined a α−family for any α ∈ R. But for α 6= 1, α−family is not flat with respect to the α−connection.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

2 / 25

A q-exponential family extends the notion of an exponential family. A q−exponential family, which is an α−family with α = 1 − 2q, has a dually flat structure called q-structure. This q-geometry is obtained by the conformal flattening of α−geometry.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

3 / 25

A q-exponential family extends the notion of an exponential family. A q−exponential family, which is an α−family with α = 1 − 2q, has a dually flat structure called q-structure. This q-geometry is obtained by the conformal flattening of α−geometry. Naudts generalized the notion of exponential family to a large class of families of probability distributions called φ-exponential family and studied the dually flat structure. An information geometric foundation for the deformed exponential family (χ-family) is given by Amari et al.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

3 / 25

A q-exponential family extends the notion of an exponential family. A q−exponential family, which is an α−family with α = 1 − 2q, has a dually flat structure called q-structure. This q-geometry is obtained by the conformal flattening of α−geometry. Naudts generalized the notion of exponential family to a large class of families of probability distributions called φ-exponential family and studied the dually flat structure. An information geometric foundation for the deformed exponential family (χ-family) is given by Amari et al. We extended Amari’s α-geometry to a new geometry called F −geometry using an embedding function F of S into the space of random variables RX . When F is the α−embeddings of Amari, the F −geometry reduces to α-geometry. In this general embedding case also a dualistic structure (g , ∇F , ∇H ) can be introduced on S, where H is the dual embedding of F .

Harsha K V and Subrahamanian Moosath K S

26 September 2014

3 / 25

Further we introduced (F , G )−geometry using the embedding F and a positive smooth function G . From the idea of (F , G )-geometry, we consider a F -exponential family which is an extension of q−exponential family which is not flat with respect to the F -connection ∇F . A dually flat structure can be defined on F −exponential family by the conformal flattening of (F , G )−geometry.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

4 / 25

Further we introduced (F , G )−geometry using the embedding F and a positive smooth function G . From the idea of (F , G )-geometry, we consider a F -exponential family which is an extension of q−exponential family which is not flat with respect to the F -connection ∇F . A dually flat structure can be defined on F −exponential family by the conformal flattening of (F , G )−geometry. Further using the function F and its inverse, we generalize the notion of independence called F −independence. Then we define F −likelihood function and F −likelihood estimators and discusses its geometry. An analytic proof of the F -version of the max-ent theorem is outlined.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

4 / 25

2. F -Geometry Let S be an n-dimensional statistical manifold. Let F : (0, ∞) −→ R be an injective function which is at least twice differentiable. Then F is an embedding of S into RX which takes each p(x; ξ) 7−→ F (p(x; ξ)). The metric induced by the embedding F is the Fisher information metric g defined by Z gij (θ) = ∂i ` ∂j ` p(x; θ) dx where `(x; θ) = log p(x; θ).

The affine connection induced by the embedding F , the F −connection is defined by  Z  pF 00 (p) )∂i ` ∂j `)(∂k `) p dx. ΓFijk (ξ) = (∂i ∂j ` + (1 + 0 F (p) Remark 2.1 Amari’s α−geometry is a special case of the F −geometry.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

5 / 25

3. (F , G )-Geometry Definition 3.1 Let G : (0, ∞) −→ R be a positive smooth function. Define the G −metric as gijG (θ) =

Z ∂i ` ∂j ` G (p) p dx.

Using the embedding F and the function G , Define (F , G )−connection ∇F ,G as  Z  pF 00 (p) ΓFijk,G = (∂i ∂j ` + (1 + 0 )∂i ` ∂j `)(∂k `) G (p) p dx. F (p) Theorem 3.2 The (F , G )−connection ∇F ,G and the (H, G )−connection ∇H,G are dual connections with respect to the G -metric iff the functions F and H satisfy H 0 (p) =

G (p) . pF 0 (p)

We call such an embedding H as a G −dual embedding of F . Harsha K V and Subrahamanian Moosath K S

26 September 2014

6 / 25

4. F −exponential family Definition 4.1 Let F : (0, ∞) −→ R be any smooth increasing concave function. Let Z be the inverse function of F . The standard form of an n-dimensional F −exponential family of distributions S = {p(x; θ) / θ ∈ E ⊆ Rn } is written as p(x; θ) = Z (

n X

θi xi − ψF (θ))

or

F (p(x; θ)) =

i=1

n X

θi xi − ψF (θ)

i=1

where x = (x1 , ..., xn ) is a set of random variables, θ = (θ1 , .., θn ) are the canonical parameters. ψF (θ) is called the F -free energy or the F -potential and is determined from the normalization condition. Z n X Z( θi xi − ψF (θ))dx = 1 i=1

Define a functional hF (θ) as Z hF (θ) =

1 dx F 0 (p(x; θ))

. Harsha K V and Subrahamanian Moosath K S

26 September 2014

7 / 25

Theorem 4.2 The F −potential function ψF (θ) is a convex function of θ and Z −pF 00 (p) 1 1 ∂i ∂j ψF (θ) = ∂i p ∂j p dx. hF (θ) F 0 (p) p Definition 4.3 Define a Riemannian metric called F −metric g F by gijF (θ) = ∂i ∂j ψF (θ). Note that (gijF ) is positive definite since ψF is a convex function of θ. Define a divergence of Bregman-type using ψF (θ), called the F −divergence as DF [p(x; θ1 ) : p(x; θ2 )] = ψF (θ2 ) − ψF (θ1 ) − ∇ψF (θ1 ).(θ2 − θ1 ). The two distributions p and r which are parametrized by θ1 and θ2 respectively. Then the F −divergence can be written as Z 1 1 DF [p : r ] = (F (p) − F (r )) 0 dx. hF (θ1 ) F (p)

Harsha K V and Subrahamanian Moosath K S

26 September 2014

8 / 25

Definition 4.4 For a density function p parametrized by θ, define the F −escort probability distribution of p as 1 pˆF (x) = . hF (θ)F 0 (p) Using pˆF , define the Fˆ−expectation of a random variable as Z 1 1 Epˆ(f (x)) = f (x)dx. hF (θ) F 0 (p) Then the F −divergence can be written as DF [p : r ] = Epˆ(F (p) − F (r )). Lemma 4.5 D

The metric gij F and the affine connection ∇DF induced by the F −divergence DF are given by D D gij F (θ) = gijF (θ) = ∂i ∂j ψF (θ); ΓijkF = ∂i ∂j ∂k ψF (θ). ∗

D∗

The dual DF∗ of DF induces an affine connection ∇DF defined by ΓijkF = 0.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

9 / 25

5. Dually flat structure of F -Exponential family The Legendre transformation of the convex function ψF (θ) is given by ηi = ∂i ψF (θ). The dual potential function φF is called the negative F -entropy and is given by φF (η)

=

max {θ.η − ψF (θ)}

=

Epˆ(F (p)) =

θ

We have, ηi = ∂i ψF (θ) = Epˆ(xi );

1 hF (θ)

Z

F (p) dx. F 0 (p)

∂i ηj = ∂i ∂j ψF (θ) = gijF (θ).

With respect to the dual co-ordinate system (ηj ), the metric and the dual connections are given by D

g˜ij F (η) = ∂ i ∂ j φF (η);

Harsha K V and Subrahamanian Moosath K S

˜DF (η) = 0; Γ ijk



˜DF (η) = ∂ i ∂ j ∂ k φF (η). Γ ijk

26 September 2014

10 / 25

6. Conformal flattening of (F , G )-geometry ˜ are said to be ˜ h) Definition 6.1 Two statistical manifolds (M, ∇, h) and (M, ∇, β-conformally equivalent if there exist a positive function K on M such that ˜ ,Y) h(X

=

K h(X , Y )

˜∇ ˜ XY , Z) h(

=

K h(∇X Y , Z ) −

+

1−β {h(Y , Z )dK (X ) + h(X , Z )dK (Y )} 2

1+β h(X , Y )dK (Z ) 2

In terms of the basis vectors, we can rewrite the above expression as ˜ i , ∂j ) = h˜ij = K h(∂i , ∂j ) = K hij h(∂ ˜β = K Γijk − 1 + β hij ∂k K + 1 − β {hjk ∂i K + hik ∂j K } Γ ijk 2 2 The dually flat structure on F -exponential family is obtained by the conformal flattening of (F , G )-geometry.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

11 / 25

Theorem 6.2 D

The metric gij F induced by the F -divergence DF is obtained by the conformal flattening of the G -metric gijG by a gauge function K (θ) =

1 , hF (θ)

with G (p) =

−pF 00 (p) . F 0 (p)

Proof: D

gij F (θ) = ∂i ∂j ψF (θ) =

1 hF (θ)

Z

−pF 00 (p) 1 ∂i p ∂j p dx F 0 (p) p

= K (θ)gijG where K (θ) =

1 hF (θ)

and gijG =

R

∂i p ∂j p

G (p) dx p

is the G −metric with G (p) =

−pF 00 (p) . F 0 (p)

Theorem 6.3 The affine connection ∇DF induced by DF is the (−1)-conformal transformation of the 00 (p) (H, G )-connection ∇H,G by the gauge function K (θ) = hF1(θ) , where G (p) = −pF and F 0 (p) H is the G -dual embedding of F .

Harsha K V and Subrahamanian Moosath K S

26 September 2014

12 / 25

Proof: The components Z 1 D ΓijkF = hF (θ) Z 1 + hF (θ) Z 1 + hF (θ) Z 1 + hF (θ) = with G (p) =

of the connection ∇DF are given by   −pF 00 (p) p 2 F 000 (p) 2p 2 (F 00 (p))2 ∂i ` ∂j ` ∂k ` pdx − + F 0 (p) F 0 (p) (F 0 (p))2 −pF 00 (p) )∂i ∂j ` ∂k ` pdx F 0 (p) pF 00 (p) ∂i ` dx ∂j ∂k ψF (θ) 0 (F (p))2 pF 00 (p) ∂j ` dx ∂i ∂k ψF (θ) 0 (F (p))2

(

G G K (θ)ΓH,G ijk + ∂j K (θ)gik (θ) + ∂i K (θ)gjk (θ)

−pF 00 (p) F 0 (p)

and K (θ) =

1 . hF (θ)

Theorem 6.4 ∗

The affine connection ∇DF induced by DF∗ is the 1-conformal transformation of the 00 (p) (F , G )-connection ∇F ,G by a gauge function K (θ) = hF1(θ) , where G (p) = −pF . F 0 (p)

Harsha K V and Subrahamanian Moosath K S

26 September 2014

13 / 25

7. F −likelihood estimator Definition 7.1 Let F be an increasing concave function and let Z be its inverse function. Then the F -product of two numbers x, y is defined as x ⊗F y = Z [F (x) + F (y )] The F −product satisfies the following properties Z (x) ⊗F Z (y ) = Z (x + y ) F (x ⊗F y ) = F (x) + F (y ) Definition 7.2 Two random variables X and Y are said to be F −independent with normalization if the joint probability density function pF (x, y ) is given by the F −product of the marginal probability density functions p1 (x) and p2 (y ). p1 (x) ⊗F p2 (y ) Zp1 ,p2 RR is the normalization defined by Zp1 ,p2 = p (x) ⊗F p2 (y )dxdy Ω1 Ω2 1 pF (x, y ) =

where Zp1 ,p2

Harsha K V and Subrahamanian Moosath K S

26 September 2014

14 / 25

8. The geometry of F −likelihood estimators Let S = {p(x; θ) / θ ∈ E ⊆ Rn } be an n−dimensional statistical manifold defined on a sample space Ω ⊆ R and let {x 1 , ....., x N } be N independent observations from p(x; θ) ∈ S. Definition 8.1 The F −likelihood function LF (θ) is defined as LF (θ) = p(x 1 ; θ) ⊗F .... ⊗F p(x N ; θ) Since F is an increasing function, it is equivalent to consider F (LF (θ)) as well. F (LF (θ)) = F (p(x 1 ; θ)) ⊗F .... ⊗F F (p(x N ; θ)) =

N X

F (p(x i ; θ))

i=1

Definition 8.2 A maximum F −likelihood estimator θˆ is defined as θˆ = arg max LF (θ) = arg max F (LF (θ)) θ∈E

Harsha K V and Subrahamanian Moosath K S

θ∈E

26 September 2014

15 / 25

Let S = {p(x; θ) / θ ∈ E ⊆ Rn } be a F −exponential family and let M be a curved F -exponential family in S. Consider {x 1 , ....., x N } be N independent observations from a probability density function p(x; u) = p(x; θ(u)) ∈ M. Theorem 8.3 The F −likelihood estimator for M is the orthogonal projection of that of S to the ∗ submanifold M with respect to the connection ∇DF . Proof: The F −likelihood function is given by F (LF (u)) =

N X

j

F (p(x ; u))

=

j=1

" n N X X j=1

=

n X i=1

∂i F (LF (u)) =

N X

# θ

i

(u)xij

− ψF (θ(u))

i=1

θi (u)

N X

xij − NψF (θ(u)).

j=1

xij − N∂i ψF (θ(u)).

j=1

Harsha K V and Subrahamanian Moosath K S

26 September 2014

16 / 25

Thus the maximum F −likelihood estimator for S is given by ηˆi =

N 1 X j x. N j=1 i

The canonical divergence DF∗ for S can be calculated as DF∗ [p(θ(u)); p(ˆ η )] = DF [p(ˆ η ); p(θ(u))] = ψF (θ(u)) + φF (ˆ η) −

n X

θi (u)ηˆi

i=1

1 = φF (ˆ η ) − F (LF (u)). N Hence the F −likelihood is maximum if the canonical divergence (or the dual of the F -divergence) is minimum. Equivalently, by the projection theorem, we can say that F −likelihood estimator for M is the orthogonal projection of ηˆ to the submanifold M with respect to the ∗ connection ∇DF .

Harsha K V and Subrahamanian Moosath K S

26 September 2014

17 / 25

9. F -Max-Ent Theorem Definition 9.1 For any probability density function p(x), the F −entropy is defined as Z −F (p) 1 HF (p) = −Epˆ(F (p)) = dx. hF (p) F 0 (p)

(1)

When F (p) =lnq p, theq−logarithm then HF (p) reduces to the q−entropy 1 1 − hq1(p) and when F (p) = ln p, HF (p) reduces to the Shannon 1−q R entropy H(p) = − p(x) ln p(x) dx.

Hq (p) =

Theorem 9.2 Probability distributions maximizing the F -entropy HF under the F -linear constraints EpˆF [ck (x)] = ak ; k = 1, ..., m

(2)

for m random variables ck (x) and various values of ak ∈ R form an m-dimensional F -exponential family m X F (p(x; θ)) = θi ci (x) − ψ(θ) i=1 Harsha K V and Subrahamanian Moosath K S

26 September 2014

18 / 25

Proof: We use the method of Lagrange multipliers and the calculus of variation principle. 1 hF (p)

R

1 hF (p)

Z

Our aim is to maximize HF (p) = EpˆF [ck (x)] =

−F (p) dx F 0 (p)

subject to the m constraints

ck (x) dx = ak ; k = 1, ..., m F 0 (p)

Consider Z ∞ Z ∞ −F (p) 1 dx + λ pdx 0 hF (p) 0 F 0 (p) 0 Z m m ∞ X X ck (x) 1 λi λ i ai + dx − λ0 − 0 hF (p) 0 F (p) i=1 i=1

L(p, λ0 , λ1 , .., λm ) =

At maximum F -entropy distribution we have

Harsha K V and Subrahamanian Moosath K S

dL dp

= 0.

26 September 2014

19 / 25

1 hF (p)

Using this we get λ0 = F (p)

=

and

m X

λi (ci (x) − ai ) +

i=1

= =

m X i=1 m X

1 hF (p)



Z 0

F (p) dx F 0 (p)

λi (ci (x) − ai ) − HF (p) λi ci (x) − σ(λi , ai )

i=1

where σ(λi , ai ) =

Pm

i=1

λi ai + HF (p).

Using the m constraints we can solve for λi to get λi = − dHdaF (p) and λi ’s are the i canonical co-ordinates for the F -exponential family. Hence ai ’s are the dual co-ordinate of the canonical co-ordinate λi . Using the dual co-ordinates λi , ai and their potential functions, F (p) takes the form of a F -exponential family. Thus F (p(x; θ)) =

m X

θi ci (x) − ψ(θ)

i=1

Harsha K V and Subrahamanian Moosath K S

26 September 2014

20 / 25

Conclusion

Using a general embedding F and a positive smooth function G , the (F , G )−geometry is introduced and the geometry of the F -exponential family is studied. The dually flat structure of the F -exponential family is obtained by the conformal flattening of the (F , G )-geometry. Using a generalized notion of independence called the F -independence, we defined the F -likelihood function. Further, the geometry of the F -likelihood estimators are discussed. A generalized notion of entropy called F -entropy is given and an analytic proof of the F -version of the max-ent theorem is outlined. Future work Further one can explore the asymptotic behavior of the mle of the F -escort probability density function and its various applications.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

21 / 25

Acknowledgement

We would like to express our sincere gratitude to Prof. Shun-ichi Amari for his valuable suggestions and fruitful discussions.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

22 / 25

References H. Matsuzoe, and A. Ohara, “Geometry for q−exponential families”, Proceedings of the 2nd International Colloquium on Differential Geometry and its Related Fields, Veliko Tarnovo, September 6-10, 2010. S.I. Amari, and A. Ohara, “Geometry of q-Exponential Family of Probability Distributions”, Entropy, 13, 1170–1185 (2011). J. Naudts, “Deformed exponentials and logarithms in generalized thermostatistics”, Physica A, 316 323-334(2002). J. Naudts, “Estimators, escort probabilities, and phi-exponential families in statistical physics”, J. Ineq. Pure Appl. Math., 5, 102(2004). J. Naudts, Generalised Thermostatistics; Springer: London, UK, 2011. S.I. Amari,; A. Ohara, ; H. Matsuzoe, “Geometry of deformed exponential families: Invariant, dually flat and conformal geometries”, Physica A:Statistical Mechanics and its Applications, textbf391, 4308-4319 (2012).

Harsha K V and Subrahamanian Moosath K S

26 September 2014

23 / 25

K.V. Harsha, and K.S. Subrahamanian Moosath, “F -geometry and Amari’s α-geometry on a statistical manifold”, Entropy, 16(5), 2472-2487 (2014). T. Kurose, “On the Divergence of 1-conformally Flat Statistical Manifolds”, Tohoku Math. J., 46, 427433 (1994). T. Kurose, “Conformal-projective geometry of statistical manifolds”, Interdisciplinary Information Sciences, 8, 89100(2002). A. Ohara, ; H. Matsuzoe, and S.I. Amari, “A dually flat structure on the space of escort distributions”, Journal of Physics: Conference Series, 201(01), 2010. Y. Fujimoto, and N. Murata, “ A generalization of Independence in Naive Bayes Model”, Lecture Notes in Computer Science, 153-161(2010). S.I. Amari, and H. Nagaoka, Methods of Information Geometry, Translations of Mathematical Monographs, Oxford University Press, Oxford, UK, 2000.

Harsha K V and Subrahamanian Moosath K S

26 September 2014

24 / 25

Thank You

Harsha K V and Subrahamanian Moosath K S

26 September 2014

25 / 25