On bounds for the Fisher-Rao distance between multivariate normal

Explicit forms for the Fisher-Rao distance associated to this metric and ... multivariate normal distribution model in the following special cases : i) the covariance.
7MB taille 9 téléchargements 251 vues
On bounds for the Fisher-Rao distance between multivariate normal distributions 1

João E. Strapasson∗ , Julianna P. S. Porto† and Sueli I. R. Costa† ∗

School of Applied Sciences, University of Campinas, Brazil † Institute of Mathematics, University of Campinas, Brazil

Abstract. Information geometry is approached here by considering the statistical model of multivariate normal distributions as a Riemannian manifold with the natural metric provided by the Fisher information matrix. Explicit forms for the Fisher-Rao distance associated to this metric and for the geodesics of general distribution models are usually very hard to determine. In the case of general multivariate normal distributions lower and upper bounds have been derived. We approach here some of these bounds and introduce a new one discussing their tightness in specific cases. Keywords: Information Geometry, Fisher or Rao distance, bounds for Fisher- Rao distance, multivariated normal distributions. PACS: 02.40.-k

1. INTRODUCTION In information geometry a Riemannian manifold of probability distributions is considered with the metric given by the Fisher information matrix which was introduced by R. Fisher in 1921. This differential geometric approach to probability theory appears first in the work of C. Rao in 1945 [10] and has been studied and applied by several authors since then (see [1], [3], [2], [7] and [9]). The distance associated to the Fisher matrix has been called Rao distance, Fisher distance and also Fisher-Rao distance, the name adopted here. This distance as well as other measures of divergence between two probability distributions such as the Kullback-Leibler dissimilarity, have been applied to different domains such as statistical inference, mathematical programming, image processing and radar signal processing [8]. Inequalities in information theory can be related to geometric inequalities through the Fisher matrix [6]. Explicit forms for the Fisher-Rao distance and for the geodesics of general distribution models are usually very hard to determine. The model of the univariate normal distributions, a two dimensional manifold, is related to the hyperbolic plane and there is a closed form for the distance in this case. Closed forms for the Fisher-Rao distance can also be derived for submanifolds of the multivariate normal distribution model in the following special cases : i) the covariance matrices are diagonal [7], ii) the covariance matrices are a multiple of a fixed matrix [5], 1

Partially supported by FAPESP (Grant 2013/25977-7), CNPQ (Grant 312926/2013-8) and CAPES. Email addresses: [email protected] (João E. Strapasson), [email protected] (Julianna P. S. Porto), [email protected] (Sueli I. R. Costa).

and iii) the mean vector is constant [2] . For general multivariate distributions upper and lower bounds for the distance were derived in [4] and [5] from connections to the special case ii) above mentioned. In this work we derive another upper bound (Proposition 4.3) using expressions from the case i) above and specific isometries in the Fisher metric. As it is shown in the calculations presented in Figures 1, 2, 3 and 4, this upper bound can be meaningful tighter than the previous known particularly in the case of distributions with high divergence between eigenvalues and rotation between eigenvector frames of the covariance matrices.

2. PRELIMINARES We consider the statistical model S = {pθ = p(x;θθ );θθ = (θ1 , θ2 , · · · , θm ) ∈ Θ}, of multivariate differential probability density functions on χ = Rn . In this model, a natural Riemannian structure [1] can be provided by the Fisher information matrix:   ∂   ∂  gi j (θθ ) = Eθ log p(x;θθ ) log p(x;θθ ) , ∂ θi ∂θj where Eθ is the expected value with respect to the distribution pθ . This matrix can also be viewed as the Hessian matrix of the Shannon Entropy H(p) = −

Z

p(x;θθ ) log p(x;θθ )dx.

This paper will approach the specific model M = ψ(Θ) of multivariate normal probability density functions on χ = Rn , with covariance matrix Σ. This model is a Riemannian manifold of dimension m = n + n(n+1) , where µ = (θ1 , θ2 , · · · , θn ) is the 2 mean vector, assumed as a column, and (θn+1 , θn+2 , · · · , θm ) are parameters for Σ. The infinitesimal arc length element ds of the metric given by the Fisher information matrix is given by n 2

ds =



gi j (θθ )dθi dθ j .

i, j=1

The Fisher-Rao distance between two distributions pθ 1 and pθ 2 is then given by the shortest length of a curve in Rm connecting θ 1 and θ 2 . The length of a curve α from α(t1 ) = θ 1 to α(t2 ) = θ 2 is Z t2

1 `(α) = α 0 (t), α 0 (t) G 2 dt, t1

where hxx,yyiG = xt [gi j ]yy is the inner product defined G. An explicit expression for the distance between two general normal multivariate distributions is very hard obtain. In the case n = 1, a closed form is known via an association with the classical model of the hyperbolic plane, [2],[3] and [7]. For n = 1, we may consider the mean µ and the standard deviation σ as parameters, θ = (µ, σ ), for the two dimensional manifold of univariate normal distributions,

1 1 exp − p(x; µ, σ ) = √ 2 2πσ



x−µ σ

2 ! ,

where µ ∈ R and σ ∈ (0, ∞). In the mean×standard deviation, half-plane H2F = R×(0, +∞), the Fisher information matrix at θ = (µ, σ ) can be calculated as  1  0 2 σ GF (θθ ) = . 0 σ22 Since in the Poincaré model of the upper half plane, the metric matrix is given by  1  0 2 G(θθ ) = σ , 0 σ12 the Fisher-Rao distance, dF , can be expressed in terms of the Poincaré hyperbolic distance, dH ,     √ µ1 µ2 √ , σ1 ; √ , σ2 dF ((µ1 , σ1 ); (µ2 , σ2 )) = 2dH . 2 2 An analytic expression for dF is         √µ1 µ µ µ 2 2 , σ1 − √2 , −σ2 + √12 , σ1 − √22 , σ2 √        , dF ((µ1 , σ1 ); (µ2 , σ2 )) = 2 log  µ µ µ µ √12 , σ1 − √22 , −σ2 − √12 , σ1 − √22 , σ2 where |.| is the standard Euclidean norm in R2 . The geodesics in H2F are half-vertical lines and half-ellipses centered at σ = 0, with eccentricity √12 , [7].

3. MULTIVARIATE NORMAL DISTRIBUTIONS For multivariate normal distributions,   n (2π)−( 2 ) (xx − µ )t Σ−1 (xx − µ ) µ , Σ) = p p(xx;µ exp − , 2 Det(Σ) t where x = (x1 , · · · , xn )t , µ = (µ1 , · · · , µ n ) is the mean  vector and Σ = [ai j ] is the covari-

-dimensional model, M = {pθ ;θθ = ance matrix, we consider the statistical n + n(n+1) 2 n µ , Σ) ∈ Θ = R × Pn (R)}, where Pn (R) is the space of positive-definite order n sym(µ metric matrices. The Fisher information metric in M can be expressed as ([3]) 1 µ t Σ−1 dµ µ + tr[(Σ−1 dΣ)2 ]. (1) ds2 = dµ 2 A closed form for the Fisher-Rao distance in M , dF , is known only for some submanifolds.

µ , Σ) ∈ ΘD }; The submanifold MD where Σ is diagonal. Let MD = {pθ ;θθ = (µ µ , D), Σ is diagonal} ⊂ Θ, the submanifold of M composed by distributions ΘD = {(µ with diagonal covariance matrix Σ = diag(σ12 , σ22 , · · · , σn2 ), σi > 0, ∀i. If we consider the parameter θ = (µ1 , σ1 , µ2 , σ2 , · · · , µn , σn ), the Fisher information matrix is ([7])  1  0 · · · 0 0 2  σ1 2   0  · · · 0 0 2 σ1    .  . . . . . . . . . .  . . . . .     0 0 ··· 1 0    σn2 2 0 0 · · · 0 σ2 n

Note that ΘD is a 2n-dimensional manifold which can be identified with the space 2n 2 n H2n F = (HF ) . The metric in HF is the product metric in the product space of univariate normal distributions. For θ 1 = (µ11 , σ11 , · · · , µ1n , σ1n ) and θ 2 = (µ21 , σ21 , · · · , µ2n , σ2n ), the distance between two probability distributions with this parameters is, [7], v u        2   µ u µ µ µ u n √1i2 , σ1i − √2i2 , −σ2i + √1i2 , σ1i − √2i2 , σ2i u         . dD (θθ 1 ,θθ 2 ) = t2 ∑ log  √ µ1i µ2i µ1i µ2i √ √ √ i=1 − , σ − , −σ , σ − , σ 2 1i 2i 2 2 1i 2 2i (2) In this space, a curve α(t) = (α1 (t), · · · , αn (t)) is a geodesic if and only if, αi (t), ∀i, is a geodesic in H2F . The submanifold MΣ where Σ is constant. In the n-dimensional manifold composed by multivariate normal distributions with common covariance matrix Σ, MΣ = µ , Σ) ∈ Θ, Σ = Σ0 ∈ Pn (R) constant}, the Fisher-Rao is associated to the Eu{pθ ;θθ = (µ clidean space and it is equal to the Mahalanobis’ distance, i. e., the distance between two µ 1 , Σ0 ) to θ 2 = (µ µ 2 , Σ0 ) is distributions in this space from θ 1 = (µ q µ 1 − µ 2 ). µ 1 ,µ µ 2 ) = (µ µ 1 − µ 2 )t Σ−1 dΣ (µ 0 (µ The submanifold Mµ where µ is constant. In 1976, S. T. Jensen deduced a closed expression for the Fisher-Rao distance in the n(n+1) 2 -dimensional totally geodesic µ , Σ) ∈ Θµ }, Θµ = {(µ µ , Σ); µ = µ 0 ∈ Rn constant} ⊂ Θ, submanifold Mµ = {pθ ;θθ = (µ composed by distributions which have the same mean vector µ [2], dµ2 (Σ1 , Σ2 ) =

1 n ∑ [log(λi)]2 = dF ((µµ , Σ1), (µµ , Σ2)), 2 i=1

where 0 < λ1 ≤ λ2 ≤ · · · ≤ λn are the eigenvalues of Σ−1 1 Σ2 . Remark 3.1. Note that the submanifolds MΣ and MD are not totally geodesic. To see MΣ is not a geodesic just consider the one dimensional case [7, Fig. 2]. Note also that,

for normal distributions with same mean and diagonal covariance matrices, dµ ≤ dD . This means that dΣ and dD are just upper bounds for the Fisher-Rao distance, if we consider the whole space of multivariate normal distributions.

4. LOWER AND UPPER BOUNDS In 1990 Calvo and Oller [4] derive a lower bound for Fisher-Rao distance through a isometric embedding of the parametric space of M into the manifold of the positivedefinite matrices. µ 1 , Σ1 ) e θ 2 = (µ µ 2 , Σ2 ), let Proposition 4.1. [4] Given θ 1 = (µ   Σi + µ ti µ i µ ti , Si = µi 1 i = 1, 2. A lower bound for the distance between θ 1 and θ 2 is s 1 n LB(θθ 1 ,θθ 2 ) = ∑ [log(λi)]2, 2 i=1

(3)

where λk , 1 ≤ k ≤ n + 1, are the eigenvalues of S1−1 S2 . Upper bounds. In another paper, [5], the same authors derive an explicit expression for geodesics in M . They also could express the distance between special pairs of points by solving the associated second order partial differential equation with boundary conditions in these cases. Assuming some restrictions in this system, they calculate the Fisher-Rao distance in M between points in the submanifold Mα ⊂ M , µ , Σ) ∈ Θ; Σ = αΣ0 , Σ0 ∈ Pn (R), α ∈ R∗+ }. Mα = {pθ ;θθ = (µ µ 1 , Σ1 ) and θ α = (µ µ 2 , αΣ1 ), we denote dF (, ) as dα (, ), Given two distributions θ 1 = (µ √  α 1 1 t n−1 2 2 dα (θθ 1 ,θθ α ) = 2 arccosh + √ + √ δ δ + log α, 2 2 2 α 4 α 1

µ 2 − µ 1 ). where δ = Σ− 2 (µ By considering triangular inequalities and proper values of α they could establish µ 1 , Σ1 ), θ 2 = (µ µ 2 , Σ2 ) and θ α = upper bounds for the Fisher-Rao distance. For θ 1 = (µ µ 2 , αΣ1 ), (µ dF (θθ 1 ,θθ 2 ) ≤ dF (θθ 1 ,θθ α ) + dF (θθ α ,θθ 2 ) ≤ dα (θθ 1 ,θθ α ) + dµ (θθ α ,θθ 2 ). Therefore, UBα = dαΣ (θθ 1 ,θθ α ) + dµ (θθ α , θ2 ) is a upper bound for Fisher-Rao distance between θ 1 and θ 2 . −1 −1 1 For a first bound, U1, it was considered α = kΣ1 2 Σ2 Σ1 2 k n . (4)

A second bound, U2, was proposed through a numerical minimization process: α = min{dαΣ (θθ 1 ,θθ α ) + dµ (θθ α ,θθ 2 )}.

(5)

α

The new upper bound we propose next is based on special isometries in the manifold M and on the distance in the submanifold MD (2). µ 1 , Σ1 ) and Proposition 4.2. For any pair of multivariate normal distributions θ 1 = (µ µ 2 , Σ2 ) we can assert: θ 2 = (µ   µ 3 , D3 ) . dF (θθ 1 ,θθ 2 ) = dF (0, In ); (µ (6) where 0 is the zero mean, In is the identity matrix, D3 is the diagonal matrix given by the −(1/2) −(1/2) eigenvalues of A = Σ1 Σ2 Σ1 , Q is the orthogonal matrix whose columns are the −(1/2) µ 2 − µ 1 ). respective eigenvectors of A (A = QD3 Qt ) and µ 3 = Qt Σ1 (µ µ , Σ) = Proof: We first remark, as pointed out in [3], that the mapping ψ(µ ˜ is an isometry in the manifold M , for any c ∈ Rn and µ + c , QΣQt ) = (µ˜ , Σ) (Qµ any non-singular n × n matrix Q. If consider the arc length expression given in (1), µ t Qt QΣ−1 Qt Qdµ µ + 21 tr[(QΣ−1 Qt QdΣQt )2 ] = ds2 . Besides, if we consider the d s˜2 = dµ 1 1 −(1/2) (−1/2) diagonalization Σ = ODOt and Σ 2 = OD 2 Ot , for Q = Σ1 and c = −Σ1 µ1 we µ 1 , Σ1 ) = (0, In ). Let ψ1 (µ µ 2 , Σ2 ) = (µ¯ 2 , Σ¯2 ) and Σ¯2 = OD¯ 2 Ot . We get ψ1 such that ψ1 (µ now consider another isometry ψ2 in M in order to set the eigenvectors of Σ¯ 2 as frame µ , Σ) = (Oµ µ , Ot ΣO) and the for coordinates. By taking Q = Ot and c = 0 , we get ψ2 (µ µ 1 , Σ1 ) = (0, In ) and ψ3 (µ µ 2 , Σ2 ) = (µ µ 3 , D3 ), where isometry ψ3 = ψ2 ◦ ψ1 we have ψ3 (µ −(1/2) −(1/2) −(1/2) t µ 2 − µ 1 ) and D3 = OΣ1 µ 3 = OΣ1 (µ Σ2 Σ1 O and therefore (6) is derived. By considering the distance (2) in the submanifold MD of the normal distributions with diagonal covariance matrix (which is not geodesic as pointed out in Remark 3.1) and the above proposition we get a closed form for an upper bound of the Fisher-Rao distance. Proposition 4.3. The Fisher-Rao distance between two multivariate normal distribuµ 1 , Σ1 ) and θ 2 = (µ µ 2 , Σ2 ) is upper bounded by, tions θ 1 = (µ v q  q u 2 2 2 2 n u |1 − D3i | + |µ3i | + |D3i + 1| + |µ3i | u , (7) q U3(θθ 1 ,θθ 2 ) = ∑ t2 log2  q 2 2 2 2 i=1 |D3i + 1| + |µ3i | − |1 − D3i | + |µ3i | where µ3i are the coordinates of µ 3 and D3i are the diagonal terms of D3 , as established in the last proposition. Simulations. In the next figures we explore simulations using the Mathematica program. We consider the bivariate normal distribution model (n = 2), a fixed θ1 = (00, I2 ) µ 2 , Σ2 ) according to Proposition 4.2. It is shown a comparison and variable θ2 = (µ between the lower bound LB (3), the upper bound U1 (4), the upper bound U2 (5), where a numerical optimization process was used, and our closed form upper bound U3 (7). We consider Σ2 = OD2 Ot , λ1 and λ2 its eigenvalues and O = O(τ) the rotation matrix given by the eigenvectors of Σ2 . In Figure 1, the mean µ 2 = (2, 2) is fixed,

λ1 = 1, λ2 = 10 and τ varies from 0 to π4 . In Figure 2, the mean µ 2 = (16, 16) is fixed, λ1 = 1, λ2 = 10 and τ varies from 0 to π4 . In Figure 3, the mean µ 2 = (4, 4) and the angle τ = 0 are fixed, λ1 = k, λ2 = 10k and k varies from 1 to 8. In Figure 4, the mean µ 2 = (0.5, 0.5) and the angle τ = π4 are fixed, λ1 = 1 and λ2 = 0.5 + 0.1 j, where j varies from 0 to 8. LB

U1

U2

U3

Bounds à ì

à ì

à ì

à ì

à ì

à ì

à ì

à ì

à ì

à ì

LB 2.5322 2.4724 2.4099 2.3467 2.285 2.2278 2.1782 2.1396 2.1151 2.1066

3.2 3.0 2.8 ò

2.6

ò

æ

ò æ

ò

æ

2.4

ò

æ

ò

æ

ò

æ

2.2

æ

2.0

Π 36

Π 18

Π 12

Π 9

Π 6

5Π 36

ò æ

ò

æ

ò

7Π 36

2Π 9

Π 4

æ

U1 3.3876 3.3876 3.3876 3.3876 3.3876 3.3876 3.3876 3.3876 3.3876 3.3876

U2 3.3844 3.3844 3.3844 3.3844 3.3844 3.3844 3.3844 3.3844 3.3844 3.3844

U3 2.6656 2.5943 2.5185 2.4404 2.3632 2.2905 2.2269 2.177 2.1451 2.1341

Angle Τi

The mean µ 2 = (2, 2) is fixed, λ1 = 1, λ2 = 10 and τ varies from 0 to π4 .

FIGURE 1.

LB

U1

U2

U3

Bounds ò

ò ò

LB 6.0767 5.9865 5.8866 5.778 5.6626 5.5443 5.4301 5.3309 5.2616 5.2364

ò à

à

à

à

à

ì

ì

ì

ì

ì

8.0

ò

à

à

à

à

à

ì

ì

ì

ì

ì

ò

7.5

ò

7.0

ò

ò

6.5

ò æ

6.0

æ æ æ æ æ

5.5

æ æ

5.0

Π 36

Π 18

Π 12

Π 9

Π 6

5Π 36

7Π 36

æ

æ

2Π 9

Π 4

U1 8.2474 8.2474 8.2474 8.2474 8.2474 8.2474 8.2474 8.2474 8.2474 8.2474

U2 8.0495 8.0495 8.0495 8.0495 8.0495 8.0495 8.0495 8.0495 8.0495 8.0495

U3 8.7179 8.6436 8.5229 8.3511 8.1219 7.8266 7.4554 7.0057 6.5274 6.2732

Angle Τi

FIGURE 2. The mean µ 2 = (16, 16) is fixed, λ1 = 1, λ2 = 10 and τ varies from 0 to π4 . LB

U1

U2

U3

Bounds à à à à à

ì

4.5

à à

ì

ì

ò

ò

ì ì

à

ì ì

LB 3.5226 3.5236 3.5983 3.6839 3.7679 3.8476 3.9223 3.9921

ì

ò

ò ò

4.0

ò ò ò

æ

æ æ

æ æ æ

3.5

3.0

FIGURE 3. to 8.

æ

æ

1

2

3

4

5

6

7

8

U1 4.6688 4.6327 4.6775 4.7362 4.7959 4.8533 4.9075 4.9585

U2 4.6291 4.5446 4.5596 4.5995 4.6469 4.6961 4.7448 4.792

U3 4.0909 3.8838 3.8568 3.8819 3.926 3.9776 4.0315 4.0855

k

The mean µ 2 = (4, 4) and the angle τ = 0 are fixed, λ1 = k, λ2 = 10k and k varies from 1

LB

U1

U2

U3

Bounds à

LB 0.8998 0.8214 0.7666 0.7294 0.706 0.6931 0.6886 0.6904 0.697 0.7073 0.7203

ì à

1.1

ì

à

1.0 à

0.9

ì

ò

æ

à

à à

à

à

ì

ò

æ

ì

0.8

ì

ò

æ

ì ì

ò

æ

0.6

ò

æ

0.7

ò ì æ

ò

ò

6

7

8

ò æ

æ

æ

U1 1.2237 1.1102 1.0205 0.9504 0.8986 0.8657 0.853 0.8589 0.8781 0.9052 0.9363 à

ì

ò æ

U2 1.1373 1.0174 0.9181 0.8341 0.7621 0.6999 0.7408 0.7815 0.8213 0.8598 0.897 à

ì

ò æ

U3 0.9107 0.8325 0.7776 0.7403 0.7166 0.7035 0.6985 0.6999 0.7061 0.7159 0.7285

j 1

2

3

4

5

FIGURE 4. The mean µ 2 = (0.5, 0.5) and the angle τ = j varies from 0 to 8.

π 4

are fixed, λ1 = 1 and λ2 = 0.5 + 0.1 j, where

In these simulations we can see that the upper bound U3 introduced here is good for small variation (the distance in the submanifold approaches the Fisher-Rao distance). We see also, a much better tightness of this bound when compared with U2 or even with the optimized U3 in the case of bigger discrepancy between the eigenvalues and/ or bigger rotation angle of the eigenvectors of Σ2 . This was somehow expected, since U2 and U3 where derived using the distance between distributions which have covariance matrices which are one multiple of the other.

REFERENCES 1. S. Amari, and H. Nagaoka, Methods of Information Geometry, Translations of Mathematical Monographs, Vol.191, Am. Math. Soc., 2000. 2. C. Atkinson, and A. F. S. Mitchell, Rao’s Distance Measure, Samkhyã- The Indian Journal of Statistics, 43:345-365, 1981. 3. J. Burbea, Informative geometry of probability spaces, Expositiones Mathematica 4, 347-378, 1986. 4. M. Calvo, and J. M. Oller, A distance between multivariate normal distributions based in an embedding into the Siegel group, Journal of Multivariate Analysis 35.2, 223-242, 1990. 5. M. Calvo, and J.M. Oller, An explicit solution of information geodesic equations for the multivariate normal model, Statistics and Decisions 9, 119-138, 1991. 6. M. H. M. Costa, T.M. Cover, On the Similarity of the Entropy Power Inequality and the BrunnMinkowski Inequality, IEEE Trans. Inform. Theory, 30(6):837-839, 1984. 7. S. I. R. Costa, S. A. Santos, and J. E. Strapasson, Fisher information distance: a geometrical reading, arXiv preprint arXiv:1210.2354, 2012. 8. F. Nielsen and F. Barbaresco (Eds.), Geometric Science of Information, First International Conference, GSI 2013 Paris, France, August 2013, Lecture Notes in Computer Science 8085 (2013). 9. F. Nielsen, R. Bhatia, Matrix Information Geometry, Springer-Verlag, Berlin, Heidelberg, 2013. 10. C. R. Rao, Information and the accuracy attainable in the estimation of statistical parameters, Bulletin of the Calcutta Math. Soc. 37:81-91, 1945.