Information geometry of Bayesian statistics

We regard S as a manifold with a local coordinate system {Ξ; ξ. 1. ,...,ξ n} g. F. = (g. F ij. ) is the Fisher metric (Fisher information matrix) of S def. ⇐⇒ g. F ij. (ξ) :=.
111KB taille 2 téléchargements 344 vues
Information geometry of Bayesian statistics MATSUZOE Hiroshi Nagoya Institute of Technology 1 2 3 4 5

Introduction Geometry of statistical models Bayesian inference of curved exponential families Equiaffine structures and Tchebychev vector fields Deformed exponential families

Information geometry of Bayesian statistics

Example (Bernoulli Trials) Ω = {0, 1} x = 1 : success event x = 0 : failure event η : success probability (1 − η : failure probability) p(x; η) = η x(1 − η)1−x the Bernoulli distribution Suppose that η is unknown. Let us infer the parameter η from experiments (trials). ( ) 298 3 trials: 500, success events: 298 =⇒ η = ≈ 500 5 ) ( maximum likelihood estimator 3 2 (?). trial: 1, success event: 1 =⇒ η = 1 We may answer (?), 3 4 Bayesian estimator a prior =⇒ a non-informative prior =⇒

a volume element a parallel volume element

A prior distribution gives a weight to the parameter space.

2/34

Information geometry of Bayesian statistics

Example (Deformed expectations) 1 [ 2 ] 1−q 1 − q (x − µ) 1 1− p(x; µ, σ) = Zq 3−q σ2 +

Student’s t-distribution (q-normal distribution)

5 q distribution mean variance q≥ 1 normal ◦ ◦ 3 the variance does not exist. n+2 Student t ◦ ◦ q≥2 n+1 the mean does not exist. 2 Cauchy × × escort distribution (in anomalous statistics) Pq (x) : the escort distribution of p(x) ∫ 1 def ⇐⇒ Pq (x) = p(x)q , Zq (p) = p(x)q dx Zq (p) Ω





Eq,p[f (x)] : the q-expectation of the random variable f (x) ∫ ∫ 1 def ⇐⇒ Eq,p[f (x)] = f (x)Pq (x)dx = f (x)p(x)q dx Zq (p) Ω Ω 



An escort distribution gives a weight to the sample space. 3/34

2 GEOMETRY OF STATISTICAL MODELS

2

Geometry of statistical models





Definition 2.1 S is a statistical model or a parametric model on Ω def ⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ s.t. ∫ { } S = p(x; ξ) p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn . Ω





We regard S as a manifold with a local coordinate system {Ξ; ξ 1, . . . , ξ n} 



F g F = (gij ) is the Fisher metric (Fisher information matrix) of S ∫ ∂ ∂ def F log p(x; ξ) j log p(x; ξ)p(x; ξ)dx ⇐⇒ gij (ξ) := i ∂ξ Ω ∂ξ ( ) ∫ ∂ = ∂ipξ log pξ dx = Eξ [∂ilξ ∂j lξ ] j ∂ξ Ω



∂ipξ ∂ilξ =

(

def

) ⇐⇒ mixture representation, ∂i pξ def ⇐⇒ exponential representation. (the score function) pξ 4/34



2 GEOMETRY OF STATISTICAL MODELS 



(α) For α ∈ R, we define the α-connection ∇ by ) ] [( 1 − α def (α) (α) ∂ilξ ∂j lξ (∂k lξ ) ⇐⇒ g(∇∂i ∂j , ∂k ) := Γij,k (ξ) := Eξ ∂i ∂j l ξ + 2

∇(e) := ∇(1) : the exponential connection ∇(m) := ∇(−1) : the mixture connection



(−α)

(α)

(1) ∂ig(∂j , ∂k ) = g(∇∂i ∂j , ∂k ) + g(∂j , ∇∂i



∂k )

( ∇(α) and ∇(−α) are called dual (or conjugate) with respect to g) α F (α) (0) (2) g(∇∂i ∂j , ∂k ) = g(∇∂i ∂j , ∂k ) − T (∂i, ∂j , ∂k ) 2 TξF (∂i, ∂j , ∂k ) := Eξ [(∂ilξ )(∂j lξ )(∂k lξ )] : the skewness or the cubic form F F (α) F (S, g , T ) (or (S, ∇ , g )) is called an invariant statistical manifold.  

A statistical{model S e is an exponential family } n ∑ def θ iFi(x) − ψ(θ)] , ⇐⇒ Se = p(x; θ) p(x; θ) = exp[Z(x) + i=1

Z, F1, · · · , Fn : functions on Ω ψ : a function on the parameter space Θ i [θ ] : natural parameters  5/34



2 GEOMETRY OF STATISTICAL MODELS

Normal distributions Ω = R, n = 2, ξ = (µ, σ) ∈ R2+ (the upper half plane). [ { 2 ]} 1 (x − u) exp − S = p(x; µ, σ) p(x; µ, σ) = √ 2σ 2 2πσ The Fisher metric is ( ) ( ) 1 1 1 0 (gij ) = 2 S is a space of constant negative curvature − . 0 2 σ 2 



∇(1) and ∇(−1) are flat affine connections. In addition, ) ( 1 2 µ (θ ) 1 1 π θ 1 = 2 , θ 2 = − 2 ψ(θ) = − + log − 2 2 σ 2σ 4θ 2 θ ] [ [ 1 ] 1 (x − u)2 2 2 = exp xθ + x θ − ψ(θ) . =⇒ p(x; µ, σ) = √ exp − 2 2σ 2πσ {θ 1, θ 2}: natural parameters. (∇(1)-geodesic coordinate system) [ 2] η1 = E[x] = µ, η2 = E x = σ 2 + µ2. {η1, η2}: moment parameters. (∇(−1)-geodesic coordinate system)



6/34



2 GEOMETRY OF STATISTICAL MODELS

Bernoulli distributions Ω = {0, 1}, n = 1, ξ = η. C(x) = 0,

F (x) = x,

θ = log

η

1−η ψ(θ) = − log(1 − η) = log(1 + eθ )

,

Then we obtain

[ ] x 1−x p(x; ξ) = η (1 − η) = exp log η (1 − η) = exp [xθ − ψ(θ)] . x

1−x

This implies that Bernoulli distributions are an exponential family. The expectation parameter is: E[x] = 1 · η + 0 · (1 − η) = η The Fisher metric is g(η) =

1 η(1 − η)

7/34

2 GEOMETRY OF STATISTICAL MODELS

Statistical inference for curved exponential families S : an exponential family M : a curved exponential family embedded into S x1, · · · , xN : N independent observations generated by p(x; u) ∈ M Set xN = (x1, · · · , xN ) L : likelihood function def

⇐⇒ L(u) = p(x1; u) · · · p(xN ; u) n

= Π p(xi; u) i=1

= p(xN ; u) u ˆ : the maximum likelihood estimator def

⇐⇒ u ˆ = argmax L(u) u∈U

8/34

2 GEOMETRY OF STATISTICAL MODELS

Suppose that p(x; θ), p(x; θ ′) ∈ S.





D : the Kullback-Leibler divergence (or the relative entropy) of S def

⇐⇒ D is a function on S × S such that ∫ p(θ) ′ D(p(θ)||p(θ )) = p(θ)dx. log ′ p(θ ) Ω 

x ¯

=

ηˆi

=

1 ∑ N N 1 ∑ N



xi

(the sample mean of xN )

Fi(xj )

(the sample mean of Fi.)

j=1

ϕ(θ) = Eθ [log p(θ)]

(−ϕ(θ) is the entropy of p(θ))

Then the Kullback-Leibler divergence is given by 1 D(p(η)||p(u)) ˆ = ϕ(η) ˆ − log L(u). N The maximum likelihood estimation u ˆ is the point in M which minimizes the divergence from p(η). ˆ 9/34

2 GEOMETRY OF STATISTICAL MODELS

10/34

3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES

3

Bayesian inference of curved exponential families

S : an exponential family M : a curved exponential family embedded into S xN : N independent observations generated by p(x; θ(u)) ∈ M



ρ(u)du : a prior distribution e.g. ρ˜(0)du : the Jeffreys prior of M . (det |gab|)1/2 def (0) du ⇐⇒ ρ˜ du = ∫ 1/2 du U (det |gab |)

g : the Fisher metric of M .



ρ′(u|x) : a posterior distribution p(x; u)ρ(u) def ′ ∫ ⇐⇒ ρ (u|x) = du U p(x; u)ρ(u)du fρ[xN ](x) : a Bayesian mixture∫ distribution def ⇐⇒ fρ[xN ](x) = p(x; u)ρ′(u|xN )du U ( ) u f˜ρ[xN ] : a projected Bayesian estimation ( ) ( ) def N N N ˜ ⇐⇒ u fρ[x ] = argmin D fρ[x ]||p(x ; u) u∈U 11/34





3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES

Example (Bernoulli Trial) Ω = {0, 1} p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors density ρ(η) w.r.t. dη

dθ dη

dθ Jeffreys dη 1 √ 1 = η(1−η) 1 η(1−η)

where dθ and dη are uniform priors with respect to θ and η, respectively. Uniformity depends on choice of local coordinate systems What is the good prior from the viewpoint of geometry?

12/34

3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES

α-parallel priors 



Recall the Bayes formula: ′

ρ (u|x) = ∫

p(x; u)ρ(u) U p(x; u)ρ(u)du

The integral is carried out on the parameter space =⇒ A prior distribution can be regarded as a volume element on M .  M : a statistical model g : the Fisher metric on M ∇(0) : the Levi-Civita connection with respect to g ω (0) : the Jeffreys prior distribution Proposition 3.1 ∇(0)ω (0) = 0 





Definition 3.2 def ω (α) is an α-(parallel) prior ⇐⇒ ∇(α)ω (α) = 0



For an exponential family dθ ↔ 1-parallel prior dη ↔ (−1)-parallel prior a prior ⇐⇒ a volume element a non-informative prior ⇐⇒ a parallel volume element 13/34



3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES

Example (Bernoulli Trial) Ω = {0, 1}, p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors density ρ(η) w.r.t. dη



Jeffreys dθ 1 1 √ = dη η(1 − η) η(1 − η)

Experiment N = 1, success event k = 1 dθ Jeffreys 3 the projected Bayes estimator 1 4 General case k + 12 k the projected Bayes estimator N N +1 14/34

dη 1

dη 2 3 k+1 N +2

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES

4

Equiaffine structures and Tchebychev structures

M : an m-dimensional manifold ∇ : a torsion-free affine connection on M ω : a volume element of M (i.e. an n-th differential form on M ) 



Definition 4.1 {∇, ω} is (locally) equiaffine structure on M . def

⇐⇒ ∇ω = 0 ∇ is called a (locally) equiaffine connection, ω is called a parallel volume element. 



If a manifold M has an equiaffine structure, M has an uniformly distributed volume form. An α-parallel prior is an uniform prior distribution with respect to given α-connection. In particular, the Jeffreys prior is the uniform distribution with respect to the Levi-Civita connection of the Fisher metric.

15/34

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES

M : a manifold g : a Riemannian metric on M T : a totally symmetric (0, 3)-tensor field on M (skewness tensor field, cubic form) 



Definition 4.2 (M, g, T ) is called a statistical manifold. 



For fixed α ∈ R, an α-connection is defined by α (α) (0) g(∇X Y, Z) := g(∇X Y, Z) − T (X, Y, Z) 2 where ∇(0) is the Levi-Civita connection with respect to g.





Definition 4.3 (M, g, T ) : a statistical manifold τ : the Tchebychev form of (M, g, T ) # τ : the Tchebychev vector field of (M, g, T ) def

⇐⇒

τ (X) := traceg {(Y, Z) 7→ T (X, Y, Z)}, g(#τ, X) := τ (X)





α

− τ : 1st Koszul form if (M, ∇(α), g) is a Hessian manifold. 2 16/34

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES 



Proposition 4.4 (M, g, T ) : a statistical manifold ∇(α), ∇(−α) : affine connections determined by g, T ϕ : a function on M which determines Tchebychev form. (τ = dϕ) Then {∇(α), ω} : equiaffine ⇐⇒ {∇(−α), e−αϕω} : equiaffine





The Tchebychev vector field is the gradient vector field of the logarithmic ratio of volumes. 



Proposition u ˆ : the g ˆ, Tˆ : the u(f˜(α)[xN ]) =⇒

4.5 maximum likelihood estimator (MLE) Fisher metric and the cubic form with respect to MLE : the projected Bayesian estimator with respect to α-parallel prior ( ) 1 − α 1 c ˜(α) N c ab cd ˆ u (f [x ]) = u ˆ + Tabdg ˆ g ˆ +o 2N ( ) N 1 − α# c 1 c = u ˆ + τˆ + o 2N N





17/34

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES

Example (Bernoulli Trial) Ω = {0, 1} p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors dθ Jeffreys dη dθ 1 1 √ density ρ(η) w.r.t. dη = 1 dη η(1−η) η(1−η)

Experiment N = 1, success event k = 1 dθ Jeffreys dη 3 2 the projected Bayes estimator 1 4 3 General case k + 21 k k+1 the projected Bayes estimator N N +1 N +2 18/34

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES

4.2

Equiaffine structures on submanifolds

(S, g, T ) : a statistical manifold with a flat affine connection ∇ ˜ : a submanifold of S (M, g ˜, ∇) {νβ } : a basis of normal space of M T , T˜ : the skewness tensor field of S and M (u1, · · · , un) : a local coordinate system on S m+1 n such that M = {u = · · · = u = 0} ) ( ∂ g , νβ = 0 on M (a = 1, · · · , m, β = m + 1, · · · , n) a ∂u ˜ XY + ∇X Y = ∇

n ∑

hβ (X, Y )νβ

β=m+1 n ∑

∇X νβ = −Sβ (X) +

β=m+1

19/34

µββ (X)νβ

4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES

Theorem 4.6 ω : ∇-parallel volume form ω ˜ := ω(∗, · · · , ∗, νm+1, · · · , νn) : the induced volume form τ (X) := traceg T (∗, ∗, X) τ˜(X) := traceg˜T˜(∗, ∗, X) τ = τ˜ on M =⇒ ∇˜ ω=0 Proof: ˜Yω (∇ ˜ )(X1, · · · , Xm) = Hence



µα ω (X1, · · · , Xm) α (Y )˜

α



α ˜ µ ˜ : ∇-parallel. α α = 0 ⇐⇒ ω

On the other hand ∑ ∑ α −2 µα(X) = T (να, να, X) α

α

= traceg T (∗, ∗, X) − traceg˜T˜(∗, ∗, X) = τ (X) − τ˜(X)

20/34

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

5

Geometry of deformed exponential families



q-exponential, q-logarithm (q > 0) expq x := (1 + (1 − q)x) x1−q − 1 logq x := 1−q

1 1−q



(1 + (1 − q)x > 0)

q-exponential

(x > 0)

q-logarithm

q → 1, these are the standard exponential function, and the standard logarithm function, respectively. 



F1(x), . . . , Fn(x) : random variables on Ω θ ={ {θ 1, . . . , θ n} : parameters } ∫ p(x; θ)dx = 1 : statistical model S = p(x, θ) p(x; θ) > 0, Ω





Definition 5.1 Sq = {p(x; θ)} : q-exponential family [ n ] } { ∑ def θ iFi(x) − ψ(θ) , p(x, θ) ∈ S ⇐⇒ Sq := p(x; θ) p(x; θ) = expq 

i=1

ψ : strictly convex ⇐⇒ {∂1 logq p(x; θ), . . . , ∂n logq p(x; θ)} is linearly independent. 21/34



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Example 5.2 (q-normal distributions) 1 [ 2 ] 1−q 1 1 − q (x − µ) p(x; µ, σ) = 1− zq 3−q σ2 Set θ1 =

2 3−q

zqq−1 ·

µ σ

, 2

θ2 = −

Then

{

1 3−q

zqq−1 ·

1 σ2

.

} ( 2) 1 1 1 1 − q (x − µ) −1 logq pq (x) = (p1−q − 1) = 1 − 1−q 2 1−q 1 − q zq 3−q σ q−1 q−1 2 zqq−1 z z −1 2µzqq−1 µ q q 2 x− x − = · 2+ 2 2 (3 − q)σ (3 − q)σ 3−q σ 1−q = θ 1x + θ 2x2 − ψ(θ) zqq−1 − 1 (θ 1)2 − ψ(θ) = − 2 4θ 1−q

22/34

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Example 5.3 (discrete distributions) Ω = {x0, x1, . . . , xn} { } n n ∑ ∑ S = p(x; η) ηi > 0, ηi = 1, p(x; η) = ηiδi(x) , η0 = 1 −

n ∑

i=0

i=0

ηi

i=1

n-dimensional probability simplex ) 1 ( i 1−q 1−q (ηi) − (η0) = logq p(xi) − logq p(x0) Set θ = 1−q Then n ∑ 1 1 logq p(x) = (p1−q (x) − 1) = ηiq δi(x) 1−q 1 − q i=0 { n } ∑( ) 1 1−q 1−q = (ηi) − (η0) δi(x) + (η0)1−q − 1 1 − q i=1 ψ(θ) = − logq η0

23/34

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Remark 5.4 S = {p(x; θ)}: (standard) exponential family, ∂i = F gij (θ) = = F Tijk (θ) = =

E[(∂i log p(x; θ))(∂j log p(x; θ))] ∂i∂j ψ(θ) :the Fisher metric E[(∂i log p(x; θ))(∂j log p(x; θ))(∂k log p(x; θ))] ∂i∂j ∂k ψ(θ) :the cubic form



∂ ∂θ i



Definition 5.5 Sq = {p(x; θ)}: a q-exponential family q gij (θ) = ∂i∂j ψ(θ) : the q-Fisher metric q Tijk (θ) = ∂i∂j ∂k ψ(θ) : the q-cubic form 



On a deformed exponential family, the Fisher and the Hessian structures are different. (There are two different dually flat structures.) 1 q 1 q q(e) q(0) q(m) q(0) Set Γij,k := Γij,k − Tijk , Γij,k := Γij,k + Tijk , 2 2 q(0) where Γij,k is the connection coefficient of the Levi-Civita connection with respect to the q-Fisher metric g q . ∇q(e) : the q-exponential connection ∇q(m) : the q-mixture connection 24/34

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES 



Proposition 5.6 For Sq , the following hold: (1) (Sq , g q , ∇q(e), ∇q(m)) is a dually flat space. (2) {θ i} is a ∇q(e)-affine coordinate system on Sq . (3) ψ is the potential of g q with respect to {θ i}, that is, q gij (θ) = ∂i∂j ψ(θ).

(4) Set the q-expectation of Fi(x) by ηi = Eq,p[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θ i} with respect to g q . (5) Set ϕ(η) = Eq,p[logq p(x; θ)] =⇒ ϕ(η) is the potential of g q with respect to {ηi}.



Pq (x) : the escort distribution of p(x) ∫ 1 def ⇐⇒ Pq (x) = p(x)q , Zq (p) = p(x)q dx Zq (p) Eq,p[f (x)] : the q-expectation of the random variable f (x) ∫ ∫ 1 def ⇐⇒ Eq,p[f (x)] = f (x)Pq (x)dx = f (x)p(x)q dx Zq (p) 25/34



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES 



Proposition 5.6 For Sq , the following hold: (1) (Sq , g q , ∇q(e), ∇q(m)) is a dually flat space. (2) {θ i} is a ∇q(e)-affine coordinate system on Sq . (3) ψ is the potential of g q with respect to {θ i}, that is, q gij (θ) = ∂i∂j ψ(θ).

(4) Set the q-expectation of Fi(x) by ηi = Eq,p[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θ i} with respect to g q . (5) Set ϕ(η) = Eq,p[logq p(x; θ)] =⇒ ϕ(η) is the potential of g q with respect to {ηi}.







normalized Tsallis relative entropy (q-relative entropy) [ ] q D (p, r) = Eq,p logq p(x) − logq r(x) (↓ α-divergence) ∫ ) 1 − p(x)q r(x)1−q dx ( q = = D (1−2q)(p, r) . (1 − q)Zq (p) Zq (p)

26/34

 



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES 



α-divergence (α = 1 − 2q) ∫ 1 D (1−2q) (p(x), r(x)) = p(x)q {logq p(x) − logq r(x)}dx q Ω





D (1−2q) induces a non-flat invariant statistical manifold (Sq ,∇(1−2q), g F ). 

 normalized Tsallis relative entropy (χ-relative entropy) [ ] q D (p(x), r(x)) = Eq,p logq p(x) − logq r(x) ( ) ∫ p(x)q q = {logq p(x) − logq r(x)}dx = D (1−2q)(p, r) Zq (p) Ω Zq (p)





D q induces a Hessian manifold (flat statistical mfd.) (Sq , ∇q(m), g q ). In general, if two contrast functions have the following relation: D(p, r) = f (p)D(p, r), then induced statistical manifolds are 1-conformally equivalent.



ν(x)

ν(x) × −→ Zq (ν) pos. measure prob. measure



Normalization of a positive measure to a probability measure is NOT a trivial problem.





27/34

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

5.2 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)]

(p1(x) > 0, p2(y) > 0)





x > 0, y > 0 and x1−q + y 1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y 1 [ 1−q ] 1−q def 1−q ⇐⇒ x ⊗q y := x +y −1 [ ] = expq logq x + logq y 

expq x ⊗q expq y = expq (x + y), logq (x ⊗q y) = logq x + logq y. def

X and Y : q-independent ⇐⇒ pq (x, y) = p1(x) ⊗q p2(y) X and Y : q-independent with m-normalization (mixture normalization) p1(x) ⊗q p2(y) def ⇐⇒ pq (x, y) = ∫ ∫ Zp1,p2 where Zp1,p2 = p1(x) ⊗q p2(y)dxdy Supp{pq (x,y)}⊂X Y 28/34



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

5.2 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)]

(p1(x) > 0, p2(y) > 0)





x > 0, y > 0 and x1−q + y 1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y 1 [ 1−q ] 1−q def 1−q ⇐⇒ x ⊗q y := x +y −1 [ ] = expq logq x + logq y 

expq x ⊗q expq y = expq (x + y), logq (x ⊗q y) = logq x + logq y. def

⇐⇒ pq (x, y) = p1(x) ⊗q p2(y) X and Y : q-independent X and Y : q-independent with e-normalization (exponential normalization) def

⇐⇒

pq (x, y) = p1(x) ⊗q p2(y) ⊗q expq (−c) ∫∫ where c is determined by pq (x, y)dxdy = 1 Supp{pq (x,y)}⊂X Y

29/34



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

5.3

Geometry for q-likelihood estimators

Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN } : N -observations from p(x; ξ) ∈ Sq .





Lq (ξ) : q-likelihood function def

⇐⇒

Lq (ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN ; ξ) ( ) N ∑ ⇐⇒ logq Lq (ξ) = logq p(xi; ξ) i=1



In the case q → 1, Lq is the standard likelihood function on Ξ. 





expq (x1 + x2 + · · · + xN ) = expq x1 ⊗q expq x2 ⊗q · · · ⊗q expq xN ( ) ) ( x2 xN · · · expq = expq x1 · expq ∑N −1 1 + (1 − q)x1 1 + (1 − q) i=1 xi   



Each measurement influences the others.

“q-independent”, but random variables are strongly correlated. 30/34

 

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

5.3

Geometry for q-likelihood estimators

Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN } : N -observations from p(x; ξ) ∈ Sq .





Lq (ξ) : q-likelihood function def

⇐⇒

Lq (ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN ; ξ) ) ( N ∑ ⇐⇒ logq Lq (ξ) = logq p(xi; ξ) i=1





In the case q → 1, Lq is the standard likelihood function on Ξ. 



ξˆ : the maximum q-likelihood estimator ( ) def ⇐⇒ ξˆ = argmax Lq (ξ) = argmax logq Lq (ξ) . ξ∈Ξ

ξ∈Ξ

 

 

Theorem 5.7 the q-likelihood is maximum ⇐⇒ the canonical divergence (q-relative entropy) is minimum.



31/34



5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Estimation preserving Corollary 5.8 Let 1 < q < 3,  1 [  2 ] 1−q  1 1 − q (x − µ) Nq (µ, σ 2) := p(x; µ, σ) p(x; µ, σ) = 1−   Zq 3−q σ2 Student’s t-distributions (q-normal distributions) {x1, . . . , xN } : q-independent N -observations from p(x; µ, σ) ∈ Sq =⇒ q-maximum likelihood estimators in mixture coordinates are



ηˆ1 =

N 1 ∑

N

xi ,

ηˆ2 =

i=1

N 1 ∑

N



x2i

i=1





N (µ, σ 2)

=⇒

ηˆ1 =

MLE Nq (µ, σ 2)

=⇒

ηˆ1 =

N 1 ∑

N

ηˆ2 =

i=1

N 1 ∑

N

xi ,

i=1

q-MLE

32/34

xi ,

ηˆ2 =

N 1 ∑

N

i=1

N 1 ∑

N

x2i

i=1

x2i

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Estimator preserving property • MLE for normal distribution

=⇒ ηˆ1 =

N 1 ∑

N

xi ,

ηˆ2 =

i=1

N 1 ∑

N

x2i

i=1

• Nq (µ, σ 2) : q-normal distribution (Student’s t-distribution) – maximization of Tsallis entropy (M. Tanaka (2002) ?) – infinite mixtures of normal distributions ) ( ) ∫ ∞ ( 1 3 − q q − 1 2 pq (x; µ, σ 2) = N µ, Gamma t; , · 2 dt t 2(q − 1) 3 − q σ 0 Bayesian expression of q-normal distribution • escort distributions, deformed algebras These are natural objects from the viewpoint of differential geometry. • q-MLE for q-normal distributions =⇒ ηˆ1 =

N 1 ∑

N

xi, ηˆ2 =

i=1

A weight for parameter space and a weight for sample space are well balanced. 33/34

N 1 ∑

N

i=1

x2i

5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES

Summary Bayesian statistics A prior distribution: an n-th differential form on a statistical model



∇(α)ω (α) = 0



α-parallel prior

Tchebychev form: τ (X) := traceg {(Y, Z) 7→ T (X, Y, Z)} Tchebychev vector field: difference between the MLE and the projected Bayes estimator  



anomalous statistics (Tsallis statistics)



the q-expectation and the escort distribution of p(x) ∫ ∫ 1 f (x)p(x)q dx Eq,p[f (x)] = f (x)Pq (x)dx = Zq (p) dual coordinates {ηi}: deformed independence:

ηi = Eq,p[Fi(x)]

1 [ ] 1−q 1−q 1−q p(x) ⊗q p(y) := p(x) + p(y) −1





Bayesian statistics: a weight on a parameter space anomalous statistics: a weight on a sample space

34/34