Information geometry of Bayesian statistics MATSUZOE Hiroshi Nagoya Institute of Technology 1 2 3 4 5
Introduction Geometry of statistical models Bayesian inference of curved exponential families Equiaffine structures and Tchebychev vector fields Deformed exponential families
Information geometry of Bayesian statistics
Example (Bernoulli Trials) Ω = {0, 1} x = 1 : success event x = 0 : failure event η : success probability (1 − η : failure probability) p(x; η) = η x(1 − η)1−x the Bernoulli distribution Suppose that η is unknown. Let us infer the parameter η from experiments (trials). ( ) 298 3 trials: 500, success events: 298 =⇒ η = ≈ 500 5 ) ( maximum likelihood estimator 3 2 (?). trial: 1, success event: 1 =⇒ η = 1 We may answer (?), 3 4 Bayesian estimator a prior =⇒ a non-informative prior =⇒
a volume element a parallel volume element
A prior distribution gives a weight to the parameter space.
2/34
Information geometry of Bayesian statistics
Example (Deformed expectations) 1 [ 2 ] 1−q 1 − q (x − µ) 1 1− p(x; µ, σ) = Zq 3−q σ2 +
Student’s t-distribution (q-normal distribution)
5 q distribution mean variance q≥ 1 normal ◦ ◦ 3 the variance does not exist. n+2 Student t ◦ ◦ q≥2 n+1 the mean does not exist. 2 Cauchy × × escort distribution (in anomalous statistics) Pq (x) : the escort distribution of p(x) ∫ 1 def ⇐⇒ Pq (x) = p(x)q , Zq (p) = p(x)q dx Zq (p) Ω
Eq,p[f (x)] : the q-expectation of the random variable f (x) ∫ ∫ 1 def ⇐⇒ Eq,p[f (x)] = f (x)Pq (x)dx = f (x)p(x)q dx Zq (p) Ω Ω
An escort distribution gives a weight to the sample space. 3/34
2 GEOMETRY OF STATISTICAL MODELS
2
Geometry of statistical models
Definition 2.1 S is a statistical model or a parametric model on Ω def ⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ s.t. ∫ { } S = p(x; ξ) p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn . Ω
We regard S as a manifold with a local coordinate system {Ξ; ξ 1, . . . , ξ n}
F g F = (gij ) is the Fisher metric (Fisher information matrix) of S ∫ ∂ ∂ def F log p(x; ξ) j log p(x; ξ)p(x; ξ)dx ⇐⇒ gij (ξ) := i ∂ξ Ω ∂ξ ( ) ∫ ∂ = ∂ipξ log pξ dx = Eξ [∂ilξ ∂j lξ ] j ∂ξ Ω
∂ipξ ∂ilξ =
(
def
) ⇐⇒ mixture representation, ∂i pξ def ⇐⇒ exponential representation. (the score function) pξ 4/34
2 GEOMETRY OF STATISTICAL MODELS
(α) For α ∈ R, we define the α-connection ∇ by ) ] [( 1 − α def (α) (α) ∂ilξ ∂j lξ (∂k lξ ) ⇐⇒ g(∇∂i ∂j , ∂k ) := Γij,k (ξ) := Eξ ∂i ∂j l ξ + 2
∇(e) := ∇(1) : the exponential connection ∇(m) := ∇(−1) : the mixture connection
(−α)
(α)
(1) ∂ig(∂j , ∂k ) = g(∇∂i ∂j , ∂k ) + g(∂j , ∇∂i
∂k )
( ∇(α) and ∇(−α) are called dual (or conjugate) with respect to g) α F (α) (0) (2) g(∇∂i ∂j , ∂k ) = g(∇∂i ∂j , ∂k ) − T (∂i, ∂j , ∂k ) 2 TξF (∂i, ∂j , ∂k ) := Eξ [(∂ilξ )(∂j lξ )(∂k lξ )] : the skewness or the cubic form F F (α) F (S, g , T ) (or (S, ∇ , g )) is called an invariant statistical manifold.
A statistical{model S e is an exponential family } n ∑ def θ iFi(x) − ψ(θ)] , ⇐⇒ Se = p(x; θ) p(x; θ) = exp[Z(x) + i=1
Z, F1, · · · , Fn : functions on Ω ψ : a function on the parameter space Θ i [θ ] : natural parameters 5/34
2 GEOMETRY OF STATISTICAL MODELS
Normal distributions Ω = R, n = 2, ξ = (µ, σ) ∈ R2+ (the upper half plane). [ { 2 ]} 1 (x − u) exp − S = p(x; µ, σ) p(x; µ, σ) = √ 2σ 2 2πσ The Fisher metric is ( ) ( ) 1 1 1 0 (gij ) = 2 S is a space of constant negative curvature − . 0 2 σ 2
∇(1) and ∇(−1) are flat affine connections. In addition, ) ( 1 2 µ (θ ) 1 1 π θ 1 = 2 , θ 2 = − 2 ψ(θ) = − + log − 2 2 σ 2σ 4θ 2 θ ] [ [ 1 ] 1 (x − u)2 2 2 = exp xθ + x θ − ψ(θ) . =⇒ p(x; µ, σ) = √ exp − 2 2σ 2πσ {θ 1, θ 2}: natural parameters. (∇(1)-geodesic coordinate system) [ 2] η1 = E[x] = µ, η2 = E x = σ 2 + µ2. {η1, η2}: moment parameters. (∇(−1)-geodesic coordinate system)
6/34
2 GEOMETRY OF STATISTICAL MODELS
Bernoulli distributions Ω = {0, 1}, n = 1, ξ = η. C(x) = 0,
F (x) = x,
θ = log
η
1−η ψ(θ) = − log(1 − η) = log(1 + eθ )
,
Then we obtain
[ ] x 1−x p(x; ξ) = η (1 − η) = exp log η (1 − η) = exp [xθ − ψ(θ)] . x
1−x
This implies that Bernoulli distributions are an exponential family. The expectation parameter is: E[x] = 1 · η + 0 · (1 − η) = η The Fisher metric is g(η) =
1 η(1 − η)
7/34
2 GEOMETRY OF STATISTICAL MODELS
Statistical inference for curved exponential families S : an exponential family M : a curved exponential family embedded into S x1, · · · , xN : N independent observations generated by p(x; u) ∈ M Set xN = (x1, · · · , xN ) L : likelihood function def
⇐⇒ L(u) = p(x1; u) · · · p(xN ; u) n
= Π p(xi; u) i=1
= p(xN ; u) u ˆ : the maximum likelihood estimator def
⇐⇒ u ˆ = argmax L(u) u∈U
8/34
2 GEOMETRY OF STATISTICAL MODELS
Suppose that p(x; θ), p(x; θ ′) ∈ S.
D : the Kullback-Leibler divergence (or the relative entropy) of S def
⇐⇒ D is a function on S × S such that ∫ p(θ) ′ D(p(θ)||p(θ )) = p(θ)dx. log ′ p(θ ) Ω
x ¯
=
ηˆi
=
1 ∑ N N 1 ∑ N
xi
(the sample mean of xN )
Fi(xj )
(the sample mean of Fi.)
j=1
ϕ(θ) = Eθ [log p(θ)]
(−ϕ(θ) is the entropy of p(θ))
Then the Kullback-Leibler divergence is given by 1 D(p(η)||p(u)) ˆ = ϕ(η) ˆ − log L(u). N The maximum likelihood estimation u ˆ is the point in M which minimizes the divergence from p(η). ˆ 9/34
2 GEOMETRY OF STATISTICAL MODELS
10/34
3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES
3
Bayesian inference of curved exponential families
S : an exponential family M : a curved exponential family embedded into S xN : N independent observations generated by p(x; θ(u)) ∈ M
ρ(u)du : a prior distribution e.g. ρ˜(0)du : the Jeffreys prior of M . (det |gab|)1/2 def (0) du ⇐⇒ ρ˜ du = ∫ 1/2 du U (det |gab |)
g : the Fisher metric of M .
ρ′(u|x) : a posterior distribution p(x; u)ρ(u) def ′ ∫ ⇐⇒ ρ (u|x) = du U p(x; u)ρ(u)du fρ[xN ](x) : a Bayesian mixture∫ distribution def ⇐⇒ fρ[xN ](x) = p(x; u)ρ′(u|xN )du U ( ) u f˜ρ[xN ] : a projected Bayesian estimation ( ) ( ) def N N N ˜ ⇐⇒ u fρ[x ] = argmin D fρ[x ]||p(x ; u) u∈U 11/34
3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES
Example (Bernoulli Trial) Ω = {0, 1} p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors density ρ(η) w.r.t. dη
dθ dη
dθ Jeffreys dη 1 √ 1 = η(1−η) 1 η(1−η)
where dθ and dη are uniform priors with respect to θ and η, respectively. Uniformity depends on choice of local coordinate systems What is the good prior from the viewpoint of geometry?
12/34
3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES
α-parallel priors
Recall the Bayes formula: ′
ρ (u|x) = ∫
p(x; u)ρ(u) U p(x; u)ρ(u)du
The integral is carried out on the parameter space =⇒ A prior distribution can be regarded as a volume element on M . M : a statistical model g : the Fisher metric on M ∇(0) : the Levi-Civita connection with respect to g ω (0) : the Jeffreys prior distribution Proposition 3.1 ∇(0)ω (0) = 0
Definition 3.2 def ω (α) is an α-(parallel) prior ⇐⇒ ∇(α)ω (α) = 0
For an exponential family dθ ↔ 1-parallel prior dη ↔ (−1)-parallel prior a prior ⇐⇒ a volume element a non-informative prior ⇐⇒ a parallel volume element 13/34
3 BAYESIAN INFERENCE OF CURVED EXPONENTIAL FAMILIES
Example (Bernoulli Trial) Ω = {0, 1}, p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors density ρ(η) w.r.t. dη
dθ
Jeffreys dθ 1 1 √ = dη η(1 − η) η(1 − η)
Experiment N = 1, success event k = 1 dθ Jeffreys 3 the projected Bayes estimator 1 4 General case k + 12 k the projected Bayes estimator N N +1 14/34
dη 1
dη 2 3 k+1 N +2
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
4
Equiaffine structures and Tchebychev structures
M : an m-dimensional manifold ∇ : a torsion-free affine connection on M ω : a volume element of M (i.e. an n-th differential form on M )
Definition 4.1 {∇, ω} is (locally) equiaffine structure on M . def
⇐⇒ ∇ω = 0 ∇ is called a (locally) equiaffine connection, ω is called a parallel volume element.
If a manifold M has an equiaffine structure, M has an uniformly distributed volume form. An α-parallel prior is an uniform prior distribution with respect to given α-connection. In particular, the Jeffreys prior is the uniform distribution with respect to the Levi-Civita connection of the Fisher metric.
15/34
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
M : a manifold g : a Riemannian metric on M T : a totally symmetric (0, 3)-tensor field on M (skewness tensor field, cubic form)
Definition 4.2 (M, g, T ) is called a statistical manifold.
For fixed α ∈ R, an α-connection is defined by α (α) (0) g(∇X Y, Z) := g(∇X Y, Z) − T (X, Y, Z) 2 where ∇(0) is the Levi-Civita connection with respect to g.
Definition 4.3 (M, g, T ) : a statistical manifold τ : the Tchebychev form of (M, g, T ) # τ : the Tchebychev vector field of (M, g, T ) def
⇐⇒
τ (X) := traceg {(Y, Z) 7→ T (X, Y, Z)}, g(#τ, X) := τ (X)
α
− τ : 1st Koszul form if (M, ∇(α), g) is a Hessian manifold. 2 16/34
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
Proposition 4.4 (M, g, T ) : a statistical manifold ∇(α), ∇(−α) : affine connections determined by g, T ϕ : a function on M which determines Tchebychev form. (τ = dϕ) Then {∇(α), ω} : equiaffine ⇐⇒ {∇(−α), e−αϕω} : equiaffine
The Tchebychev vector field is the gradient vector field of the logarithmic ratio of volumes.
Proposition u ˆ : the g ˆ, Tˆ : the u(f˜(α)[xN ]) =⇒
4.5 maximum likelihood estimator (MLE) Fisher metric and the cubic form with respect to MLE : the projected Bayesian estimator with respect to α-parallel prior ( ) 1 − α 1 c ˜(α) N c ab cd ˆ u (f [x ]) = u ˆ + Tabdg ˆ g ˆ +o 2N ( ) N 1 − α# c 1 c = u ˆ + τˆ + o 2N N
17/34
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
Example (Bernoulli Trial) Ω = {0, 1} p(x; η) = η x(1 − η)1−x η : an expectation parameter η θ = log a natural parameter 1−η 1 the Fisher information with respect to η g(η) = η(1 − η) priors dθ Jeffreys dη dθ 1 1 √ density ρ(η) w.r.t. dη = 1 dη η(1−η) η(1−η)
Experiment N = 1, success event k = 1 dθ Jeffreys dη 3 2 the projected Bayes estimator 1 4 3 General case k + 21 k k+1 the projected Bayes estimator N N +1 N +2 18/34
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
4.2
Equiaffine structures on submanifolds
(S, g, T ) : a statistical manifold with a flat affine connection ∇ ˜ : a submanifold of S (M, g ˜, ∇) {νβ } : a basis of normal space of M T , T˜ : the skewness tensor field of S and M (u1, · · · , un) : a local coordinate system on S m+1 n such that M = {u = · · · = u = 0} ) ( ∂ g , νβ = 0 on M (a = 1, · · · , m, β = m + 1, · · · , n) a ∂u ˜ XY + ∇X Y = ∇
n ∑
hβ (X, Y )νβ
β=m+1 n ∑
∇X νβ = −Sβ (X) +
β=m+1
19/34
µββ (X)νβ
4 EQUIAFFINE STRUCTURES AND TCHEBYCHEV STRUCTURES
Theorem 4.6 ω : ∇-parallel volume form ω ˜ := ω(∗, · · · , ∗, νm+1, · · · , νn) : the induced volume form τ (X) := traceg T (∗, ∗, X) τ˜(X) := traceg˜T˜(∗, ∗, X) τ = τ˜ on M =⇒ ∇˜ ω=0 Proof: ˜Yω (∇ ˜ )(X1, · · · , Xm) = Hence
∑
µα ω (X1, · · · , Xm) α (Y )˜
α
∑
α ˜ µ ˜ : ∇-parallel. α α = 0 ⇐⇒ ω
On the other hand ∑ ∑ α −2 µα(X) = T (να, να, X) α
α
= traceg T (∗, ∗, X) − traceg˜T˜(∗, ∗, X) = τ (X) − τ˜(X)
20/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
5
Geometry of deformed exponential families
q-exponential, q-logarithm (q > 0) expq x := (1 + (1 − q)x) x1−q − 1 logq x := 1−q
1 1−q
(1 + (1 − q)x > 0)
q-exponential
(x > 0)
q-logarithm
q → 1, these are the standard exponential function, and the standard logarithm function, respectively.
F1(x), . . . , Fn(x) : random variables on Ω θ ={ {θ 1, . . . , θ n} : parameters } ∫ p(x; θ)dx = 1 : statistical model S = p(x, θ) p(x; θ) > 0, Ω
Definition 5.1 Sq = {p(x; θ)} : q-exponential family [ n ] } { ∑ def θ iFi(x) − ψ(θ) , p(x, θ) ∈ S ⇐⇒ Sq := p(x; θ) p(x; θ) = expq
i=1
ψ : strictly convex ⇐⇒ {∂1 logq p(x; θ), . . . , ∂n logq p(x; θ)} is linearly independent. 21/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Example 5.2 (q-normal distributions) 1 [ 2 ] 1−q 1 1 − q (x − µ) p(x; µ, σ) = 1− zq 3−q σ2 Set θ1 =
2 3−q
zqq−1 ·
µ σ
, 2
θ2 = −
Then
{
1 3−q
zqq−1 ·
1 σ2
.
} ( 2) 1 1 1 1 − q (x − µ) −1 logq pq (x) = (p1−q − 1) = 1 − 1−q 2 1−q 1 − q zq 3−q σ q−1 q−1 2 zqq−1 z z −1 2µzqq−1 µ q q 2 x− x − = · 2+ 2 2 (3 − q)σ (3 − q)σ 3−q σ 1−q = θ 1x + θ 2x2 − ψ(θ) zqq−1 − 1 (θ 1)2 − ψ(θ) = − 2 4θ 1−q
22/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Example 5.3 (discrete distributions) Ω = {x0, x1, . . . , xn} { } n n ∑ ∑ S = p(x; η) ηi > 0, ηi = 1, p(x; η) = ηiδi(x) , η0 = 1 −
n ∑
i=0
i=0
ηi
i=1
n-dimensional probability simplex ) 1 ( i 1−q 1−q (ηi) − (η0) = logq p(xi) − logq p(x0) Set θ = 1−q Then n ∑ 1 1 logq p(x) = (p1−q (x) − 1) = ηiq δi(x) 1−q 1 − q i=0 { n } ∑( ) 1 1−q 1−q = (ηi) − (η0) δi(x) + (η0)1−q − 1 1 − q i=1 ψ(θ) = − logq η0
23/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Remark 5.4 S = {p(x; θ)}: (standard) exponential family, ∂i = F gij (θ) = = F Tijk (θ) = =
E[(∂i log p(x; θ))(∂j log p(x; θ))] ∂i∂j ψ(θ) :the Fisher metric E[(∂i log p(x; θ))(∂j log p(x; θ))(∂k log p(x; θ))] ∂i∂j ∂k ψ(θ) :the cubic form
∂ ∂θ i
Definition 5.5 Sq = {p(x; θ)}: a q-exponential family q gij (θ) = ∂i∂j ψ(θ) : the q-Fisher metric q Tijk (θ) = ∂i∂j ∂k ψ(θ) : the q-cubic form
On a deformed exponential family, the Fisher and the Hessian structures are different. (There are two different dually flat structures.) 1 q 1 q q(e) q(0) q(m) q(0) Set Γij,k := Γij,k − Tijk , Γij,k := Γij,k + Tijk , 2 2 q(0) where Γij,k is the connection coefficient of the Levi-Civita connection with respect to the q-Fisher metric g q . ∇q(e) : the q-exponential connection ∇q(m) : the q-mixture connection 24/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Proposition 5.6 For Sq , the following hold: (1) (Sq , g q , ∇q(e), ∇q(m)) is a dually flat space. (2) {θ i} is a ∇q(e)-affine coordinate system on Sq . (3) ψ is the potential of g q with respect to {θ i}, that is, q gij (θ) = ∂i∂j ψ(θ).
(4) Set the q-expectation of Fi(x) by ηi = Eq,p[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θ i} with respect to g q . (5) Set ϕ(η) = Eq,p[logq p(x; θ)] =⇒ ϕ(η) is the potential of g q with respect to {ηi}.
Pq (x) : the escort distribution of p(x) ∫ 1 def ⇐⇒ Pq (x) = p(x)q , Zq (p) = p(x)q dx Zq (p) Eq,p[f (x)] : the q-expectation of the random variable f (x) ∫ ∫ 1 def ⇐⇒ Eq,p[f (x)] = f (x)Pq (x)dx = f (x)p(x)q dx Zq (p) 25/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Proposition 5.6 For Sq , the following hold: (1) (Sq , g q , ∇q(e), ∇q(m)) is a dually flat space. (2) {θ i} is a ∇q(e)-affine coordinate system on Sq . (3) ψ is the potential of g q with respect to {θ i}, that is, q gij (θ) = ∂i∂j ψ(θ).
(4) Set the q-expectation of Fi(x) by ηi = Eq,p[Fi(x)]. =⇒ {ηi} is the dual coordinate system of {θ i} with respect to g q . (5) Set ϕ(η) = Eq,p[logq p(x; θ)] =⇒ ϕ(η) is the potential of g q with respect to {ηi}.
normalized Tsallis relative entropy (q-relative entropy) [ ] q D (p, r) = Eq,p logq p(x) − logq r(x) (↓ α-divergence) ∫ ) 1 − p(x)q r(x)1−q dx ( q = = D (1−2q)(p, r) . (1 − q)Zq (p) Zq (p)
26/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
α-divergence (α = 1 − 2q) ∫ 1 D (1−2q) (p(x), r(x)) = p(x)q {logq p(x) − logq r(x)}dx q Ω
D (1−2q) induces a non-flat invariant statistical manifold (Sq ,∇(1−2q), g F ).
normalized Tsallis relative entropy (χ-relative entropy) [ ] q D (p(x), r(x)) = Eq,p logq p(x) − logq r(x) ( ) ∫ p(x)q q = {logq p(x) − logq r(x)}dx = D (1−2q)(p, r) Zq (p) Ω Zq (p)
D q induces a Hessian manifold (flat statistical mfd.) (Sq , ∇q(m), g q ). In general, if two contrast functions have the following relation: D(p, r) = f (p)D(p, r), then induced statistical manifolds are 1-conformally equivalent.
ν(x)
ν(x) × −→ Zq (ν) pos. measure prob. measure
Normalization of a positive measure to a probability measure is NOT a trivial problem.
27/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
5.2 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)]
(p1(x) > 0, p2(y) > 0)
x > 0, y > 0 and x1−q + y 1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y 1 [ 1−q ] 1−q def 1−q ⇐⇒ x ⊗q y := x +y −1 [ ] = expq logq x + logq y
expq x ⊗q expq y = expq (x + y), logq (x ⊗q y) = logq x + logq y. def
X and Y : q-independent ⇐⇒ pq (x, y) = p1(x) ⊗q p2(y) X and Y : q-independent with m-normalization (mixture normalization) p1(x) ⊗q p2(y) def ⇐⇒ pq (x, y) = ∫ ∫ Zp1,p2 where Zp1,p2 = p1(x) ⊗q p2(y)dxdy Supp{pq (x,y)}⊂X Y 28/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
5.2 The q-independence X ∼ p1(x), Y ∼ p2(y) X and Y are independent def ⇐⇒ p(x, y) = p1(x)p2(y). ⇐⇒ p(x, y) = exp [log p1(x) + log p2(x)]
(p1(x) > 0, p2(y) > 0)
x > 0, y > 0 and x1−q + y 1−q − 1 > 0 (q > 0). x ⊗q y : the q-product of x and y 1 [ 1−q ] 1−q def 1−q ⇐⇒ x ⊗q y := x +y −1 [ ] = expq logq x + logq y
expq x ⊗q expq y = expq (x + y), logq (x ⊗q y) = logq x + logq y. def
⇐⇒ pq (x, y) = p1(x) ⊗q p2(y) X and Y : q-independent X and Y : q-independent with e-normalization (exponential normalization) def
⇐⇒
pq (x, y) = p1(x) ⊗q p2(y) ⊗q expq (−c) ∫∫ where c is determined by pq (x, y)dxdy = 1 Supp{pq (x,y)}⊂X Y
29/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
5.3
Geometry for q-likelihood estimators
Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN } : N -observations from p(x; ξ) ∈ Sq .
Lq (ξ) : q-likelihood function def
⇐⇒
Lq (ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN ; ξ) ( ) N ∑ ⇐⇒ logq Lq (ξ) = logq p(xi; ξ) i=1
In the case q → 1, Lq is the standard likelihood function on Ξ.
expq (x1 + x2 + · · · + xN ) = expq x1 ⊗q expq x2 ⊗q · · · ⊗q expq xN ( ) ) ( x2 xN · · · expq = expq x1 · expq ∑N −1 1 + (1 − q)x1 1 + (1 − q) i=1 xi
Each measurement influences the others.
“q-independent”, but random variables are strongly correlated. 30/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
5.3
Geometry for q-likelihood estimators
Sq = {p(x; ξ)|ξ ∈ Ξ} : a q-exponential family {x1, . . . , xN } : N -observations from p(x; ξ) ∈ Sq .
Lq (ξ) : q-likelihood function def
⇐⇒
Lq (ξ) = p(x1; ξ) ⊗q p(x2; ξ) ⊗q · · · ⊗q p(xN ; ξ) ) ( N ∑ ⇐⇒ logq Lq (ξ) = logq p(xi; ξ) i=1
In the case q → 1, Lq is the standard likelihood function on Ξ.
ξˆ : the maximum q-likelihood estimator ( ) def ⇐⇒ ξˆ = argmax Lq (ξ) = argmax logq Lq (ξ) . ξ∈Ξ
ξ∈Ξ
Theorem 5.7 the q-likelihood is maximum ⇐⇒ the canonical divergence (q-relative entropy) is minimum.
31/34
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Estimation preserving Corollary 5.8 Let 1 < q < 3, 1 [ 2 ] 1−q 1 1 − q (x − µ) Nq (µ, σ 2) := p(x; µ, σ) p(x; µ, σ) = 1− Zq 3−q σ2 Student’s t-distributions (q-normal distributions) {x1, . . . , xN } : q-independent N -observations from p(x; µ, σ) ∈ Sq =⇒ q-maximum likelihood estimators in mixture coordinates are
ηˆ1 =
N 1 ∑
N
xi ,
ηˆ2 =
i=1
N 1 ∑
N
x2i
i=1
N (µ, σ 2)
=⇒
ηˆ1 =
MLE Nq (µ, σ 2)
=⇒
ηˆ1 =
N 1 ∑
N
ηˆ2 =
i=1
N 1 ∑
N
xi ,
i=1
q-MLE
32/34
xi ,
ηˆ2 =
N 1 ∑
N
i=1
N 1 ∑
N
x2i
i=1
x2i
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Estimator preserving property • MLE for normal distribution
=⇒ ηˆ1 =
N 1 ∑
N
xi ,
ηˆ2 =
i=1
N 1 ∑
N
x2i
i=1
• Nq (µ, σ 2) : q-normal distribution (Student’s t-distribution) – maximization of Tsallis entropy (M. Tanaka (2002) ?) – infinite mixtures of normal distributions ) ( ) ∫ ∞ ( 1 3 − q q − 1 2 pq (x; µ, σ 2) = N µ, Gamma t; , · 2 dt t 2(q − 1) 3 − q σ 0 Bayesian expression of q-normal distribution • escort distributions, deformed algebras These are natural objects from the viewpoint of differential geometry. • q-MLE for q-normal distributions =⇒ ηˆ1 =
N 1 ∑
N
xi, ηˆ2 =
i=1
A weight for parameter space and a weight for sample space are well balanced. 33/34
N 1 ∑
N
i=1
x2i
5 GEOMETRY OF DEFORMED EXPONENTIAL FAMILIES
Summary Bayesian statistics A prior distribution: an n-th differential form on a statistical model
∇(α)ω (α) = 0
α-parallel prior
Tchebychev form: τ (X) := traceg {(Y, Z) 7→ T (X, Y, Z)} Tchebychev vector field: difference between the MLE and the projected Bayes estimator
anomalous statistics (Tsallis statistics)
the q-expectation and the escort distribution of p(x) ∫ ∫ 1 f (x)p(x)q dx Eq,p[f (x)] = f (x)Pq (x)dx = Zq (p) dual coordinates {ηi}: deformed independence:
ηi = Eq,p[Fi(x)]
1 [ ] 1−q 1−q 1−q p(x) ⊗q p(y) := p(x) + p(y) −1
Bayesian statistics: a weight on a parameter space anomalous statistics: a weight on a sample space
34/34