deBruijn identities: from Shannon, Kullback–Leibler and Fisher to

We propose a generalization of the usual deBruijn iden- tity that links the Shannon differential entropy (or the. Kullback–Leibler divergence) and the Fisher ...
1MB taille 4 téléchargements 28 vues
Steeve Zozor & Jean-Marc Brossier

g i p b a s g i p b l a gips gipsa gip gipsa-lab ip ipsa-lab ipsa psa-lab sa-lab a-lab lab ab a ggiipps l abb

deBruijn identities: from Shannon, Kullback–Leibler and Fisher to Salicrú, Csizàr and generalized Fisher informations Laboratoire Grenoblois d’Image, Parole, Signal et Automatique (GIPSA-Lab, CNRS), Grenoble, France

Abstract

From the classical “quadriptych” to extensions? Z Shannon: H(p) = −

We propose a generalization of the usual deBruijn iden-

p(x) log p(x) dx Ω ! Z p(x) Kullback-Leibler: Dkl (pkq) = p(x) dx log q(x) Ω

tity that links the Shannon differential entropy (or the Kullback–Leibler divergence) and the Fisher information (or the Fisher divergence) of the output of a Gaussian channel. The generalization makes use of Salicrú entropies on the one hand, and of divergences of the Csizàr class on the other hand, as generalizations of

Stam’s inequality

Generalized

Variance-entropy inequality

Generalized Stam?

Curvature Dkl deBruijn’s identity

the Shannon entropy and of the Kullback–Leibler divergence respectively. The generalized deBruijn identities induce the definition of generalized Fisher informations and generalized Fisher divergences (some of such generalizations exist in the literature). Moreover, we

θ position:

∂ ∂θ



∂ − ∂x

?

Verdú’s identity Generalized deBruijn?

Cramér-Rao’s inequality Fisher: Variance, MSE 2 Z  ∂ (θ) J (p) = log p(x) p(x) dx Ω ∂θ !!2 Z p(x) ∂ (θ) log p(x) dx J (pkq) = q(x) Ω ∂θ

provide results that go beyond the Gaussian channel: we are then able to characterize a noisy channel using general measures of mutual information, both for Gaussian and nonGaussian channels.

Moment-entropies?

Curvature Gaussian

entropies divergences

Generalized Verdú?

Generalized Cramér-Rao? Generalized Fisher

Generalized moments

e.g., Bercher’12 or Lutwak et al.’12 • Stretched Gaussian



1 − (λ −

1 1)|x|α 1−λ +

h

i

• Rényi–Tsallis entropies E p(X)λ−1



h

• Generalized Fisher E (∇x log p(X))β pβ(λ−1)

& notation J

i

• Generalized moments E [|X|α ]

The usual deBruijn identity for the Gaussian channel X resp. X

 (k) √

input(s) of a noisy channel, independent on the zero-mean standard Gaussian N and on parameter t and s (preamplification); output(s) Y ∼ pY (resp. Y (k) )

tN

X

Y

√ s

d 1 deBruijn (Stam’59): H(pY ) = J(pY ) dt 2

1 d Barron’86: Dkl (pY (1) kpY (0) ) = − J(pY (1) kpY (0) ) dt 2

Consequences: • Stam’s inequality

N

• Entropy Power Inequality

X

Y

 d 1 1  2 Guo–Shamai–Verdú’05: I(X; Y ) = MMSE(X|Y ) ≡ E (X − E [X|Y ]) ds 2 2

φ−deBruijn identity for the Gaussian channel φ : R+ 7→ R, convex Z

p(x) φ−divergence of Csizàr: Dφ (pkq) = φ q(x) Ω

Z φ−entropy of Salicrú, Hφ (p) = −

φ(p(x)) dx Ω

e.g.,

e.g., • φ(p) = p log p: Shannon • φ(p) = pλ : Rényi–Tsallis. . .

d 1 Hφ (pY ) = Jφ (pY ) dt 2 ⇓ 2  Z   ∂ 00 2 log p(x) φ (p(x))p (x) dx φ−Fisher information: Jφ (p) = Ω ∂x Proof– The pdf of



tN satisfies the Heat equation

∂ p ∂t

=

1 ∂2 p 2 ∂x2



 q(x) dx

• φ(l) = l log l: Kullback–Leibler   l+1 • φ(l) = 2l log l − l+1 log : Jensen–Shannon. . . 2 2

  d 1 Dφ pY (1) kpY (0) = − Jφ pY (1) kpY (0) dt 2 ⇓  p(x)   2 φ00 2 Z  p (x) q(x) ∂ p(x)   dx φ−Fisher divergence: Jφ (pkq) = log q(x) q(x) Ω ∂x

⇒ pY (pY (k) ) satisfies the same equation. Derivation of Hφ (or Dφ ) vs t, heat equation and integration by parts.

The φ−Fisher information/divergence charaterizes a gain of entropy (loss of information)/convergence of the ouput vs the variation of the noise level.   pX,Y (x,y) 00 2 Z   φ p 2 X,Y (x, y) pX (x)pY (y) d 1 1 Variation à la Verdú: Dφ (pX,Y kpX pY ) = MMSEφ (X|Y ) ≡ x − E [X|Y = y] dx dy ds 2 2 Ω pX (x)pY (y)

To more general channels. . . ξθ

pdf of ξθ satisfying the PDE: X

Y

Entropic version:

∂ 2 pξθ (x) α2 (θ) ∂θ2

+

∂pξθ (x) α1 (θ) ∂θ

=

∂ 2 pξθ (x) β2 (θ) ∂x2

+

∂pξθ (x) β1 (θ) ∂x

d2 d (θ) α2 (θ) 2 Hφ (pY ) + α1 (θ) Hφ (pY ) = β2 (θ)Jφ (pY ) − α2 (θ)Jφ (pY ) dθ dθ 2

d d (θ) Divergence version: α2 (θ) 2 Dφ (pY (1) kpY (0) ) + α1 (θ) Dφ (pY (1) kpY (0) ) = α2 (θ)Jφ (pY (1) kpY (0) ) − β2 (θ)Jφ (pY (1) kpY (0) ) dθ dθ • Fokker–Planck equation (state-independent drift and diffusion, α2 = 0): the parametric & nonparametric φ−Fisher information characterize the channel. • Lévy: ξθ = θ2 L , α2 = 1, β1 = 2; the parametric φ−Fisher information characterizes the curvature of the φ−entropy/divergence (see Johnson’04 for Dkl ). • Cauchy: ξθ = θC, α2 = −β2 = 1; sum parametric-nonparametric φ−Fisher information characterizes the curvature of the φ−entropy/divergence (see Johnson’04 for Dkl ).

Discussion Generalizations of the usual deBruijn identity, in terms of entropies, or in terms of divergences: • For Salicrú’s φ−entropies & Csizàr’s φ− divergences, • Beyond the Gaussian channel, (noise pdf satisfying a well-suited second order partial differential equation) Without many efforts, they extend to the multivariate laws and for a multivariate parameter θ. • General interpretation? • Potential implications (e.g., generalized Entropy Power Inequality & Central Limit Theorem)?

References A. R. Barron, The Annals of Proba., 14: 336 (1986) J.-F. Bercher, J. Math. Phys., 53: 063303 (2012) J. of Phys. A, 45: 255303 (2012) I. Csizàr, Studia Sci. Math. Hungarica, 2: 299 (1967) O. Johnson, “Info. Th. and The CLT” Imp. Col. Press (2004) E. Lutwak et al., IEEE Trans. on IT, 58:1319 (2012) M. Salicrú et al., Com. in Stat., 22: 2015 (1993) A. J. Stam, Information and Control, 2:101 (1959)