'
$
On the estimation of a parameter with incomplete knowledge on a nuisance parameter Ali Mohammad-Djafari and Adel Mohammadpour Laboratoire des Signaux et Syst`emes CNRS-ESE-UPS Sup´elec, Plateau de Moulon 91192 Gif-sur-Yvette, FRANCE.
[email protected] http://djafari.free.fr &
1
%
'
$
Contents • Problems statement • Classical methods – Maximum likelihood – Bayesian approach • New approach
&
2
%
'
$
Problems statement • We have a set of data:
x 1 , · · · , xn
• We have assigned a parametric probability law to them: fX|V,θ (x|ν, θ)
and
cdf
• There are two set of parameters:
θ
and
pdf
FX|V,θ (x|ν, θ) ν
• We are interested on θ and ν is a nuisance parameter • We may have or not some prior knowledge on θ • We have some prior knowledge on ν and we want to account for it when inferring on θ &
3
%
'
$
Case 1: Perfect knowledge of ν: ν = ν0 • Maximum Likelihood (ML) ª © b θM L = arg max l0 (θ) = fX|ν0 ,θ (x|ν0 , θ) . θ
• Bayesian approach: If we also have a prior fΘ (θ) on the parameter of interest θ, then we can use the Bayesian approach: fΘ|X,ν0 (θ|x, ν0 ) ∝ fX|ν0 ,θ (x|ν0 , θ) fΘ (θ) – Maximum a posteriori (MAP) ª © b θM AP = arg max fΘ|X,ν0 (θ|x, ν0 ) = arg max {l0 (θ) fΘ (θ)} θ
θ
– Mean Square Estimate (MSE) R Z θ l0 (θ) fΘ (θ) dθ θbM SE = E {Θ} = θ fΘ|X,ν0 (θ|x, ν0 ) dθ = R l0 (θ) fΘ (θ) dθ
&
4
%
'
$
Case 2: Incomplete knowledge of ν via fV (ν) or FV (ν) • Marginal Maximum Likelihood (MML) ª © b θM M L = arg max l1 (θ) = fX|θ (x|θ) θ
where
fX|θ (x|θ) =
Z
fX|V,θ (x|ν, θ)fV (ν) dν
• Bayesian approach: If we also have a prior fΘ (θ) ª © b θM M AP = arg max fΘ|X (θ|x) = arg max {l1 (θ) fΘ (θ)} θ
or
&
θbM M SE = E {Θ} =
θ
Z
R θ l1 (θ) fΘ (θ) dθ θ fΘ|X (θ|x) dθ = R l1 (θ) fΘ (θ) dθ 5
%
'
Case 3: Incomplete knowledge of ν through the knowledge of a finite number of its moments
$
• Our prior knowledge on ν is: E {φk (V)} = dk ,
k = 1, · · · , K
where φk are known functions. Particular case: φk (ν) = ν k where {dk } are then the moments of V. • Maximum Entropy (ME): Z maximize H = − fV (ν) ln fV (ν) dν subject to the data constraints E {φk (V)} = to obtain fV (ν). • Do as in previous case. &
Z
φk (V)fV (ν) dν = dk ,
6
k = 1, · · · , K
%
'
$
• Solution:
"
fV (ν) = exp −λ0 −
K X
k=1
λk φk (ν)
#
Lagrange parameters {λk , k = 0, · · · , K} are the solution of: " # Z K X λk φk (ν) dν = dk , k = 1, · · · , K, φk (ν) exp −λ0 − k=1
where we used φ0 (ν) = 1 and d0 = 1.
&
7
%
'
Case 4: Incomplete knowledge of ν through the knowledge of its median
$
• Our prior knowledge on ν is: median(V) = ν0 Z +∞ Z ν0 fV (ν) dν = fV (ν) dν −∞
ν0
or simply Fν (ν0 ) = 1/2. • Maximum Entropy (ME) ? • New approach is motivated by Marginal Maximum Likelihood (MML) ª © b θM M L = arg max l1 (θ) = fX|θ (x|θ) θ
where
fX|θ (x|θ) =
Z
fX|V,θ (x|ν, θ) fV (ν) dν
• Main idea : Define feX|θ (x|θ) as the median of fX|V,θ (x|ν, θ) over fV (ν)
&
8
%
'
$
New inference tool • Marginal probability law: Z ª © fX|θ (x|θ) = fX|V,θ (x|ν, θ) fV (ν) dν = EV fX|V,θ (x|V, θ) or equivalently
FX|θ (x|θ) =
Z
ª
©
FX|V,θ (x|ν, θ) fV (ν) dν = EV FX|V,θ (x|V, θ) ,
can be recognized as the mean value of FX|V,θ (x|V, θ) over the probability law fV (ν) • The proposed new criterion is then defined as the median value of fX|V,θ (x|V, θ) (or FX|V,θ (x|V, θ)) over the probability law fV (ν):
&
³
´
FeX|θ (x|θ) : P FX|V,θ (x|V, θ) ≤ FeX|θ (x|θ) = 1/2 9
%
'
• In previous works (MaxEnt03, SPSP04), we showed that, under some mild conditions on FX|V,θ (x|ν, θ) (strictly increasing) the function FeX|θ (x|θ) has all the properties of a cdf and thus the function ˜l1 (θ) = feX|θ (x|θ) has all the properties of a likelihood function. Thus, we can use it in place of l1 (θ), i.e.: o n θbM LM = arg max ˜l1 (θ) = feX|θ (x|θ)
$
θ
• If we also have a prior fΘ (θ) we can define the a posteriori f˜Θ|X (θ|x) and n n o o θbM AP M = arg max f˜Θ|X (θ|x) = arg max ˜l1 (θ) fΘ (θ) θ
θ
• Indeed, we showed that the expression of FeX|θ (x|θ) is given by ¶ µ 1 where L(ν) = FX|V,θ (x|ν, θ) FeX|θ (x|θ) = L FV−1 ( ) 2
• Thus to obtain the expression of FeX|θ (x|θ) we only need to know the median value FV−1 ( 12 ) of the distribution fV (ν). & 10
%
'
Examples:
Example 1:
fX|V,θ (x|ν, θ) = N (x; θ, ν)
$
• Perfect knowledge ν = ν0 : −→ fX|ν0 ,θ (x|ν0 , θ) = N (x; ν0 , θ) ML estimate of θ is : θb = (x − ν0 )2 .
• Knowledge of fV (ν) =ZN (ν; ν0 , θ0 ): fX|θ (x|θ) =
fX|V,θ (x|ν, θ) fV (ν) dν = N (x; ν0 , θ + θ0 )
MML estimate of θ is: θb = max((x − ν0 )2 − θ0 , 0).
• Moments knowledge case: — E {V} = ν0 and S = R+ −→ ME pdf fV (ν) = E (ν; ν0 ) — E {|V|} = ν0 and S = R −→ ME pdf is: fV (ν) = DE (ν; ν0 ) © ª 2 — E {V} = ν0 and E (V − ν0 ) = θ0 −→ ME pdf is: fV (ν) = N (ν; ν0 , θ0 ).
• Knowledge of the median ν˜ of V: feX|θ (x|θ) = N (x; ν0 , θ) o n which gives: θb = (x − ν0 )2 . θb = arg max feX|θ (x|θ)
&
θ
11
%
'
fX|V,θ (x|ν, θ) = N (x; ν, θ) ª © b • Perfect knowledge ν = ν0 : −→ θ = arg maxθ fX|ν0 ,θ (x|ν0 , θ) ML estimate of θ is : θb = x.
Examples:
Example 2:
$
• Knowledge of fV (ν) = fV (ν) = IG (ν; α/2, β/2):
fX|θ (x|θ) = N (x; θ, ν) IG (ν; α/2, β/2) dν = S (x; θ, α/β, α)
MML estimate of θ is: θb = x.
• Moments knowledge case: E {V} = ν0 : V here is the variance, then S = R+ −→ ME pdf fV (ν) = E (ν; ν0 ) fX|θ (x|θ) = N (x; θ, ν) E (ν; ν0 ) dν = S (x; θ, 0, 1) = C (x; θ, 1) MML estimate of θ is: θb = x.
• Knowledge of the median ν0 of V −→ : feX|θ (x|θ) = N (x; θ, ν0 ) o n θb = arg maxθ feX|θ (x|θ) which gives θb = x.
• Note: in all cases θb = x (scale invariance). & 12
%
'
$
Conclusions • We considered the problem of estimating one of the two parameters of a probability distribution when the other one is considered as a nuisance parameter on which we may have some prior information. • Complete knowledge ν = ν0 . This is the simplest case and the classical likelihood based methods apply. • Incomplete knowledge via fV (ν). Integrate out the nuisance parameter and obtain a marginal likelihood and use it. • Incomplete knowledge via a finite number of its moments. Use the ME principle to translate our prior knowledge into a prior pdf fV (ν) and find the situation of the previous case. • Incomplete knowledge via the median value of the nuisance parameter. A new inference tool is presented which can handle this case.
&
13
%