Nested sampling with demons Michael Habeck Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Göttingen, Germany
Amboise, September 23, 2014
Bayesian inference
•
•
Probability rules posterior × evidence
=
likelihood × prior
Pr(θ|D, M) × Pr(D|M)
=
Pr(D|θ, M) × Pr(θ|M)
p(θ) × Z
=
Inference • Evidence
L(θ) × π(θ)
∫ Z=
• Posterior p(θ) =
L(θ) π(θ) dθ
1 L(θ) π(θ) Z
Nested sampling •
The evidence reduces to a one-dimensional integral:
∫ Z=
∫
1
L(θ) π(θ) dθ =
L(X) dX 0
summing over prior mass
∫ X(λ) =
π(θ) dθ, L(θ)≥λ
X(0) = 1, X(∞) = 0.
Nested sampling
•
The evidence reduces to a one-dimensional integral:
∫ Z=
∫
1
L(θ) π(θ) dθ =
L(X) dX 0
summing over prior mass
∫ X(λ) =
π(θ) dθ,
X(0) = 1, X(∞) = 0.
L(θ)≥λ
•
Prior masses can be ordered: X(λ) < X(λ′ )
•
if
λ > λ′
Idea: We can evaluate L exactly and estimate X
Estimation of prior masses
•
Nested sequence of truncated priors: p(θ|λ) =
•
Θ[L(θ) − λ] π(θ) where X(λ)
{ Θ(x) =
0; 1;
Distribution of prior masses at likelihood contour λ: X ∼ Uniform(0, X(λ))
•
Order statistics: Xmax ∼ N
X N−1 X(λ)N
where Xmax is the maximum of N uniformely distributed Xn ∼ Uniform(0, X(λ))
x λ
energy bound E(θ) < ϵ
prior mass X(λ) ∫ evidence Z = L(X )dX
cumulative DOS X(ϵ) =
truncated prior
microcanonical ensemble
∫ϵ −∞
partition function Z(β) =
∫
g(E) dE
e−β E g(E) dE
Microcanonical ensemble
•
Density of states (DOS)
∫ g(E ) =
•
δ[E − E(θ)] π(θ) dθ = ∂E X(E )
Microcanonical entropy and temperature: S(E ) = ln X(E ),
•
T(E ) = 1/∂E S(E )
Compression: H(ϵ′ → ϵ) = S(ϵ′ ) − S(ϵ) =
∫
ϵ′
β(E ) dE ϵ
where the inverse temperature β = 1/T measures the entropy production
Enter the demon •
Implement truncated prior as microcanonical ensemble with additional demon absorbing energy D : p(θ, D |ϵ) =
1 δ[ϵ − D − E(θ)] Θ(D) π(θ) X(ϵ)
and explore constant energy shells
•
Creutz algorithm: Require: ϵ (upper bound on total energy)
θ ∼ π(θ) with energy E = E(θ) ≤ ϵ, D = ϵ − E
▷
Initialize
while not converged do
θ ′ ∼ π(θ) with energy E ′ = E(θ ′ )
▷
D ′ = D − ∆E where ∆E = E ′ − E
▷
if D ′ ≥ 0 then (θ, D) ← (θ ′ , D ′ ) end if end while
Generate a candidate Update demon’s state
▷
Accept
Sampling the Ising model with a single demon
Gibbs entropy SG (E)
0 500
A
1000 1500 2000 2500 8000
estimated lnXk 6000
4000
energy E
2000
0
∑
•
Nearest-neighbor interaction on a 64 × 64 lattice: E(θ) = θi = ±1
•
Nested sampling provides a very accurate estimate of the volume entropy S = ln X
⟨i,j ⟩ θi θj
where
Sampling the Ising model with a single demon
inverse temperature βG (E)
1.0
B
heat capacity p ln(1 + 2)/2
0.8 0.6 0.4 0.2 0.08000
6000
∫
′
4000
energy E
2000
0
• H(ϵ′ → ϵ) = S(ϵ′ ) − S(ϵ) = ϵϵ β(E ) dE • ⟨H(ϵ′ → ϵ)⟩ = 1/N, therefore β(ϵk ) (ϵk − ϵk+1 ) ≈ 1/N • histogram of energy bounds ϵk matches the inverse temperature / entropy production β(E )
Sampling the Ising model with a single demon
inverse temperature βG (E)
1.0
C
0.6 0.4 0.2 0.08000
•
estimated βB
0.8
6000
2000
0
The demon’s energy distribution is
∫ p(D|ϵ) =
•
4000
energy E
p(θ, D|ϵ) dθ =
g(ϵ − D) 1 ≈ exp{−D/T(ϵ)} X(ϵ) T(ϵ)
The demon may serve as a thermometer: D ≈ T
Properties of nested sampling Pros: 1. Nested sampling is a microcanonical approach: energy E is the control parameter rather than the temperature used in thermal approaches 2. constructs an adaptive “cooling” protocol {ϵk } 3. progresses at constant thermodynamic speed: ∆S ≈ 1/N 4. provides an estimate of the entropy S Cons: 1. Nested sampling requires efficient sampling from p(θ|ϵ)
= =
1 Θ[ϵ − E(θ)] π(θ) X(ϵ) ∫ 1 δ[ϵ − D − E(θ)] Θ(D) π(θ) dD X(ϵ)
Releasing more demons
•
We would like to preserve nested sampling’s adaptive behavior but be more flexible in terms of the ensemble
•
Idea: introduce more demons in order to smooth the ensemble p(θ, D, K |ϵ) =
1 δ[ϵ − D − K − E(θ)] Θ(D ) f(K ) π(θ) Y(ϵ)
where the prior mass of the compound system is
∫ Y(ϵ) =
Θ(ϵ − H ) (f ⋆ g)(H ) dH
involving the convolution (f ⋆ g)(H )
•
Nested sampling tracks Y(ϵ) where ϵ is an upper bound on the total energy H = K + E
Releasing more demons •
Nested sampling estimates the evidence of the extended system
∫ ZH =
e−H (f ⋆ g)(H ) dH = ZK ZE
from which we can obtain the evidence of the original system ZE
•
Marginal distribution of configurations
∫ p(θ|ϵ) =
where F(K ) =
•
∫K
−∞
p(θ, D, K |ϵ) dD dK =
1 π(θ) F [ϵ − E(θ)] Y(ϵ)
f(t) dt is the cdf of the demon’s energy distribution
Sampling (θ, K ):
θ
∼
p(θ|ϵ)
K
∼
p(K |θ, ϵ) ∝ f(K ) Θ[ϵ − E(θ) − K ]
∝ π(θ) F [ϵ − E(θ)]
Demonic nested sampling of the ten state Potts model
Demon: f(K ) ∝ Θ(Kmax − K) Kd/2−1 (d-dimensional harmonic oscillator where d =dimension of configuration space)
{ p(θ|ϵ) ∝ Θ[ϵ − E(θ)] π(θ) × 1e3
A
standard NS demonic NS
0.5
ϵ − E(θ) ≤ Kmax
d/2
ϵ − E(θ) > Kmax
Kmax ;
6 1e3 5
B
relative accuracy logZ [%]
energy bounds ²k
0.0
[ϵ − E(θ)]d/2 ;
4
1.0
3 2
1.5
1 2.0 0.0
0.5
1.0
1.5
2.0
iteration k
2.5
3.0
3.5 1e5
0 1080 1060 1040 1020 1000 980 960 940
energy E
10
C
5 0 5 10 0
100
200
300
400
demon capacity Kmax
500
Nested sampling in phase space •
In continuous configuration spaces, it is convenient to unfold the demon and introduce momenta
∫ f(K) =
δ[K − K(ξ)] dξ
where
K(ξ) =
d 1∑ 2 ξi 2
(kinetic energy)
i=1
•
The marginal distribution in configuration space is
{ p(θ|ϵ) ∝ Θ[ϵ − E(θ)] π(θ) ×
•
[ϵ − E(θ)]d/2 ;
ϵ − E(θ) ≤ Kmax
d/2
ϵ − E(θ) > Kmax
Kmax ;
Hamiltonian dynamics for exploration: L
(θ, ξ) → (θ ′ , ξ ′ ) where L is an integrator (e.g. the leapfrog)
Microcanonical Hamiltonian Monte Carlo
• 2(d + 1) dimensional phase space:
implement demon D as harmonic oscillator with energy D = (ξd2+1 + θd2+1 )/2
•
Require: ϵ (total energy), configuration θ with E(θ) < ϵ
θd+1 = 0
▷
Initialize demon D
while not converged do
ξ ∼ N(0, 1) √ ξ ← ξ × ϵ − E − D/∥ξ∥
▷ ▷
Draw momenta from (d + 1)-dim Gaussian
Scale momenta so as to match excess energy
L
(θ, ξ) → (θ′ , ξ ′ ) ′
H = E(θ ′ ) + K(ξ ′ )
▷ ▷
Run the leapfrog algorithm
Compute total energy of candidate
if H ′ < ϵ then
θ ← θ ′ , E ← E(θ) end if end while
▷
Accept
Application to GS peptide
800 700 600 500 400 300 200 100 0 1000.0 0.2 0.4 0.6 0.8 1.0 1.2 iteration k 1e4
energy E(θk )
B
• • •
200
C
150 100 50 00
1
2
3
RMSD [ ]
4
5
A: Native structure of the GS peptide B: Evolution of the energy (goodness-of-fit) during nested sampling C: Structure’s accuracy measured by the root-mean square deviation (RMSD) to the crystal structure
Other demons Distribution of system’s energy p(E |ϵ) =
demon Gauss
Θ(ϵ − E ) g(E ) F(ϵ − E ) Y(ϵ)
pdf f(K) √ 2 β −β e 2K 2π
oscillator
K(ξ1 , ξ2 ) = 12 ξ12 + β −1 ln |ξ2 | √ ⇒ 8πβ eβ K
Fermi
β (1+ee−β K )2
Logarithmic
−β K
cdf F(K) 1 [1 2
√
√ + erf( β/2 K)]
8π/β eβ K
1 1+e−β K
Application to SH3 domain
•
Structure determination from sparse distance data measured by NMR spectroscopy
•
Structure ensemble as accurate and precise as with parallel tempering
Summary
•
Nested sampling is a powerful method to study the microcanonical ensemble
•
By means of demons we can smooth the microcanonical ensemble, which eases the exploration of configuration space
•
All of the desired features of nested sampling are preserved