Bayesian particle flow for estimation, decisions & transport Fred Daum & Jim Huang 23 September 2014 1 Copyright © 2014 Raytheon Company. All rights reserved. Customer Success Is Our Mission is a trademark of Raytheon Company.
nonlinear filter problem* dynamical model of state : dx = F ( x, t )dt + G ( x, t )dw or x(t k +1 ) = F ( x(t k ), t k , w(t k )) x(t ) = state vector at time t w(t ) = process noise vector at time t z k = measuremen t vector at time t k
estimate x given noisy measurements
z k = H ( x(t k ), t k , vk ) vk = measuremen t noise vector at time t k p ( x, t k Z k ) = probability density of x at time t k given Zk Z k = set of all measuremen ts Z k = {z1 , z 2 ,..., zk } "The Oxford handbook of nonlinear filtering," edited by Dan Crisan and Boris Rozovskii, Oxford University Press, 2011.
2
curse of dimensionality for classic particle filter*
optimal accuracy: r = 1.0
*Daum, IEEE AES Systems Magazine, August 2005.
3
nonlinear filter* prediction of conditional probability density from tk-1 to tk
measurements
Bayes' rule : solution of Fokker-Planck equation
p( x, t k Z k ) = p( x, t k Z k −1 ) p( z k x, t k )
*Yu-Chi Ho & R. C. K. Lee, "A Bayesian approach to problems in stochastic estimation and control," IEEE Transactions on automatic control, pages 333-339, October 1964.
4
particle degeneracy* prior density g(x)
likelihood h(x)
x particles to represent the prior 5
*Daum & Huang, “Particle degeneracy: root cause & solution,” SPIE Proceedings 2011.
particle degeneracy*
prior density g(x)
likelihood h(x)
x particles to represent the prior 6
*Daum & Huang, “Particle degeneracy: root cause & solution,” SPIE Proceedings 2011.
chicken & egg problem How do you pick a good way to represent the product of two functions before you compute the product itself?
7
induced flow of particles for Bayes’ rule prior = g(x)
posterior = g(x)h(x)/K(1)
flow of density
pdf
pdf
log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ )
(sample from density)
λ = continuous parameter (like time)
flow of particles particles
λ=0
dx = f ( x, λ ) dλ
particles
λ =1 =1
8
curse of dimensionality: root cause of curse of dimensionality: prior density
g
likelihood of measurement
h
particles to represent the prior prior pdf
posterior flow of density
pdf
log p ( x, λ ) = log g ( x) + λ log h( x) sample from density
sample from density
d log K div( pf ) = p − log h + dλ We design the particle flow by
flow of particles particles
λ=0
particles
dx = f ( x, λ ) dλ
λ=1
solving the above PDE for f.
d log K div( pf ) = p − log h + λ d let q = pf ∂q1 ∂q2 ∂qd + + ... + =η ∂x1 ∂x2 ∂xd (1) (2) (3) (4) (5) (6) (7)
linear PDE in unknown f or q constant coefficient PDE in q first order PDE highly underdetermined PDE same as the Gauss divergence law in Maxwell’s equations same as Euler’s equation in fluid dynamics existence of solution if and only if volume integral of η is zero (i.e., neutral charge density for plasma; satisfied automatically) 10
irrotational flow
Coulomb’s law flow
small curvature flow
constant curvature flows (e.g. zero)
exponential family
Knothe-Rosenblatt flow
non-zero diffusion flow
geodesic flows
Fourier transform flow
direct integration
stabilized flows
finite dimensional flow
optimal MongeKantorovich transports
method of characteristics
renormalization group flow for log K(λ) inspired by QFT
renormalization group flow for log g(x) inspired by QFT
renormalization group flow for logK(λ) and log g(x)
exponential family with non-zero diffusion
Gibbs sampler like flow (inspired by direct integration)
non-singular Jacobian flow (inspired by proof)
maximum entropy flow
Moser coupling flow
suboptimal MongeKantorovich
incompressible flow
Gaussian densities
11
exact particle flow for Gaussian densities: dx/dλ does not depend on K(λ), despite the fact that the PDE does!
dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f = −div( f ) − ∂x dλ for g & h Gaussian, we can solve for f exactly :
automatically stable under very mild conditions & extremely fast
f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax
[ [
]
] 12
incompressible particle flow T ∂ log p ( x, λ ) ∂ x for non - zero gradient dx − log( h ( x )) 2 = ∂ log p ( x, λ ) dλ ∂x 0 otherwise
for d ≥ 2
dx/dλ does not depend on K(λ), despite the fact that the PDE does! 13
initial probability distribution of particles: Time = 1, Frame 1 400
Angle Rate (deg/sec)
300
λ = 0.0
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 2 400
Angle Rate (deg/sec)
300
λ = 0.1
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 2 400
Angle Rate (deg/sec)
300
λ = 0.2
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 3 400
Angle Rate (deg/sec)
300
λ = 0.3
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 4 400
Angle Rate (deg/sec)
300
λ = 0.4
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 5 400
Angle Rate (deg/sec)
300
λ = 0.5
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 6 400
Angle Rate (deg/sec)
300
λ = 0.6
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 7 400
Angle Rate (deg/sec)
300
λ = 0.7
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 8 400
Angle Rate (deg/sec)
300
λ = 0.8
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 9 400
Angle Rate (deg/sec)
300
λ = 0.9
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
final probability distribution of particles (resulting from one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 10 400
Angle Rate (deg/sec)
300
λ=1
200
100
0
-100
-200
-300 -200
-150
-100
-50
0 50 Angle (deg)
100
150
200
new particle flow* : −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x 2
T
dx/dλ does not depend on K(λ), despite the fact that the PDE does!
If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (C) over the set of particles:
dx ∂ log h ≈ C dλ ∂x
T
for Gaussian densities we get the EKF for each particle: T
dx ∂θ ( x) −1 ≈ C R (z − θ ( x) ) dλ ∂x *Daum & Huang, “particle flow with non-zero diffusion for nonlinear filters," SPIE conference proceedings, San Diego, August 2013.
25
d = 42 statesN = 1,000 particlesincompressible flow nonlinear dynamicsMetropolis & nonlinearadjusted measurements N = 10,000 particles Langevin median error over 100 Monte Carlo runs
dimension of state vector = 17 Hamiltonian Monte Carlo 100 Monte Carlo trials, SNR = 20 dB
regularized bootstrap auxiliary particle filter Gaussian flow new flow MALA, HMC, auxiliary & bootstrap
incompressible flow new flowflow new
Gaussian flow
time
new particle flow : −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x 2
T
If we approximate the density g(x) from the exponential family, then the flow is: −1
∂ log q( x) dx ∂ log h ∂ log h T ∂ θ ( x) ≈ − +ψ +λ 2 2 2 dλ ∂x ∂x ∂x ∂x 2
2
2
in which the unnormalized prior density is:
[
g ( x) ≈ q( x) exp ψ Tθ ( x)
] 27
T
BIG DIG (17 million cubic yards of dirt, one million truckloads & $24 billion)*
*Daum & Huang, “particle flow & Monge-Kantorovich transport,” proceedings of FUSION conference, Singapore, July 2012.
28
item 1. purpose 2. optimality criterion 3. how to pick a unique solution
4. computational complexity 5. high dimensional problems 6. solution of nice special cases
7. homotopy of densities 8. stability of flow considered (e.g., Lyapunov stability) 9. existence of flows proved
10. conservation of probability mass along the flow 11. avoiding normalization of probability density is crucial for practical algorithms 12. explicitly compute normalized densities 13. stiff ODEs mitigated 14. stochastic flows
particle flow
transport
fix particle degeneracy caused by Bayes’ rule NONE! many different methods (e.g. min L² norm, given form, most stable flow, min convex function, more smoothness, max entropy, arbitrary, random, etc.) numerical integration of ODE for each particle
move physical objects from one probability density to another* minimize convex functional
d = 1 to 42 Gaussian, incompressible, irrotational, geodesic, exponential family, etc. log-homotopy often explicitly designed into algorithm adapt proofs from transport theory
minimize convex functional (e.g., dirt mover’s metric or Wasserstein metric) solution of PDE (Poisson’s, MongeAmpere, HJB, etc.) d = 1, 2, 3 Moser coupling (1990), Brenier (1991-2011), Knothe-Rosenblatt (1952), etc. homotopy rarely
yes
Shnirelman irrotational flow d > 2, Moser & Dacorogna d >1, Brenier any d, et al. yes
yes
no
no, log of unnormalized densities
yes
yes rarely
no rarely
29
superb books on transport theory Cédric Villani, “Topics in optimal transportation,” AMS Press 2003. Very clear & accessible introduction; wonderful book! Cédric Villani, “Optimal transport: old & new,” Springer-Verlag 2009. More detailed & rigorous math; free on internet! 30
new nonlinear filter: particle flow new particle flow filter
standard particle filters
many orders of magnitude faster than standard particle filters
suffers from curse of dimensionality due to particle degeneracy
3 to 4 orders of magnitude faster code per particle for any d ≥ 3 problems
requires resampling using a proposal density
3 to 4 orders of magnitude fewer particles required to achieve optimal accuracy for d ≥ 6 problems
requires millions or billions of particles for high dimensional problems
Bayes’ rule is computed using particle flow (like physics)
Bayes’ rule is computed using a pointwise multiplication of two functions
no proposal density
depends on proposal density (e.g., Gaussian from EKF or UKF or other)
no resampling of particles
resampling is needed to repair the damage done by Bayes’ rule
embarrassingly parallelizable
suffers from bottleneck due to resampling
computes log of unnormalized density
suffers from severe numerical problems 31 due to computation of normalized density
history of mathematics 1. creation of the integers 2. invention of counting 3. invention of addition as a fast method of counting 4. invention of multiplication as a fast method of addition 5. invention of particle flow as a fast method of multiplication* 32
BACKUP 33
REFERENCES: (1) Fred Daum, "Nonlinear filters: beyond the Kalman filter," IEEE Aerospace & Electronic Systems Magazine special tutorial, pages 57-69, August 2005. (2) Fred Daum and Jim Huang, "Particle flow with non-zero diffusion for nonlinear filters," Proceedings of SPIE conference, San Diego, August 2013. (3) Arnaud Doucet and A. M. Johansen, "A tutorial on particle filtering and smoothing: fifteen years later", in "The Oxford handbook of nonlinear filtering," edited by Dan Crisan and Boris Rozovskii, pages 656 to 704, 2011. (4) Fred Daum and Jim Huang, "Particle flow and Monge-Kantorovich transport," Proceedings of IEEE FUSION Conference, Singapore, July 2012. (5) Fred Daum & Jim Huang, “How to avoid normalization of particle flow for nonlinear filters, Bayesian decisions and transport,” Proceedings of SPIE conference, Baltimore, May 2014. 34
university or company
researchers
topic
papers
Connecticut
Peter Willett & Sora Choi
numerical experiments
2011
McGill
Mark Coates & Ding
numerical experiments
2012
New Orleans
Jilkov & Wu & Chen
GPUs numerical experiments
2013
Melbourne
Mark Morelande
generalization of theory
2011
Goteborg
Svensson, et al.
generalization of theory
2011
Scientific Systems
Lingji Chen & Raman Mehra
analysis of singularities in incompressible flow for d = 1
2010
Mitsubishi
Grover & Sato
theory & numerical experiments
2012
Cambridge
Peter Bunch & Simon Godsill
theory & numerical experiments
2014
Liverpool
Simon Maskell & Flávio De Melo
relation to MCMC & numerical experiments
2014
London
Simon Julier
numerical experiments
METRON
Kristine Bell & Larry Stone
numerical experiments & theory
Lockheed Martin
Nima Moshtagh & Moses Chan
numerical experiments
---
STR
Shozo Mori
theory & numerical experiments
---
BU
Castanon, et al.
numerical experiments
---
Toulouse
Marcelo Pereyra
numerical experiments & theory
---
Bonn
Martin Ulmke & Ahmed
numerical experiments
---
Istanbul
Serdar Aslan
numerical experiments
---
Wuhan
Lanlan Pang
numerical experiments
---
Tufts & Hartford
Umarov & Nelson
fractional Brownian motion
Raytheon Boston
Daum & Huang & Noushin
theory & numerical experiments
2007-2014
Raytheon Arizona
Frankot & Reid & Kyle
theory & numerical experiments
---
Raytheon California
Ploplys & Casey
theory & numerical experiments
---
--2014
2014
dimension of the state vector
1000
100
extended Kalman filters
particle flow filters
10 standard particle filters 1 nonlinearity or non-Gaussianity 36
many applications of particle flow robotics
communications
control of chemical, mechanical, electrical & nuclear plants
tracking
weather & climate prediction
predicting ionosphere, thermosphere, troposphere
science
imaging
medicine (e.g., MRI, surgical planning, drug design, diagnosis)
transport
oil & mineral exploration
financial engineering
adaptive antennas
audio & video signal processing
nonlinear filtering & smoothing
multi-sensor data fusion
compressive sensing
CRYPTO
Bayesian decisions & learning
guidance & navigation
38
exact flow filter is many orders of magnitude faster per particle than standard particle filters 7
Median Computation Time for 30 Updates (sec)
10
d= d= d= d=
6
10
5
30 20 10 5
bootstrap particle filter
10
-----
4
10
bootstrap EKF proposal incomp flow exact flow
EKF proposal
3
10
incompressible flow
2
10
1
10
exact flow
0
10
-1
10
2
10
10
3
Number of Particles
4
10
5
10
25 Monte Carlo trials 39
* Intel Corel 2 CPU, 1.86GHz, 0.98GB of RAM, PC-MATLAB version 7.7
particle flow filter is many orders of magnitude faster real time computation (for the same or better estimation accuracy) 3 or 4 orders of magnitude faster per particle avoids bottleneck in parallel processing due to resampling
3 or 4 orders of magnitude fewer particles
many orders of magnitude faster 40
Velocity Error (m/sec)
comparison of estimation accuracy for three filters: 10
7
10
6
10
5
10
4
N = 1,000 particles 100 Monte Carlo trials 20 dB SNR 10% tropo & SDMB d=6 standard particle filter
10
3
10
2
10
1
extended Kalman filter
particle flow 10
0
0
20
40
60 Time (sec)
80
100
new filter improves angle rate estimation accuracy by two or three orders of magnitude 10
highly nonlinear dynamics:
I1ω&1 + (I3 − I 2 )ω3ω2 = M1 I 2ω& 2 + (I1 − I3 )ω1ω3 = M2 I3ω&3 + (I 2 − I1)ω2ω1 = M3
10
2
median error in angle rates (rad/sec)
extended Kalman filter
1
standard particle filter 10
10
0
N = 500 particles SNR = 20 dB WB rangeEKF & Doppler data PF Incompressible d=6
-1
PF Ax+B
10
-2
10
-3
particle flow filter 0
2
4
6
8
Time (sec)
extended Kalman filter diverges because it cannot model multimodal conditional probability densities accurately
10
derivation of PDE for exact particle flow:
dx = f ( x, λ ) dλ ∂p( x, λ ) ∂( pf ) = −Tr ∂λ ∂x
definition of log p(x,λ)
∂ log p( x, λ ) ∂( pf ) p( x, λ ) = −Tr ∂λ ∂ x log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ log K (λ ) p( x, λ ) = −div( pf ) log h( x) − ∂λ div(q) = η ∂ log K (λ ) η = − p( x, λ )log h( x) − λ ∂
definition of η
43
Fokker-Planck equation* 2 f 1 p ∂p ∂p ∂ ∂ f − Tr p + Tr 2 Q =− ∂t ∂x 2 ∂x ∂x
p = p( x, t ) p( x, t ) = probabilit y density of x at time t dx dw = f ( x, t ) + dt dt Q = covariance matrix of white noise w(t) *Andrew Jazwinski, "Stochastic processes and filtering theory," 44 Dover Books Inc., 1998
Bayes’ rule: p ( x, t k Z k ) =
p( x, tk Z k −1 ) p( zk x, tk ) p( zk Z k −1 )
p( x, t k Z k ) = probability density of x at time t k given Zk x = state vector t k = time of the k th measuremen t th
z k = k measuremen t vector Zk = set of all measuremen ts up to & including time t k Z k = {z1 , z2 , z3 ,..., zk } 45
induced flow of particles for Bayes’ rule prior = g(x)
posterior = g(x)h(x)/K(1)
flow of density
pdf
pdf
log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ )
(sample from density)
why log? flow of particles
particles
λ=0
dx = f ( x, λ ) dλ
particles
λ =1 =1
46
convergence with N for particle filters: 2
σ ≈c/N N = number of particles c = so-called “constant” which depends on: (1) (2) (3) (4) (5) (6) (7) (8) (9)
dimension of the state vector (x) initial uncertainty in the state vector measurement accuracy shape of probability densities (e.g., log-concave or multimodal etc.) Lipschitz constants of log densities stability of the plant curvature of nonlinear dynamics & measurements ill-conditioning of Fisher information matrix smoothness of densities & dynamics & measurements 47
Oh’s Formula for Monte Carlo errors d
1 + k ε exp σ ≈ / N 1 + 2 k 1 + 2k 2
2
(1) (2) (3) (4)
Assumptions: Gaussian density (zero mean & unit covariance matrix) d-dimensional random variable Proposal density is also Gaussian with mean ε and covariance matrix kI, but it is not exact for k ≠ 1 or ε ≠ 0 N = number of Monte Carlo trials 48
nonlinear filter performance (accuracy wrt optimal & computational complexity) sparseness smoothness
DIMENSION process noise initial uncertainty of state vector
exploit structure (e.g. exact filters)
concentration of measure quality of proposal density ill-conditioning
measurement noise stability & mixing of dynamics multi-modal nonlinearity 49
variation in initial uncertainty of x N = 1000, Stable, d = 10, Quadratic , λ = 0.6
20
10
Huge Initial Uncertainty Large Initial Uncertainty Medium Initial Uncertainty Small Initial Uncertainty
15
Dimensionless Error
10
10
10
5
10
0
10
-5
10
0
5
10
15 Time
20
25
30
50
25 Monte Carlo Trials
variation in eigenvalues of the plant (λ) N = 1000, d = 10, Cubic
12
10
λ = 0.1
0.5 1.0 1.1 1.2
10
10
8
Dimensionless Error
10
6
10
4
10
2
10
0
10
-2
10
0
5
10
15 Time
20
25
30
51
25 Monte Carlo Trials
variation in dimension of x N = 1000, λ = 1.0, Cubic
12
10
Dimension = 5 10 15 20
10
10
8
Dimensionless Error
10
6
10
4
10
2
10
0
10
-2
10
0
5
10
15 Time
20
25
30
52
25 Monte Carlo Trials
d = 12, n = 3, y = x 2, SNR = 20dB y
4
Dimensionless Error
10
quadratic measurement nonlinearity particle flow filter beats EKF by orders of magnitude
3
10
2
10 2 10
EKF PF 3
10
4
10
5
10
Number of Particles
53
exact flow: performance vs. number of particles λ = 1.2, Linear, Large Initial Uncertainty
2
Dimensionless Error After 30 Updates
10
Dimension = 5 10 15 20 30
extremely unstable plant
1
10
0
10
-1
10
2
10
3
10 Number of Particles
4
10
54
25 Monte Carlo Trials
all roads lead to new flow: zero curvature & solution of vector Riccati equation rather than PDE
non-zero diffusion & clever choice of Q to avoid PDE
maximum likelihood estimation with Newton’s method
new flow
maximum likelihood estimation with homotopy
Svensson & Morelande et al., Hanebeck et al., Daum & Huang, Girolami & Calderhead, etc. 55
computing the Hessian of log p: log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) can compute λ = + Hessians using ∂x 2 ∂x 2 ∂x 2 calculus or 2nd 2 2 ∂ log p ∂ log h( x) −1 differences λ ≈ − C + ∂x 2 ∂x 2 C = sample covariance matrix of particles for prior (λ = 0) with Tychonov regularization; or EKF or UKF covariance matrix ∂ 2 log p −1 ≈ − P ∂x 2 P = sample covariance matrix of particles for p(x, λ ) with Tychonov regularization; or EKF or UKF covariance matrix 56
formula that avoids inverse of sample covariance matrix: −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) = +λ 2 2 ∂x ∂x ∂x 2 ∂ 2 log p ∂ 2 log h( x) −1 ≈ −C + λ for g(x) ≈ Gaussian 2 2 ∂x ∂x but Woodbury's matrix inversion lemma gives us : 2
(A + B)-1 = A−1 − A−1 B ( I + A−1 B ) −1 A−1 for arbitrary B and non - singular A hence −1
∂ log p −1 ( ) ≈ − C − CB I − CB C 2 ∂x in which 2
∂ 2 log h( x) B=λ ∂x 2
how to mitigate stiffness in ODEs for certain particle flows* method
computational complexity
filter accuracy
comments
1. use a stiff ODE solver (e.g., implicit integration rather than explicit)
large to extremely large
uncertain
textbook advice & many papers
2. use very small integration steps everywhere
extremely large
good
brute force solution
3. use very small integration steps only where needed (adaptively determined)
large
4. use very small integration steps only where needed (determined non-adaptively)
small
2nd best
easy to do with particle flow
5. transform to principal coordinates or approximately principal coordinates
small
best
easy to do for certain applications
6. Battin’s trick (i.e., sequential scalar measurement updates) 7. Tychonov regularization of the Hessian of log p
small
very bad
very small
uncertain
destroys the benefit of particle flow
*Daum & Huang, “seven dubious methods to mitigate stiffness in particle flow for nonlinear filters,” Proceedings of SPIE Conference, May 2014.
58
RED flows are extremely stiff* incompressible flow
irrotational flow
Coulomb’s law flow
small curvature flow
Gaussian densities
exponential family
Fourier transform flow
constant curvature flow
KnotheRosenblatt flow
non-zero diffusion flow
method of characteristics
geodesic flow
stabilized flows
finite dimensional flow
direct integration
MongeKantorovich transport
(e.g. zero)
*non-stiff flows work well with Euler integration ∆λ = 0.1 59
new particle flow : −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x 2
T
If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (P) over the set of particles:
dx ∂ log h ≈ P dλ ∂x
T
for Gaussian densities we get the EKF for each particle: T
dx ∂θ ( x) −1 ≈ P R (z − θ ( x) ) dλ ∂x
60
importance of avoiding explicit computation of normalization K(λ)* (1) our 3 best flows do not explicitly compute the normalization of the conditional density (2) small errors in computing K(λ) can ruin the filter accuracy (e.g., Coulomb’s law flow & Fourier transform flow) (3) similar effect in numerical weather prediction using transport theory (Bath University); must compute K(λ) to machine precision! (4) exploit this effect in designing flows *Daum & Huang, “how to avoid normalization of particle flow for nonlinear filters,” Proceedings of SPIE Conference, May 2014.
61
RED flows do not explicitly compute normalization K(λ) incompressible flow
irrotational flow
Coulomb’s law flow
small curvature flow constant curvature flow
Gaussian densities
exponential family
Fourier transform flow
KnotheRosenblatt flow
non-zero diffusion flow
method of characteristics
geodesic flow
stabilized flows
finite dimensional flow
direct integration
MongeKantorovich transport
(e.g. zero curvature)
62
exact particle flow for Gaussian densities: dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f = −div( f ) − ∂x dλ for g & h Gaussian, we can solve for f exactly : f does not depend on K(λ), despite the fact that the PDE does!
f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax
[ [
]
] 63
incompressible particle flow T ∂ log p ( x, λ ) ∂ x for non - zero gradient dx − log( h ( x )) 2 = ∂ log p ( x, λ ) dλ ∂x f does not depend 0 otherwise on K(λ), despite
for d ≥ 2
the fact that the PDE does!
64
new particle flow : −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x 2
T
If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (P) over the set of particles: f does not depend T on K(λ), despite dx ∂ log h ≈ P the fact that the dλ ∂x PDE does! for Gaussian densities we get the EKF for each particle: T
dx ∂θ ( x) −1 ≈ P R (z − θ ( x) ) dλ ∂x
65
most general solution for particle flow: dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f − = −div( f ) − dλ ∂x the most general solution is : d log K (λ ) f = -C [log h − ] + ( I − C #C ) y dλ in which C is a linear differenti al operator : ∂logp could pick y to C= + div robustly stabilize the ∂x filter or random or # C = generalized inverse of C zero or other #
y = arbitrary d - dimensiona l vector
66
idea #1 inspired by renormalization group flow: dx d log K (λ ) # = -C [log h − ] + ( I − C #C ) y dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #
∂Π ∂Γ y = − ∂L ∂L L = d log K (λ ) / dλ
L rather than K (just like QFT): linear in L but not K; result does not depend on K itself; avoids singularity; slightly different & it works
(.)# = generalized inverse of (.) 67
idea #2 inspired by renormalization group flow: dx d log K (λ ) # = -C [log h − ] + ( I − C #C ) y dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #
∂Π ∂Γ y = − ∂L ∂L L = log g ( x)
L rather than g (just like QFT): linear in L but not g; result does not depend on g itself; avoids singularity at g = 0; slightly different & it works
(.)# = generalized inverse of (.) 68
idea #3 inspired by renormalization group flow* dx d log K (λ ) # ] + ( I − C #C ) y = -C [log h − dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #
∂Π ∂Γ y = − ∂L ∂L L = {log g ( x), d log K (λ ) / dλ}
T
(.)# = generalized inverse of (.) *Daum & Huang, “renormalization group flow & other ideas inspired by physics 69 for nonlinear filters,” Proceedings of SPIE Conference, May 2014.
computation of normalization using Fourier transform: d log K (λ ) ] dλ take the Fourier transform :
div( pf ) = − p[log h −
dlogK(λ ) iω T ℑ(pf) = -ℑp logh d λ iω T ∫ p( x, λ ) f ( x, λ ) exp(−iω T x)dx = − ∫ p( x, λ )[log h( x) −
d log K (λ ) ] exp(−iω T x)dx dλ
evaluate above at ω = 0 (assuming that E(f) is finite) dlogK(λ ) 0 = ∫ p(x, λ ) log h(x) dx dλ d log K (λ ) = E[log h( x)] dλ approximate the integral using the Monte Carlo sum over particles : dlogK(λ ) 1 N ≈ ∑ log h( x j ) dλ N j =1
MOVIES
71
exact particle flow for Gaussian densities: dx = f ( x, λ ) dλ ∂ log p log(h) = −div( f ) − f ∂x for g & h Gaussian, we can solve for f exactly : automatically stable under very mild conditions & extremely fast
f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax
[ [
]
] 72
Inside = 5 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
73
Inside = 10.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
74
Inside = 13.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
75
Inside = 16.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
76
Inside = 17.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
77
Inside = 21 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
78
Inside = 21 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
79
Inside = 23.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
80
Inside = 26.4 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
81
Inside = 27.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Ax+b
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
82
incompressible particle flow ∂ log p − log( h ) dx ∂x = dλ dx =0 dλ
T
∂ log p ∂x
2
for zero gradient
83
Inside = 6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
84
Inside = 5.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
85
Inside = 8.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
86
Inside = 9.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
87
Inside = 11.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
88
Inside = 11.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
89
Inside = 12.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
90
Inside = 12 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
91
Inside = 11.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8
0.6
0.4
Hessian
0.2
0
-0.2
-0.4
-0.6
-0.8 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
92
QUADRATIC MEASURE MENTS 93
d = 12, n = 3, y = x 2, SNR = 20dB y
4
Dimensionless Error
10
quadratic measurement nonlinearity 3
10
2
10 2 10
EKF PF 3
10
4
10
5
10
Number of Particles
94
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
95
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
96
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
97
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
98
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
99
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
100
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
101
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
102
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
103
5
1
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.5
0
0.5
1
1.5 5
x 10
104
CUBIC MEASURE MENTS 105
d = 12, n = 3, y = x 3, SNR = 20dB y
7
10
6
Dimensionless Error
10
cubic measurement nonlinearity 5
10
4
10
3
10 2 10
EKF PF 10
3
10
4
10
5
Number of Particles
106
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
107
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
108
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
109
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
110
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
111
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
112
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
113
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
114
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
115
5
1.5
Time = 1, Magenta: truth, Green: PF estimate, Black: KF
x 10
1
0.5
0
-0.5
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2 5
x 10
116
STABILITY 117
Stability of nonlinear filters stability of Kalman filter (1963 paper by Kalman)
Kalman filter is stable under very mild conditions (e.g., controllability & observability); Kalman filter is stable for unstable plants*.
stability of extended Kalman filters
EKF is unstable for unstable plants for typical nonlinear examples**
convergence of particle filters assumes ergodicity, which implies stability of plant papers by Ramon van Handel makes extremely strong assumptions & Dan Crisan (2009) (the complete state vector is measured) incompressible flow of particles
explicitly stabilize filter & flow for unstable plants without measuring complete state vector
exact (compressible) particle flow
automatically stable filter & flow 118
most general solution for flow: dx =? dλ by the chain rule : d log p ( x, λ ) ∂ log p ( x, λ ) dx ∂ log p ( x, λ ) = + =0 dλ ∂x dλ ∂λ ∂ log p ( x, λ ) dx = − A# + [ I − A T A / Tr ( A T A)] y ∂λ dλ ∂logp(x, λ ) pick y to maximize in which A = ∂x stability of filter y = arbitrary d - vector A # = A T / (AA T ) for AA T > 0, and A # = 0 otherwise. 119
Pick y to maximize stability of filter: dx ∂ log p ( x, λ ) + [ I − A T A / Tr ( A T A)] y = − A# dλ ∂λ For example, linearize about each particle dx ≈ Bx + Dy with y = Kx dλ Φ filter ≈ Φ plant Φ Bayes where Φ Bayes ≈ exp(B + DK)
Most general solution for flow
pick K to minimize the following stability measure
Schur’s inequality
T T ( ) ( Φ ≤ Tr Φ Φ Φ λ ∑ j =1 j filter plant plant Bayes Φ Bayes ) d
2
optimal K ≈ −( I + B ) and thus optimal y ≈ - x - Bx but we can " delineariz e" Bx, resulting in y ≈ - x - {Bx} delineariz ed ∂ log p ( x, λ ) Hence, optimal y ≈ - x +A but DA # = 0, and thus ∂λ ∂ log p ( x, λ ) dx ≈ − A# − Dx Optimal y results in ∂λ dλ #
120
Fracational error in the estimated sum of magnitude squared of eigen values 3
2
● ● ● ● ● ●
errors in Schur’s inequality
imaginary part of the eigenvalues
1
100% 200% 300% 400% 500% 600%
0
-1
-2
-3 -4
random 10x10 real non-singular matrices -3
-2
-1
0 Real part of the eigenvaues
1
2
3
4
121
Exactly the same feedback is derived using standard control theory: y = -D T W -1 x in which the controllability Grammian is : 1
W = ∫ exp(− sB ) DD T exp(− sB T )ds 0
where B is the linearization of the particle flow : dx = − A # log(h) + ( I − AT A / AAT ) y dλ dx ≈ Bx + Dy dλ Using the facts that D = D 2 and D kills A # and a little algebra : y = -Dx
122
Particle filter accuracy depends on the plant stability & mixing 5
10
λ = 1.2
λ = eigenvalue of plant
1.1 1.0 0.5 0.1
4
Dimensionless Error
10
3
10
2
10
1
10
0
10
0
5
10
15 Time
20
25
d = 6, ny = 3, N = 500, #(MC) = 10, without -Dx
30
123
New theory (general flow) improves filter accuracy dramatically 5
10
λ = 1.2
4
1.1 1.0 0.5 0.1
λ = eigenvalue of plant
Dimensionless Error
10
3
10
2
10
1
10
0
10
0
5
10
15 Time
20
25
d = 6, ny = 3, N = 500, #(MC) = 10, with -Dx
30
124
item
renormalization group flow in quantum field theory
particle flow for Bayes’ rule
1. purpose
avoid infinite integrals at all fix particle degeneracy in energy scales (µ) particle filters
2. PDE
linear first order PDE
3. method
“the trick of doing an homotopy of log-density integral a little bit at a time” (Tony Zee QFTNS p. 346)
4. efficacy
“the most important conceptual advance in QFT over the last 3 or 4 decades” (Tony Zee QFTNS p. 337)
5. algorithm
ODE for motion of particles f = dx/dλ in N-dimensional space x = particle in d-dimensions (Tony Zee QFTNS p. 340)
6. derivation of PDE
dH(x)/dµ = 0 H(x) = Hamiltonian
7. new idea for particle flow dH(x)/dµ = 0 inspired by RNGF (we want H to be scale invariant)
linear first order PDE
reduces computational complexity by many orders of magnitude for high dimensional problems
Fokker-Planck equation & definition of log p ∂f/∂g = 0 g = prior density
125
two steps in renormalization group flow & particle flow step
physics
nonlinear filters
1. regularization
regularization (e.g., cut-off of integral or dimensional regularization d-ε)
homotopy of log-density
2. renormalization
modify effective charge & mass as the energy scale varies from high to low (integrate out degrees of freedom to maintain symmetries & finite number of parameters); scale invariant flow of parameters:
compute flow of particles that is invariant to errors in the prior density & the normalization constant:
∂f/∂g = 0
dH/dµ = 0 126
exact recursive filters* filter
conditional density
Kalman (1960)
η = Gaussian
Beneš (1983)
Daum (1986)
special condition on dynamics dx/dt = f + w ∂f/∂x = A(t) f(x) = ∂V/∂x and
η exp(∫f(x)dx)
div(f) + ║f║² = x*Ax +bx+c exponential family p ( x Z ) = p ( x) exp[θ ( x)Ψ ( Z )]
∂θ ∂θ = Qr ∂t ∂x
[
r =
T
∂ log p ( x ) ∂x
− f
ξ
1 ξ − Aθ 2 ∂ 2θ j = Tr Q 2 ∂ x
]+ j
*Daum, IEEE AES Systems Magazine, August 2005.
127
a miracle*
p( x, t Z t ) = p( x, t Ψt ( Z t )) dimension of Z grows with time
dimension of Ψ is fixed and finite for all time
*Daum, IEEE AES Systems Magazine, August 2005.
128
Monge-Ampere highly nonlinear PDE y = T ( x) p ( x)dx = p ( y )dy ∂y p ( x) = p ( y ) det ∂x ∂V Let T ( x) = ∂x Hence,
one shot transport requires nonlinear PDE (and we cannot evaluate the functions at good points!), whereas particle flow only needs an extremely simple linear PDE
∂ 2V p ( x) = p ( y ) det 2 ∂x 129
computing the Hessian of log p: log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) can compute λ = + Hessians using ∂x 2 ∂x 2 ∂x 2 calculus or 2nd 2 2 ∂ log p ∂ log h( x) −1 differences λ ≈ − C + ∂x 2 ∂x 2 C = sample covariance matrix of particles for prior (λ = 0) with Tychonov regularization; or EKF or UKF covariance matrix ∂ 2 log p −1 ≈ − P ∂x 2 P = sample covariance matrix of particles for p(x, λ ) with Tychonov regularization; or EKF or UKF covariance matrix 130
formula that avoids inverse of sample covariance matrix: −1
∂ log p ∂ log h dx = − 2 dλ ∂x ∂x ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) = +λ 2 2 ∂x ∂x ∂x 2 ∂ 2 log p ∂ 2 log h( x) −1 ≈ −C + λ for g(x) ≈ Gaussian 2 2 ∂x ∂x but Woodbury's matrix inversion lemma gives us : 2
(A + B)-1 = A−1 − A−1 B ( I + A−1 B ) −1 A−1 for arbitrary B and non - singular A hence −1
∂ log p −1 ( ) ≈ − C − CB I − CB C 2 ∂x in which 2
∂ 2 log h( x) B=λ ∂x 2
formula that avoids computing Hessian of g(x): T
∂ log g ( x j ) ∂ log g ( x j ) ∂ log g ( x) 1 ≈ − ∑ 2 ∂x ∂x ∂x N j =1 derivation of the above : 2
N
∂ log g ( x) T ∂ log g ( x) ∂ 2 log g ( x) E = − E 2 ∂x ∂x ∂x ∂ 2 log g ( x) ∂ 2 log g ( x) ≈ E 2 2 ∂x ∂ x ∂ log g ( x) T ∂ log g ( x) 1 E ≈ ∂x ∂x N
T
∂ log g ( x j ) ∂ log g ( x j ) ∑ ∂x ∂x j =1 N
132
method to solve PDE
how to pick unique solution
comments
1. generalized inverse of linear differential operator
minimum L² norm
Coulomb’s law or fast Poisson solver
2. Poisson’s equation
irrotational flow
Coulomb’s law or fast Poisson solver
3. generalized inverse of gradient of log-homotopy
incompressible flow
workhorse for multimodal densities
4. stabilized version of method #3
most robustly stable filter
workhorse for multimodal densities
5. separation of variables (Gaussian)
pick solution of specific form
extremely fast & hard to beat in accuracy
6. separation of variables (exponential family)
pick solution of specific form
generalization of Gaussian flow
7. variational formulation (Gauss & Hertz)
convex function minimization
generalization of minimum L² norm
8. optimal transport formulation (Monge-Kantorovich)
convex functional minimization (e.g., least action or Wasserstein metric, etc.)
very high computational complexity (e.g. Monge-Ampere fully nonlinear PDE)
9. direct integration (of first order linear PDE in divergence form)
choice of d-1 arbitrary functions
should work with enforcement of neutral charge density & importance sampling
10. method of characteristics (or generalized method of characteristics)
more conditions (e.g., small curvature or specify curl, or use Lorentz invariance)
can solve any first order linear PDE except for the one of interest to us!
11. another homotopy of the PDE (inspired by Gromov’s h-principle)
initial condition of ODE & uniqueness of solution to ODE
like Feynman’s perturbation for QED
12. finite dimensional parametric flow (e.g., f = Ax+b with A & b parameters)
non-singular matrix to invert
avoids PDE completely
13. Fourier transform of PDE (divergence form of linear PDE has constant coefficients!)
minimum L² norm or most stable flow
generalized inverse & Monte Carlo integration avoids inverse Fourier transform at random points in d dimensions
14. small “curvature” flow
set certain 2nd derivatives of flow to zero
solve d x d system of linear equations or numerically integrate ODE (like Feynman)
15. zero “curvature” flow
set acceleration of particles to zero
solve vector Riccati equation exactly in closed form (rather than solve PDE)!
16. constant “curvature” flow etc. etc.
set acceleration of particle to constant
solve polynomial multivariate equations (rather than PDE); maybe use homotopy
17. upper triangular Jacobian flow
set certain lower triangular terms in Jacobian to zero (but not all terms to zero)
inspired by Knothe-Rosenblatt rearrangement in transport theory
18. non-zero process noise in flow for Bayes’ rule, with clever choice of f & Q to avoid PDE
compute gradient of PDE to obtain d equations in d unknowns
Q = covariance matrix of diffusion in flow: dx = f(x, λ)dλ + √Q dw
derivation of PDE for particle flow with Q ≠ 0: dx dw = f ( x, λ ) + Q( x, λ ) dλ dλ 1 ∂p( x, λ ) ∂p = −div( pf ) + divQ( x, λ ) 2 ∂λ ∂x ∂ log p( x, λ ) 1 ∂p p( x, λ ) = −div( pf ) + div Q ∂λ 2 ∂x log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) 1 ∂p d log K (λ ) log ( ) ( , ) ( ) h x − p x = − div pf + divQ λ dλ 2 ∂x 1 ∂p ∂p d log K log h − dλ p = − pdiv( f ) − ∂x f + 2 divQ ∂x 1 d log K ∂ log p ∂p log − ( ) h = − div f − f + div Q 2 p ∂x dλ ∂x 134
derivation of first new particle flow with Q ≠ 0: ∂ log p ∂p d log K 1 log h − dλ = −div( f ) − ∂x f + 2 p divQ( x) ∂x 2 ∂ log h ∂div( f ) ∂ log p ∂f 1 ∂ ∂p T ∂ log p =−f − − + divQ( x) / p 2 ∂x ∂x ∂x ∂x ∂x 2 ∂x ∂x
pick Q such that the three last terms sum to zero, and solve for f: −1
∂ log p ∂ log h f = − 2 ∂x ∂x 2
T
135
why does the new flow work so well? item
new flow
old flows
1. solution for flow
d x d matrix inverse
Moore-Penrose inverse of 1 equation in d unknowns
2. normalization of probability density
we killed the normalization
explicitly computed
smoother (2nd derivatives wrt x)
less smooth (only first derivatives wrt x)
4. exploits calculus to compute Hessian & gradient of likelihood
yes
no
5. exploits greater freedom with non-zero diffusion in flow
yes
no
6. depends on Monte Carlo approximation
no
yes
more
less
3. exploits smoothness of density functions
7. generality
136
small curvature flow:
Gaussian flow : f = A(λ )x + b(λ ) div(f) = Tr(A)
incompressible flow : div(f) = 0
∂div( f ) =0 ∂x 137
linear first order highly underdetermined PDE:
dx = f ( x, λ ) dλ
(1) we want a stable flow (2) we want a full rank flow (3) we want a fast algorithm to approximate f accurately (roughly 1%)
div ( pf ) = η let q = pf (p = known & f = unknown) div ( q ( x , λ )) = η d log K (λ ) η ( x, λ ) = − p ( x, λ ) log h ( x ) − dλ ∂q d ∂q1 ∂q 2 + + ... + η= like Gauss’ divergence ∂x1 ∂x2 ∂x d
law in electromagnetics with η analogous to electric charge density 138
irrotational particle flow: T
dx ∂V ( x, λ ) = f ( x, λ ) = / p( x, λ ) dλ ∂x ∂ 2V ( x, λ ) Tr = η ( x, λ ) 2 ∂x V ( x, λ ) = − ∫η ( y, λ )
c x− y
d −2
Poisson’s equation
dy for d ≥ 3
∂logK(λ ) c V(x, λ ) = ∫ p(y,λ ) logh(y)∂λ x − y
d −2
dy
∂V ( x, λ ) ∂ log K (λ ) c(2 − d )( x − y )T = ∫ p( y, λ ) log h( y ) − dy d ∂x ∂λ x− y ∂V ( x, λ ) ∂ log K (λ ) c(2 − d )( x − y)T = E (log h( y ) − ) d λ ∂x ∂ x− y c(2 − d )( xi − x j )T ∂V ( xi , λ ) 1 λ ∂ K log ( ) (log h( x j ) − ≈ ) ∑ d ∂x ∂λ M j∈Si xi − x j
like Coulomb’s law
139
derivation of Fourier transform particle flow: d log K (λ ) div( pf ) = − p[log h − ] dλ take the Fourier transform : dlogK(λ ) iω T ℑ(pf) = -ℑp logh λ d iω T ∫ p( x, λ ) f ( x, λ ) exp(−iω T x)dx = − ∫ p( x, λ )[log h( x) − E (log h)] exp(−iω T x)dx approximate the integrals using the Monte Carlo sum over particles : 1 N 1 N T iω ∑ f ( x j , λ ) exp(−iω x j ) ≈ − ∑ log h( x j ) − E (log h) exp(−iω T x j ) N j =1 N j =1 evaluate the above at k points in ω (e.g., k = d or 2d) and write this as a linear operator on the unknown function f :
[
T
L(ω )f = y (ω ) f(x) = L# y in which L# = generalized inverse of L
( )
in which L# = LT LLT
−1
]
Lf = y written out explicitly:
(
)
ω
(
)
ω1T cos(ω1T x 2 )
(
)
ω 2T sin (ω 2T x 2 )
(
)
ω 2T cos(ω 2T x 2 )
ω1T sin ω1T x1 ω1T cos ω1T x1 ω T sin ω T x 2 1 2 ω T cos ω T x 2 1 2 M M T T ω k sin ω k x1 T ω k cos ω kT x1
T 1
sin (ω x ) T 1
2
M M
(
)
ω kT sin (ω kT x 2 )
(
)
ω1T cos(ω kT x 2 )
T 1
L ω
sin (ω
T 1
xN
)
T T L ω1 cos ω1 x N L L f (x1 ) f (x2 ) L L M M M M M f ( x N ) dN ×1 M ω kT sin ω kT x N T T M ω k cos ω k x N 2 k ×dN
(
)
(
)
(
)
[
] (
[
] (
[
] (
[
] (
[
] (
[
] (
− ∑ log h(x j ) − E (log h ) cos ω1T x j j ∑ log h(x j ) − E (log h ) sin ω1T x j j − log h(x j ) − E (log h ) cos ω 2T x j ∑ j log h(x j ) − E (log h ) sin ω 2T x j ∑ j = M M T − ∑ log h(x j ) − E (log h ) cos ω k x j j ∑ log h(x j ) − E (log h ) sin ω kT x j j
)
)
)
)
2 k ×1
)
)
optimization of points in k-space for Fourier transform time = 1, iteration 5, blue = best, red = worst 0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2 -0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
most general solution for incompressible flow: dx =? dλ by the chain rule : d log p ( x, λ ) ∂ log p ( x, λ ) dx ∂ log p ( x, λ ) = + =0 dλ ∂x dλ ∂λ dx = − A # log h ( x ) + [ I − A # A] y dλ ∂logp(x, λ ) pick y to maximize in which A = ∂x robustness of filter y = arbitrary d - vector A # = AT / (AA T ) for AA T > 0, and A # = 0 otherwise. 143
Dimensionless Error
10
Ν = 1000
3
EKF Incompressible Ax+b No Flow Zero Curvature
10
2
10
1
10
0
10
-1
0
5
10
15 Time
σ0 = 100
20
25
30
10
σ 0 = 10
1
Dimensionless Error
EKF Incompressible Ax+b No Flow Zero Curvature
10
0
10
-1
0
5
10
15 Time
∆λ0 = 1e-3
20
25
30
Dimensionless Error
10
σ 0 = 100
3
EKF Incompressible Ax+b No Flow Zero Curvature
10
2
10
1
10
0
10
-1
0
5
10
15 Time
∆λ0 = 1e-5
20
25
30
Dimensionless Error
10
σ 0 = 1000
5
EKF Incompressible Ax+b No Flow Zero Curvature
10
4
10
3
10
2
10
1
10
0
10
-1
0
5
10
15 Time
∆λ0 = 1e-7
20
25
30
Dimensionless Error
10
σ 0 = 1e4
8
EKF Incompressible Ax+b No Flow Zero Curvature
10
6
10
4
10
2
10
0
10
-2
0
5
10
15 Time
∆λ0 = 1e-9
20
25
30
Dimensionless Error
10
σ 0 = 1e5
10
10
8
10
6
10
4
10
2
10
0
10
-2
EKF Incompressible Ax+b No Flow Zero Curvature
0
5
10
15 Time
20
∆λ0 = 1e-11
25
30
Dimensionless Error
10
σ 0 = 1e6
12
10
10
10
8
10
6
10
4
10
2
10
0
10
-2
0
EKF Incompressible Ax+b No Flow Zero Curvature
5
10
15 Time
20
∆λ0 = 1e-13
25
30
Dimensionless Error
10
σ 0 = 1e7
15
10
10
10
5
10
0
10
-5
0
EKF Incompressible Ax+b No Flow Zero Curvature
5
10
15 Time
20
∆λ0 = 1e-15
25
30
Fisher information matrix
∂ log p J = −E 2 ∂x 2
152
zero curvature flow: we want to solve the following PDE for the flow f : ∂ log p d log K div( f ) + f = − log h + ∂x dλ assume that the flow has zero curvature : d 2x =0 2 dλ dx dλ df hence =0 dλ using this condition (after several pages of calculations) results in : but f =
2 2 log p ∂ log h d log K ∂ fT f + 2 f = 2 2 x x d λ ∂ ∂
153
zero curvature flow: 2 2 log p log h d log K ∂ ∂ T f f = f +2 2 2 ∂ x x d λ ∂
(1) vector Riccati equation for f rather than a PDE for f! (2) highly underdetermined algebraic equation for f (3) we can solve for f exactly in closed form! (4) Hessian of logp is similar to the Fisher information matrix (we can exploit this to solve for f exactly in closed form) (5) for nonlinear measurements with Gaussian noise, it is easy & fun to solve for f explicitly! 154
exploit non-singular symmetric pre-Fisher information matrix: 2 2 ∂ log p ∂ log h d log K T f f = f +2 2 2 ∂ x x d λ ∂
we can always write the above in the following canonical form : ~2 ∂ log h d 2 log K −1 ~ f −2 H f =− ∂x dλ2 in which : ∂ 2 log p H=− ∂x 2 ~ f = Hf 155
solution of general vector Riccati equation: 2 ~ f −b = 0
this is a single scalar - valued equation in d unknowns, ~ but it nevertheless has obvious the unique solution : f = b ~ ~ ( f − b)T ( f − b) = 0 ~2 ~ 2 f − 2bT f + b = 0 ~ obviously this equation also has the unique solution : f = b Encouraged by the above simple example, now consider our equation : ~2 T~ f +b f +c = 0 ~ let f = kb in which k is a scalar 2
2
k2 b + k b +c = 0 2
k 2 + k + c / b = 0 which has the solution : k =
− 1 ± 1 − 4c / b 2
2
156
geometrical interpretation of solution: ~2 T~ f +b f +c = 0 ~ let f = kb in which k is a scalar 2
2
k2 b + k b +c = 0 2
k 2 + k + c / b = 0 which has the solution : k =
− 1 ± 1 − 4c / b
2
2
b
157
158
solution of our vector Riccati equation: ~ f = Hf ~ f = H f ~ f = kb −1
∂ log h b = −2 H −1 ∂x − 1 ± 1 − 4c / b
T
2
d 2 log K k= and c = 2 dλ2 T −1 −1 ∂ log h f = H − 2k H ∂x ∂ log h f = −2kH ∂x
T
−1
−1
∂ log p ∂ log h f = 2k 2 ∂x ∂x 2
T
159
fast Ewald’s method vs. Coulomb’s law item
fast Ewald method in physics & chemistry*
1. dimension of x
d=3
Coulomb’s law with fast approximate k-NN d = 3 to 30
comments
rapid decay of Coulomb kernel in higher d helps!
2. relative error desired
0.0001 or better
1% to 10%
all Ewald methods the same for 1% accuracy
3. cut-off in x space
fixed distance
random per k-NN
automatic space-taper to weight convolution
4. desired force
on mesh
at particles
big difference!
5. neutral charge
locally enforced
locally enforced
crucial!!!
6. smoothing charge
Gaussian
no explicit smoothing
Debye kernel
7. k-space or realspace
both
real space (x)
no FFT needed for Coulomb
*Shan, Klepeis, Eastwood, Dror & Shaw, “Gaussian split Ewald: a fast Ewald mesh method for molecular simulation,” Journal of Chemical Physics, 2005.
160
item 1. purpose
particle flow fix particle degeneracy due to Bayes’ rule
Monge-Kantorovich transport move physical objects with minimal effort from one probability density to another
2. conservation of probability mass along flow
yes
yes
3. deterministic
yes*
yes
4. homotopy of density
no
yes
5. log-homotopy of density
yes
no
6. optimality criteria
none
7. how to pick a solution 8. stability of flow explicitly considered 9. high dimensional applications 10. computational complexity
24 distinct methods
minimize convex functional
yes
rarely
yes (d ≤ 42)
no (d = 1, 2 or 3)
numerical integration of ODE for each particle
11. solution of PDE for nice special incompressible, irrotational, cases Gaussian, geodesic, etc. 12. math theory for existence of incompressible flow, etc.
Wasserstein metric (dirt mover’s metric) or minimum action, etc.
Poisson’s PDE or HJB PDE or Monge-Ampere PDE etc. Moser (1965 & 1990), Brenier (1991), Knothe-Rosenblatt (1952)
borrow Shnirelman’s theorem, and Shnirelman’s theorem 161 Moser & Dacorogna (1990) for d ≥ 3, Moser & Dacorogna