Bayesian particle flow for estimation, decisions & transport

Sep 23, 2014 - nonlinear filter* solution of. Fokker-Planck equation measurements. ),(). ,(). ,(. :rule. Bayes' ...... N = 1000, Stable, d = 10, Quadratic. Huge Initial ...
6MB taille 1 téléchargements 399 vues
Bayesian particle flow for estimation, decisions & transport Fred Daum & Jim Huang 23 September 2014 1 Copyright © 2014 Raytheon Company. All rights reserved. Customer Success Is Our Mission is a trademark of Raytheon Company.

nonlinear filter problem* dynamical model of state : dx = F ( x, t )dt + G ( x, t )dw or  x(t k +1 ) = F ( x(t k ), t k , w(t k )) x(t ) = state vector at time t w(t ) = process noise vector at time t z k = measuremen t vector at time t k

estimate x given noisy measurements

z k = H ( x(t k ), t k , vk ) vk = measuremen t noise vector at time t k p ( x, t k Z k ) = probability density of x at time t k given Zk Z k = set of all measuremen ts Z k = {z1 , z 2 ,..., zk } "The Oxford handbook of nonlinear filtering," edited by Dan Crisan and Boris Rozovskii, Oxford University Press, 2011.

2

curse of dimensionality for classic particle filter*

optimal accuracy: r = 1.0

*Daum, IEEE AES Systems Magazine, August 2005.

3

nonlinear filter* prediction of conditional probability density from tk-1 to tk

measurements

Bayes' rule : solution of Fokker-Planck equation

p( x, t k Z k ) = p( x, t k Z k −1 ) p( z k x, t k )

*Yu-Chi Ho & R. C. K. Lee, "A Bayesian approach to problems in stochastic estimation and control," IEEE Transactions on automatic control, pages 333-339, October 1964.

4

particle degeneracy* prior density g(x)

likelihood h(x)

x particles to represent the prior 5

*Daum & Huang, “Particle degeneracy: root cause & solution,” SPIE Proceedings 2011.

particle degeneracy*

prior density g(x)

likelihood h(x)

x particles to represent the prior 6

*Daum & Huang, “Particle degeneracy: root cause & solution,” SPIE Proceedings 2011.

chicken & egg problem How do you pick a good way to represent the product of two functions before you compute the product itself?

7

induced flow of particles for Bayes’ rule prior = g(x)

posterior = g(x)h(x)/K(1)

flow of density

pdf

pdf

log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ )

(sample from density)

λ = continuous parameter (like time)

flow of particles particles

λ=0

dx = f ( x, λ ) dλ

particles

λ =1 =1

8

curse of dimensionality: root cause of curse of dimensionality: prior density

g

likelihood of measurement

h

particles to represent the prior prior pdf

posterior flow of density

pdf

log p ( x, λ ) = log g ( x) + λ log h( x) sample from density

sample from density

d log K   div( pf ) = p − log h + dλ   We design the particle flow by

flow of particles particles

λ=0

particles

dx = f ( x, λ ) dλ

λ=1

solving the above PDE for f.

d log K   div( pf ) = p − log h +  λ d   let q = pf ∂q1 ∂q2 ∂qd + + ... + =η ∂x1 ∂x2 ∂xd (1) (2) (3) (4) (5) (6) (7)

linear PDE in unknown f or q constant coefficient PDE in q first order PDE highly underdetermined PDE same as the Gauss divergence law in Maxwell’s equations same as Euler’s equation in fluid dynamics existence of solution if and only if volume integral of η is zero (i.e., neutral charge density for plasma; satisfied automatically) 10

irrotational flow

Coulomb’s law flow

small curvature flow

constant curvature flows (e.g. zero)

exponential family

Knothe-Rosenblatt flow

non-zero diffusion flow

geodesic flows

Fourier transform flow

direct integration

stabilized flows

finite dimensional flow

optimal MongeKantorovich transports

method of characteristics

renormalization group flow for log K(λ) inspired by QFT

renormalization group flow for log g(x) inspired by QFT

renormalization group flow for logK(λ) and log g(x)

exponential family with non-zero diffusion

Gibbs sampler like flow (inspired by direct integration)

non-singular Jacobian flow (inspired by proof)

maximum entropy flow

Moser coupling flow

suboptimal MongeKantorovich

incompressible flow

Gaussian densities

11

exact particle flow for Gaussian densities: dx/dλ does not depend on K(λ), despite the fact that the PDE does!

dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f = −div( f ) − ∂x dλ for g & h Gaussian, we can solve for f exactly :

automatically stable under very mild conditions & extremely fast

f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax

[ [

]

] 12

incompressible particle flow T   ∂ log p ( x, λ )     ∂ x for non - zero gradient dx − log( h ( x )) 2 = ∂ log p ( x, λ ) dλ  ∂x  0 otherwise

for d ≥ 2

dx/dλ does not depend on K(λ), despite the fact that the PDE does! 13

initial probability distribution of particles: Time = 1, Frame 1 400

Angle Rate (deg/sec)

300

λ = 0.0

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 2 400

Angle Rate (deg/sec)

300

λ = 0.1

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 2 400

Angle Rate (deg/sec)

300

λ = 0.2

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 3 400

Angle Rate (deg/sec)

300

λ = 0.3

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 4 400

Angle Rate (deg/sec)

300

λ = 0.4

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 5 400

Angle Rate (deg/sec)

300

λ = 0.5

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 6 400

Angle Rate (deg/sec)

300

λ = 0.6

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 7 400

Angle Rate (deg/sec)

300

λ = 0.7

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 8 400

Angle Rate (deg/sec)

300

λ = 0.8

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

flow of particles (for one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 9 400

Angle Rate (deg/sec)

300

λ = 0.9

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

final probability distribution of particles (resulting from one noisy measurement of sin(θ) with Bayes’ rule): Time = 1, Frame 10 400

Angle Rate (deg/sec)

300

λ=1

200

100

0

-100

-200

-300 -200

-150

-100

-50

0 50 Angle (deg)

100

150

200

new particle flow* : −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  2

T

dx/dλ does not depend on K(λ), despite the fact that the PDE does!

If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (C) over the set of particles:

dx  ∂ log h  ≈ C  dλ  ∂x 

T

for Gaussian densities we get the EKF for each particle: T

dx  ∂θ ( x)  −1 ≈ C  R (z − θ ( x) ) dλ  ∂x  *Daum & Huang, “particle flow with non-zero diffusion for nonlinear filters," SPIE conference proceedings, San Diego, August 2013.

25

d = 42 statesN = 1,000 particlesincompressible flow nonlinear dynamicsMetropolis & nonlinearadjusted measurements N = 10,000 particles Langevin median error over 100 Monte Carlo runs

dimension of state vector = 17 Hamiltonian Monte Carlo 100 Monte Carlo trials, SNR = 20 dB

regularized bootstrap auxiliary particle filter Gaussian flow new flow MALA, HMC, auxiliary & bootstrap

incompressible flow new flowflow new

Gaussian flow

time

new particle flow : −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  2

T

If we approximate the density g(x) from the exponential family, then the flow is: −1

 ∂ log q( x) dx ∂ log h   ∂ log h  T ∂ θ ( x) ≈ − +ψ +λ    2 2 2 dλ ∂x ∂x ∂x   ∂x   2

2

2

in which the unnormalized prior density is:

[

g ( x) ≈ q( x) exp ψ Tθ ( x)

] 27

T

BIG DIG (17 million cubic yards of dirt, one million truckloads & $24 billion)*

*Daum & Huang, “particle flow & Monge-Kantorovich transport,” proceedings of FUSION conference, Singapore, July 2012.

28

item 1. purpose 2. optimality criterion 3. how to pick a unique solution

4. computational complexity 5. high dimensional problems 6. solution of nice special cases

7. homotopy of densities 8. stability of flow considered (e.g., Lyapunov stability) 9. existence of flows proved

10. conservation of probability mass along the flow 11. avoiding normalization of probability density is crucial for practical algorithms 12. explicitly compute normalized densities 13. stiff ODEs mitigated 14. stochastic flows

particle flow

transport

fix particle degeneracy caused by Bayes’ rule NONE! many different methods (e.g. min L² norm, given form, most stable flow, min convex function, more smoothness, max entropy, arbitrary, random, etc.) numerical integration of ODE for each particle

move physical objects from one probability density to another* minimize convex functional

d = 1 to 42 Gaussian, incompressible, irrotational, geodesic, exponential family, etc. log-homotopy often explicitly designed into algorithm adapt proofs from transport theory

minimize convex functional (e.g., dirt mover’s metric or Wasserstein metric) solution of PDE (Poisson’s, MongeAmpere, HJB, etc.) d = 1, 2, 3 Moser coupling (1990), Brenier (1991-2011), Knothe-Rosenblatt (1952), etc. homotopy rarely

yes

Shnirelman irrotational flow d > 2, Moser & Dacorogna d >1, Brenier any d, et al. yes

yes

no

no, log of unnormalized densities

yes

yes rarely

no rarely

29

superb books on transport theory Cédric Villani, “Topics in optimal transportation,” AMS Press 2003. Very clear & accessible introduction; wonderful book! Cédric Villani, “Optimal transport: old & new,” Springer-Verlag 2009. More detailed & rigorous math; free on internet! 30

new nonlinear filter: particle flow new particle flow filter

standard particle filters

many orders of magnitude faster than standard particle filters

suffers from curse of dimensionality due to particle degeneracy

3 to 4 orders of magnitude faster code per particle for any d ≥ 3 problems

requires resampling using a proposal density

3 to 4 orders of magnitude fewer particles required to achieve optimal accuracy for d ≥ 6 problems

requires millions or billions of particles for high dimensional problems

Bayes’ rule is computed using particle flow (like physics)

Bayes’ rule is computed using a pointwise multiplication of two functions

no proposal density

depends on proposal density (e.g., Gaussian from EKF or UKF or other)

no resampling of particles

resampling is needed to repair the damage done by Bayes’ rule

embarrassingly parallelizable

suffers from bottleneck due to resampling

computes log of unnormalized density

suffers from severe numerical problems 31 due to computation of normalized density

history of mathematics 1. creation of the integers 2. invention of counting 3. invention of addition as a fast method of counting 4. invention of multiplication as a fast method of addition 5. invention of particle flow as a fast method of multiplication* 32

BACKUP 33

REFERENCES: (1) Fred Daum, "Nonlinear filters: beyond the Kalman filter," IEEE Aerospace & Electronic Systems Magazine special tutorial, pages 57-69, August 2005. (2) Fred Daum and Jim Huang, "Particle flow with non-zero diffusion for nonlinear filters," Proceedings of SPIE conference, San Diego, August 2013. (3) Arnaud Doucet and A. M. Johansen, "A tutorial on particle filtering and smoothing: fifteen years later", in "The Oxford handbook of nonlinear filtering," edited by Dan Crisan and Boris Rozovskii, pages 656 to 704, 2011. (4) Fred Daum and Jim Huang, "Particle flow and Monge-Kantorovich transport," Proceedings of IEEE FUSION Conference, Singapore, July 2012. (5) Fred Daum & Jim Huang, “How to avoid normalization of particle flow for nonlinear filters, Bayesian decisions and transport,” Proceedings of SPIE conference, Baltimore, May 2014. 34

university or company

researchers

topic

papers

Connecticut

Peter Willett & Sora Choi

numerical experiments

2011

McGill

Mark Coates & Ding

numerical experiments

2012

New Orleans

Jilkov & Wu & Chen

GPUs numerical experiments

2013

Melbourne

Mark Morelande

generalization of theory

2011

Goteborg

Svensson, et al.

generalization of theory

2011

Scientific Systems

Lingji Chen & Raman Mehra

analysis of singularities in incompressible flow for d = 1

2010

Mitsubishi

Grover & Sato

theory & numerical experiments

2012

Cambridge

Peter Bunch & Simon Godsill

theory & numerical experiments

2014

Liverpool

Simon Maskell & Flávio De Melo

relation to MCMC & numerical experiments

2014

London

Simon Julier

numerical experiments

METRON

Kristine Bell & Larry Stone

numerical experiments & theory

Lockheed Martin

Nima Moshtagh & Moses Chan

numerical experiments

---

STR

Shozo Mori

theory & numerical experiments

---

BU

Castanon, et al.

numerical experiments

---

Toulouse

Marcelo Pereyra

numerical experiments & theory

---

Bonn

Martin Ulmke & Ahmed

numerical experiments

---

Istanbul

Serdar Aslan

numerical experiments

---

Wuhan

Lanlan Pang

numerical experiments

---

Tufts & Hartford

Umarov & Nelson

fractional Brownian motion

Raytheon Boston

Daum & Huang & Noushin

theory & numerical experiments

2007-2014

Raytheon Arizona

Frankot & Reid & Kyle

theory & numerical experiments

---

Raytheon California

Ploplys & Casey

theory & numerical experiments

---

--2014

2014

dimension of the state vector

1000

100

extended Kalman filters

particle flow filters

10 standard particle filters 1 nonlinearity or non-Gaussianity 36

many applications of particle flow robotics

communications

control of chemical, mechanical, electrical & nuclear plants

tracking

weather & climate prediction

predicting ionosphere, thermosphere, troposphere

science

imaging

medicine (e.g., MRI, surgical planning, drug design, diagnosis)

transport

oil & mineral exploration

financial engineering

adaptive antennas

audio & video signal processing

nonlinear filtering & smoothing

multi-sensor data fusion

compressive sensing

CRYPTO

Bayesian decisions & learning

guidance & navigation

38

exact flow filter is many orders of magnitude faster per particle than standard particle filters 7

Median Computation Time for 30 Updates (sec)

10

d= d= d= d=

6

10

5

30 20 10 5

bootstrap particle filter

10

-----

4

10

bootstrap EKF proposal incomp flow exact flow

EKF proposal

3

10

incompressible flow

2

10

1

10

exact flow

0

10

-1

10

2

10

10

3

Number of Particles

4

10

5

10

25 Monte Carlo trials 39

* Intel Corel 2 CPU, 1.86GHz, 0.98GB of RAM, PC-MATLAB version 7.7

particle flow filter is many orders of magnitude faster real time computation (for the same or better estimation accuracy) 3 or 4 orders of magnitude faster per particle avoids bottleneck in parallel processing due to resampling

3 or 4 orders of magnitude fewer particles

many orders of magnitude faster 40

Velocity Error (m/sec)

comparison of estimation accuracy for three filters: 10

7

10

6

10

5

10

4

N = 1,000 particles 100 Monte Carlo trials 20 dB SNR 10% tropo & SDMB d=6 standard particle filter

10

3

10

2

10

1

extended Kalman filter

particle flow 10

0

0

20

40

60 Time (sec)

80

100

new filter improves angle rate estimation accuracy by two or three orders of magnitude 10

highly nonlinear dynamics:

I1ω&1 + (I3 − I 2 )ω3ω2 = M1 I 2ω& 2 + (I1 − I3 )ω1ω3 = M2 I3ω&3 + (I 2 − I1)ω2ω1 = M3

10

2

median error in angle rates (rad/sec)

extended Kalman filter

1

standard particle filter 10

10

0

N = 500 particles SNR = 20 dB WB rangeEKF & Doppler data PF Incompressible d=6

-1

PF Ax+B

10

-2

10

-3

particle flow filter 0

2

4

6

8

Time (sec)

extended Kalman filter diverges because it cannot model multimodal conditional probability densities accurately

10

derivation of PDE for exact particle flow:

dx = f ( x, λ ) dλ ∂p( x, λ )  ∂( pf )  = −Tr   ∂λ  ∂x 

definition of log p(x,λ)

∂ log p( x, λ )  ∂( pf )  p( x, λ ) = −Tr   ∂λ ∂ x   log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ log K (λ )   p( x, λ ) = −div( pf ) log h( x) −  ∂λ   div(q) = η ∂ log K (λ )   η = − p( x, λ )log h( x) −  λ ∂  

definition of η

43

Fokker-Planck equation* 2  f 1 p  ∂p ∂p ∂ ∂   f − Tr   p + Tr  2 Q  =− ∂t ∂x 2  ∂x  ∂x  

p = p( x, t ) p( x, t ) = probabilit y density of x at time t dx dw = f ( x, t ) + dt dt Q = covariance matrix of white noise w(t) *Andrew Jazwinski, "Stochastic processes and filtering theory," 44 Dover Books Inc., 1998

Bayes’ rule: p ( x, t k Z k ) =

p( x, tk Z k −1 ) p( zk x, tk ) p( zk Z k −1 )

p( x, t k Z k ) = probability density of x at time t k given Zk x = state vector t k = time of the k th measuremen t th

z k = k measuremen t vector Zk = set of all measuremen ts up to & including time t k Z k = {z1 , z2 , z3 ,..., zk } 45

induced flow of particles for Bayes’ rule prior = g(x)

posterior = g(x)h(x)/K(1)

flow of density

pdf

pdf

log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ )

(sample from density)

why log? flow of particles

particles

λ=0

dx = f ( x, λ ) dλ

particles

λ =1 =1

46

convergence with N for particle filters: 2

σ ≈c/N N = number of particles c = so-called “constant” which depends on: (1) (2) (3) (4) (5) (6) (7) (8) (9)

dimension of the state vector (x) initial uncertainty in the state vector measurement accuracy shape of probability densities (e.g., log-concave or multimodal etc.) Lipschitz constants of log densities stability of the plant curvature of nonlinear dynamics & measurements ill-conditioning of Fisher information matrix smoothness of densities & dynamics & measurements 47

Oh’s Formula for Monte Carlo errors d

 1 + k   ε  exp  σ ≈   / N  1 + 2 k    1 + 2k  2

2

(1) (2) (3) (4)

Assumptions: Gaussian density (zero mean & unit covariance matrix) d-dimensional random variable Proposal density is also Gaussian with mean ε and covariance matrix kI, but it is not exact for k ≠ 1 or ε ≠ 0 N = number of Monte Carlo trials 48

nonlinear filter performance (accuracy wrt optimal & computational complexity) sparseness smoothness

DIMENSION process noise initial uncertainty of state vector

exploit structure (e.g. exact filters)

concentration of measure quality of proposal density ill-conditioning

measurement noise stability & mixing of dynamics multi-modal nonlinearity 49

variation in initial uncertainty of x N = 1000, Stable, d = 10, Quadratic , λ = 0.6

20

10

Huge Initial Uncertainty Large Initial Uncertainty Medium Initial Uncertainty Small Initial Uncertainty

15

Dimensionless Error

10

10

10

5

10

0

10

-5

10

0

5

10

15 Time

20

25

30

50

25 Monte Carlo Trials

variation in eigenvalues of the plant (λ) N = 1000, d = 10, Cubic

12

10

λ = 0.1

0.5 1.0 1.1 1.2

10

10

8

Dimensionless Error

10

6

10

4

10

2

10

0

10

-2

10

0

5

10

15 Time

20

25

30

51

25 Monte Carlo Trials

variation in dimension of x N = 1000, λ = 1.0, Cubic

12

10

Dimension = 5 10 15 20

10

10

8

Dimensionless Error

10

6

10

4

10

2

10

0

10

-2

10

0

5

10

15 Time

20

25

30

52

25 Monte Carlo Trials

d = 12, n = 3, y = x 2, SNR = 20dB y

4

Dimensionless Error

10

quadratic measurement nonlinearity particle flow filter beats EKF by orders of magnitude

3

10

2

10 2 10

EKF PF 3

10

4

10

5

10

Number of Particles

53

exact flow: performance vs. number of particles λ = 1.2, Linear, Large Initial Uncertainty

2

Dimensionless Error After 30 Updates

10

Dimension = 5 10 15 20 30

extremely unstable plant

1

10

0

10

-1

10

2

10

3

10 Number of Particles

4

10

54

25 Monte Carlo Trials

all roads lead to new flow: zero curvature & solution of vector Riccati equation rather than PDE

non-zero diffusion & clever choice of Q to avoid PDE

maximum likelihood estimation with Newton’s method

new flow

maximum likelihood estimation with homotopy

Svensson & Morelande et al., Hanebeck et al., Daum & Huang, Girolami & Calderhead, etc. 55

computing the Hessian of log p: log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) can compute λ = + Hessians using ∂x 2 ∂x 2 ∂x 2 calculus or 2nd 2 2 ∂ log p ∂ log h( x) −1 differences λ ≈ − C + ∂x 2 ∂x 2 C = sample covariance matrix of particles for prior (λ = 0) with Tychonov regularization; or EKF or UKF covariance matrix ∂ 2 log p −1 ≈ − P ∂x 2 P = sample covariance matrix of particles for p(x, λ ) with Tychonov regularization; or EKF or UKF covariance matrix 56

formula that avoids inverse of sample covariance matrix: −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) = +λ 2 2 ∂x ∂x ∂x 2 ∂ 2 log p ∂ 2 log h( x) −1 ≈ −C + λ for g(x) ≈ Gaussian 2 2 ∂x ∂x but Woodbury's matrix inversion lemma gives us : 2

(A + B)-1 = A−1 − A−1 B ( I + A−1 B ) −1 A−1 for arbitrary B and non - singular A hence −1

 ∂ log p  −1 ( ) ≈ − C − CB I − CB C   2  ∂x  in which 2

∂ 2 log h( x) B=λ ∂x 2

how to mitigate stiffness in ODEs for certain particle flows* method

computational complexity

filter accuracy

comments

1. use a stiff ODE solver (e.g., implicit integration rather than explicit)

large to extremely large

uncertain

textbook advice & many papers

2. use very small integration steps everywhere

extremely large

good

brute force solution

3. use very small integration steps only where needed (adaptively determined)

large

4. use very small integration steps only where needed (determined non-adaptively)

small

2nd best

easy to do with particle flow

5. transform to principal coordinates or approximately principal coordinates

small

best

easy to do for certain applications

6. Battin’s trick (i.e., sequential scalar measurement updates) 7. Tychonov regularization of the Hessian of log p

small

very bad

very small

uncertain

destroys the benefit of particle flow

*Daum & Huang, “seven dubious methods to mitigate stiffness in particle flow for nonlinear filters,” Proceedings of SPIE Conference, May 2014.

58

RED flows are extremely stiff* incompressible flow

irrotational flow

Coulomb’s law flow

small curvature flow

Gaussian densities

exponential family

Fourier transform flow

constant curvature flow

KnotheRosenblatt flow

non-zero diffusion flow

method of characteristics

geodesic flow

stabilized flows

finite dimensional flow

direct integration

MongeKantorovich transport

(e.g. zero)

*non-stiff flows work well with Euler integration ∆λ = 0.1 59

new particle flow : −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  2

T

If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (P) over the set of particles:

dx  ∂ log h  ≈ P  dλ  ∂x 

T

for Gaussian densities we get the EKF for each particle: T

dx  ∂θ ( x)  −1 ≈ P  R (z − θ ( x) ) dλ  ∂x 

60

importance of avoiding explicit computation of normalization K(λ)* (1) our 3 best flows do not explicitly compute the normalization of the conditional density (2) small errors in computing K(λ) can ruin the filter accuracy (e.g., Coulomb’s law flow & Fourier transform flow) (3) similar effect in numerical weather prediction using transport theory (Bath University); must compute K(λ) to machine precision! (4) exploit this effect in designing flows *Daum & Huang, “how to avoid normalization of particle flow for nonlinear filters,” Proceedings of SPIE Conference, May 2014.

61

RED flows do not explicitly compute normalization K(λ) incompressible flow

irrotational flow

Coulomb’s law flow

small curvature flow constant curvature flow

Gaussian densities

exponential family

Fourier transform flow

KnotheRosenblatt flow

non-zero diffusion flow

method of characteristics

geodesic flow

stabilized flows

finite dimensional flow

direct integration

MongeKantorovich transport

(e.g. zero curvature)

62

exact particle flow for Gaussian densities: dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f = −div( f ) − ∂x dλ for g & h Gaussian, we can solve for f exactly : f does not depend on K(λ), despite the fact that the PDE does!

f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax

[ [

]

] 63

incompressible particle flow T   ∂ log p ( x, λ )     ∂ x for non - zero gradient dx − log( h ( x )) 2 = ∂ log p ( x, λ ) dλ  ∂x  f does not depend 0 otherwise on K(λ), despite

for d ≥ 2

the fact that the PDE does!

64

new particle flow : −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  2

T

If we approximate the density p as Gaussian, then the observed Fisher information matrix can be computed using the sample covariance matrix (P) over the set of particles: f does not depend T on K(λ), despite dx  ∂ log h  ≈ P  the fact that the dλ  ∂x  PDE does! for Gaussian densities we get the EKF for each particle: T

dx  ∂θ ( x)  −1 ≈ P  R (z − θ ( x) ) dλ  ∂x 

65

most general solution for particle flow: dx = f ( x, λ ) dλ d log K (λ ) ∂ log p log(h) − f − = −div( f ) − dλ ∂x the most general solution is : d log K (λ ) f = -C [log h − ] + ( I − C #C ) y dλ in which C is a linear differenti al operator : ∂logp could pick y to C= + div robustly stabilize the ∂x filter or random or # C = generalized inverse of C zero or other #

y = arbitrary d - dimensiona l vector

66

idea #1 inspired by renormalization group flow: dx d log K (λ ) # = -C [log h − ] + ( I − C #C ) y dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #

 ∂Π   ∂Γ  y = −     ∂L   ∂L  L = d log K (λ ) / dλ

L rather than K (just like QFT): linear in L but not K; result does not depend on K itself; avoids singularity; slightly different & it works

(.)# = generalized inverse of (.) 67

idea #2 inspired by renormalization group flow: dx d log K (λ ) # = -C [log h − ] + ( I − C #C ) y dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #

 ∂Π   ∂Γ  y = −     ∂L   ∂L  L = log g ( x)

L rather than g (just like QFT): linear in L but not g; result does not depend on g itself; avoids singularity at g = 0; slightly different & it works

(.)# = generalized inverse of (.) 68

idea #3 inspired by renormalization group flow* dx d log K (λ ) # ] + ( I − C #C ) y = -C [log h − dλ dλ f = Γ + Πy Π = projection into null - space of C ∂f ∂Γ ∂Π = + y=0 ∂L ∂L ∂L #

 ∂Π   ∂Γ  y = −     ∂L   ∂L  L = {log g ( x), d log K (λ ) / dλ}

T

(.)# = generalized inverse of (.) *Daum & Huang, “renormalization group flow & other ideas inspired by physics 69 for nonlinear filters,” Proceedings of SPIE Conference, May 2014.

computation of normalization using Fourier transform: d log K (λ ) ] dλ take the Fourier transform :

div( pf ) = − p[log h −

  dlogK(λ )   iω T ℑ(pf) = -ℑp logh  d λ    iω T ∫ p( x, λ ) f ( x, λ ) exp(−iω T x)dx = − ∫ p( x, λ )[log h( x) −

d log K (λ ) ] exp(−iω T x)dx dλ

evaluate above at ω = 0 (assuming that E(f) is finite) dlogK(λ )   0 = ∫ p(x, λ ) log h(x) dx  dλ   d log K (λ ) = E[log h( x)] dλ approximate the integral using the Monte Carlo sum over particles : dlogK(λ ) 1 N ≈ ∑ log h( x j ) dλ N j =1

MOVIES

71

exact particle flow for Gaussian densities: dx = f ( x, λ ) dλ ∂ log p log(h) = −div( f ) − f ∂x for g & h Gaussian, we can solve for f exactly : automatically stable under very mild conditions & extremely fast

f = Ax + b −1 1 T T A = − PH λHPH + R H 2 b = (I + 2λA) (I + λA)PH T R −1 z + Ax

[ [

]

] 72

Inside = 5 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

73

Inside = 10.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

74

Inside = 13.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

75

Inside = 16.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

76

Inside = 17.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

77

Inside = 21 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

78

Inside = 21 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

79

Inside = 23.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

80

Inside = 26.4 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

81

Inside = 27.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Ax+b

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

82

incompressible particle flow  ∂ log p  − log( h )  dx  ∂x  = dλ dx =0 dλ

T

∂ log p ∂x

2

for zero gradient

83

Inside = 6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

84

Inside = 5.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

85

Inside = 8.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

86

Inside = 9.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

87

Inside = 11.2 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

88

Inside = 11.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

89

Inside = 12.8 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

90

Inside = 12 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

91

Inside = 11.6 percent, Magenta: truth, Green: PF estimate, Black: KF 0.8

0.6

0.4

Hessian

0.2

0

-0.2

-0.4

-0.6

-0.8 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

92

QUADRATIC MEASURE MENTS 93

d = 12, n = 3, y = x 2, SNR = 20dB y

4

Dimensionless Error

10

quadratic measurement nonlinearity 3

10

2

10 2 10

EKF PF 3

10

4

10

5

10

Number of Particles

94

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

95

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

96

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

97

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

98

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

99

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

100

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

101

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

102

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

103

5

1

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 -1

-0.5

0

0.5

1

1.5 5

x 10

104

CUBIC MEASURE MENTS 105

d = 12, n = 3, y = x 3, SNR = 20dB y

7

10

6

Dimensionless Error

10

cubic measurement nonlinearity 5

10

4

10

3

10 2 10

EKF PF 10

3

10

4

10

5

Number of Particles

106

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

107

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

108

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

109

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

110

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

111

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

112

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

113

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

114

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

115

5

1.5

Time = 1, Magenta: truth, Green: PF estimate, Black: KF

x 10

1

0.5

0

-0.5

-1

-1.5 -1

-0.5

0

0.5

1

1.5

2 5

x 10

116

STABILITY 117

Stability of nonlinear filters stability of Kalman filter (1963 paper by Kalman)

Kalman filter is stable under very mild conditions (e.g., controllability & observability); Kalman filter is stable for unstable plants*.

stability of extended Kalman filters

EKF is unstable for unstable plants for typical nonlinear examples**

convergence of particle filters assumes ergodicity, which implies stability of plant papers by Ramon van Handel makes extremely strong assumptions & Dan Crisan (2009) (the complete state vector is measured) incompressible flow of particles

explicitly stabilize filter & flow for unstable plants without measuring complete state vector

exact (compressible) particle flow

automatically stable filter & flow 118

most general solution for flow: dx =? dλ by the chain rule : d log p ( x, λ ) ∂ log p ( x, λ ) dx ∂ log p ( x, λ ) = + =0 dλ ∂x dλ ∂λ ∂ log p ( x, λ ) dx = − A# + [ I − A T A / Tr ( A T A)] y ∂λ dλ ∂logp(x, λ ) pick y to maximize in which A = ∂x stability of filter y = arbitrary d - vector A # = A T / (AA T ) for AA T > 0, and A # = 0 otherwise. 119

Pick y to maximize stability of filter: dx ∂ log p ( x, λ ) + [ I − A T A / Tr ( A T A)] y = − A# dλ ∂λ For example, linearize about each particle dx ≈ Bx + Dy with y = Kx dλ Φ filter ≈ Φ plant Φ Bayes where Φ Bayes ≈ exp(B + DK)

Most general solution for flow

pick K to minimize the following stability measure

Schur’s inequality

T T ( ) ( Φ ≤ Tr Φ Φ Φ λ ∑ j =1 j filter plant plant Bayes Φ Bayes ) d

2

optimal K ≈ −( I + B ) and thus optimal y ≈ - x - Bx but we can " delineariz e" Bx, resulting in y ≈ - x - {Bx} delineariz ed ∂ log p ( x, λ ) Hence, optimal y ≈ - x +A but DA # = 0, and thus ∂λ ∂ log p ( x, λ ) dx ≈ − A# − Dx Optimal y results in ∂λ dλ #

120

Fracational error in the estimated sum of magnitude squared of eigen values 3

2

● ● ● ● ● ●

errors in Schur’s inequality

imaginary part of the eigenvalues

1

100% 200% 300% 400% 500% 600%

0

-1

-2

-3 -4

random 10x10 real non-singular matrices -3

-2

-1

0 Real part of the eigenvaues

1

2

3

4

121

Exactly the same feedback is derived using standard control theory: y = -D T W -1 x in which the controllability Grammian is : 1

W = ∫ exp(− sB ) DD T exp(− sB T )ds 0

where B is the linearization of the particle flow : dx = − A # log(h) + ( I − AT A / AAT ) y dλ dx ≈ Bx + Dy dλ Using the facts that D = D 2 and D kills A # and a little algebra : y = -Dx

122

Particle filter accuracy depends on the plant stability & mixing 5

10

λ = 1.2

λ = eigenvalue of plant

1.1 1.0 0.5 0.1

4

Dimensionless Error

10

3

10

2

10

1

10

0

10

0

5

10

15 Time

20

25

d = 6, ny = 3, N = 500, #(MC) = 10, without -Dx

30

123

New theory (general flow) improves filter accuracy dramatically 5

10

λ = 1.2

4

1.1 1.0 0.5 0.1

λ = eigenvalue of plant

Dimensionless Error

10

3

10

2

10

1

10

0

10

0

5

10

15 Time

20

25

d = 6, ny = 3, N = 500, #(MC) = 10, with -Dx

30

124

item

renormalization group flow in quantum field theory

particle flow for Bayes’ rule

1. purpose

avoid infinite integrals at all fix particle degeneracy in energy scales (µ) particle filters

2. PDE

linear first order PDE

3. method

“the trick of doing an homotopy of log-density integral a little bit at a time” (Tony Zee QFTNS p. 346)

4. efficacy

“the most important conceptual advance in QFT over the last 3 or 4 decades” (Tony Zee QFTNS p. 337)

5. algorithm

ODE for motion of particles f = dx/dλ in N-dimensional space x = particle in d-dimensions (Tony Zee QFTNS p. 340)

6. derivation of PDE

dH(x)/dµ = 0 H(x) = Hamiltonian

7. new idea for particle flow dH(x)/dµ = 0 inspired by RNGF (we want H to be scale invariant)

linear first order PDE

reduces computational complexity by many orders of magnitude for high dimensional problems

Fokker-Planck equation & definition of log p ∂f/∂g = 0 g = prior density

125

two steps in renormalization group flow & particle flow step

physics

nonlinear filters

1. regularization

regularization (e.g., cut-off of integral or dimensional regularization d-ε)

homotopy of log-density

2. renormalization

modify effective charge & mass as the energy scale varies from high to low (integrate out degrees of freedom to maintain symmetries & finite number of parameters); scale invariant flow of parameters:

compute flow of particles that is invariant to errors in the prior density & the normalization constant:

∂f/∂g = 0

dH/dµ = 0 126

exact recursive filters* filter

conditional density

Kalman (1960)

η = Gaussian

Beneš (1983)

Daum (1986)

special condition on dynamics dx/dt = f + w ∂f/∂x = A(t) f(x) = ∂V/∂x and

η exp(∫f(x)dx)

div(f) + ║f║² = x*Ax +bx+c exponential family p ( x Z ) = p ( x) exp[θ ( x)Ψ ( Z )]

∂θ ∂θ = Qr ∂t ∂x

[

r =

T

∂ log p ( x ) ∂x

− f

ξ

1 ξ − Aθ 2  ∂ 2θ j  = Tr  Q 2  ∂ x  

]+ j

*Daum, IEEE AES Systems Magazine, August 2005.

127

a miracle*

p( x, t Z t ) = p( x, t Ψt ( Z t )) dimension of Z grows with time

dimension of Ψ is fixed and finite for all time

*Daum, IEEE AES Systems Magazine, August 2005.

128

Monge-Ampere highly nonlinear PDE y = T ( x) p ( x)dx = p ( y )dy  ∂y  p ( x) = p ( y ) det    ∂x  ∂V Let T ( x) = ∂x Hence,

one shot transport requires nonlinear PDE (and we cannot evaluate the functions at good points!), whereas particle flow only needs an extremely simple linear PDE

 ∂ 2V  p ( x) = p ( y ) det  2   ∂x  129

computing the Hessian of log p: log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) can compute λ = + Hessians using ∂x 2 ∂x 2 ∂x 2 calculus or 2nd 2 2 ∂ log p ∂ log h( x) −1 differences λ ≈ − C + ∂x 2 ∂x 2 C = sample covariance matrix of particles for prior (λ = 0) with Tychonov regularization; or EKF or UKF covariance matrix ∂ 2 log p −1 ≈ − P ∂x 2 P = sample covariance matrix of particles for p(x, λ ) with Tychonov regularization; or EKF or UKF covariance matrix 130

formula that avoids inverse of sample covariance matrix: −1

 ∂ log p   ∂ log h  dx = −    2 dλ  ∂x   ∂x  ∂ 2 log p ∂ 2 log g ( x) ∂ 2 log h( x) = +λ 2 2 ∂x ∂x ∂x 2 ∂ 2 log p ∂ 2 log h( x) −1 ≈ −C + λ for g(x) ≈ Gaussian 2 2 ∂x ∂x but Woodbury's matrix inversion lemma gives us : 2

(A + B)-1 = A−1 − A−1 B ( I + A−1 B ) −1 A−1 for arbitrary B and non - singular A hence −1

 ∂ log p  −1 ( ) ≈ − C − CB I − CB C   2  ∂x  in which 2

∂ 2 log h( x) B=λ ∂x 2

formula that avoids computing Hessian of g(x): T

 ∂ log g ( x j )  ∂ log g ( x j ) ∂ log g ( x) 1  ≈ − ∑  2 ∂x ∂x ∂x N j =1   derivation of the above : 2

N

 ∂ log g ( x) T ∂ log g ( x)   ∂ 2 log g ( x)  E    = − E  2 ∂x ∂x ∂x       ∂ 2 log g ( x)  ∂ 2 log g ( x) ≈ E  2 2 ∂x ∂ x    ∂ log g ( x) T ∂ log g ( x)  1 E   ≈ ∂x ∂x    N

T

 ∂ log g ( x j )  ∂ log g ( x j )   ∑ ∂x ∂x j =1   N

132

method to solve PDE

how to pick unique solution

comments

1. generalized inverse of linear differential operator

minimum L² norm

Coulomb’s law or fast Poisson solver

2. Poisson’s equation

irrotational flow

Coulomb’s law or fast Poisson solver

3. generalized inverse of gradient of log-homotopy

incompressible flow

workhorse for multimodal densities

4. stabilized version of method #3

most robustly stable filter

workhorse for multimodal densities

5. separation of variables (Gaussian)

pick solution of specific form

extremely fast & hard to beat in accuracy

6. separation of variables (exponential family)

pick solution of specific form

generalization of Gaussian flow

7. variational formulation (Gauss & Hertz)

convex function minimization

generalization of minimum L² norm

8. optimal transport formulation (Monge-Kantorovich)

convex functional minimization (e.g., least action or Wasserstein metric, etc.)

very high computational complexity (e.g. Monge-Ampere fully nonlinear PDE)

9. direct integration (of first order linear PDE in divergence form)

choice of d-1 arbitrary functions

should work with enforcement of neutral charge density & importance sampling

10. method of characteristics (or generalized method of characteristics)

more conditions (e.g., small curvature or specify curl, or use Lorentz invariance)

can solve any first order linear PDE except for the one of interest to us!

11. another homotopy of the PDE (inspired by Gromov’s h-principle)

initial condition of ODE & uniqueness of solution to ODE

like Feynman’s perturbation for QED

12. finite dimensional parametric flow (e.g., f = Ax+b with A & b parameters)

non-singular matrix to invert

avoids PDE completely

13. Fourier transform of PDE (divergence form of linear PDE has constant coefficients!)

minimum L² norm or most stable flow

generalized inverse & Monte Carlo integration avoids inverse Fourier transform at random points in d dimensions

14. small “curvature” flow

set certain 2nd derivatives of flow to zero

solve d x d system of linear equations or numerically integrate ODE (like Feynman)

15. zero “curvature” flow

set acceleration of particles to zero

solve vector Riccati equation exactly in closed form (rather than solve PDE)!

16. constant “curvature” flow etc. etc.

set acceleration of particle to constant

solve polynomial multivariate equations (rather than PDE); maybe use homotopy

17. upper triangular Jacobian flow

set certain lower triangular terms in Jacobian to zero (but not all terms to zero)

inspired by Knothe-Rosenblatt rearrangement in transport theory

18. non-zero process noise in flow for Bayes’ rule, with clever choice of f & Q to avoid PDE

compute gradient of PDE to obtain d equations in d unknowns

Q = covariance matrix of diffusion in flow: dx = f(x, λ)dλ + √Q dw

derivation of PDE for particle flow with Q ≠ 0: dx dw = f ( x, λ ) + Q( x, λ ) dλ dλ 1  ∂p( x, λ ) ∂p  = −div( pf ) + divQ( x, λ )  2  ∂λ ∂x  ∂ log p( x, λ ) 1  ∂p  p( x, λ ) = −div( pf ) + div Q  ∂λ 2  ∂x  log p( x, λ ) = log g ( x) + λ log h( x) − log K (λ ) 1  ∂p  d log K (λ )   log ( ) ( , ) ( ) h x − p x = − div pf + divQ  λ   dλ 2  ∂x   1  ∂p  ∂p d log K   log h − dλ  p = − pdiv( f ) − ∂x f + 2 divQ ∂x  1 d log K  ∂ log p   ∂p  log − ( ) h = − div f − f + div Q     2 p  ∂x  dλ  ∂x  134

derivation of first new particle flow with Q ≠ 0: ∂ log p ∂p  d log K  1   log h − dλ  = −div( f ) − ∂x f + 2 p divQ( x) ∂x  2 ∂ log h ∂div( f ) ∂ log p ∂f 1 ∂   ∂p   T ∂ log p =−f − − + divQ( x)  / p  2 ∂x ∂x ∂x ∂x ∂x 2 ∂x   ∂x  

pick Q such that the three last terms sum to zero, and solve for f: −1

 ∂ log p   ∂ log h  f = −    2  ∂x   ∂x  2

T

135

why does the new flow work so well? item

new flow

old flows

1. solution for flow

d x d matrix inverse

Moore-Penrose inverse of 1 equation in d unknowns

2. normalization of probability density

we killed the normalization

explicitly computed

smoother (2nd derivatives wrt x)

less smooth (only first derivatives wrt x)

4. exploits calculus to compute Hessian & gradient of likelihood

yes

no

5. exploits greater freedom with non-zero diffusion in flow

yes

no

6. depends on Monte Carlo approximation

no

yes

more

less

3. exploits smoothness of density functions

7. generality

136

small curvature flow:

Gaussian flow : f = A(λ )x + b(λ ) div(f) = Tr(A)

incompressible flow : div(f) = 0

∂div( f ) =0 ∂x 137

linear first order highly underdetermined PDE:

dx = f ( x, λ ) dλ

(1) we want a stable flow (2) we want a full rank flow (3) we want a fast algorithm to approximate f accurately (roughly 1%)

div ( pf ) = η let q = pf (p = known & f = unknown) div ( q ( x , λ )) = η d log K (λ )   η ( x, λ ) = − p ( x, λ ) log h ( x ) −  dλ  ∂q d ∂q1 ∂q 2 + + ... + η= like Gauss’ divergence ∂x1 ∂x2 ∂x d

law in electromagnetics with η analogous to electric charge density 138

irrotational particle flow: T

dx  ∂V ( x, λ )  = f ( x, λ ) =  / p( x, λ )  dλ  ∂x   ∂ 2V ( x, λ )  Tr   = η ( x, λ ) 2  ∂x  V ( x, λ ) = − ∫η ( y, λ )

c x− y

d −2

Poisson’s equation

dy for d ≥ 3

∂logK(λ )  c  V(x, λ ) = ∫ p(y,λ ) logh(y)∂λ  x − y 

d −2

dy

∂V ( x, λ ) ∂ log K (λ )  c(2 − d )( x − y )T  = ∫ p( y, λ ) log h( y ) − dy d  ∂x ∂λ   x− y  ∂V ( x, λ ) ∂ log K (λ ) c(2 − d )( x − y)T  = E (log h( y ) − )  d λ ∂x ∂ x− y    c(2 − d )( xi − x j )T  ∂V ( xi , λ ) 1 λ ∂ K log ( ) (log h( x j ) −  ≈ ) ∑ d  ∂x ∂λ M j∈Si  xi − x j  

like Coulomb’s law

139

derivation of Fourier transform particle flow: d log K (λ ) div( pf ) = − p[log h − ] dλ take the Fourier transform :   dlogK(λ )   iω T ℑ(pf) = -ℑp logh   λ d    iω T ∫ p( x, λ ) f ( x, λ ) exp(−iω T x)dx = − ∫ p( x, λ )[log h( x) − E (log h)] exp(−iω T x)dx approximate the integrals using the Monte Carlo sum over particles : 1 N  1 N T iω  ∑ f ( x j , λ ) exp(−iω x j ) ≈ − ∑ log h( x j ) − E (log h) exp(−iω T x j ) N j =1  N j =1  evaluate the above at k points in ω (e.g., k = d or 2d) and write this as a linear operator on the unknown function f :

[

T

L(ω )f = y (ω ) f(x) = L# y in which L# = generalized inverse of L

( )

in which L# = LT LLT

−1

]

Lf = y written out explicitly:

(

)

ω

(

)

ω1T cos(ω1T x 2 )

(

)

ω 2T sin (ω 2T x 2 )

(

)

ω 2T cos(ω 2T x 2 )

ω1T sin ω1T x1   ω1T cos ω1T x1   ω T sin ω T x 2 1  2  ω T cos ω T x 2 1  2   M    M    T T ω k sin ω k x1   T ω k cos ω kT x1

T 1

sin (ω x ) T 1

2

M M

(

)

ω kT sin (ω kT x 2 )

(

)

ω1T cos(ω kT x 2 )

T 1

L ω

sin (ω

T 1

xN

)

  T T L ω1 cos ω1 x N     L L  f (x1 )          f (x2 ) L L         M  M M          M M  f ( x N ) dN ×1    M ω kT sin ω kT x N    T T M ω k cos ω k x N  2 k ×dN

(

)

(

)

(

)

[

] (

[

] (

[

] (

[

] (

[

] (

[

] (

− ∑ log h(x j ) − E (log h ) cos ω1T x j  j   ∑ log h(x j ) − E (log h ) sin ω1T x j  j  − log h(x j ) − E (log h ) cos ω 2T x j ∑  j   log h(x j ) − E (log h ) sin ω 2T x j ∑  j =  M    M    T − ∑ log h(x j ) − E (log h ) cos ω k x j  j   ∑ log h(x j ) − E (log h ) sin ω kT x j  j

)

)

)

)

                       2 k ×1

)

)

optimization of points in k-space for Fourier transform time = 1, iteration 5, blue = best, red = worst 0.2

0.15

0.1

0.05

0

-0.05

-0.1

-0.15

-0.2 -0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

most general solution for incompressible flow: dx =? dλ by the chain rule : d log p ( x, λ ) ∂ log p ( x, λ ) dx ∂ log p ( x, λ ) = + =0 dλ ∂x dλ ∂λ dx = − A # log h ( x ) + [ I − A # A] y dλ ∂logp(x, λ ) pick y to maximize in which A = ∂x robustness of filter y = arbitrary d - vector A # = AT / (AA T ) for AA T > 0, and A # = 0 otherwise. 143

Dimensionless Error

10

Ν = 1000

3

EKF Incompressible Ax+b No Flow Zero Curvature

10

2

10

1

10

0

10

-1

0

5

10

15 Time

σ0 = 100

20

25

30

10

σ 0 = 10

1

Dimensionless Error

EKF Incompressible Ax+b No Flow Zero Curvature

10

0

10

-1

0

5

10

15 Time

∆λ0 = 1e-3

20

25

30

Dimensionless Error

10

σ 0 = 100

3

EKF Incompressible Ax+b No Flow Zero Curvature

10

2

10

1

10

0

10

-1

0

5

10

15 Time

∆λ0 = 1e-5

20

25

30

Dimensionless Error

10

σ 0 = 1000

5

EKF Incompressible Ax+b No Flow Zero Curvature

10

4

10

3

10

2

10

1

10

0

10

-1

0

5

10

15 Time

∆λ0 = 1e-7

20

25

30

Dimensionless Error

10

σ 0 = 1e4

8

EKF Incompressible Ax+b No Flow Zero Curvature

10

6

10

4

10

2

10

0

10

-2

0

5

10

15 Time

∆λ0 = 1e-9

20

25

30

Dimensionless Error

10

σ 0 = 1e5

10

10

8

10

6

10

4

10

2

10

0

10

-2

EKF Incompressible Ax+b No Flow Zero Curvature

0

5

10

15 Time

20

∆λ0 = 1e-11

25

30

Dimensionless Error

10

σ 0 = 1e6

12

10

10

10

8

10

6

10

4

10

2

10

0

10

-2

0

EKF Incompressible Ax+b No Flow Zero Curvature

5

10

15 Time

20

∆λ0 = 1e-13

25

30

Dimensionless Error

10

σ 0 = 1e7

15

10

10

10

5

10

0

10

-5

0

EKF Incompressible Ax+b No Flow Zero Curvature

5

10

15 Time

20

∆λ0 = 1e-15

25

30

Fisher information matrix

 ∂ log p  J = −E   2  ∂x  2

152

zero curvature flow: we want to solve the following PDE for the flow f : ∂ log p d log K div( f ) + f = − log h + ∂x dλ assume that the flow has zero curvature : d 2x =0 2 dλ dx dλ df hence =0 dλ using this condition (after several pages of calculations) results in : but f =

2 2   log p ∂ log h d log K ∂ fT f + 2 f =  2 2 x x d λ ∂ ∂  

153

zero curvature flow: 2 2   log p log h d log K ∂ ∂ T f  f =  f +2 2 2 ∂ x x d λ ∂  

(1) vector Riccati equation for f rather than a PDE for f! (2) highly underdetermined algebraic equation for f (3) we can solve for f exactly in closed form! (4) Hessian of logp is similar to the Fisher information matrix (we can exploit this to solve for f exactly in closed form) (5) for nonlinear measurements with Gaussian noise, it is easy & fun to solve for f explicitly! 154

exploit non-singular symmetric pre-Fisher information matrix: 2 2   ∂ log p ∂ log h d log K T f  f =  f +2 2 2 ∂ x x d λ ∂  

we can always write the above in the following canonical form : ~2 ∂ log h d 2 log K −1 ~ f −2 H f =− ∂x dλ2 in which : ∂ 2 log p H=− ∂x 2 ~ f = Hf 155

solution of general vector Riccati equation: 2 ~ f −b = 0

this is a single scalar - valued equation in d unknowns, ~ but it nevertheless has obvious the unique solution : f = b ~ ~ ( f − b)T ( f − b) = 0 ~2 ~ 2 f − 2bT f + b = 0 ~ obviously this equation also has the unique solution : f = b Encouraged by the above simple example, now consider our equation : ~2 T~ f +b f +c = 0 ~ let f = kb in which k is a scalar 2

2

k2 b + k b +c = 0 2

k 2 + k + c / b = 0 which has the solution : k =

− 1 ± 1 − 4c / b 2

2

156

geometrical interpretation of solution: ~2 T~ f +b f +c = 0 ~ let f = kb in which k is a scalar 2

2

k2 b + k b +c = 0 2

k 2 + k + c / b = 0 which has the solution : k =

− 1 ± 1 − 4c / b

2

2

b

157

158

solution of our vector Riccati equation: ~ f = Hf ~ f = H f ~ f = kb −1

 ∂ log h  b = −2 H −1    ∂x  − 1 ± 1 − 4c / b

T

2

d 2 log K k= and c = 2 dλ2 T   −1 −1  ∂ log h  f = H  − 2k H     ∂x     ∂ log h  f = −2kH    ∂x 

T

−1

−1

 ∂ log p   ∂ log h  f = 2k     2  ∂x   ∂x  2

T

159

fast Ewald’s method vs. Coulomb’s law item

fast Ewald method in physics & chemistry*

1. dimension of x

d=3

Coulomb’s law with fast approximate k-NN d = 3 to 30

comments

rapid decay of Coulomb kernel in higher d helps!

2. relative error desired

0.0001 or better

1% to 10%

all Ewald methods the same for 1% accuracy

3. cut-off in x space

fixed distance

random per k-NN

automatic space-taper to weight convolution

4. desired force

on mesh

at particles

big difference!

5. neutral charge

locally enforced

locally enforced

crucial!!!

6. smoothing charge

Gaussian

no explicit smoothing

Debye kernel

7. k-space or realspace

both

real space (x)

no FFT needed for Coulomb

*Shan, Klepeis, Eastwood, Dror & Shaw, “Gaussian split Ewald: a fast Ewald mesh method for molecular simulation,” Journal of Chemical Physics, 2005.

160

item 1. purpose

particle flow fix particle degeneracy due to Bayes’ rule

Monge-Kantorovich transport move physical objects with minimal effort from one probability density to another

2. conservation of probability mass along flow

yes

yes

3. deterministic

yes*

yes

4. homotopy of density

no

yes

5. log-homotopy of density

yes

no

6. optimality criteria

none

7. how to pick a solution 8. stability of flow explicitly considered 9. high dimensional applications 10. computational complexity

24 distinct methods

minimize convex functional

yes

rarely

yes (d ≤ 42)

no (d = 1, 2 or 3)

numerical integration of ODE for each particle

11. solution of PDE for nice special incompressible, irrotational, cases Gaussian, geodesic, etc. 12. math theory for existence of incompressible flow, etc.

Wasserstein metric (dirt mover’s metric) or minimum action, etc.

Poisson’s PDE or HJB PDE or Monge-Ampere PDE etc. Moser (1965 & 1990), Brenier (1991), Knothe-Rosenblatt (1952)

borrow Shnirelman’s theorem, and Shnirelman’s theorem 161 Moser & Dacorogna (1990) for d ≥ 3, Moser & Dacorogna