Lecture 3: Linear FIR Adaptive Filtering

equations, each showing one facet (or one interpretation) of the adaptation process. ..... In Lecture 2 we obtained the canonical form of the quadratic form which ...
270KB taille 21 téléchargements 271 vues
1

Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement • Consider the family of variable parameter FIR filters, computing their output according to y(n) = w0 (n)u(n) + w1 (n)u(n − 1) + . . . wM −1 (n)u(n − M + 1) =

M −1 

wk (n)u(n − k) = w(n)T u(n), n = 0, 1, 2, . . . , ∞

k=0

where parameters w(n) are allowed to change at every time step, u(t) is the input signal, d(t) is the desired signal and {u(t) and d(t)} are jointly stationary. • Given the parameters w(n) = [w0 (n), w1 (n), w2 (n), . . . , wM −1 (n)]T , find an adaptation mechanism w(n + 1) = w(n) + δw(n), or written componentwise w0 (n + 1) = w0 (n) + δw0 (n) w1 (n + 1) = w1 (n) + δw1 (n) ... wM −1 (n + 1) = wM −1 (n) + δwM −1 (n) such as the adaptation process converges to the parameters of the optimal Wiener filter, w(n + 1) → wo , no matter where the iterations are initialized (i.e. ∀ w(0)). 2

Lecture 3

2

The key of adaptation process is the existence of an error between the output of the filter y(n) and the desired signal d(n): e(n) = d(n) − y(n) = d(n) − w(n)T u(n) If the parameter vector w(n) will be used at all time instants, the performance criterion will be (see last Lecture) J(n) = Jw(n) = E[e(n)e(n)] = E[(d(n) − wT (n)u(n))(d(n) − uT (n)w(n))] = E[d2 (n)] − 2E[d(n)uT (n)]w(n) + wT (n)E[u(n)uT (n)]w(n) = σd2 − 2pT w(n) + wT (n)Rw(n) =

σd2

−2

M −1  i=0

p(−i)wi +

M −1 M −1   l=0 i=0

wl wi Ri,l

(1)

Lecture 3

Expressions of the gradient vector

3

Lecture 3

4

The gradient of the criterion J(n) with respect to the parameters w(n) is ⎡

∇w(n) J(n) =

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡

∇w(n) J(n) =

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡

=

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

∂J(n) ∂w0 (n) ∂J(n) ∂w1 (n)

. . . ∂J(n) ∂wM −1 (n)





⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

=



M −1 −2p(0) + 2 i=0 R0,i wi (n) M −1 −2p(−1) + 2 i=0 R1,i wi (n) . . . M −1 R1,M −1 wi (n) −2p(−M + 1) + 2 i=0 

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

= −2p + 2Rw(n)

M −1 −2Ed(n)u(n) + 2 i=0 Eu(n)u(n − i)wi (n) M −1 −2Ed(n)u(n − 1) + 2 i=0 Eu(n − 1)u(n − i)wi (n) . . . M −1 −2Ed(n)u(n − M + 1) + 2 i=0 Eu(n − M + 1)u(n − i)wi (n)





M −1 −2Eu(n) d(n) − i=0 u(n − i)wi (n)

M −1 −2Eu(n − 1) d(n) − i=0 u(n − i)wi (n) . . .

M −1 u(n − i)wi (n) −2Eu(n − M + 1) d(n) − i=0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(2)

Lecture 3

5 ⎡

=

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

−2E (u(n)e(n)) −2E (u(n − 1)e(n)) . . . −2E (u(n − M + 1)e(n))





⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

= −2E



u(n) ⎥ u(n − 1) ⎥⎥⎥ ⎥ ⎥ . ⎥ e(n) = −2Eu(n)e(n) ⎥ ⎥ . ⎥ ⎥ ⎥ . ⎦ u(n − M + 1)

((2) and (3) give the solution of exercise 7, page 229 [Haykin 2002]). Steepest descent algorithm w(n + 1) = w(n) − 12 µ∇w(n) J(n) = w(n) + µEu(n)e(n)

(3)

Lecture 3

SD ALGORITHM Steepest descent search algorithm for finding the Wiener FIR optimal filter

6

Lecture 3

7

Given • the autocorrelation matrix R = Eu(n)uT (n) • the cross-correlation vector p(n) = Eu(n)d(n) Initialize the algorithm with an arbitrary parameter vector w(0). Iterate for n = 0, 1, 2, 3, . . . , nmax w(n + 1) = w(n) + µ[p − Rw(n)] Stop iterations if p − Rw(n) < ε 2 Designer degrees of freedom: µ, ε, nmax

Lecture 3

8

Equivalent forms of the adaptation equations We have the following equivalent forms of adaptive equations, each showing one facet (or one interpretation) of the adaptation process. 1. Solving p = Rwo for wo using iterative schemes: w(n + 1) = w(n) + µ[p − Rw(n)] 2. or componentwise M −1 i=0 R0,i wi (n)) M −1 − i=0 R1,i wi (n))

w0 (n + 1) = w0 (n) + µ(p(0) − w1 (n + 1) = w1 (n) + µ(p(−1) ...

wM −1 (n + 1) = wM −1 (n) + µ(p(−M + 1) −

M −1 i=0 RM −1,i wi (n))

3. Error driven adaptation process (See Figure at page 6): w(n + 1) = w(n) + µ[Ee(n)u(n)] 4. or componentwise w0 (n + 1) = w0 (n) + µ(Ee(n)u0 (n)) w1 (n + 1) = w1 (n) + µ(Ee(n)u1 (n)) ... wM −1 (n + 1) = wM −1 (n) + µ(Ee(n)uM −1 (n))

Lecture 3

9

Example: Running SD algorithm to solve for the Wiener filter derived in the example of Lecture 2. Rw = p; ⎡ ⎣

ru (0) ru (1) ru (1) ru (0) ⎡ ⎣

with the solution

⎤⎡ ⎦⎣

1.1 0.5 0.5 1.1 ⎡ ⎣

w0 w1

w0 w1

⎤⎡ ⎦⎣







=⎣

w0 w1







=⎣

Ed(n)u(n) Ed(n)u(n − 1)







=⎣

0.5271 −0.4458

0.8362 −0.7854

⎤ ⎦

⎤ ⎦

⎤ ⎦

We will use the Matlab code R_u=[1.1 0.5 ; 0.5 1.1 ]; p= [0.5271 ; -0.4458] W=[-1;-1]; mu= 0.01; Wt=W; for k=1:1000 W=W +mu*(p-R_u*W); Wt=[Wt W]; end

% Define autocovariance matrix % % % % % % %

Define cross-correlation vector Initial values for the adaptation algorithm Wt will record the evolution of vector W Iterate 1000 times the adaptation step Adaptation Equation ! Quite simple! Wt records the evolution of vector W Is W the Wiener Filter?

Lecture 3

10

W_1 −W_2 Criterion Surface and Adaptation process 1

10

1

0.5 0

10 Parameter error w(n)−w_0

0

W(2)

−0.5

−1

−1.5

−1

10

−2

10

−2

−2.5 −3

10

−1

−0.5

0

0.5

1 W(1)

1.5

2

2.5

0

200

400

600 n (iteration)

800

1000

1200

Figure 1: Adaptation Step µ = 0.01. Left: Adaptation process starts from an arbitary point w(0) = [−1, −1]T – marked with a cross – and converges to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The convergence rate is exponential (note the logarithmic scale for the parametric error). In 1000 iterations the parameter error reaches 10−3 .

Lecture 3

11

W_1 −W_2 Criterion Surface and Adaptation process 2

10

1 0

10

0.5 −2

10 Parameter error w(n)−w_0

0

W(2)

−0.5

−1

−1.5

−4

10

−6

10

−8

10

−10

10

−12

10

−2 −14

10

−2.5 −16

10

−1

−0.5

0

0.5

1 W(1)

1.5

2

2.5

0

200

400

600 n (iteration)

800

1000

1200

Figure 2: Adaptation Step µ = 1. Left: Stable oscillatory adaptation process starting from an arbitary point w(0) = [−1, −1]T – marked with a cross – and convergence to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The convergence rate is exponential . In about 70 iterations the parameter error reaches 10−16 .

Lecture 3

12

W_1 −W_2 Criterion Surface and Adaptation process 300

10

1

0.5

250

10

Parameter error w(n)−w_0

0

W(2)

−0.5

−1

−1.5

−2

200

10

150

10

100

10

50

10

−2.5 0

10

−1

−0.5

0

0.5

1 W(1)

1.5

2

2.5

0

200

400

600 n (iteration)

800

1000

1200

Figure 3: Adaptation Step µ = 4. Left: Unstable oscillatory adaptation process starting from an arbitary point w(0) = [−1, −1]T – marked with a cross – and divergence from Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The divergence rate is exponential . In about 400 iterations the parameter error reaches 1030 .

Lecture 3

13

W_1 −W_2 Criterion Surface and Adaptation process 1

10

1

0.5

Parameter error w(n)−w_0

0

W(2)

−0.5

−1

−1.5

0

10

−2

−2.5 −1

10

−1

−0.5

0

0.5

1 W(1)

1.5

2

2.5

0

200

400

600 n (iteration)

800

1000

1200

Figure 4: Adaptation Step µ = 0.001 Left: Very slow adaptation process, starting from an arbitary point w(0) = [−1, −1]T – marked with a cross. It will eventually converge to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x”, point in the center of ellipses). Right The convergence rate is exponential (note the logarithmic scale for the parametric error). In 1000 iterations the parameter error reaches 0.7.

Lecture 3

14

Stability of Steepest – Descent algorithm Write the adaptation algorithm as w(n + 1) = w(n) + µ[p − Rw(n)]

(4)

But from Wiener-Hopf equation p = Rwo and therefore w(n + 1) = w(n) + µ[Rwo − Rw(n)] = w(n) + µR[wo − w(n)]

(5)

Now subtract Wiener optimal parameters,wo , from both members w(n + 1) − wo = w(n) − wo + µR[wo − w(n)] = (I − µR)[w(n) − wo ] and introducing the vector c(n) = w(n) − wo , we have c(n + 1) = (I − µR)c(n) Let λ1 , λ2 , . . . , λM be the (real and positive) eigenvalues and let q 1 , q 2 , . . . , q M be the (generally complex) eigenvectors of the matrix R, thus satisfying (6) Rq i = λi q i Then the matrix Q = [q 1 q 2 . . . q M ] can transform R to diagonal form Λ = diag(λ1 , λ2 , . . . , λM ) R = QΛQH where the superscript H means complex conjugation and transposition. c(n + 1) = (I − µR)c(n) = (I − µQΛQH )c(n)

(7)

Lecture 3

15

Left- multiplying by QH QH c(n + 1) = QH (I − µQΛQH )c(n) = (QH − µQH QΛQH )c(n) = (I − µΛ)QH c(n) We notate ν(n) the rotated and translated version of w(n) ν(n) = QH c(n) = QH (w(n) − wo ) and now we have ν(n + 1) = (I − µΛ)ν(n) where the initial value ν(0) = QH (w(0) − wo ). We can write componentwise the recursions for ν(n) νk (n + 1) = (1 − µλk )νk (n)

k = 1, 2, . . . , M

which can be solved easily to give νk (n) = (1 − µλk )n νk (0)

k = 1, 2, . . . , M

For the stability of the algorithm −1 < 1 − µλk < 1

k = 1, 2, . . . , M

or 0