1
Lecture 3: Linear FIR Adaptive Filtering Gradient based adaptation: Steepest Descent Method Adaptive filtering: Problem statement • Consider the family of variable parameter FIR filters, computing their output according to y(n) = w0 (n)u(n) + w1 (n)u(n − 1) + . . . wM −1 (n)u(n − M + 1) =
M −1
wk (n)u(n − k) = w(n)T u(n), n = 0, 1, 2, . . . , ∞
k=0
where parameters w(n) are allowed to change at every time step, u(t) is the input signal, d(t) is the desired signal and {u(t) and d(t)} are jointly stationary. • Given the parameters w(n) = [w0 (n), w1 (n), w2 (n), . . . , wM −1 (n)]T , find an adaptation mechanism w(n + 1) = w(n) + δw(n), or written componentwise w0 (n + 1) = w0 (n) + δw0 (n) w1 (n + 1) = w1 (n) + δw1 (n) ... wM −1 (n + 1) = wM −1 (n) + δwM −1 (n) such as the adaptation process converges to the parameters of the optimal Wiener filter, w(n + 1) → wo , no matter where the iterations are initialized (i.e. ∀ w(0)). 2
Lecture 3
2
The key of adaptation process is the existence of an error between the output of the filter y(n) and the desired signal d(n): e(n) = d(n) − y(n) = d(n) − w(n)T u(n) If the parameter vector w(n) will be used at all time instants, the performance criterion will be (see last Lecture) J(n) = Jw(n) = E[e(n)e(n)] = E[(d(n) − wT (n)u(n))(d(n) − uT (n)w(n))] = E[d2 (n)] − 2E[d(n)uT (n)]w(n) + wT (n)E[u(n)uT (n)]w(n) = σd2 − 2pT w(n) + wT (n)Rw(n) =
σd2
−2
M −1 i=0
p(−i)wi +
M −1 M −1 l=0 i=0
wl wi Ri,l
(1)
Lecture 3
Expressions of the gradient vector
3
Lecture 3
4
The gradient of the criterion J(n) with respect to the parameters w(n) is ⎡
∇w(n) J(n) =
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡
∇w(n) J(n) =
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡
=
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
∂J(n) ∂w0 (n) ∂J(n) ∂w1 (n)
. . . ∂J(n) ∂wM −1 (n)
⎤
⎡
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
=
M −1 −2p(0) + 2 i=0 R0,i wi (n) M −1 −2p(−1) + 2 i=0 R1,i wi (n) . . . M −1 R1,M −1 wi (n) −2p(−M + 1) + 2 i=0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
= −2p + 2Rw(n)
M −1 −2Ed(n)u(n) + 2 i=0 Eu(n)u(n − i)wi (n) M −1 −2Ed(n)u(n − 1) + 2 i=0 Eu(n − 1)u(n − i)wi (n) . . . M −1 −2Ed(n)u(n − M + 1) + 2 i=0 Eu(n − M + 1)u(n − i)wi (n)
M −1 −2Eu(n) d(n) − i=0 u(n − i)wi (n)
M −1 −2Eu(n − 1) d(n) − i=0 u(n − i)wi (n) . . .
M −1 u(n − i)wi (n) −2Eu(n − M + 1) d(n) − i=0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(2)
Lecture 3
5 ⎡
=
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
−2E (u(n)e(n)) −2E (u(n − 1)e(n)) . . . −2E (u(n − M + 1)e(n))
⎤
⎡
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
= −2E
⎤
u(n) ⎥ u(n − 1) ⎥⎥⎥ ⎥ ⎥ . ⎥ e(n) = −2Eu(n)e(n) ⎥ ⎥ . ⎥ ⎥ ⎥ . ⎦ u(n − M + 1)
((2) and (3) give the solution of exercise 7, page 229 [Haykin 2002]). Steepest descent algorithm w(n + 1) = w(n) − 12 µ∇w(n) J(n) = w(n) + µEu(n)e(n)
(3)
Lecture 3
SD ALGORITHM Steepest descent search algorithm for finding the Wiener FIR optimal filter
6
Lecture 3
7
Given • the autocorrelation matrix R = Eu(n)uT (n) • the cross-correlation vector p(n) = Eu(n)d(n) Initialize the algorithm with an arbitrary parameter vector w(0). Iterate for n = 0, 1, 2, 3, . . . , nmax w(n + 1) = w(n) + µ[p − Rw(n)] Stop iterations if p − Rw(n) < ε 2 Designer degrees of freedom: µ, ε, nmax
Lecture 3
8
Equivalent forms of the adaptation equations We have the following equivalent forms of adaptive equations, each showing one facet (or one interpretation) of the adaptation process. 1. Solving p = Rwo for wo using iterative schemes: w(n + 1) = w(n) + µ[p − Rw(n)] 2. or componentwise M −1 i=0 R0,i wi (n)) M −1 − i=0 R1,i wi (n))
w0 (n + 1) = w0 (n) + µ(p(0) − w1 (n + 1) = w1 (n) + µ(p(−1) ...
wM −1 (n + 1) = wM −1 (n) + µ(p(−M + 1) −
M −1 i=0 RM −1,i wi (n))
3. Error driven adaptation process (See Figure at page 6): w(n + 1) = w(n) + µ[Ee(n)u(n)] 4. or componentwise w0 (n + 1) = w0 (n) + µ(Ee(n)u0 (n)) w1 (n + 1) = w1 (n) + µ(Ee(n)u1 (n)) ... wM −1 (n + 1) = wM −1 (n) + µ(Ee(n)uM −1 (n))
Lecture 3
9
Example: Running SD algorithm to solve for the Wiener filter derived in the example of Lecture 2. Rw = p; ⎡ ⎣
ru (0) ru (1) ru (1) ru (0) ⎡ ⎣
with the solution
⎤⎡ ⎦⎣
1.1 0.5 0.5 1.1 ⎡ ⎣
w0 w1
w0 w1
⎤⎡ ⎦⎣
⎤
⎡
⎦
=⎣
w0 w1
⎤
⎡
⎦
=⎣
Ed(n)u(n) Ed(n)u(n − 1)
⎤
⎡
⎦
=⎣
0.5271 −0.4458
0.8362 −0.7854
⎤ ⎦
⎤ ⎦
⎤ ⎦
We will use the Matlab code R_u=[1.1 0.5 ; 0.5 1.1 ]; p= [0.5271 ; -0.4458] W=[-1;-1]; mu= 0.01; Wt=W; for k=1:1000 W=W +mu*(p-R_u*W); Wt=[Wt W]; end
% Define autocovariance matrix % % % % % % %
Define cross-correlation vector Initial values for the adaptation algorithm Wt will record the evolution of vector W Iterate 1000 times the adaptation step Adaptation Equation ! Quite simple! Wt records the evolution of vector W Is W the Wiener Filter?
Lecture 3
10
W_1 −W_2 Criterion Surface and Adaptation process 1
10
1
0.5 0
10 Parameter error w(n)−w_0
0
W(2)
−0.5
−1
−1.5
−1
10
−2
10
−2
−2.5 −3
10
−1
−0.5
0
0.5
1 W(1)
1.5
2
2.5
0
200
400
600 n (iteration)
800
1000
1200
Figure 1: Adaptation Step µ = 0.01. Left: Adaptation process starts from an arbitary point w(0) = [−1, −1]T – marked with a cross – and converges to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The convergence rate is exponential (note the logarithmic scale for the parametric error). In 1000 iterations the parameter error reaches 10−3 .
Lecture 3
11
W_1 −W_2 Criterion Surface and Adaptation process 2
10
1 0
10
0.5 −2
10 Parameter error w(n)−w_0
0
W(2)
−0.5
−1
−1.5
−4
10
−6
10
−8
10
−10
10
−12
10
−2 −14
10
−2.5 −16
10
−1
−0.5
0
0.5
1 W(1)
1.5
2
2.5
0
200
400
600 n (iteration)
800
1000
1200
Figure 2: Adaptation Step µ = 1. Left: Stable oscillatory adaptation process starting from an arbitary point w(0) = [−1, −1]T – marked with a cross – and convergence to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The convergence rate is exponential . In about 70 iterations the parameter error reaches 10−16 .
Lecture 3
12
W_1 −W_2 Criterion Surface and Adaptation process 300
10
1
0.5
250
10
Parameter error w(n)−w_0
0
W(2)
−0.5
−1
−1.5
−2
200
10
150
10
100
10
50
10
−2.5 0
10
−1
−0.5
0
0.5
1 W(1)
1.5
2
2.5
0
200
400
600 n (iteration)
800
1000
1200
Figure 3: Adaptation Step µ = 4. Left: Unstable oscillatory adaptation process starting from an arbitary point w(0) = [−1, −1]T – marked with a cross – and divergence from Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x” point, in the center of ellipses). Right The divergence rate is exponential . In about 400 iterations the parameter error reaches 1030 .
Lecture 3
13
W_1 −W_2 Criterion Surface and Adaptation process 1
10
1
0.5
Parameter error w(n)−w_0
0
W(2)
−0.5
−1
−1.5
0
10
−2
−2.5 −1
10
−1
−0.5
0
0.5
1 W(1)
1.5
2
2.5
0
200
400
600 n (iteration)
800
1000
1200
Figure 4: Adaptation Step µ = 0.001 Left: Very slow adaptation process, starting from an arbitary point w(0) = [−1, −1]T – marked with a cross. It will eventually converge to Wiener filter parameters w(0) = [0.8362 − 0.7854]T (”x”, point in the center of ellipses). Right The convergence rate is exponential (note the logarithmic scale for the parametric error). In 1000 iterations the parameter error reaches 0.7.
Lecture 3
14
Stability of Steepest – Descent algorithm Write the adaptation algorithm as w(n + 1) = w(n) + µ[p − Rw(n)]
(4)
But from Wiener-Hopf equation p = Rwo and therefore w(n + 1) = w(n) + µ[Rwo − Rw(n)] = w(n) + µR[wo − w(n)]
(5)
Now subtract Wiener optimal parameters,wo , from both members w(n + 1) − wo = w(n) − wo + µR[wo − w(n)] = (I − µR)[w(n) − wo ] and introducing the vector c(n) = w(n) − wo , we have c(n + 1) = (I − µR)c(n) Let λ1 , λ2 , . . . , λM be the (real and positive) eigenvalues and let q 1 , q 2 , . . . , q M be the (generally complex) eigenvectors of the matrix R, thus satisfying (6) Rq i = λi q i Then the matrix Q = [q 1 q 2 . . . q M ] can transform R to diagonal form Λ = diag(λ1 , λ2 , . . . , λM ) R = QΛQH where the superscript H means complex conjugation and transposition. c(n + 1) = (I − µR)c(n) = (I − µQΛQH )c(n)
(7)
Lecture 3
15
Left- multiplying by QH QH c(n + 1) = QH (I − µQΛQH )c(n) = (QH − µQH QΛQH )c(n) = (I − µΛ)QH c(n) We notate ν(n) the rotated and translated version of w(n) ν(n) = QH c(n) = QH (w(n) − wo ) and now we have ν(n + 1) = (I − µΛ)ν(n) where the initial value ν(0) = QH (w(0) − wo ). We can write componentwise the recursions for ν(n) νk (n + 1) = (1 − µλk )νk (n)
k = 1, 2, . . . , M
which can be solved easily to give νk (n) = (1 − µλk )n νk (0)
k = 1, 2, . . . , M
For the stability of the algorithm −1 < 1 − µλk < 1
k = 1, 2, . . . , M
or 0