Decomposing a Third-Order Tensor in Rank-(L,L,1) Terms by Means of Simultaneous Matrix Diagonalization
Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, Belgium E-mails:
[email protected] [email protected]
2009 SIAM Conference on Applied Linear Algebra, Session MS33 “Computational Methods for Tensors” Monterey, USA, October 26-29, 2009
Roadmap I.
Introduction Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions
II.
Block-Component Decomposition in Rank-(L,L,1) Terms Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm
III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization New algorithm, relaxed uniqueness bound
IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.
Conclusion and Future Research
Roadmap I.
Introduction Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions
II.
Block-Component Decomposition in Rank-(L,L,1) Terms Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm
III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization New algorithm, relaxed uniqueness bound
IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.
Conclusion and Future Research
Tucker/ HOSVD and PARAFAC [Tucker, 1966] / [De Lathauwer, 2000]
W K
Y
I
=
N L
U
V
H
Y = H ×1 U ×2 V ×3 W
T
M
J
PARAFAC [Harshman, 1970]
C
K I
Y
R
=
A
J
H is diagonal
BT R
H
c1 b1
= a1
( if i=j=k, hijk=1, else, hijk=0 )
R
cR + … +
bR aR
Sum of R rank-1 tensors: Y1+…+ YR
From PARAFAC/HOSVD to Block Components Decompositions (BCD) [De Lathauwer and Nion, SIMAX 2008] BCD in rank (Lr,Lr,1) terms c1
K I
=
Y
cR
B1T
L1
L1
B RT
LR LR
+…+
A1
AR
J
BCD in rank (Lr, Mr, . ) terms K I
K
K
=
Y
A1
L1
B1T
H1
+…+
M1
AR
LR
H1
B RT
MR
J
BCD in rank (Lr, Mr, Nr) terms C1 K I
N1
Y J
=
A1
L1
H1 M1
CR T 1
B
NR
+…+
AR
HR
LR
MR
B RT
Roadmap I.
Introduction Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions
II.
Block-Component Decomposition in Rank-(L,L,1) Terms Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm
III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization New algorithm, relaxed uniqueness bound
IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.
Conclusion and Future Research
The BCD(L,L,1) as a generalization of PARAFAC. c1
K I
Y
=
L
cR
B1T
L
+…+
A1
L
L
BRT
BCD-(L,L,1)
AR
J
Generalization of PARAFAC [De Lathauwer, de Baynast, 2003] BCD-(1,1,1)=PARAFAC Unknown matrices:
L
L
A = A1 ... AR
L I
L
B = B1 ... BR
J
C=
... c1
K
cR
BCD-(L,L,1) is said essentially unique if only remaining ambiguities are: Arbitrary permutation of the blocks in A and B and of the columns of C Rotational freedom of each block (block-wise subspace estimation) + scaling ambiguity on the columns of C
The BCD(L,L,1) as a constrained Tucker model. The BCD-(L ,L , 1) can be seen as a particular case of Tucker model, where the core tensor is « block-diagonal », with L by L blocks on its diagonal. c1
K I
Y
L
L
=
cR T 1
B
+…+
A1
L
BRT
L
AR
J
K R R L
=
I
L
A1 ... AR
L L
C L
L
B1 ... BR
J
BCD(L,L,1): existing results on algorithms and uniqueness Several usual algorithms used to compute PARAFAC have been adapted to the BCD(L,L,1).
Example 1: ALS algorithm (alternate between Least Squares updates of unknowns A, B and C). Example 2: ALS with Enhanced Line Search to speed up convergence. Example 3: Gauss-Newton based algorithms (Levenberg-Marquardt).
First result on essential uniqueness, in the generic sense [De Lathauwer, 2006] LR ≤ IJ and min(
I ,R)+min( J ,R)+min(K,R)≥ 2(R+1 ) L L
(1)
Starting point of this work In 2005, De Lathauwer has shown that, under certain assumptions on the dimensions, PARAFAC can be reformulated as a simultaneous diagonalization (SD) problem. This yields:
A very fast and accurate algorithm to compute PARAFAC
A new, relaxed, uniqueness bound
Is it possible to generalize these results to the BCD-(L,L,1)?
If so, does it also yield a fast algoritm and a new uniqueness bound (more relaxed than the one on previous slide)?
The answer is YES
Roadmap I.
Introduction Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions
II.
Block-Component Decomposition in Rank-(L,L,1) Terms Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm
III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization New algorithm, relaxed uniqueness bound
IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.
Conclusion and Future Research
Reformulation of DCB-(L,L,1) in terms of SD: overview (1) K I
Y
=
R
∑ r= 1
R
J L
I J
cr
K
Ar
=
BrT
∑ r= 1 I
cr
K
Xr
rank L
J
L
Assumption:
R ≤ min( IJ , K )
i.e., K has to be a sufficiently long dimension Build Y, the JI by K matrix unfolding of Y
∃ W ∈ C R× R
BCD-(L,L,1) in matrix format :
~ T Y = (vec ( X1 ) L vec ( X R ) ) ⋅ C = X ⋅ C T
(1)
SVD of Y (generically rank-R):
Y = U ⋅ Σ ⋅ VH = E ⋅ VH
( 2)
~ X = E⋅W CT = W −1 ⋅ V H
Goal: Find W, i.e., find the linear combinations of the columns of E that yield vectorized rank-L matrices.
Reformulation of DCB-(L,L,1) in terms of SD: overview (2) Note 1: Once W found, the unknown matrices A, B, C of the BCD-(L,L,1) follow
~ X = E⋅W ~ X = (vec ( X1 ) L vec ( X R ) )
CT = W −1 ⋅ V H
= (vec ( A1B1T ) L vec ( A R BTR ) )
Matricize and estimate A1 and B1 from best rank-L approximation.
C = V * ⋅ W −T
Matricize and estimate AR and BR from best rank-L approximation.
Note 2: For PARAFAC (i.e. L=1), we have
~ X = (vec (a1b1T ),L, vec (a R bTR ) ) = (b1 ⊗ a1 ,L, b R ⊗ a R )
= B o A where o is the Khatri - Rao product
~ X = E ⋅ W is a Khatri-Rao structure recovery problem, and can be solved by simultaneous diagonalization [De Lathauwer, 2005]
Reformulation of DCB-(L,L,1) in terms of SD: overview (3) Remark: on typical matrix factorization problems in Signal Processing Problem formulation: Given only an (MxN) rank-R observed matrix X, find the (MxR) and (RxN) matrices H and S s.t. X=HS N M
X
=
R M
H
N
S
R
But infinite number of solutions X = (HF) (F-1S) so we need extra constraints. Examples: ICA (Independent Component Analysis) find H that makes the R source signals in S as much statistically independent as possible. Blind Source Separation. FIR filter estimation H holds the impulse response of a FIR filter, and S is Toeplitz. Blind Channel Estimation in telecommunications. Source localization H is Vandermonde and holds the individual response of the M antennas to the R source signals, each signal impinging with a Direction Of Arrival (DOA). Non-negative matrix factorization Finite Alphabet projection S holds numerical symbols
Reformulation of DCB-(L,L,1) in terms of SD: overview (4) R
~ X = E⋅ W
For r = 1K R
R
=
JI vec ( X 1 ) ... vec ( X R )
I
Xr
= W1r
E1
I
J
W
JI vec ( E1 ) ... vec ( E R )
+ L + WRr
J
I
ER J
How to find the coefficients of the linear combinations of the Er that yield rank-L matrices? Tool: mapping iif
φL
for rank-L detection. Let
X r ∈ C I ×J
, then
φL ( X r , X r ,K, X r ) = 0
X r is at most rank-L.
After several algebraic manipulations, one can show that W is solution of a SD problem
Q1 = W ⋅ D1 ⋅ W
T
Q2 = W ⋅D2 ⋅W M
T
M
QR = W ⋅DR ⋅W
T
Reformulation of DCB-(2,2,1) in terms of SD Technical details Trilinear mapping φ2
for rank-2 detection:
φ2 : ( X , Y , Z ) ∈ (C I × J , C I × J , C I × J ) → φ2 ( X , Y , Z ) ∈ C I × I × I × J × J × J x i1 j1
[φ 2 ( X , Y , Z )] i1i2 i3 j1 j2 j3 = y i2 j1
x i1 j2
x i1 j3
x i1 j1
x i1 j2
x i1 j3
y i1 j1
y i1 j2
y i1 j3
y i 2 j2
y i2 j3 + z i2 j1
z i 2 j2
z i2 j3 + x i2 j1
x i 2 j2
x i2 j3
z i3 j1
z i3 j 2
z i3 j3
y i3 j1
y i3 j 2
y i3 j3
z i3 j1
z i3 j 2
z i3 j3
y i1 j1
y i1 j2
y i1 j3
z i1 j1
z i1 j2
z i1 j3
z i1 j1
z i1 j2
z i1 j3
+ z i2 j1
z i2 j2
z i2 j3 + x i2 j1
x i2 j2
x i2 j3 + y i2 j1
y i2 j2
y i 2 j3
x i3 j1
x i3 j 2
x i3 j3
y i3 j 2
y i3 j3
x i3 j 2
x i3 j3
y i3 j1
Then we have φ 2 ( X , X , X ) = 0 iif X is at most rank - 2.
x i3 j1
Reformulation of DCB-(2,2,1) in terms of SD Technical details For r = 1K R
I
Er
−1 = W1r I
J
X1 J
Build the set of R3 tensors Prst = φ2 ( Er , Es , Et )
Since φ2 is trilinear, we have : Prst =
+ L + WRr−1 I X R J
r=1,…,R, s=1,…,R, t=1,…,R
R
−1 −1 −1 ( W ) ( W ) ( W ) wt φ2 ( X u , X v , X w ) ∑ ur vs
u ,v , w =1
One can show that, if the (CR3 + 2 − R ) tensors of the set Ω are linearly independent,
Ω = {φ2 ( X u , X v , X w ),1 ≤ u ≤ v ≤ w ≤ R} - {φ2 ( X u , X u , X u ),1 ≤ u ≤ R},
then W is solution of
Q = D ×1 W ×2 W ×3 W tensor and Q is a symmetric tensor D is an arbitrary diagonal R
where satisfying
∑q
P =0
rst rst
r , s ,t
Reformulation of DCB-(2,2,1) in terms of SD: A new uniqueness bound Crucial assumption in the reformulation: « The (CR3 + 2 − R ) tensors of the set Ω are linearly independent » One can show that this is generically true if
R ≤ min( IJ , K ) and
C I3 .C 3J ≥ C R3 + 2 − R
Cnk =
n! k! ( n − k )!
The generalization to any value of L yields that the DCB-(L,L,1) is unique if
R ≤ min( IJ , K ) and
C IL + 1 .C LJ + 1 ≥ C LR ++1L − R
To be compared to the old uniqueness bound LR ≤ IJ and min(
I ,R)+min( J ,R)+min(K,R)≥ 2(R+1 ) L L
Reformulation of DCB-(2,2,1) in terms of SD Uniqueness
New bound, L={2,3,4}
Old bound, L={2,3,4}
Roadmap I.
Introduction Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions
II.
Block-Component Decomposition in Rank-(L,L,1) Terms Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm
III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization New algorithm, relaxed uniqueness bound
IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.
Conclusion and Future Research
Data model: DS-CDMA system Spatial dimension: K receiving antennas
R users transmitting at the same time
Array steering vector (response of the K antennas) ar
K K
= I
Y
J
R
∑ r =1
J
Slow time: observation during J period symbols Fast time: I=number of samples within a symbol period
L I
Hr
L SrT
Symbols of user r Toeplitz structure (convolution) Channel impulse response of user r (spans L symbol periods for each user)
Performance: comparison between ALS and SD algorithms
Conclusion Reformulation of PARAFAC in terms of Simultaneous Diagonalization (SD) yields a fast and accurate algorithm, with improved identifiability results [De Lathauwer, 2005]. The starting point for this reformulation is that one dimension is long enough: R ≤ min( IJ , K ) , where I,J and K can be interchanged. The BCD-(L,L,1), which is a generalization of PARAFAC, can also be formulated in terms of SD, which also yield a fast and accurate algorithm and improved identifiability result. The starting point for this reformulation is that the third dimension (K) is long enough R ≤ min( IJ , K ) . I,J and K can not be interchanged When the long dimension is I or J, i.e., R ≤ min( JK , I ) or R ≤ min( IK , J ) we have recently shown (CAMSAP 2009), that the BCD-(L,L,1) can be reformulated as Joint-Block-Diagonalization problem. This yields a new set of identifiability results.