Decomposing a Third-Order Tensor in Rank - Dr. Dimitri Nion, Signal

Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions. II. .... core tensor is « block-diagonal », with L by L blocks on its diagonal.
510KB taille 2 téléchargements 238 vues
Decomposing a Third-Order Tensor in Rank-(L,L,1) Terms by Means of Simultaneous Matrix Diagonalization

Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, Belgium E-mails: [email protected] [email protected]

2009 SIAM Conference on Applied Linear Algebra, Session MS33 “Computational Methods for Tensors” Monterey, USA, October 26-29, 2009

Roadmap I.

Introduction  Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions

II.

Block-Component Decomposition in Rank-(L,L,1) Terms  Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm

III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization  New algorithm, relaxed uniqueness bound

IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.

Conclusion and Future Research

Roadmap I.

Introduction  Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions

II.

Block-Component Decomposition in Rank-(L,L,1) Terms  Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm

III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization  New algorithm, relaxed uniqueness bound

IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.

Conclusion and Future Research

Tucker/ HOSVD and PARAFAC [Tucker, 1966] / [De Lathauwer, 2000]

W K

Y

I

=

N L

U

V

H

Y = H ×1 U ×2 V ×3 W

T

M

J

PARAFAC [Harshman, 1970]

C

K I

Y

R

=

A

J

H is diagonal

BT R

H

c1 b1

= a1

( if i=j=k, hijk=1, else, hijk=0 )

R

cR + … +

bR aR

Sum of R rank-1 tensors: Y1+…+ YR

From PARAFAC/HOSVD to Block Components Decompositions (BCD) [De Lathauwer and Nion, SIMAX 2008] BCD in rank (Lr,Lr,1) terms c1

K I

=

Y

cR

B1T

L1

L1

B RT

LR LR

+…+

A1

AR

J

BCD in rank (Lr, Mr, . ) terms K I

K

K

=

Y

A1

L1

B1T

H1

+…+

M1

AR

LR

H1

B RT

MR

J

BCD in rank (Lr, Mr, Nr) terms C1 K I

N1

Y J

=

A1

L1

H1 M1

CR T 1

B

NR

+…+

AR

HR

LR

MR

B RT

Roadmap I.

Introduction  Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions

II.

Block-Component Decomposition in Rank-(L,L,1) Terms  Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm

III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization  New algorithm, relaxed uniqueness bound

IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.

Conclusion and Future Research

The BCD(L,L,1) as a generalization of PARAFAC. c1

K I

Y

=

L

cR

B1T

L

+…+

A1

L

L

BRT

BCD-(L,L,1)

AR

J

 Generalization of PARAFAC [De Lathauwer, de Baynast, 2003] BCD-(1,1,1)=PARAFAC  Unknown matrices:

L

L

A = A1 ... AR

L I

L

B = B1 ... BR

J

C=

... c1

K

cR

 BCD-(L,L,1) is said essentially unique if only remaining ambiguities are:  Arbitrary permutation of the blocks in A and B and of the columns of C  Rotational freedom of each block (block-wise subspace estimation) + scaling ambiguity on the columns of C

The BCD(L,L,1) as a constrained Tucker model. The BCD-(L ,L , 1) can be seen as a particular case of Tucker model, where the core tensor is « block-diagonal », with L by L blocks on its diagonal. c1

K I

Y

L

L

=

cR T 1

B

+…+

A1

L

BRT

L

AR

J

K R R L

=

I

L

A1 ... AR

L L

C L

L

B1 ... BR

J

BCD(L,L,1): existing results on algorithms and uniqueness Several usual algorithms used to compute PARAFAC have been adapted to the BCD(L,L,1).



Example 1: ALS algorithm (alternate between Least Squares updates of unknowns A, B and C). Example 2: ALS with Enhanced Line Search to speed up convergence. Example 3: Gauss-Newton based algorithms (Levenberg-Marquardt).



First result on essential uniqueness, in the generic sense [De Lathauwer, 2006] LR ≤ IJ and min(

 I ,R)+min(  J ,R)+min(K,R)≥ 2(R+1 )  L   L 

(1)

Starting point of this work In 2005, De Lathauwer has shown that, under certain assumptions on the dimensions, PARAFAC can be reformulated as a simultaneous diagonalization (SD) problem. This yields:







A very fast and accurate algorithm to compute PARAFAC



A new, relaxed, uniqueness bound

Is it possible to generalize these results to the BCD-(L,L,1)?

If so, does it also yield a fast algoritm and a new uniqueness bound (more relaxed than the one on previous slide)?





The answer is YES

Roadmap I.

Introduction  Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions

II.

Block-Component Decomposition in Rank-(L,L,1) Terms  Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm

III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization  New algorithm, relaxed uniqueness bound

IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.

Conclusion and Future Research

Reformulation of DCB-(L,L,1) in terms of SD: overview (1) K I

Y

=

R

∑ r= 1

R

J L

I J

cr

K

Ar

=

BrT

∑ r= 1 I

cr

K

Xr

rank L

J

L

Assumption:

R ≤ min( IJ , K )

i.e., K has to be a sufficiently long dimension Build Y, the JI by K matrix unfolding of Y

∃ W ∈ C R× R

BCD-(L,L,1) in matrix format :

~ T Y = (vec ( X1 ) L vec ( X R ) ) ⋅ C = X ⋅ C T

(1)

SVD of Y (generically rank-R):

Y = U ⋅ Σ ⋅ VH = E ⋅ VH

( 2)

~ X = E⋅W CT = W −1 ⋅ V H

Goal: Find W, i.e., find the linear combinations of the columns of E that yield vectorized rank-L matrices.

Reformulation of DCB-(L,L,1) in terms of SD: overview (2) Note 1: Once W found, the unknown matrices A, B, C of the BCD-(L,L,1) follow

~ X = E⋅W ~ X = (vec ( X1 ) L vec ( X R ) )

CT = W −1 ⋅ V H

= (vec ( A1B1T ) L vec ( A R BTR ) )

Matricize and estimate A1 and B1 from best rank-L approximation.

C = V * ⋅ W −T

Matricize and estimate AR and BR from best rank-L approximation.

Note 2: For PARAFAC (i.e. L=1), we have

~ X = (vec (a1b1T ),L, vec (a R bTR ) ) = (b1 ⊗ a1 ,L, b R ⊗ a R )

= B o A where o is the Khatri - Rao product

~ X = E ⋅ W is a Khatri-Rao structure recovery problem, and can be solved by simultaneous diagonalization [De Lathauwer, 2005]

Reformulation of DCB-(L,L,1) in terms of SD: overview (3) Remark: on typical matrix factorization problems in Signal Processing Problem formulation: Given only an (MxN) rank-R observed matrix X, find the (MxR) and (RxN) matrices H and S s.t. X=HS N M

X

=

R M

H

N

S

R

But infinite number of solutions X = (HF) (F-1S) so we need extra constraints. Examples:  ICA (Independent Component Analysis)  find H that makes the R source signals in S as much statistically independent as possible. Blind Source Separation.  FIR filter estimation  H holds the impulse response of a FIR filter, and S is Toeplitz. Blind Channel Estimation in telecommunications.  Source localization  H is Vandermonde and holds the individual response of the M antennas to the R source signals, each signal impinging with a Direction Of Arrival (DOA).  Non-negative matrix factorization  Finite Alphabet projection  S holds numerical symbols

Reformulation of DCB-(L,L,1) in terms of SD: overview (4) R

~ X = E⋅ W

For r = 1K R

R

=

JI vec ( X 1 ) ... vec ( X R )

I

Xr

= W1r

E1

I

J

W

JI vec ( E1 ) ... vec ( E R )

+ L + WRr

J

I

ER J

How to find the coefficients of the linear combinations of the Er that yield rank-L matrices? Tool: mapping iif

φL

for rank-L detection. Let

X r ∈ C I ×J

, then

φL ( X r , X r ,K, X r ) = 0

X r is at most rank-L.

After several algebraic manipulations, one can show that W is solution of a SD problem

Q1 = W ⋅ D1 ⋅ W

T

Q2 = W ⋅D2 ⋅W M

T

M

QR = W ⋅DR ⋅W

T

Reformulation of DCB-(2,2,1) in terms of SD Technical details Trilinear mapping φ2

for rank-2 detection:

φ2 : ( X , Y , Z ) ∈ (C I × J , C I × J , C I × J ) → φ2 ( X , Y , Z ) ∈ C I × I × I × J × J × J x i1 j1

[φ 2 ( X , Y , Z )] i1i2 i3 j1 j2 j3 = y i2 j1

x i1 j2

x i1 j3

x i1 j1

x i1 j2

x i1 j3

y i1 j1

y i1 j2

y i1 j3

y i 2 j2

y i2 j3 + z i2 j1

z i 2 j2

z i2 j3 + x i2 j1

x i 2 j2

x i2 j3

z i3 j1

z i3 j 2

z i3 j3

y i3 j1

y i3 j 2

y i3 j3

z i3 j1

z i3 j 2

z i3 j3

y i1 j1

y i1 j2

y i1 j3

z i1 j1

z i1 j2

z i1 j3

z i1 j1

z i1 j2

z i1 j3

+ z i2 j1

z i2 j2

z i2 j3 + x i2 j1

x i2 j2

x i2 j3 + y i2 j1

y i2 j2

y i 2 j3

x i3 j1

x i3 j 2

x i3 j3

y i3 j 2

y i3 j3

x i3 j 2

x i3 j3

y i3 j1

Then we have φ 2 ( X , X , X ) = 0 iif X is at most rank - 2.

x i3 j1

Reformulation of DCB-(2,2,1) in terms of SD Technical details For r = 1K R

I

Er

−1 = W1r I

J

X1 J

Build the set of R3 tensors Prst = φ2 ( Er , Es , Et )

Since φ2 is trilinear, we have : Prst =

+ L + WRr−1 I X R J

r=1,…,R, s=1,…,R, t=1,…,R

R

−1 −1 −1 ( W ) ( W ) ( W ) wt φ2 ( X u , X v , X w ) ∑ ur vs

u ,v , w =1

One can show that, if the (CR3 + 2 − R ) tensors of the set Ω are linearly independent,

Ω = {φ2 ( X u , X v , X w ),1 ≤ u ≤ v ≤ w ≤ R} - {φ2 ( X u , X u , X u ),1 ≤ u ≤ R},

then W is solution of

Q = D ×1 W ×2 W ×3 W tensor and Q is a symmetric tensor D is an arbitrary diagonal R

where satisfying

∑q

P =0

rst rst

r , s ,t

Reformulation of DCB-(2,2,1) in terms of SD: A new uniqueness bound  Crucial assumption in the reformulation: « The (CR3 + 2 − R ) tensors of the set Ω are linearly independent »  One can show that this is generically true if

R ≤ min( IJ , K ) and

C I3 .C 3J ≥ C R3 + 2 − R

Cnk =

n! k! ( n − k )!

 The generalization to any value of L yields that the DCB-(L,L,1) is unique if

R ≤ min( IJ , K ) and

C IL + 1 .C LJ + 1 ≥ C LR ++1L − R

 To be compared to the old uniqueness bound LR ≤ IJ and min(

 I ,R)+min(  J ,R)+min(K,R)≥ 2(R+1 )  L   L 

Reformulation of DCB-(2,2,1) in terms of SD Uniqueness

New bound, L={2,3,4}

Old bound, L={2,3,4}

Roadmap I.

Introduction  Tensor decompositions: PARAFAC, Tucker, Block-Component Decompositions

II.

Block-Component Decomposition in Rank-(L,L,1) Terms  Definition of the BCD-(L,L,1), Uniqueness bound, ALS Algorithm

III. Reformulation of BCD-(L,L,1) in terms of simultaneous matrix diagonalization  New algorithm, relaxed uniqueness bound

IV. An application of the BCD-(L,L,1): blind source separation in telecommunications V.

Conclusion and Future Research

Data model: DS-CDMA system Spatial dimension: K receiving antennas

R users transmitting at the same time

Array steering vector (response of the K antennas) ar

K K

= I

Y

J

R

∑ r =1

J

Slow time: observation during J period symbols Fast time: I=number of samples within a symbol period

L I

Hr

L SrT

Symbols of user r Toeplitz structure (convolution) Channel impulse response of user r (spans L symbol periods for each user)

Performance: comparison between ALS and SD algorithms

Conclusion  Reformulation of PARAFAC in terms of Simultaneous Diagonalization (SD) yields a fast and accurate algorithm, with improved identifiability results [De Lathauwer, 2005]. The starting point for this reformulation is that one dimension is long enough: R ≤ min( IJ , K ) , where I,J and K can be interchanged.  The BCD-(L,L,1), which is a generalization of PARAFAC, can also be formulated in terms of SD, which also yield a fast and accurate algorithm and improved identifiability result. The starting point for this reformulation is that the third dimension (K) is long enough R ≤ min( IJ , K ) . I,J and K can not be interchanged  When the long dimension is I or J, i.e., R ≤ min( JK , I ) or R ≤ min( IK , J ) we have recently shown (CAMSAP 2009), that the BCD-(L,L,1) can be reformulated as Joint-Block-Diagonalization problem. This yields a new set of identifiability results.