Efficient approximate conditional density in a ... - Pierre SENDOREK

Jan 20, 2016 - ... fact and propose an algorithm which yields, for each xQ, the expression .... β = ng and decreases it's value as long as inequality (2) is not met.
136KB taille 7 téléchargements 214 vues
Efficient approximate conditional density in a Gaussian mixture Pierre Sendorek January 20, 2016 Abstract Efficient method to approximate the conditional probability when the joint probability is a Gaussian mixture. The approximation is arbitrarly close and the tolerance of the maximal difference is chosen by the user.

1

Introduction

When the joint probability f (xP , xQ ) is a Gaussian mixture (GM), given a query vector xQ , the conditional probability of the predicted vector xP 7→ f (xP |xQ ) is also a GM, which is proportional to the joint probability for this fixed value of xQ . However, for this fixed value of xQ , some Gaussian components have a very small contribution to the density xP 7→ f (xP |xQ ). We take advantage of this fact and propose an algorithm which yields, for each xQ , the expression of an arbitrarly good approximation of xP 7→ f (xP |xQ ) where the components which have a negligible contribution are ignored. This algorithm allows to speed up computations and is especially well suited when the joint probability : • is a GM with a great number of components • is formed by groups of gaussian components such as some sets of gaussian components are not far from each other (with respect to their covariances) • these sets of components are far from each other (with respect to their covariances)

2

Notations

We note f (x) =

ng X

νi N (x; µi , Ci )

i=1

where x ∈ RD . Let E ⊂ [1, D] ∩ Z and let P and Q be two matrices such as x 7→ P x = (xi )i∈E and x 7→ Qx = (xi )i∈E C . These matrices extract components of the vector x in a complementary manner. Hereafter, we will use the fact that I = P T P + QT Q and we name xP = P x and xQ = Qx. Also, we write [|1, N|] = {1, ..., N} for short.

3 3.1

Principle Overview

The final algorithm yields the indexes of the hyperplane-non-negligible components. Roughly speaking, the hyperplanenon-negligible components are the components of the density xP 7→ f (xP |xQ ) which have a big amplitude. This set of components depends on xQ . In this paper, we distinguish the hyperplane-non-negligible components from the boxnon-negligible components which are the components of f (x) = f (xP , xQ ) which have a big amplitude on a subset of the space we call a "box". The hyperplane-non-negligible components are defined as the components which are box-non-negligible for all the boxes which have an intersection with the hyperplane {(xP , xQ ), xP ∈ R|E| }. Our algorithm first needs to determine the box-non-negligible components to determine the hyperplane-non-negligible components. To determine the box-non-negligible components, for each box, we derive the lower bound and upper bound of the value taken by each Gaussian component on the box. Thus, for a box B and for x ∈ B we have LB (i ) ≤ νi N (x; µi , Ci ) ≤ UB (i ). Then, sets of indexes of the box-negligible components are built according to the following requirement

1

X

K

X

UB (i ) ≤

LB (i )

(1)

i ∈box-negligible(B) /

i∈box-negligible(B)

where K  0 is a constant choosen by the user. The more K is big, the less components are neglected. Finally, when a query point xQ is provided, the set of hyperplane-non-negligible components, is computed as the union of the box-negligible components which have a non empty intersection with the hyperplane {(xP , xQ ), xP ∈ R|E| } [

hyperplane-negligible(xQ )C =

box-negligible(B)C

B:∃xP ∈R|E| ,(xP ,xQ )∈B

We now give the details of this algorithm.

3.2

Griding the space

For each d, let (yi (d))i be a sorted vector, i.e. a vector such as yi (d) < yi+1 (d). Let Ii (d) =]yi (d), yi+1 (d)] be the i th interval on the dimension d, with y−1 (d) = −∞ and ynd +1 = +∞, where nd is a function of the coordinate number, d. We now grid the RD space with cartesian products of these intervals, so as to have a partition of RD with boxes of the form Ii1 × ... × IiD . Thus each point x ∈ RD is contained by only one of these boxes.

3.3

Bounding the contribution of each Gaussian component on a box

For each box B = Ii1 × ... × IiD , we can compute the upper bound and lower bound of each gaussian component. • Finding the upper bound of νi N (x; µi , Ci ) requires to find the minimum of (x −µi )T Ci−1 (x −µi ) under the constraint x ∈ B. This can be achieved thanks to a convex optimization library. • The lower bound of νi N (x; µi , Ci ) on B = {x ∈ RD : ∀d ∈ [|1, D|], zL (d) < x(d) ≤ zU (d)} is reached in a point of the form z = (zγd (d))d∈[|1,D|] , where for each d, γ(d) ∈ {L, U}. It can be found thanks to an exhaustive search.1 However, when the dimensionality D is high we suggest to use a looser lower bound as proposed hereafter. 3.3.1

Looser lower bound



zU −zL L and r = We note m = zU +z

2 2 .

n o n o

Since B = x ∈ RD : ∀d ∈ [|1, D|], zL (d) < x(d) ≤ zU (d) ⊂ B = x ∈ RD , x − m ≤ r , then for all x in B

2

µ−m

−(x − µ)T C −1 (x − µ) ≥ −λ1 kx − µk2 ≥ −λ1 m + r − µ kµ − mk which is the distance between m and the closest point of B and where λ1 is the greatest eigenvalue of C −1 or equivalently the inverse of the lowest eigenvalue of C. 3.3.2

Looser upper bound

If µ ∈ B then the maximum of −(x − µ)T C −1 (x − µ) is reached for x = µ. If µ ∈ / B then

2

µ−m

−(x − µ)T C −1 (x − µ) ≤ −λD kx − µk2 ≤ −λD m − r − µ kµ − mk which is the distance between m and the farthest point of B and where λD is the smallest eigenvalue of C −1 .

3.4

Finding the set of box-non-negligible components

Let (L(i ))i be the sorted vector of {LB (i ), i ∈ [|1, ng |]} (U(i ))i be the sorted vector of {UB (i ), i ∈ [|1, ng |]} 1 It

may be useful to keep results of the computation in memory since each point z = (zγd (d))d∈[|1,D|] is common to several boxes.

2

Now, for each β ∈ [|1, ng |] we can find a value α(β) (the greatest possible) such that K

X

U(i ) ≤

X

L(i )

(2)

i≥β

i≤α(β)

The proposed algorithm starts at β = ng and decreases it’s value as long as inequality (2) is not met. Once this inequality is met, the box-negligible components are those associated with the values U(i ) for i ≤ α(β). The set of boxnon-negligible components is defined as the complementary of this set. It is straightforward to check that the inequality (1) is met with this choice of the box-negligible components.

3.5

Usage

Although the set of box-non-negligible components is defined by the algorithm described in the previous section, it would be costly to compute the set of box-non-negligible components for each possible box, especially in a high dimensional setting. Indeed, most of the boxes might never contain any random sample once the algorithm is running, since the probability to be drawn in them may be very small. Thus we recommend to perform this computation only if needed (i.e. when a sample is drawn in a box for which the set of box-non-negligible indexes is not known yet), and/or to perform a precomputation for the regions of high probability (which can be approximately found by random sampling according to the GM).

3