Convergence of a Reactive Planning Algorithm 1 ... - Vincent Decugis

Oct 29, 1997 - 111, Faubourg Saint-Honor e, 75008 Paris .... G2GPj G + vPi(st) ..... that we could with little problems fix these null valued goals to an value, ... mc ma0. We do the same for the other beam of straight lines ((I a) (I A) (I B) (I c0)).
213KB taille 2 téléchargements 240 vues
Convergence of a Reactive Planning Algorithm Vincent DECUGIS & Bernard BEAUZAMY y

y

z

Groupe d'Etudes Sous-Marine de l'Atlantique (GESMA), BP , 29240 BREST NAVAL Soci et e de Calcul Math ematique, 111, Faubourg Saint-Honor e, 75008 Paris October 29, 1997 z

1 Introduction One of the main research themes in Articial Intelligence is to construct algorithms for planication. These algorithms aim to build a series of actions on a system so that it reaches in a nite time a given conguration, called aim. This problem is very general and its solutions may be very dierent if one has a good knowledge of the consequences of the actions, and may depend strongly whether the system is deterministic or not. Unfortunately, for the great majority of the systems which are to be controlled, little is known about the consequences of the actions. Quite often, the original plan becomes inapplicable because of an unforeseen event, or because the knowledge of the system was too incomplete or approximative. Replanning is therefore a central problem, widely discussed in AI literature. One solution has been proposed by P. Maes, and can be called \reactive planning". Reactive planning enables to correct instantly the plan of actions whenever unexpected event occurs or one of the planned action fails. We propose here a new version of this algorithm, more suitable to deal with autonomous robot decision-making problematic. The mathematical expression of this algorithm shows that the choice of the action which is made to control the system is conditioned by the convergence of a discrete time dynamical system. The algorithm which was used rst by P. Maes was not expressed in mathematical terms. The question of the convergence of the underlying dynamical system had therefore been totally forgotten. The expression of the potential limit was unknown, even though its expression would have been of great interest in order to verify the proposed analysis of empirical data. In a rst version of the development of our reactive planning algorithm, we stated empirically that convergence eectively occurred in every conguration we would face, following in condence Maes' results. In this article, we will demonstrate this convergence and express su cient conditions for its occurrence. First, we explain brie y the principles of reactive planning algorithms. Then, we will establish a practical form of the underlying dynamical system. We then state a condition, and show that it is su cient for the convergence of the dynamical system. Then, we express the limit of the dynamical system.

2 Description of the reactive planning algorithm The reactive planning algorithm is implemented in the control architecture of a mobile autonomous robot. The control architecture we are considering takes its informations on the environment of the robot from sensors, whose value over time can be described by a point in the sensor space S . The in uence that the control architecture can have on the environment is modeled by a point whose value can change over time in the eector space E of the robot, i.e. the set of all motor commands available, taking into account the mechanical architecture of the robot. The control architecture for reactive planning is organized as a network of several components, loosely inspired from Maes' architecture ( 4], 5]):  perceptions: boolean functions of the sensor values vP : S ;! f0 1g where S is the space of all possible combinations of the values of the sensors. There is a scalar quantity called \ activation level" P 2 IR, 1

2 DESCRIPTION OF THE REACTIVE PLANNING ALGORITHM

2

varying with time, attached to each perception. This activation level re ects the importance of the perception for the choice of the current action.  reex behaviors which also have an activation level R and a sensori-motor coupling function bR : S ;! E which maps the sensor space S onto the eector space E , i.e. the set of all possible motor commands or actions that the control architecture can use.  goals which have a strength G and a set of perceptions and re ex behaviors they try to make active GP and GB . The components of the network interact through oriented links of two types: perception towards re ex or re ex towards perception. The links enable the activation level of each component to diuse to their neighboring components. Each link has a strength that re ects its permeability to activation diusion. The activation can diuse in the main direction of the link, following the equation:

ij = i fij direct where i is the origin of the link (perception or re ex), j its target (perception or re ex), and direct a global coe cient for the network regulating the direct diusion of activation. It can also diuse against the main direction of the link (retropropagation):

ij = j fij inverse 0

with inverse regulating the global diusion in this direction. Some activation is introduced in the network of components by two sources:  goals that raise the activation level of their target behavior and perception nodes of G at each time step. This enables the network to in uence action choice, in order to reach the specied goals.  perceptions that raise their level of activation by 1 at each time step where their truth value is true. This enables the network to take the actual situation of the robot into account, in order to choose appropriate actions. Since there is no dissipation of activation, clearly the total activation in the network will diverge when time elapses. In order to prevent this divergence, we normalize the activation of the perceptions and re exes at each time step. The combination of the diusion of activation process, introduction of activation and normalization leads to the equations of the discrete time dynamical system:

8 t+1 > Pj > > > < tR+1j > > > :

P

t = tPj + direct Ri fRi Pj Ri P +P inverse Rk fPj Rk tRk + G Pj G + vPi (st ) P f t = tPj + direct Pi Rj Pi Pi P +P inverse Pk fRj Pk tPk + G Rj G 2R

2R

2G

2P

2P

2G

where P is the set of all perceptions, R is the set of all re exes, GPj the set of goals pointing to perception

Pj and GRj the set of goals pointing to re ex Rj . st is the value of sensors at time t.

The aim of the dierent processes regulating activation levels in the network is to reach a global equilibrium after some time of evolution. This equilibrium, provided that we choose correctly the strength of the links, will distribute the activation level between the re exes so that the more active one will maximize the probability to achieve one of the goals. This property has been tested empirically with computer simulation under several forms ( 4], 5], 6], 7], 3], 2]). Results are positive enough to consider the algorithm to be empirically e cient. Our aim here is to prove the convergence of this dynamical system, and to express the limit it reaches. This will enable us to prove the validity of the algorithm while showing clearly its limits.

3 TRANSFORMATION INTO SIMPLER TERMS

3

3 Transformation into simpler terms We express the preceding equations by writing the activation levels as a single vector A. We rewrite the input of activity E and dene a new matrix M :

0 BB BB BB BB B A=B BB BB BB BB @

with

P1

1

.. .

Pi .. .

PnP R1 .. .

Rj .. .

1 0 GP + vP (s) 1 .. CC C BB CC BB GP +. vP (s) CCC i i CC CC BB . . CC BB C . CC E = BB GPnP + vPnP (s) CCC M =  InP M1 CC BB CC GR M2 InR CC BB CC .. . CC BB CC  GR j CC BB CC .. A @ A .

RnR

1

1

GRnR

0 directfR P + inverse fP R .. M1 = B @ . 1 1

1 1

direct fR1 Pn + inverse fPn R1

0 inverse fR PP + directfP RP .. M2 = B @ . 1 1

1 1

inverse fR1 PnP + direct fPnP R1



direct fRnR P1 + inverse fP1 RnR

.. .    direct fRnR PnP + inverse fPnP RnR  inverse fRnR P1 + direct fP1 RnR .. ... .    inverse fRnR PnP + direct fPnP RnR ...

1 CA 1 CA

4 Demonstration of the convergence

We consider the following suite of vectors in IRn , dened by the induction: k +E Ak+1 = kMA MAk + E k for each k 2 IN, where:  M is a n  n real matrix of the form

(1)

 I M 1 M= M2

I

and M1 is a n1  n2 positive matrix and M2 a n2  n1 positive matrix.  E is a constant vector of IRn with positive or null coe cients.  A0 has positive or null coe cients.  k k is the Euclidean norm of IRn According to these denitions, we will prove the following theorem:

Theorem 1 a) If the vector E has all his components strictly positive, or, more generally, if any of the vectors E , ME , M 2 E , ..., has all his components strictly positive, the sequence (Ak ) is convergent for the Euclidean norm of IRn . b) The limit A is a xed point of the application: +E (2) A ;! (A) = kMA MA + E k

4 DEMONSTRATION OF THE CONVERGENCE

4

c) If the vector E has a norm great enough, i.e. more precisely, if kE k > 2kM k, (2) has only one xed point the limit A can then be expressed as the sum of the convergent series: n Mk X k+1 E k=0

where  is the only positive real number satisfying

2

k kM E k 1 = kE k + kME 2 + 3 + :::

The proof we give here follows some of the ideas used by Jean-Bernard Baillon in 1]. Since we are in a nite-dimensional vector space, all norms in this space are equivalent to the Euclidean norm. The rst step of the demonstration is therefore to replace the Euclidean norm in our theorem by a more suitable one, namely the l1 norm, dened by: jxj1

=

X i

jxi j

if x = (xi )

This norm has for us an advantage: the intersection of the cone of IRn whose vectors have all their components positive or null with the unit sphere of l1 is a convex subset of IRn . Lemma 1 The convergence of the sequence (Ak ) is equivalent to the convergence of (Bk ) dened by: MBk + E Bk+1 = jMB k + E j1 More generally, for a sequence (ak ), the fact that ak =kak k converges is equivalent to the fact that ak =jak j1 converges. These two convergences re ect a convergence in direction of the vector, which is independent of the norm. More formally, it is su cient to consider the case where kak k = 1 for all k. Then if (ak ) converges for k k, it converges for j j1 because all norms are equivalent in a nite dimensional vector space. So jak j1 converges and ak =jak j1 converges also. The converse is established the same way. As a consequence, we can consider the following sequence instead of (1): MAk + E Ak+1 = jMA k + E j1 Let K be ( ) n X n K = x 2 IR  x = (xi ), xi  0 for all i and xi = 1 i=1

K is the positive unit sphere of IRn endowed with the l1 norm. It is, as we said, a convex compact set

and a simplex: convex closed hull of the points of the canonical base. Let  be the function dened on K by MA + E (A) = jMA + E j1 Since M and E have all components positive,  maps K into K . Let then suppose that all components of E are strictly positive. We denote by K 0 the interior of K . Lemma 2 If E has all its components strictly positive,  operates from K into K 0 Demonstration of lemma 2: The set K n K 0, written @K , is the border of K : this the union of points that have at least one of their components equal to zero. K 0 is, on the opposite, constituted by points that have all their components strictly positive. This is the case of E and so of MA + E , which ends the demonstration. The proof we present now uses the notion of inner distance of a convex set. This tool, which is classical in such a frame, was also shown to us by J. B. Baillon, and was used in JBB]. Denition: the an-harmonic ratio: Let there be four points A,B,C , and D aligned on an axis . We can associate to these four points the following ratio, called \an-harmonic":

CD (A B C D) = BD : BA CA

4 DEMONSTRATION OF THE CONVERGENCE

5

Properties: The an-harmonic ratio is invariant by a ne and projective transformations (we consider the Euclidean distance). A typical example of projective transformation is the projection on a straight line, centered on a point. As a consequence, if we consider four convergent or parallel straight lines  , ,  and  in a plane and crossed by another straight line  respectively at A,B ,C and D, then (A,B,C,D) does not depend of  (see gure). ∆α ∆β A

∆γ

B C

∆δ

D



Denition:

We consider a closed bounded convex K in a Euclidean space. We dene the following way a distance on K 0 . Let A and B be two points of K 0 . We draw the straight line passing through A and B . This straight line crosses the border of K at two points, denoted by a and b, so that a,A,B ,b are aligned in this order. We set:  Ab Bb dK (A B ) = log Aa : Ba b B A a K

It is well known that dK is a distance on K 0 . For the sake of completeness, the proof is given in the Appendix. Notes:  on K 0 , dK induces the usual topology,  K 0 is not bounded for dK ,  the closer a point x is to the border of K , the smaller the balls BdK (x r) are for the Euclidean distance,  K 0 is a complete metric space for the distance dK . We will now study  relatively to the distance dK (we will denote it by d further on). Proposition 1 For the distance d, the function  is a strict contraction of K 0 into itself: there exists a constant c < 1 such that d((A) (B )) < cd(A B ) 0 for any couple of points A and B in K . Demonstration: Let A and B be two points of K 0 . Let a and b be the intersection of the straight line (A B ) with the border of K . a,A,B and b are aligned in this order. Through the linear operator M , they transform into four other points that are still aligned, and have the same an-harmonic ratio. If we now add a constant vector E to these four points and project them on K , they remain aligned. We have therefore: ((a) (A) (B ) (b)) = (a A B b):

4 DEMONSTRATION OF THE CONVERGENCE

6

Let a , b be the intersection of the straight line ((A) (B )) with K . We have according to lemma 3 in the Appendix: (A)(b) (B )(b) (A)b (B )b (A)(a) : (B )(a)  (A)a : (B )a 0

0

0

0

0

0

and so

d(A B ) > d((A) (b)) which proves that  is a strict contraction for the distance d.

We notice that the inequation we derived from lemma 3 is strict: (A) and (B ) are in K 0 , and so are dierent from a and b . Since we have on the other hand (K ) K 0 and since (K ) is itself a compact set, the Euclidean distance from (K ) to @K is strictly positive. 0

0

b B

A a

ϕ

a’

ϕ( a )

ϕ( A )

ϕ( B )

b’

ϕ( b )

We will show that there exists a c < 1 such that

d((A) (B )) < cd(A B ): We set:

) (B )) c = sup 0 d(d(A (A B ) AB K 2

Let's suppose c = 1. Then there exist two sequences of points (An ) and (Bn ) with



d((An ) (Bn )) ! 1: d(An  Bn ) If (An ) and (Bn ) have two accumulation points A and B in K 0 , we get d((A) (B )) = d(A B )



which contradicts our rst inequality. If An ! A 2 @K and Bn ! B 2 K 0,

d((An ) (Bn )) ! 1 d(An  Bn ) is impossible, because (An ) (Bn ) 2 K 0 , and so d((An ) (Bn )) has a nite limit, whereas d(An  Bn ) ! +1 since An get closer to the border.

4 DEMONSTRATION OF THE CONVERGENCE

7

The same if An ! A 2 @K and Bn ! B 2 @K . So c < 1 and, for all A B 2 K 0, d((A) (B ))  c d(A B ): If we iterate this inequality, we get: d((k) (A) (k) (B ))  ck d(A B ): The set Kk = (k) (K ) is therefore compact and the diameter of Kk converges towards 0 when k ! 1. The intersection of these sets is therefore a single point A0 : \ Kk = fA0 g: k IN T T This point is a xed point of , for if A 2 k IN (k) (K ), we have also (A) 2 k IN (k) (K ), and so (A) = A0 . Furthermore, for any initial point A, d((A) A0 )  c d(A A0 ) and so d((k) (A) A0 )  ck d(A A0 ) which proves that for any initial point A, (k) (A) ! A0 , and this achieves the demonstration of the theorem in the case where E has all its components strictly positive. Let us suppose now that, for a certain natural number m  0, the vector M m E has all its components strictly positive. Then (m) will operate from K into K 0 , and so we can apply the reasoning we made for , and we obtain: d((m) (A) (m) (B ))  c d(A B ): We choose now N 2 IN. The Euclidean division of N by m gives: N = mk + p p < m: We infer: d((N ) (A) (N ) (B ))  d((mk) (A) (mk) (B ))  ck d(A B ) and so d((N ) (A) (N ) (B )) ! 0 when N ! 1 the diameter of KN converges to 0 when N ! 1 we end the proof as before. We now nish the proof of the theorem, by giving the form of the limit, which is a xed point of : MA + E = A kMA + E k or (I ; M )A = E with  = kMA + E k. We get kA ; MAk = kE k: But, for a xed A, the function f ( ) = k A ; MAk is convex, and if kE k > kM k, we have f (0) < kM k, whereas f ( ) ! 1 when ! 1. Therefore, there exists a unique positive  satisfying the preceding equation. If  > kM k, I ; M is invertible and we get 

2

2

A=

2

n Mk X k+1 E: k=0

The condition  > kM k is satised as soon as kE k > 2kM k, since kMA + E k  kE k ; kMAk and kAk = 1. Finally, taking the l1 norm of the two members of the equation giving the expression of A, and using the additivity of this norm on the positive unit ball of l1 , we get 2

as we announced.

k kM E k 1 = kE k + kME 2  + 3 + :::

5 DISCUSSION

8

5 Discussion We need to verify that our algorithm fullls two conditions of Theorem 1 to guarantees its convergence in any case. The rst condition is that kE k > 2kM k. M and E are calculated in our model on the basis of two dierent sets of parameters:  M depends on direct , indirect , FPR and FRP .  E depends on (GPi )1 i nP and (GRi )1 i nR . All these parameters can be adjusted, provided that we maintain constant several ratios: FPi Rj FRi Pj G(P or R)i direct indirect , FP R , FR P , G(P or R)j  

 

i0 j 0

i0 j 0

It is then clear that a good choice of direct , indirect , FPR , FRP , (GPi )1 i nP and (GRi )1 i nR will lead to the fulllment of the condition. The second condition applies to the vector E whose components should be all strictly positive. This may look rather constraining at rst glance since in the great majority of cases, we are interested in having one or two goals simultaneously in the network. But the preceding remark on the ratio to be maintained shows that we could with little problems x these null valued goals to an value, as small as necessary to maintain the global coherence of decision making. The last thing we need to prove is that we can stop the calculus of the series at some nite xed point, while making the rest of the series arbitrarily small. The denition of :  

 

2

k kM E k 1 = kE k + kME 2 + 3 + :::

shows us the series is absolutely convergent. So its term has a null limit:

M k E ;! 0 k+1

This give us the opportunity to stop the iteration at some xed point in the algorithm. Experience have proven the convergence to be very quick in practical cases. We shall conclude by noticing that the shape of the limit for A is rather reassuring:

A= Mk k+1 E

n Mk X k+1 E

k=0

The term represents the impact of the xed value vector E , which combines goal eects and the perception truth values, after k iteration through the network. This represents in fact what we called propagation of the perceptual situation for the perception part of vector E , and the retropropagation of the goals for the remaining part of vector E 2]. So the results proven here are su cient to insure the determinism and correct convergence of our algorithm. There is in fact no need of proving other results in particular cases where E has some null components. Moreover, the theorem gives us a precise framework to set the various and redundant parameters of our model, which was a serious lack in the empirical method we rst used.

Appendix A : dK is a distance The function dK is a distance on K 0 . Indeed:

 Ab Ba dK (A B ) = log Aa Bb = dK (B A) Ab Ba AB + Bb AB + Aa Aa  Ba = Bb  Aa  1 and so dK (A B )  0. The equality dK (A B ) = 0 is satised if and only if: AB + Bb = AB + Aa = 1 Bb Aa which happens if and only if AB = 0. The triangular inequality is a consequence of the following lemma.

5 DISCUSSION

9

Lemma 3 If a,a1 ,A,B,b1 ,b are six points aligned in this order, we have: Ab : Bb  Ab1 : Bb1 Aa Ba Aa1 Ba1 with a strict inequality except for a = a1 and b = b1 .

Lemma 4 We consider the following gure. Then we have:



Ac Cc Ab Bb Aa : Ca = Aa : Ba 0

0

0

0

 Bc Cc : 0

0

Bb

0

Cb

0

b b’ B

a’ A

C

a

c

c’

Proof of Lemma 4 We extend (a b) and (c  b ) until their intersection point I . We trace (I B ) that crosses (a  c) in m. 0

0

0

I

b b’ B

a’ A

m

a

c

C

c’

We consider the beam of straight lines ((I a) (I A) (I B ) (I c )). Then 0

Ab : Bb = Ac : mc Aa Ba Aa ma We do the same for the other beam of straight lines ((I a) (I A) (I B ) (I c )). Then Bc : Cc = mc : Cc Bb cb ma Ca 0

0

0

0

0

0

0

0

0

By making the product of these two equalities, we get the expected relationship. We can now prove the triangular inequality

REFERENCES

10 b b’

a’

B

a’ 1 A

C

c’1 c’

a c

According to lemma 3, According to lemma 4,

Ac1 Cc1 Aa1 : Ca1 Ac Cc Aa : Ca 0

0

0

0

0

0

0

0



Ac Cc Aa : Ca 0

0

0

0

 Ab Bb  Bc Cc  : : 0

0

Aa Ba

0

0

Bb Cb

We only need to take the logarithm of the preceding inequality to get the triangular inequality.

References 1] J.B. Baillon. Non densite des iteres des directions par des matrices a coe cients positifs. Technical report, ICM contract DGA/ETCA/CREA/20388-92, june 1993. 2] V. Decugis and J. Ferber. Architecture multi-agents de robots mobiles autonomes. In Journ ees de Rochebrune 97. Ecole Nationale Superieure des Telecommunications, 1997. 3] S. Giszter. Reinforcement tuning of action synthesis in a virtual frog. In D. Cli, P. Husbands, J.A. Meyer, and S.W. Wilson, editors, From animal to animat 3, pages 293{300, 1994. 4] P. Maes. How to do the right thing. Connection Science Journal, 1(3), February 1990. 5] P. Maes. Situated agents can have goals. Robotics and Autonomous Systems, 6:49{70, 1990. 6] T. Tyrrell. Computational Mechanisms for Action Selection. PhD thesis, University of Edinburgh, 1993. 7] T. Tyrrell. An evaluation of Maes's bottom-up mechanism for action selection. Adaptive Behavior, 2(4):307{348, 1994.