Patrolling Ants: Convergence Results, Cycle Detection ... - Arnaud Glad

in terms of idleness (?).1 These algorithms do not require information about the .... Conflicts: One may specify a deterministic protocol deciding in which order agents act. ..... 2137. 10618. 2,14,-. 2580. 3.10 %. 98. 81.08 %. 175. 3.21 %. 12.59 %.
238KB taille 9 téléchargements 201 vues
Patrolling Ants: Convergence Results, Cycle Detection and an Improved Behavior Arnaud Glad, Olivier Buffet, Olivier Simonin, and Franc¸ois Charpillet MAIA Team / LORIA — INRIA / Nancy University Campus Scientifique BP 239 — 54 506 Vandœuvre-l`es-Nancy [email protected] Abstract : We consider here multi-agent patrolling as the task, for a group of agents, to repeatedly visit all the cells of a discrete environment. ? have introduced patrolling ant algorithms, where each agent can only mark and move according to its local perception of the environment. This approach led to a number of theoretical and experimental results. In particular, it has been observed that the agents often self-organize in stable cycles. The present paper focuses on the convergence behavior of a typical Antbased algorithm. We first prove the convergence to cycles under certain hypotheses, and discuss how this behavior depends on implementation details. In order to observe this convergence behavior through experiments, we then solve the difficult cycle detection problem. The very slow self-organization led us to propose a new behavior that dramatically speeds up the algorithm. Mots-cl´es : Multi-Agent Patrolling, Ant algorithms.

1

Introduction

Efficient exploration and surveillance of large environments is a central issue in mobile robotics, but multiagent patrolling algorithms can also be used in other applications like the surveillance of a network or of web sites (??). Various approaches have been proposed —as described and evaluated in (?), based for example on agents following the solution of a Traveling Salesman Problem, applying heuristic rules, negotiating objectives or learning by reinforcement. The performance of patrolling algorithms is usually measured through the average idleness over the environment (time between two consecutive visits of a cell). In this context we are particularly interested in ant covering and patrolling algorithms, introduced by ?, which guarantee to repeatedly visit all the cells of a discrete environment and with competitive performances in terms of idleness (?).1 These algorithms do not require information about the environment, as each agent only marks and moves according to its local perception of the environment. This approach is known to exhibit another interesting global behavior. It has been observed that it often self-organizes in stable cycles. These cycles, which are Hamiltonian or quasi-Hamiltonian, provide an optimal solution to the multi-agent patrolling problem. In this paper, we do not focus on the performance of the patrol in terms of idleness, but investigate the proof of this swarm property. Previous work focused on specific problems such as proving: that a covering is guaranteed (?) in the multi-agent case; that a single agent that follows an Hamiltonian cycle once will repeat it forever; or that multiple agents can converge to separate cycles only if they have the same length (?). This paper completes these results by characterizing the convergence of the multi-agent patrol, considering various hypotheses. The paper is organized as follows. Section 2 introduces the notations and the EVAW algorithm. In Section 3 we study the convergence of the system towards stable cycles. Then we introduce in Section 4 an algorithm allowing the detection of these cycles (and therefore the convergence). In Section 5 we propose an improved behavior which dramatically speeds up the convergence time. Section 6 presents some experimental results that show the convergence behavior of some variants of EVAW. Finally, Section 7 concludes on this work and presents its perspectives. 1 We do not know of any work comparing ant algorithms (also called real-time search algorithms by ?) and other types of patrolling algorithms (?) in the litterature.

JFPDA 2009

2

Background

2.1

Patrolling Problem

Here, the discrete (and finite) environment E is modeled as a directed strongly connected graph where vertices are cells cj ∈ C (neighborhood N (cj )). The state of an ant agent ai ∈ A is described by its current cell: s(ai ) = cj . In this context, we consider multi-agent patrolling as the problem for the agents to repeatedly visit each cell in the environment.

2.2

Ant-based Approaches

In a typical ant-based approach, the ant agents do not know their environment and are memoryless, and each cell cj has a marking m(cj ). They can move to a neighboring cell, mark their current cell and perceive the marking of neighboring cells. This marking of the environment E is inspired by real-world ant pheromones. From this, we can define: • the state of the environment s(E) as the marking of all cells; • the state of the system as the tuple s = (s(E), s(a1 ), · · · , s(a|A| )). This model fits a variety of ant-based approaches like: some Vertex-Ant-Walk (VAW) algorithms (?, Appendix II-A), an exploration algorithm by ?, LRTA* and Node Counting (?), and EVAP/EVAW (?). They mainly differ depending on the semantics of the marking. We consider VAW/EVAW’s model where agent ai marks its current cell by setting m(s(ai )) ← t (the current time step). Then, two environment states whose markings differ by a constant are equivalent, so that we will often identify a state s(E) with the equivalent state s∗ (E) whose minimum cell marking is 0 (minj m(cj ) = 0). Plus, up to |A|! system states map to the same environment state (this upper-bound is not reached when multiple agents are on the same cell).

2.3

EVAW

This paper focuses on the EVAW algorithm (?) because it is typical of ant patrolling and its formalism is convenient to derive theoretical results. As detailed in Algorithm 1 and as illustrated by Figure 1, at each time step an EVAW agent simply 1) moves to the neighboring cell with the smallest marking and 2) marks this cell with the current time t. Algorithm 1: The EVAW algorithm executed by one agent 1 2 3 4

c ← random cell in C ; forall t = 1, · · · , ∞ do c ← arg mind∈N (c) m(d) ; m(c) ← t ;

/* initialization */ /* descent */ /* marking */

3 4

4

4 1

2 a) current state

1

1 2

2 0

3

3

0 b) descent (line 3)

5 c) marking (line 4)

Figure 1: One iteration of the EVAW algorithm (Alg. 1)

Patrolling Ants: Convergence Results

This first algorithm is very simple, but does not specify how multiple agents running in parallel interact together. We make here the assumption that one agent observes its vicinity and moves without any other agent acting in the mean time, which is rather realistic compared to an implementation on robots (this prevents two agents from ending up on the same cell). Alg. 2 gives a centralized version of the multi-agent algorithm. One can observe that two lines involve non-deterministic choices: • Line 3 – Conflicts: A conflict occurs when multiple agents “would like” to move to the same cell but the result depends on who moves first, as on Fig. 2-a. • Line 4 – Hesitations: An agent ai hesitates when it encounters choices between two cells which were last visited simultaneously (by two different agents) as on Figure 2-b. In practice, it is often the case that conflicts and hesitations occur simultaneously. Algorithm 2: Centralized EVAW for multiple agents 1 2 3 4 5

forall i ∈ {1, · · · , |A|} do ci ← random cell in C forall t = 1, · · · , ∞ do forall i ∈ {1, · · · , |A|} do ci ← arg mind∈N (ci ) m(d) m(ci ) ← t

0 1 [2] 0 0 0 0 1 [2]

1 [8] 7 [8] 2 5 6 7 3 4 5 6

a- Two conflicting agents

b- One hesitating agent

Figure 2: Two system states corresponding to uncertain situations (agent positions are represented by brackets): a) both agents would like to move to the ’0’ cell between them; b) the right agent has a choice between two ’7’ cells: the one on its left –last visited by the other agent– and the one below it –last visited by the right agent itself.

2.4

Some Known Theoretical Results

For each of the algorithms mentioned in Sec. 2.2 it has been proven that a group of ants repeatedly visits each cell in its environment, i.e. performs the patrolling task. To our knowledge, an upper-bound on the covering time is known for each algorithm except Node Counting. In the case of EVAW, we note this bound Tvis . It is also known that VAW0 , EVAP and EVAW are quasi-equivalent and may converge to stable cycles where each agent repeats a sequence of cells infinitely often. Only limited results are known about these cycles, e.g.: 1) in the single-agent case an agent that follows an Hamiltonian cycle2 once will repeat it forever (?); 2) in the multi-agent case agents on separate cycles have equal-length cycles (?).

3

Convergence Results

In this section, we characterize the long-term behavior of the EVAW algorithm. This study of the convergence of the EVAW algorithm follows a different approach as, rather than looking at cycles from the agents’ viewpoint, we consider cycles from the environment’s viewpoint. Indeed, from now on a cycle ζ is defined as a sequence of system states that repeats forever. 2 An

Hamiltonian cycle visits each vertex of the graph exactly once.

JFPDA 2009

3.1

Single-Agent Case

Theorem 3.1 (Convergence in the Single-Agent Case) Considering an environment E and a single agent a, EVAW converges to a cycle in finite time. Proof For any cell c, the time between two successive visits of c is bounded by Tvis (as explained in Sec. 2.4). With “equivalent” states s∗ (E), this leads to: ∀c ∈ C, 0 ≤ m(c) ≤ Tvis . The set of reachable environment states is therefore finite since it can be bounded from above by (Tvis + 1)|C| . The same is true for system states since, with a single agent, there is a one-to-one mapping between environment and system states. Agent a cannot visit — and therefore mark — two cells at the same time. Thus, two cells have the same marking if and only if this marking is 0 (i.e. they have not been visited yet). So, as soon as the agent has covered the environment — after at most Tvis steps — the agent cannot hesitate anymore: its behavior becomes deterministic. So, after time step Tvis , EVAW deterministically generates a recurrent sequence of system states (st+1 = f (st )). Because there is a finite number of reachable system states, this sequence converges to a cycle in finite time. 

3.2

Multi-Agent Case

In the multi-agent case, we still have that all cells are visited infinitely often. An upper bound on the number of reachable system states is here |A|! × (Tvis + 1)|C| . Yet, having visited all cells once does not mean the system’s dynamics are now deterministic. Both conflicts and hesitations can still be encountered. These sources of non-determinism can be tackled in various ways: Conflicts: One may specify a deterministic protocol deciding in which order agents act. This protocol may depend for example on agents’ ID numbers or on their relative position in the environment. Hesitations: Here, one may pick a cell depending on its direction (e.g. an agent may prefer not turning) or depending on which agent left the last marking (if this information is available, an agent may prefer its own trace). How decisions are made deterministic will of course influence the patrolling behavior. Theorem 3.2 (Convergence in the Multi-Agent Case) If the system is made deterministic, it converges to a cycle in finite time. If the system is not made deterministic, its behavior can be modeled as a Markov chain over a finite state space. It therefore necessarily ends up in an absorbing subgraph, cycles being particular cases of such subgraphs. These two theorems confirm the observations of cycles and clarify the behavior of EVAW agents. Yet, we do not know how fast cycles or absorbing subgraphs are reached, how many states they include... In the absence of theoretical results on such points, we look in this paper for experimental results.

4

Cycle Detection

A key issue when studying the convergence of patrolling behaviors experimentally is the ability to detect cycles or absorbing graphs. This section presents a novel algorithm we introduce for that purpose.

4.1

The Tortoise and the Hare (?)

Let us consider a set X, a function f : X 7→ X and a sequence u defined by u0 ∈ X and, for all n > 1, un+1 = f (un ). The cycle-detection algorithm proposed by ? detects whether the sequence u ends up on a cycle by comparing the evolution of u (called “the tortoise”) and v defined by vn = u2n (called “the hare”). As

Patrolling Ants: Convergence Results

shown on Algorithm 3, a cycle is detected when un = vn , n > 0 (i.e. when the hare and the tortoise meet at the same point). Algorithm 3: The Tortoise and the Hare Algorithm 1 2 3 4 5

v←u repeat u ← f (u) v ← f (f (v)) until u = v

4.2

Handling Uncertain Dynamics

We now consider the use of this cycle-detection algorithm to detect whether a group of EVAW agents have reached a convergence point. A first important point is that both the hare and the tortoise should evolve identically, making the same random choices. This requires that they both rely on the same sequence of random numbers. Considering “determinized” versions of EVAW, a cycle may occur only after the coverage phase, so that Alg. 3 can be used flawlessly. If non-deterministic transitions remain, the algorithm may be fooled and will stop in case: • of a true/proper cycle, • of an absorbing subgraph, • of a transitory subgraph. Checking whether what has been detected is a true cycle requires simulating the system until the same state is reached without having any random choice to make. Making the difference between an absorbing subgraph and a transitory phenomenon is more difficult. Two possible methods are: • constructing this subgraph, using “chance states” as vertices and following all possible paths; • performing simulations to statistically evaluate whether the phenomenon is transitory or not. We have developped an algorithm following the former approach. Yet, due to the size of the subgraphs being generated, it has not been used in our experiments. The detection algorithm simply returns that it has detected a fake cycle.

5

Improved EVAW

This section is motivated by the combinatorial explosion of the convergence time with respect to the environment size and the number of agents. In order to propose an improved algorithm, we first introduce an appropriate visualization tool, allowing us to analyze the path formation. From this analysis, we design a new heuristic which relies on detecting singular patterns.

5.1

Paths Visualization

The numerical marks form a potential field, so that an intuitive idea is to visualize the associated gradient vector field, i.e. to draw, from each cell c, an arrow in the steepest ascent directions (i.e. to each cell in arg maxd∈N (c) m(d)). Yet, as illustrated by Fig. 3-a, paths are not visible. Thus, we propose a new representation that fixes this problem: at each time step we draw an arrow from a cell c to the neighboring cells reached by following the flattest ascent directions (i.e. the cells in arg mind∈N (c)s.t.m(d)≥m(c) m(d)). As illustrated in Fig. 3-b, the result clearly exhibits paths and Hamiltonian cycles when they appear.

JFPDA 2009

a) gradient-based field

b) field exhibiting paths

Figure 3: Two vector-field representations

5.2

Observation

We can see experimentally, and it is more visible on larger environments, that EVAW builds cycles from partial Hamiltonian paths appearing during the convergence process (darker arrows exhibit such a partial Hamiltonian path on Fig. 3). We focus on cells which are local minima (m(tail) ≤ minc∈N (tail) m(c)) and which we call tails since they correspond to most of the path beginnings we observe. An interesting property is that, when an agent enters an existing path by its tail (dark cell on Fig. 3), it may follow the path to its end without reorganizing all the pheromone field, thus favoring convergence to cycles. Because of their limited knowledge of the environment (perception limited to the neighboring cells, no memory), agents are not able to identify paths and: • do not always enter a path by its tail when possible, • destroy paths when not entering them by their tails. If identifying paths is difficult, an agent can easily detect a tail among its neighbors using an extended perception area (13 cells instead of 4 in our examples). Indeed, in order to detect this pattern an agent only needs to see, in addition to its current perception, the neighborhood of its adjacent cells (i.e. all the cells within a Manhattan distance of 2) as illustrated by Fig. 4.

The grey cross represents the vicinity used to determine whether the cell to the North of the agent is a tail or not.

Figure 4: Perception area of an EVAW+ agent on a grid

5.3

Speeding Up Cycle Formation

Based on these observations and on the ability to detect tails, we extend EVAW by forcing the agents to move to a tail whenever one is detected. This variant — not limited to grid environments — is called EVAW+ and is shown on Alg. 4, where tails(N (c)) is the set of neighboring cells of c which match the tail pattern.

5.4

How to Maintain the Patrolling Property

Although this improvement of the EVAW algorithm speeds up convergence (see Section 6), it leads to the loss of the patrolling property. Indeed we have observed rare cases where some cells of the environment are not visited infinitely often. One can see on Fig. 5 a), c) and e) that the agent has no choice — according to Alg. 4 — but follows path tails (darker arrows). As a consequence, the upper left cell and its two neighbors will not be visited anymore.

Patrolling Ants: Convergence Results

Algorithm 4: An enhanced version of EVAW 1 2 3 4

c ← random cell in C forall t = 1, · · · , ∞ do if tails(N (c)) 6= ∅ then c ← arg mind∈tails(N (c)) m(d)

6

else c ← arg mind∈N (c) m(d)

7

m(c) ← t

5

This bug is easily fixable. We only need to allow the agent to bypass the heuristic when the difference between the value of the candidate tail and the value of any other neighboring cell is greater than a chosen threshold (for example 10 × |C| to avoid inhibiting EVAW+’s heuristic). With this correction, all the theoretical results (e.g. the convergence theorems) remain valid with the EVAW+ algorithm.

Figure 5: Loss of the patrolling property

6

Experiments

6.1

Experimental Setting

We need to detail how conflicts and hesitations are handled in the implementations of EVAW and EVAW+ used in our experiments. As we use the Madkit/turtlekit framework (?) with the default turtlekit scheduling — where the agents always act in a fixed order — the conflict problem is solved at no cost. Then, depending on how they behave in case of hesitations, we differentiate two types of agents: 1) non-deterministic agents that act randomly and 2) deterministic agents that prefers moving right rather than backward, left and finally forward. The former are used by default in our experiments (to observe the effect of this non-determinism) while the latter are only employed to verify some specific theoretical results. Most experiments are noted (n, s, α), where n is the number of agents, s × s is the size of an empty square grid and α denotes the algorithm (’-’= EVAW, ’d’= deterministic EVAW, ’+’= EVAW+)3 . At least 1000 (and up to 20 000) simulation runs of at most 2.5 × 106 iterations were used. Agents were initially placed randomly. Complete results are presented in an associate technical report (?) along with some videos. 3 Due

to a lack of time, we only experimented with the buggy version of EVAW+.

JFPDA 2009

6.2

Results

6.2.1

Convergence Behavior

The first experiments aimed at verifying the convergence results of Sec. 3 using four settings: (1, 8, −), (1, 8, d), (3, 8, −) and (3, 8, d). As expected, the statistics gathered in Table 1 show that the only case where the convergence to cycles is not guaranteed (i.e. when “fake cycles” are found) is with multiple non-deterministic agents. setting

# runs

1,8,d 1,8,1,8,+ 1,10,1,10,+ 1,12,1,12,+ 1,14,1,14,+ 1,18,+ 1,40,+ 2,8,2,8,+ 2,10,2,10,+ 2,12,2,12,+ 2,14,2,14,+ 3,8,d 3,8,3,8,+ 3,10,3,10,+ 3,12,3,12,+ 3,14,3,14,+

20000 20000 20000 20000 20000 20000 20000 13003 20000 20000 1000 20000 20000 20000 20000 16737 20000 2580 20000 5000 10000 10000 1001 1001 1000 20000 250 10000

Ham. cycles 18.53 % 29.62 % 59.20 % 18.46 % 46.95 % 11.02 % 33.29 % 6.22 % 21.23 % 7.08 % .00 % 37.40 % 24.26 % 15.07 % 15.32 % 7.94 % 10.90 % 3.10 % 6.48 % .00 % .00 % .00 % .00 % .00 % 5.00 % 14.12 % .00 % .00 %

length (Ham.) 64 64 64 100 100 144 144 196 196 324 0 32 32 50 50 72 72 98 98 0 0 0 0 0 48 48 0 0

non-H. cycles 81.47 % 70.37 % 40.76 % 81.53 % 53.01 % 88.97 % 66.66 % 93.77 % 78.72 % 92.86 % 100.00 % 49.30 % 53.59 % 76.48 % 69.10 % 85.82 % 76.91 % 81.08 % 84.38 % 100.00 % 70.89 % 77.77 % 40.15 % 86.51 % 14.80 % 67.92 % 3.20 % 88.80 %

length (non-H.) 68 69 66 107 102 155 146 211 199 328 1617 46 64 80 103 121 143 175 190 40 27 78 53 109 117 141 176 181

fake cycles .00 % .00 % .00 % .00 % .00 % .00 % .00 % .00 % .00 % .00 % .00 % 13.29 % 22.13 % 8.44 % 15.56 % 6.22 % 12.13 % 3.21 % 9.08 % .00 % 29.11 % 21.73 % 10.18 % 12.48 % 4.50 % 17.93 % .80 % 10.24 %

no convergence .00 % .00 % .03 % .00 % .04 % .00 % .04 % .00 % .04 % .05 % .00 % .00 % .01 % .00 % .02 % .00 % .05 % 12.59 % .04 % .00 % .00 % .50 % 49.65 % .99 % 75.70 % .02 % 96.00 % .96 %

time to conv. 615 589 331 2565 716 14520 1327 116004 2263 5213 75347 3512 500 30987 1063 233479 2137 1586569 4162 31283 59404 3263 226581 4197 1274407 4506 1332858 7148

stdev(time to conv.) 428 438 173 2259 432 14130 842 115374 1494 1076 58401 4052 476 36058 2474 257010 10618 1263880 34745 30971 70841 3471 145568 4462 664516 5690 499218 7125

Table 1: Summary of experiments conducted in various settings.

6.2.2

EVAW vs EVAW+

Then, we ran experiments with a variety of settings (n ∈ {1, 2, 3}, s ∈ {8, 10, 12, 14}, α ∈ {−, +}) to gather quantitative results. Table 1 provides part of the collected statistics. Plus, Figures 6, 7 and 8 show convergence curves (the probability to have converged to a cycle after N iterations) only taking into account true cycles, so that they do not always reach 100%. One can observe that EVAW+ always converges faster than EVAW, the task being more difficult when more agents or larger environments are involved. In fact, it would not be practical to experiment with more complex settings using the original EVAW algorithm. Also, EVAW+’s bug only appears rarely in these experiments. 6.2.3

Various Observations

Looking at the various convergence curves, as could be expected they all look like cumulative distribution functions (CDF) of exponential distributions. They differ from these CDFs 1) at the beginning — during an initial exploration phase — and 2) by the fact that “steps” appear (especially in the single-agent case) — i.e. because there are periodic phases with no convergence. This periodic phenomenon is not explained for now. Finally, one can also observe that the probability to converge to Hamiltonian cycles decreases when the grid size increases. It is not clear whether this reflects a property of the environment — e.g. that there are

Patrolling Ants: Convergence Results

% of converged episodes

100 80 60

EVAW+ EVAW dEVAW

40 20 0 0

1000 2000 3000 4000 5000 # of iterations before convergence

6000

a- on an 8x8 grid

% of converged episodes

100 80 60 EVAW+ EVAW 40 20 0 0

1000 2000 3000 4000 5000 # of iterations before convergence

6000

b- on a 14x14 grid Figure 6: Convergence curves for one agent

less Hamiltonian cycles among possible absorbing subgraphs — or a property of EVAW — its exploration behavior is less likely to find Hamiltonian cycles.

7

Discussion / Conclusion

First, let us recall that all theoretical results or algorithms in this paper are not restrained to a particular graph topology, although most examples and experiments make use of grid environments. Other regular tilings — triangular or hexagonal — could be considered, but also irregular and even changing topologies — as in a website whose pages and relative links evolve with time.

JFPDA 2009

% of converged episodes

100 80 60 EVAW+ EVAW 40 20 0 0

2000 4000 6000 8000 # of iterations before convergence

10000

a- on an 8x8 grid

% of converged episodes

100 80 60 EVAW+ EVAW 40 20 0 0

2000 4000 6000 8000 # of iterations before convergence

10000

b- on a 14x14 grid Figure 7: Convergence curves for two agents

This paper studied the convergence behavior of a typical ant-based patrolling algorithm. Both Theorems 3.1 and 3.2 show that the system self-organizes, agents follow repetitive patterns, which are guaranteed to be stable in the single agent case or if agents are deterministic. These results can be generalized to the other ant-based algorithms we mentioned, except Node Counting (because no upper-bound of its covering time is known). We have also extended the Tortoise and the Hare algorithm to detect cycles in systems with non-deterministic dynamics. Brent’s cycle detection algorithm (?) could be extended in the exact same way. Distinguishing transitory subgraphs from absorbing ones is feasible — at least by constructing all possible futures from a given point — but often too expensive. A more appropriate approach would be to progressively build a partial graph representing the path followed by the system in its state space until it is trapped in a cycle or

Patrolling Ants: Convergence Results

% of converged episodes

100 80 60

EVAW+ EVAW dEVAW

40 20 0 0

10000 20000 30000 40000 # of iterations before convergence

50000

a- on an 8x8 grid

% of converged episodes

100 80 60 EVAW+ EVAW 40 20 0 0

10000 20000 30000 40000 # of iterations before convergence

50000

b- on a 14x14 grid Figure 8: Convergence curve for three agents

an absorbing subgraph. A third contribution is the proposal of the EVAW+ algorithm, which proves to converge to cycles much faster than EVAW, and more often to Hamiltonian cycles when they exist. When an Hamiltonian cycle is reached, this is an optimal solution in terms of visit frequency (idleness). Moreover, we observe that the other non-Hamiltonian cycles are often near-optimal (the length is usually close to a multiple of the number of cells). This means that the performance in terms of idleness is very good once convergence is attained. More experiments need to be conducted to further evaluate EVAW and EVAW+, including during the exploration phase. In fact, finding or designing the “best” ant-based patrolling algorithm requires studying the influence of a variety of parameters: the value (/marking) update rule, the ordering of actions, the synchronization between agents, the determinization of conflicts and hesitations... It would be also interesting

JFPDA 2009

to see whether they are competitive with other approaches in both offline and online settings.