An Architecture for Multi-Agent Systems in the Robot-Soccer Field

For ex- ample, in the case of a pass to a team player, the ball could pass close to an .... this role to another robot could result in a time saving op- tion. 2.3.3 STN.
126KB taille 1 téléchargements 203 vues
An Architecture for Multi-Agent Systems in the Robot-Soccer Field Sebastian Rodriguez and Vincent Hilaire Université de Technologie de Belfort-Montbéliard 90010 Belfort CEDEX FRANCE email: {sebastian.rodriguez, vincent.hilaire}@utbm.fr

Abstract Problem solving in complex domains often involves dynamic environments and the need of learning from previous experience and feedback. It is easy to imagine several situations where a cooperative behavior is needed in order to achieve a global objective between agents. Micro-Robot World Soccer Tournament (MiroSot) initiative gives a good arena for multi-agent research. Robot soccer makes heavy demands in all the key areas of robot technology, mechanics, sensors and intelligence.

Camera Vision System

Stimulus Coder Decision System

STNs

Robot

1 Introduction RF Card

Problem solving in complex domains often involves dynamic environments and the need of learning from previous experience and feedback. It is easy to imagine several situations where a cooperative behavior is needed in order to achieve a global objective between agents. Robot Soccer is an example where Real-time Cooperative behavior is needed. The dynamic environment presented in a match requires immediate response from the developed system, when at the same time the agents must learn to cooperate between them. With the ever increase in number of robots in an industrial environment, scientists/technologists were often faced with issues on cooperation and coordination among different robots and their self-governance in a workspace. This has led to the developments in multi-robot cooperative autonomous systems. Micro-Robot World Soccer Tournament (MiroSot) initiative gives a good arena for multi-agent research. Robot soccer makes heavy demands in all the key areas of robot technology, mechanics, sensors and intelligence. The robots used in MiroSot are small in size (7.5cm x 7.5cm x 7.5cm), fully/semi autonomous and without any human operators. MiroSot involves multiple robots that need to collaborate in an adversarial environment to achieve specific objectives. We will present the system’s structure, along with its main components in Sec-

Figure 1. System Flow

tion ??. Then the main roles for a robot-soccer player will be detailed in Section ??. Finally, some conclusions are presented in section ??

2 System Architecture In this Section, we will present the system’s architecture. A basic schema of the data flow is presented in Figure ??.

2.1 Vision System It was developed by Yujin Robotics CO., Ltd. The output is the position of each robot, home and opponent, with a rotation angle, and the position of the ball.

2.2 Stimulus Coder In order to understand the evolution of the game, we need to recognize situations instead of simply use the position of

Summing up, a game situation2 is defined by two parameters:

the robots. The first thing it is necessary to identify is whether a situation is potentially dangerous for the team or not. The chosen criteria to make this distinction was to identify the owner of the ball. The owner is consider to be the team whose robot is the closest to it. With this information a first distinction is made, from here on, we will say that we have a defensive situation when the opponent owns the ball, and an offensive one when is our team who owns it. It is important to notice here that the Stimulus Coder must also be able to deal with the uncertainty involved in this kind of systems. These uncertainties have different and various sources. One of the principals is the actual Vision System1 , which must capture an image and process it before it gives the Stimulus Coder the result. But we must also consider other sources, for example the robots are constantly in movement, so they never actually arrive (or part) from the point we are using for the calculus. However, the owner of the ball is not enough to determine a situation. At the same time, it is necessary to associate this situation with a physical location in the field. The field was divided in nine zones as shown in Figure ??. It is consider by the stimulus Coder that the home goal is in zone 1B. So the position the ball in the field generates the second parameter we need to identify the situation.

1. Owner of the Ball 2. Position of the ball in the field Recognizing the ball Owner It is consider by the algorithm that the team whose robot is the closest to the ball is the owner of it. However, certain conditions must be satisfied in order to start the search of the new owner. For example, in the case of a pass to a team player, the ball could pass close to an opponent player without being intercepted by him. In this case the owner of the ball must not change, although the closest robot was from a different team. In order to solve this problem, the stimulus coder detects the collisions that the ball may suffer. Only when the ball collides, the search of the owner is launched. This policy allows the system to deal with situations like the one described above and reduces the calculations time. In the case where is impossible to determine the owner of the ball a rather defensive approach is selected. In this moment there are only two possible causes not to be able to specify the ball owner: 1. Lack of information: the vision system could not determine the positions of all the robots. 2. Timeout: a Timeout is defined in the stimulus coder with several purposes. Between the most important ones are not to loose time, and try to conserve coherence. At the time a second system is being develop. This coder has the objective to recognize sub-categories of the situations, taking in account the positions of the home and opponent robot around the ball.

2.3 Decision System and Strategy DB Using the game situation obtained from the stimulus coder, the decision system is in position to decide the most suitable strategy for the current situation. Each strategy is coded in a Simple Temporal Network (STN for short). In a first experimental stage, only short time strategies where used, this kind of strategy will be called Plan from here on. This way we were able to evaluate the more successful sequences of plans, this sequences became later on more complex strategy. The Decision System uses the theory of Reinforcement Learning in order to choose the best strategy for a specific game situation.

Figure 2. Field Division

The chosen number of zone is actually a compromise. Although more zones could associate the game situation more precisely in the field, it could also introduce a really dangerous error. The ball could be associated to a zone due to the uncertainties mentioned above. And a smaller number of zone would not be accurate enough. 1 According to Yujin Robotics, the vision system could present an uncertainty between four and two centimeters.

2 Game

2

situation and stimulus will be used indistinctly.

2 0 2

There are three principal parts in this module that collaborate to obtain a dynamically adapted strategy for the game situation found. 2.3.1 Classifier

8 9 8

Table 1. Value given to each zone of the field

The Classifier is in charge to actually choose the strategy that will be used for the detected stimulus. The selection is based in a probabilistic classifier with reinforcement learning. Each strategy is provided with a score, called fitness from here on. This fitness describes in a global manner how "good" a strategy is. It is vital that the reinforcement rules assign a coherent value to the new fitness.

Previous Situation Offensive Defensive Defensive Offensive

Bonus +2 +1 0 -1

The second component of the"boost" is what we will call Bonus B of the strategy, and it is presented in Table ??. Bonus represent the relationship between the strategy considering only the variations in the game situations, thus, abstracting itself completely from the actual ball position in the field. Using these two values the formula to the “boost” is where k1 and k2 are constants representing the importance of each value. Considering k1=k2=1, the boost for the example presented previously will be Boost=6 if we move to a offensive situation, while it will only be 3 if we move to an Defensive one. In both cases P equals 4, however in the first case the bonus is +2 while in the second -1. At the final stage, the classifier ensures that the sum of all the fitness for a specific situation equal 100, meaning 100%.

  

Updating The Fitness Lets imagine that the ball is in the zone 2B3 , initial position, in a offensive situation. If we move forward after the execution of the strategy, we could say this is a effective strategy for this zone. However, if we have now a defensive situation in zone 3B,final position , is not as effective as a strategy that leaves us in an offensive situation. From this simple example we can say that there are two basic parameter to use in the reinforcement rules, the displacement of the ball and the ownership of the ball. In a soccer match there are some zones in the field that could become really dangerous if the adversary owns the ball, or very advantageous if our team owns it. Using this idea, we assigned a value to each zone, shown in Table ??. The outcomming value, P, is the the first component of what we will call the "boost" 4 of the strategy. The formula is where is the value associated with the final position of the ball and is the value associated with the initial position of the ball.



Posterior Situation Offensive Offensive Defensive Defensive

Table 2. Bonus Considering the PreviousPosterior Game Situation

Choosing a Strategy When we analyze a soccer match, a team could assume different position configurations. So it is possible to create several strategies for each game situation. However, it is really improbably that one of these strategies is absolutely useless or 100% effective, and it is the classifiers job to learn this difference. As mentioned, each strategy has a fitness associated to it. This actually represents the probability for this strategy to be elected for the situations its associated with. After a stimulus is detected, the classifier stops the execution of the previous strategy, updates its fitness and chooses a new one using a probabilistic algorithm based in the previous situation and the current one. This algorithm creates a link between the different strategies, allowing the classifier to choose a sequence of successful strategies.



6 5 6

2.3.2 Team In a dynamic environment like a Robot Soccer Match it would be impossible to create a strategy for each possibility according to the actual location of the robots. Therefore, the strategies were conceived to assign roles to the robots instead of specific actions. Although this solves one of the problems, it is also necessary that these roles are dynamically assigned to the physical robots. To face this last problem the Team Role was implemented. This Role is fundamental to the planification structure of the system, its job is to decide which physical robot is the more suitable to perform a role. The roles must be requested in a priority order, this means that the Team agent5 will understand that the first request received is the most important role to assign. Depending in the role that was requested, different algorithms are used to decide the physical robot. For instance,



3 see

Figure ?? boost is a value indicating how much the fitness of this strategy should increase or decrease. 4 The

5 The

3

agent performing the Team Role.

the closest robot to the ball is no necessarily the most suitable one, depending on the direction and speed assigning this role to another robot could result in a time saving option. 2.3.3 STN Simple Temporal Networks [?] have proved useful in planning and scheduling applications that involve quantitative time constraints because they allow fast checking of temporal consistency. A Simple Temporal Network (STN) is a graph in which the edges are labeled with upper and lower numerical bounds. The nodes in the graph represent timepoints, while the edges correspond to constraints on the durations between the events.

Figure 3. Parabolic Displacement Schema

3.1 Basic Primitives As explained, the robots are able to interpret only the velocity that they should apply to each of their wheels. However it is impossible to design complex behaviors without the use higher level functions. This is the goal of this basic primitives, they provide a layer of services to the Roles.

Mathematical Representation Formally, an STN may be described as a 4-tuple where E isa set of edges, and   N is a set of nodes,   and are functions mapping the edges into extended Real Numbers, that are the lower and upper bounds of the interval of possible durations. Each STN is associated with a distance graph [?] derived from the upper and lower bound constraints. An STN is consistent if and only if the distance graph does not contain a negative cycle, which can be determined by a local shortest path propagation algorithm. To avoid confusion with edges in the distance graph, we will refer to edges in the STN as links [?].





3.1.1 Linear Displacement It is easy to imagine several situations where the robots must be able to position themselves as quickly as possible in a specific point in the field, a linear displacement was developed for this purpose. [?] 3.1.2 Parabolic Displacement

STN as a Strategy After a strategy has been chosen, it can be directly executed. Several Plans and Strategies were implemented to test to functioning of the system. The STN works closely with the Team. As said, the Team (Section ??) will create a dynamic adaptation of the strategy represented in the STN with the physical Robots. However, is the STN who must execute the strategy itself, and control that the temporal constraints are respected. In order to continue its execution, in the first node the STN requests the physical robots to the Team and memorizes these values. Beside this exception, the STN follow a modified algorithm based in the one presented by Muscettola and Morris[?, ?].

Although the linear displacement is the fastest way to arrive to a point, in certain situations it is more important to arrive at this point with a desired angle. This is the case of Make a pass or a Goal Kick. In this cases the robot must give the ball a direction, so it is really important the arrival angle. The parabolic displacement was conceived with this purpose. Using available information, like the robot’s direction, desired direction of the ball after impact, etc. it is possible for the robot to describe a certain trajectory in order to impact the ball with the correct angle.

3.2 Roles

3 A Robor-Soccer Player Role

3.2.1 Goal Kick and Make a pass

The Actions are divided into two groups, the basic primitives and the roles. The second ones are based on the firsts, in order to allow more complex collectives behaviors.

These roles are actually two different ones if we consider them from the goal point of view. However, both have use the same under laying idea to approach and impact the ball. From this point of view, we can see the Goal Kick roles as a particular case of the pass where the target point is a fix and known one.

Each agent is consider to be a fully functional soccer player. In consequence must be able to assume the different roles a player could perform in a soccer match. 4

As shown in Figure ??, it is possible to divide a pass6 in three parts:

3.3.1 Close The Robot provides a close defense positioning himself in the uncover side of the goal using the goal keeper as a reference. The basic idea behind this behavior is to eliminate open areas.

1. Approach phase, the first phase that the robot must perform. 2. Alignment phase, in this phase the robot will try to get the right angle to impact the ball.

The robot will describe an arc acting as a “second goal keeper”. Like this, if the ball crosses the field a robot will be in a defensive position blocking an opponent player.

3. Shoot Phase, the robot impacts the ball.

3.3.2 Far The robot tries to position himself between the home goal and the ball in order to provide a protection cone. The defender will respect a certain distance , until he is able to push the ball away from the opponent player. This action will cause the game situation to change, and the decision system will find the best strategy.

3.4 Goal keeper The goal keeper present three different behaviors according the situation. 1. Follow the y-coordinate of the ball.

Figure 4. Ball Approach

2. Using the ball and the posts as reference. 3. Clearing the ball from the goal zone.

The first phase must be performed as quickly as possible, its goal is to position the robot behind the ball. However, it must also try to simplify the alignment phase. The second phase aligns the robot, the ball and the target. The entry point is defined according the robot’s speed. The last phase is to rectify the trajectory, it could be omitted but it ensures the correct position of the robot. At this phase, the robot also begins the acceleration to impact the ball. The rectification of the trajectory could be necessary in certain conditions, for example, if the ball is moving at a high speed. Two points, the “interception” and “placement” point, are calculated during the first phase. Notice that these points are not actually fixed points, they move at the same speed as the ball. The situation shown in Figure ??.

The first two points are explained in the Classical behavior, while the third in the Path behavior 3.4.1 Classic behavior When the ball is in our side of the field the goal must assume a position to ensure a quick response. In Figure ??, can be found a schema that will help the reader follow the formulas. 

3.3 Defend the goal This is one of the vital functions a player must be able to perform in order to provide a high level of security in a defensive situation. Two different types of defense were implemented. The robots performing these roles will count with the help of at least one robot in defensive Support (explained later on). 6 As



 

  , like this the goal The idea is to maintain   keeper will be at a acceptable distance from both post. Although this provides a good coverage of the home goal, an important processor time is consumed. So when the ball is in the opponent’s side of the field, the goal keeper simple uses the y-coordinate of the ball as reference. This minimizes the processor use when the situation is not critical.

3.4.2 Path behavior The FIRA rules specify that the ball must be cleared from the goal area within ten seconds, otherwise the team will be charged with a penalty. To avoid this situation the goal keeper will push the ball outside his area when it enters the path. This also reduces the duration of extremely dangerous situations.

the Goal Kick follows the same pattern, we will only refer to the

Pass

5

3.6 Defensive support This role was conceived to provide additional support in a defensive situations. As in other roles, the defensive support could be performed in the left, center and right sides of the filed. Normally, an agent will be in Right defensive support when the ball is in the left side of the field.

3.7 Offensive support In the case of a missed goal kick, it is really important to be able to have a home player that could recover the ball, if necessary, or try to score, if possible. As the previous roles, the robot performing this role will normally be in the opposite side of field. The role was implemented respecting the MiroSot and SimuroSot rules, where is clearly prohibited to enter the opponent goal zone with more than one robot. So the robot will wait outside the goal zone until the ball leaves this area or his roles changes, for example to goal kick.

Figure 5. Goal Keeper’s Classical Behaviour

4 Conclusion The architecture presented is capable of adapting generic strategies to specific game situations. At the same time, the Simple Temporal Networks showed to be an effective coordination method for Multi-Agent Systems. At the present time other approaches are also being tested, as purely reactive systems and a improvement of the STNs.

Figure 6. Receive Pass

Using Fuzzy Logic The basic idea is to used potentials to position the goal keeper and the close defender. The ball direction is used to estimate the arrival point, here a negative potential is produce. Then using fuzzy logic principals, the goal keeper and the defender will compensate this potential.

References [1] J. Ferber, Les Systémes Multi-Agents: Vers une Intelligence Collective. InterEditions. 1995 [2] Brooks, R. A. "A Robot that Walks; Emergent Behavior from a Carefully Evolved Network", Neural Computation, 1:2, Summer 1989, pp. 253-262. Also in IEEE International Conference on Robotics and Automation, Scottsdale, AZ, May 1989, pp. 292-296.

3.5 Pass reception This Role is based in a simple idea, to maintain the same distance between the robot and the destination point, and the ball and reception point. However, several cases must be consider in order to abort this behavior if necessary. Aborting the normal behavior could be necessary if, for example, the ball is intercepted by an opponent or the direction is not as expected. Several test were performed in the real system and in the simulator to ensure that the robot will behave as expected even in the more unprovable situations. The receptor will change his ideal trajectory (Figure ??) if the ball slows down risking the correct arrival. This role implements a simpler version of the algorithm used by the stimulus coder to detect collision or direction modifications.

[3] Rodriguez, Sebastian A. Coordinating Multi-Agent Systems in Dynamic Environments. Master Thesis. 2002. [4] S. Bussmann, Y. Demazeau. An agent model combining reactive and cognitive capabilities. Proc. of IEEE International Conference on Intelligent Robots and Systems. IROS’S 94 München 1994 [5] Riley, P. and Veloso, M., Coaching a Simulated Soccer Team by Opponent Model Recognition.Proceedings of the Fifth International Conference on Autonomous Agents (Agents-2001) 6

[6] Riley, P. and Veloso, M.,Planning for Distributed Execution Through Use of Probabilistic Opponent Models. IJCAI-2001 Workshop PRO-2: Planning under Uncertainty and Incomplete Information (2001) [7] Muscettola, N. and Morris, P. Execution of temporal plans with uncertainly. In AAAI-2000, pages 491-496. AAAI Press/ The MIT Press, 2000 [8] Dechter, R.; Meiri, I.; and Pearl, J. 1991.Temporal constraint networks. Artificial Intelligence 49:61-95. [9] E. Schwalb, R. Dechter. Processing Disjunctions in Temporal Constraint Networks. Artificial Intelligence. 1995 [10] Eugene Jr Santos, Joel D Young. Probabilistic Temporal Networks Department of Electrical and Computer Engineering, Air Force Institute of Technology - 1996 [11] Vidal and Bidot. Dynamic Sequencing of Tasks in Simple Temporal Networks with Uncertainty. CUW’01 CONSTRAINTS AND UNCERTAINTY. p39 (2001) [12] Morris, P., and Muscettola, N. 1999. Managing temporal uncertainty through waypoint controllability. In Proc. of Sixthteenth Int. Joint Conf. on Artificial Intelligence (IJCAI-99).

7