Report on: Optimizing the layout of phylogenetic graphs - Philippe

A sufficient condition for the graph to remain planar is to keep the order of the ti (the taxa), .... We can notice that this method is quite fast, to identify the striker. 4 ...
386KB taille 0 téléchargements 232 vues
Report on: Optimizing the layout of phylogenetic graphs Philippe Gambette 3rd May 2005 Abstract The most common evolutionary model is the phylogenetic tree, with molecular sequences evolving through point mutations from a common ancestor. But with this model, recombination events can not be taken into account. Thus, the concept of phylogenetic networks and splitsgraphs have been introduced. These graphs can be created with the program Splitstree [2]. We have improved the layout of the graphs with this program, which were drawn thanks to the Equal Angle Algorithm [4]. The method has been implemented in SplitsTree4.

1 1.1

Pre-optimizing the layout of the graph The taxa circle

To illustrate the equal angle algorithm, we can represent the splits in a taxa circle where the taxa are displayed regularly: t3 t2 P3

t3

P2

t4 t4

t2

P4

t5 t1

P1

t5

C t1

O

P5

P8

t6

t8

t8 P6

t7

P7

t6 t7

Figure 1: A splitsgraph and its associated taxa circle So we can try to improve the layout of small graphs by changing the place of P1 ,. . . , Pn on the taxa circle. A sufficient condition for the graph to remain planar is to keep the order of the ti (the taxa), and to keep the Pi between its two ti neighbours: −→ −−→ −−→ −−−→ −−→ −−→ −−→ −−→ 1. (Oti , OPi ) = (OPi , Oti+1 ) for i < n and (Otn , OPn ) = (OPn , Ot0 ) −−→ −−→ −−→ −−→ −−−−→ −−→ 2. (Ot0 , Ot1 ) < (Ot1 , Ot2 ) < . . . < (Otn−1 , Otn ) By moving the taxas according to these conditions, we can improve the splistgraph of figure 1 to obtain the one shown in figure 2. 1

P3

t3 P2

t3 t2

t2

t4

t4

P4

t5

O

t5

P1

C

P5

t6

t8

P P7 8 t 1

t7 t8

P6

t1

t6 t7

Figure 2: After pre-optimization

1.2

Optimizing the taxa circle

We will move the taxa on the taxa circle to improve the layout of the graph, but we first have to find a criteria to evaluate how nice the graph is drawn. Let’s define a box as the parallelogram created by two incompatible splits. Then we can compute easily the area of a box, using the weight of its two splits and one angle. For example, for the Box 2 in figure 3: Area Box 2 = w 2,3,4,5 w 4,5,6,7,8 sin a 6,7,8,1

1,2,3

t3 t2 t4 t5

Box 2 w {2,3,4,5} {6,7,8,1}

Box 1

a w{4,5,6,7,8} {1,2,3}

t8

t1

t6 t7

Figure 3: The criteria to optimize: the area of the boxes The splitsgraph will look nice if the total area of the boxes is maximized. So our algorithm considers each taxa and tries to move it half the way to its two neighbour taxas. Then, the angle of each split (the middle angle between the two surrounding taxas), and the total area of the boxes are computed. If the total area is improved, this new position is saved, and if the total area is better than the one found so far, we store the positions of the taxa. If the total area is not improved, this new position is saved with a probability p. We do this loop on all the taxa n times. We made tests on a small graph to choose the values of the parameters p and n. The results are shown in figure 4. This algorithm has a complexity of O(nb of taxas×nb of splits), and does not improve big graphs: as there are many taxas on the 2

Figure 4: Choice of parameters p and n: first we set n = 500, run the algorithm 3 times, the best areas obtained are stored on the left graph. So we identify a good value for p: 0.8; then, we set p = 0.8 and run the algorithm again for different values of n: choosing n = 500 will be sufficient for the algorithm. taxa circle, the angles do not change a lot. And as it only considers the boxes, it moves the edges leading to the leaves without caring. Applying the daylight algorithm after this pre-optimization corrects this problem. Anyway, this pre-optimization works quite well for small graphs (figure 5).

Figure 5: Example of results obtained by the pre-optimization algorithm.

2 2.1

Optimizing the layout of the graph Two critical angles

Once the pre-optimization is over, the angles of the splits are given to their edges, and the nodes can be placed: the graph can be drawn. We can then try to optimize the layout by doing automatically what a user of Splitstree would do manually: change the 3

angles of the splits to get a nicer graph. Our algorithm will iterate the following loop on all the splits of the graph: for each split, we change its angle so as to maximize the area of the split. We can notice that for the graph to remain planar, the angle of any split S cannot take any values, there are two critical angles αc (S) and αt (S), as shown in figure 6: αt (S) = a0 + αc (S) = a0 −

min

{(ai − a0 ) mod 2π}

min

{(a0 − ai )

edges i of S edges i of S

a3 a5

a2

a0

a4 a1

a6

mod 2π}

t

a0 αc a6 = α c

Figure 6: Example of a critical situation when the split angle reaches αc .

2.2

Detecting collisions

When we change the angle a0 of a split S, we can consider that one part of the splitsgraph stays in the same place, and that the other part, across the split, is translated. This movement could create a collision, if two edges of the graph intersect. We will try to detect these collisions in an efficient way and to determine how much we can change the angle of the split. The collisions can occur on the left or on the right of S, when we increase or decrease the angle of the split, so we will run similar algorithm on 4 points: the extreme nodes of the split. Let’s consider E, the node in the lower-right corner of S, we call E ′ the node in the upper-right corner. S splits the graph into two parts, so as shown in figure 2.2 we will visit each node of the lower part −−→ −−→ (resp. upper part) to try to find the node P 1 such as (EP1 , EE ′ ) −−→ −−→′ is minimal (resp. (EP2 , EE ) is maximal). We call P 1 the defender and P 2 the striker. If we decrease the angle of the split by less than −−→ −−→ (EP1 , EP2 ) nothing will collide in general. Indeed, there are some rare cases where there will be a collision, we will identify them later. We can notice that this method is quite fast, to identify the striker 4

P2

P2

E'

E'

P'2

E

E

E'' P1

P1

Figure 7: Example of a critical situation where we try to avoid a collision on the right of the split. and the defender, we need to go through all the nodes N of the graph −−→ −−→ but to compute only one angle: (EN , EE ′ ). When we run the algorithme on the 4 extreme nodes, we get 4 new critical angles which will be compared to the already found αt and αc and may replace them. Then, we can compute the area of the split and choose the angle between αc and αt such as this area is maximal. We can also notice that when we change the angle of the split, the strikers and defenders may change: for example, in figure , the striker is P2 on the left picture, but it is not P2′ on the right. E ′′ is the optimal angle then. So if we optimize the angles of all splits in the graph, there may remain some splits angles which can still be optimized. As we mentioned, this method to identify the critical angles to avoid collisions does not always work. But it fails only in cases which never occur if the layout of the graph we try to optimize has been computed by the equal angle algorithm. So, we will identify a zone where, if there are nodes of the graph, collisions occur. To compute the border of this zone, we will consider once again only one of the 4 extreme nodes of the split, and identify the nodes P2 such as if we −−→ −−→ decrease the angle α of the split by θ = (EP1 , EP2 ), P2 ∈ (E, P1 ) (the striker belongs to the straight line containing the extreme node and the defender). E'

l

l(sin α - sin α-θ) E'' P2 l

R

l sin α l sin α-θ

R sin θ θ E

α P'2

Figure 8: Critical conditions to decrease the angle α of a split.

5

In this case, as show in figure 8, with l = EE ′ and R = EP2 , R sin θ = l(sin a − sin a − θ), so we find the equation of the border of the zone: R=

l(sin a − sin a − θ) sin θ

Some examples of these zones, depending on the α angle, are shown on figure 9 E' E' E'

E' E'

α E

α P'2

Figure 9: Example of zones which should not contain nodes for our optimization algorithm to work properly, depending on the α angle

3

Conclusion

Our optimization algorithm works very well, and is quite fast, as shown in figure 10. But it only works on planar graphs, so we have to insert it correctly to deal with the convex hull algorithm which produces non planar graphs.

References [1] A.W.M. Dress and D.H. Huson: Constructing splits graphs, IEEE/ACM Transactions in Computational Biology and Bioinformatics (TCBB), volume 1(3), 2004, 109-115. [2] Daniel H. Huson and David Bryant: Estimating phylogenetic trees and networks using SplitsTree4, in preparation. [3] Daniel H. Huson, Tobias Dezulian, Tobias Klöpper, Mike A. Steel: Phylogenetic Super-Networks from Partial Trees, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) Volume 1(4), 2004, 151-158. [4] Joseph Felsenstein Inferring Phylogenies, Sinauer, August 2003, Ch.34.

6

AA_3 EP_4 EP_6 EP_5 EU_102

WP_1 WP_2 AS_86

EP_69

AS_84 AS_112 AS_85 AS_58 EU_115 EU_96 AS_113 EU_104

AS_92

EP_67 YO_64 EP_66 HA_62 EP_70 EP_73

YO_63

YO_106 EU_101 AS_93 EP_71 AA_100 AS_91 YO_103 EP_65 HE_53 EU_118 AS_95 EU_89 YO_107 YO_51 YO_57 EP_68HA_61 AS_90 HE_52 EU_114 EU_94 EU_116 EP_72 EU_120 HE_54 EU_117 HE_55 PN_108 EU_119 AS_98 PN_97 PN_109 PN_110 AS_122 AS_123 KU_19 AS_121 HE_56 KU_15 EU_111 EU_99 AS_126 PN_80 PN_130 PN_50 HA_83 PN_135PN_129 PN_133 AS_128 PN_125 AS_124 PN_131 HE_127 PN_134 AA_59 PN_132 YO_60 AS_74 PN_79 AS_88 AS_75

KU_22

KU_21 KU_11 KU_10 KU_13 KU_14 KU_12 KU_18 KU_20 KU_8 KU_9 EP_31 EP_30 KU_7

KU_16 KU_17

EP_32

AA_35

PN_81 YO_78 YO_77 WP_38 PN_82

HE_105

AS_87

YO_24 YO_26 YO_25

WP_46 WP_39

HE_34 AS_28 YO_29 AA_27 AS_23

AA_36

WP_40, WP_37

WP_48 WP_47

AA_33

WP_43 WP_45 WP_44 WP_41 WP_42

Figure 10: Results obtained with the optimization algorithm. a) Original splitsgraph. b) After 3 iterations of the optimization loop (approx. 30 seconds on a Pentium 4 2.6GHz). c) After 30 iterations of the optimization loop and one of the daylight algorithm to improve the edges leading to the leaves (approx. 5 minutes)

7