Reconstructing the duplication history of tandemly repeated ... - Root

Page 79. 2. Mathematical model. The PDT algorithm a b c d. 3 4 5 6 7 8 9 e f g h. Olivier Elemento, [email protected]. Slide 79 ...
905KB taille 14 téléchargements 202 vues
Reconstructing the duplication history of tandemly repeated genes Olivier Elemento (1,2), Olivier Gascuel(1), Marie-Paule Lefranc(2) (1) LIRM Montpellier, Méthodes et Algorithmes pour l’Analyse de Séquences (2) LIGM, IMGT the International ImMunoGeneTics Database, http ://imgt.cines.fr

Olivier Elemento, [email protected]

Slide 1

1. Introduction 2. Mathematical model 3. Reconstructing duplication trees 4. Experimental results 5. Perspectives

Olivier Elemento, [email protected]

Slide 2

1. Introduction Tandemly repeated sequences – two or more adjacent copies of a stretch of DNA

Olivier Elemento, [email protected]

Slide 3

1. Introduction

Tandemly repeated sequences – two or more adjacent copies of a stretch of DNA – they exist in several forms : microsatellites (neurodegenerative diseases), minisatellites larger sequences (genes)

Olivier Elemento, [email protected]

Slide 4

1. Introduction

Tandemly repeated genes Example : the human TRGV locus – it contains 9 adjacent copies of the same gene – each of them is 4-5kb long – they share between 85 and 97% of identity DNA (chromosome 7)

V1

Olivier Elemento, [email protected]

V2

V3

V4

V5 V5P V6

V7 V8

Slide 5

1. Introduction

Recombination gene gene

Olivier Elemento, [email protected]

Slide 6

1. Introduction

Recombination gene gene

Olivier Elemento, [email protected]

Slide 7

1. Introduction

Recombination gene gene

Olivier Elemento, [email protected]

Slide 8

1. Introduction

Unequal recombination (step 1) gene gene

initial duplication caused by the presence of short repeated sequences

Olivier Elemento, [email protected]

Slide 9

1. Introduction

Unequal recombination (step 1) gene gene

initial duplication caused by the presence of short repeated sequences

Olivier Elemento, [email protected]

Slide 10

1. Introduction

Unequal recombination (step 1) gene

gene

initial duplication caused by the presence of short repeated sequences

Olivier Elemento, [email protected]

Slide 11

1. Introduction

Unequal recombination (step 2) copy 1

copy 2 copy 1

copy 2

the presence of several times the same copy favors additional duplications

Olivier Elemento, [email protected]

Slide 12

1. Introduction

Unequal recombination (step 2) copy 1

copy 2 copy 1

copy 2

the presence of several times the same copy favors additional duplications

Olivier Elemento, [email protected]

Slide 13

1. Introduction

Unequal recombination (step 2) copy 1

copy 1

copy 2

copy 2

the presence of several times the same copy favors additional duplications

Olivier Elemento, [email protected]

Slide 14

1. Introduction

Unequal recombination (step 3) copy 1

copy 2

copy 3 copy 1

copy 2

copy 3

“block” duplication, i.e. simultaneous duplication of several copies

Olivier Elemento, [email protected]

Slide 15

1. Introduction

Unequal recombination (step 3) copy 1

copy 2

copy 3 copy 1

copy 2

copy 3

“block” duplication, i.e. simultaneous duplication of several copies

Olivier Elemento, [email protected]

Slide 16

1. Introduction

Unequal recombination (step 3) copy 1

copy 2

copy 1

copy 2

copy 3

copy 3

“block” duplication, i.e. simultaneous duplication of several copies

Olivier Elemento, [email protected]

Slide 17

1. Introduction

Preliminary hypothesis – unequal recombination is the sole generating mechanism

Olivier Elemento, [email protected]

Slide 18

1. Introduction

Preliminary hypothesis – unequal recombination is the sole generating mechanism – there was no gene conversions

Olivier Elemento, [email protected]

Slide 19

1. Introduction

Preliminary hypothesis – unequal recombination is the sole generating mechanism – there was no gene conversions – there was “no gene deletions”

Olivier Elemento, [email protected]

Slide 20

2. Mathematical model The duplication events 1-duplication

Olivier Elemento, [email protected]

1

Slide 21

2. Mathematical model

The duplication events 1-duplication

1 1’

Olivier Elemento, [email protected]

Slide 22

2. Mathematical model

The duplication events 1-duplication

1 1’

Olivier Elemento, [email protected]

1’’

Slide 23

2. Mathematical model

The duplication events 1-duplication

2-duplication

Olivier Elemento, [email protected]

1 1’

1’’

1

2

Slide 24

2. Mathematical model

The duplication events 1-duplication

1

2-duplication 1’

Olivier Elemento, [email protected]

1’

1’’

1

2

2’

Slide 25

2. Mathematical model

The duplication events 1-duplication

1

2-duplication 1’

Olivier Elemento, [email protected]

1’

1’’

1

2

2’

1’’

2’’

Slide 26

2. Mathematical model

The duplication events 1-duplication

1

2-duplication 1’ n-duplication 1

Olivier Elemento, [email protected]

1’

1’’

1

2

2’

1’’

2

2’’ n

Slide 27

2. Mathematical model

The duplication events 1-duplication

1

2-duplication 1’ n-duplication 1 1’

Olivier Elemento, [email protected]

2’

1’

1’’

1

2

2’

1’’

2

2’’ n

n’

Slide 28

2. Mathematical model

The duplication events 1-duplication

1

2-duplication 1’ n-duplication 1 1’

Olivier Elemento, [email protected]

2’

1’

1’’

1

2

2’

1’’

2 n’

2’’ n

1’’

2’’

n’’

Slide 29

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 30

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 31

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 32

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 33

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 34

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 35

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 36

2. Mathematical model

Time valued duplication history (reality)

time

Olivier Elemento, [email protected]

Slide 37

2. Mathematical model

Time valued duplication history (reality) – it implies phylogeny

a

rooted

time

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 38

2. Mathematical model

Time valued duplication history (reality) – it implies phylogeny

a

rooted

– its taxa are ordered time

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 39

2. Mathematical model

Time valued duplication history (reality) – it implies phylogeny

a

rooted

– its taxa are ordered time

– its branches are time valued

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 40

2. Mathematical model

Time valued duplication history (reality) – it implies phylogeny

a

rooted

– its taxa are ordered time

– its branches are time valued – the root is situated between the most distant taxa

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 41

2. Mathematical model

Duplication tree (what can be infered) – it is an unrooted phylogeny 9 8 7 6 1 2

3

Olivier Elemento, [email protected]

5

4

Slide 42

2. Mathematical model

Duplication tree (what can be infered) – it is an unrooted phylogeny – its taxa are ordered

9 8 7 6 1 2

3

Olivier Elemento, [email protected]

5

4

Slide 43

2. Mathematical model

Duplication tree (what can be infered) – it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued

9 8 7 6 1 2

3

Olivier Elemento, [email protected]

5

4

Slide 44

2. Mathematical model

Duplication tree (what can be infered) 9 8 7 6 1 2

3

Olivier Elemento, [email protected]

– it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued – its topology is compatible with at least one duplication history

5

4

Slide 45

2. Mathematical model

Duplication tree (what can be infered) 9 8 potential roots

7 6 1 2

3

Olivier Elemento, [email protected]

4

5

– it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued – its topology is compatible with at least one duplication history – the root is situated somewhere in the tree between the most distant taxa

Slide 46

2. Mathematical model

Ordinal duplication history – obtained when rooting a duplication tree

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 47

2. Mathematical model

Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 48

2. Mathematical model

Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 49

2. Mathematical model

Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 50

2. Mathematical model

Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered – its branch lengths have no special meaning

1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

Slide 51

2. Mathematical model

Ordinal duplication history

a b

c 1 2 3 4 5 6 7 8 9

Olivier Elemento, [email protected]

– obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered – its branch lengths have no special meaning – the duplication events are partially ordered

Slide 52

2. Mathematical model

Not all phylogenies are duplication trees 4 2 3

1

Olivier Elemento, [email protected]

5

Slide 53

2. Mathematical model

Not all phylogenies are duplication trees 4 2 3

(b)

(c)

(a)

(d)

1

5

Olivier Elemento, [email protected]

Slide 54

2. Mathematical model

Not all phylogenies are duplication trees (a) 4 2 3

(b)

(c)

(a)

(d)

1

5

1

2

3 4 5

2 and 5 are not adjacent !

Olivier Elemento, [email protected]

Slide 55

2. Mathematical model

Not all potential roots lead to correct ordinal duplication histories (c) 5 1

2

(c)

(a) (b)

3

4

1

2

3

4

5

correct ordinal duplication history Olivier Elemento, [email protected]

Slide 56

2. Mathematical model

Not all potential roots lead to correct ordinal duplication histories (a) 5 1

2

(c)

(a) (b)

3

4 1

2

3

4

5

incorrect ordinal duplication history Olivier Elemento, [email protected]

Slide 57

2. Mathematical model

Definition A phylogeny is a duplication tree if, among its potential roots, at least one of them leads to a correct ordinal duplication history

Olivier Elemento, [email protected]

Slide 58

2. Mathematical model

The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves

Olivier Elemento, [email protected]

Slide 59

2. Mathematical model

The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves – it recursively agglomerates each terminal pair belonging to correct duplication events

Olivier Elemento, [email protected]

Slide 60

2. Mathematical model

The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves – it recursively agglomerates each terminal pair belonging to correct duplication events – it stops and returns : (true) when the tree has been reduced to its root (false) when it cannot go further Olivier Elemento, [email protected]

Slide 61

2. Mathematical model

The PDT algorithm – we apply the PDT algorithm to each potential root of the considered phylogeny

Olivier Elemento, [email protected]

Slide 62

2. Mathematical model

The PDT algorithm – we apply the PDT algorithm to each potential root of the considered phylogeny – if PDT return “true” at least once, the phylogeny is a duplication tree

Olivier Elemento, [email protected]

Slide 63

2. Mathematical model

The PDT algorithm a

b c

d

f

g

e 1

Olivier Elemento, [email protected]

h 2

3

4

5

6

7

8

9

Slide 64

2. Mathematical model

The PDT algorithm a

b c

d

f

g

e 1

Olivier Elemento, [email protected]

h 2

3

4

5

6

7

8

9

Slide 65

2. Mathematical model

The PDT algorithm a

b c

f

g

d e

Olivier Elemento, [email protected]

h 3

4

5

6

7

8

9

Slide 66

2. Mathematical model

The PDT algorithm a

b c

f

g

d e

Olivier Elemento, [email protected]

h 3

4

5

6

7

8

9

Slide 67

2. Mathematical model

The PDT algorithm a b c f

d

Olivier Elemento, [email protected]

4

g

5

6

7

h

Slide 68

2. Mathematical model

The PDT algorithm a b c f

d

Olivier Elemento, [email protected]

4

g

5

6

7

h

Slide 69

2. Mathematical model

The PDT algorithm a b c d

Olivier Elemento, [email protected]

f

g

h

Slide 70

2. Mathematical model

The PDT algorithm a b c d

Olivier Elemento, [email protected]

f

g

h

Slide 71

2. Mathematical model

The PDT algorithm a b c

Olivier Elemento, [email protected]

g

h

Slide 72

2. Mathematical model

The PDT algorithm a b c

Olivier Elemento, [email protected]

g

h

Slide 73

2. Mathematical model

The PDT algorithm a b

Olivier Elemento, [email protected]

h

Slide 74

2. Mathematical model

The PDT algorithm a b

Olivier Elemento, [email protected]

h

Slide 75

2. Mathematical model

The PDT algorithm a

true!

Olivier Elemento, [email protected]

Slide 76

2. Mathematical model

The PDT algorithm a

b c g

d

f

e 1

Olivier Elemento, [email protected]

h 2

3

4

5

6

7

8

9

Slide 77

2. Mathematical model

The PDT algorithm a

b c g

d

f

e 1

Olivier Elemento, [email protected]

h 2

3

4

5

6

7

8

9

Slide 78

2. Mathematical model

The PDT algorithm a

b g

c

f d e

Olivier Elemento, [email protected]

h 3

4

5

6

7

8

9

Slide 79

2. Mathematical model

The PDT algorithm a

b g

c

f d e

Olivier Elemento, [email protected]

h 3

4

5

6

7

8

9

Slide 80

2. Mathematical model

The PDT algorithm a

g

b c f

h d

Olivier Elemento, [email protected]

4

5

6

7

8

9

Slide 81

2. Mathematical model

The PDT algorithm a

g

b c f

h d

Olivier Elemento, [email protected]

4

5

6

7

8

9

Slide 82

2. Mathematical model

The PDT algorithm a b c

g f

d

Olivier Elemento, [email protected]

4

5

6

7

h

Slide 83

2. Mathematical model

The PDT algorithm a b c

g f

d

4

5

6

7

h

false! 7 is between 6 and h

Olivier Elemento, [email protected]

Slide 84

2. Mathematical model

Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees

Olivier Elemento, [email protected]

Slide 85

2. Mathematical model

Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees – the number of duplication trees is largely inferior to the number of distinct phylogenies

Olivier Elemento, [email protected]

Slide 86

2. Mathematical model

Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees – the number of duplication trees is largely inferior to the number of distinct phylogenies – the number of phylogenies expands approximately faster than the number of duplication trees

Olivier Elemento, [email protected]

Slide 87

3. Reconstructing duplication trees

Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences

Olivier Elemento, [email protected]

Slide 88

3. Reconstructing duplication trees

Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences – we use an exhaustive search approach

Olivier Elemento, [email protected]

Slide 89

3. Reconstructing duplication trees

Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences – we use an exhaustive search approach – we assess the optimality of the reconstruction using a parcimony criterion

Olivier Elemento, [email protected]

Slide 90

3. Reconstructing duplication trees

Exhaustive approach – we generate every possible duplication tree, using a simulation of the duplication process

Olivier Elemento, [email protected]

Slide 91

3. Reconstructing duplication trees

Olivier Elemento, [email protected]

Slide 92

3. Reconstructing duplication trees

1-d

Olivier Elemento, [email protected]

Slide 93

3. Reconstructing duplication trees

1-d

Olivier Elemento, [email protected]

Slide 94

3. Reconstructing duplication trees

1-d

Olivier Elemento, [email protected]

Slide 95

3. Reconstructing duplication trees

1-d

Olivier Elemento, [email protected]

Slide 96

3. Reconstructing duplication trees

1-d

Olivier Elemento, [email protected]

Slide 97

3. Reconstructing duplication trees

2-d

Olivier Elemento, [email protected]

Slide 98

3. Reconstructing duplication trees

Olivier Elemento, [email protected]

Slide 99

3. Reconstructing duplication trees

Exhaustive approach – we generate every possible duplication tree, using a simulation of the duplication process – we select the trees that minimize the parcimony criterion

Olivier Elemento, [email protected]

Slide 100

4. Experimental results Experimental results – we applied this reconstruction procedure to the TRGV locus

Olivier Elemento, [email protected]

Slide 101

4. Experimental results

Experimental results – we applied this reconstruction procedure to the TRGV locus – only 1 duplication tree is found by exhaustive search

Olivier Elemento, [email protected]

Slide 102

4. Experimental results

First validation – this duplication tree is identical to the most parcimonious phylogeny, reconstructed from the same data, but without restriction to duplication trees

Olivier Elemento, [email protected]

Slide 103

4. Experimental results

First validation – this duplication tree is identical to the most parcimonious phylogeny, reconstructed from the same data, but without restriction to duplication trees – the probability of a phylogeny to be a duplication tree is less than 0.04 for 9 taxa

Olivier Elemento, [email protected]

Slide 104

4. Experimental results

First validation phylogenies

duplication trees

Olivier Elemento, [email protected]

Slide 105

4. Experimental results

Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup

Olivier Elemento, [email protected]

Slide 106

4. Experimental results

Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup – both methods root the tree at the same branch

Olivier Elemento, [email protected]

Slide 107

4. Experimental results

Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup – both methods root the tree at the same branch – the root belongs to the “potential roots”

Olivier Elemento, [email protected]

Slide 108

4. Experimental results

Second validation V5 V3 V5P

V6

V4 V7 V2 V1 V8

Olivier Elemento, [email protected]

Slide 109

4. Experimental results

Second validation V5 V3 V5P

V6

V4 V7 V2 V1 V8

Olivier Elemento, [email protected]

Slide 110

4. Experimental results

Second validation V5 V3 V5P

V6

V4 V7 V2 V1 V1 V2 V3 V4 V5 V5P V6 V7 V8 V8

Olivier Elemento, [email protected]

Slide 111

4. Experimental results

Third validation – the ordinal duplication history is in agreement with a polymorphism that exists for this locus

Olivier Elemento, [email protected]

Slide 112

4. Experimental results

Polymorphism for the TRGV locus

V1 V2 V3 V4 V5 V5P V6 V7 V8

Olivier Elemento, [email protected]

Slide 113

4. Experimental results

Polymorphism for the TRGV locus

V1 V2 V3 V4 V5 V5P V6 V7 V8

Olivier Elemento, [email protected]

Slide 114

4. Experimental results

Polymorphism for the TRGV locus

Olivier Elemento, [email protected]

Slide 115

4. Experimental results

Conclusion – these results validate the duplication model

Olivier Elemento, [email protected]

Slide 116

4. Experimental results

Conclusion – these results validate the duplication model – they show that our reconstruction procedure can provide a valid solution

Olivier Elemento, [email protected]

Slide 117

4. Experimental results

Conclusion – these results validate the duplication model – they show that our reconstruction procedure can provide a valid solution – they are robust to gene deletions (most duplications are 1-duplications)

Olivier Elemento, [email protected]

Slide 118

5. Perspectives Perspectives – development of a fast heuristics to improve the reconstruction speed

Olivier Elemento, [email protected]

Slide 119

5. Perspectives

Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ...

Olivier Elemento, [email protected]

Slide 120

5. Perspectives

Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?)

Olivier Elemento, [email protected]

Slide 121

5. Perspectives

Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?) – applying our methods and algorithms to other datasets

Olivier Elemento, [email protected]

Slide 122

5. Perspectives

Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?) – applying our methods and algorithms to other datasets – more complex duplication models Olivier Elemento, [email protected]

Slide 123