Reconstructing the duplication history of tandemly repeated genes Olivier Elemento (1,2), Olivier Gascuel(1), Marie-Paule Lefranc(2) (1) LIRM Montpellier, Méthodes et Algorithmes pour l’Analyse de Séquences (2) LIGM, IMGT the International ImMunoGeneTics Database, http ://imgt.cines.fr
Olivier Elemento,
[email protected]
Slide 1
1. Introduction 2. Mathematical model 3. Reconstructing duplication trees 4. Experimental results 5. Perspectives
Olivier Elemento,
[email protected]
Slide 2
1. Introduction Tandemly repeated sequences – two or more adjacent copies of a stretch of DNA
Olivier Elemento,
[email protected]
Slide 3
1. Introduction
Tandemly repeated sequences – two or more adjacent copies of a stretch of DNA – they exist in several forms : microsatellites (neurodegenerative diseases), minisatellites larger sequences (genes)
Olivier Elemento,
[email protected]
Slide 4
1. Introduction
Tandemly repeated genes Example : the human TRGV locus – it contains 9 adjacent copies of the same gene – each of them is 4-5kb long – they share between 85 and 97% of identity DNA (chromosome 7)
V1
Olivier Elemento,
[email protected]
V2
V3
V4
V5 V5P V6
V7 V8
Slide 5
1. Introduction
Recombination gene gene
Olivier Elemento,
[email protected]
Slide 6
1. Introduction
Recombination gene gene
Olivier Elemento,
[email protected]
Slide 7
1. Introduction
Recombination gene gene
Olivier Elemento,
[email protected]
Slide 8
1. Introduction
Unequal recombination (step 1) gene gene
initial duplication caused by the presence of short repeated sequences
Olivier Elemento,
[email protected]
Slide 9
1. Introduction
Unequal recombination (step 1) gene gene
initial duplication caused by the presence of short repeated sequences
Olivier Elemento,
[email protected]
Slide 10
1. Introduction
Unequal recombination (step 1) gene
gene
initial duplication caused by the presence of short repeated sequences
Olivier Elemento,
[email protected]
Slide 11
1. Introduction
Unequal recombination (step 2) copy 1
copy 2 copy 1
copy 2
the presence of several times the same copy favors additional duplications
Olivier Elemento,
[email protected]
Slide 12
1. Introduction
Unequal recombination (step 2) copy 1
copy 2 copy 1
copy 2
the presence of several times the same copy favors additional duplications
Olivier Elemento,
[email protected]
Slide 13
1. Introduction
Unequal recombination (step 2) copy 1
copy 1
copy 2
copy 2
the presence of several times the same copy favors additional duplications
Olivier Elemento,
[email protected]
Slide 14
1. Introduction
Unequal recombination (step 3) copy 1
copy 2
copy 3 copy 1
copy 2
copy 3
“block” duplication, i.e. simultaneous duplication of several copies
Olivier Elemento,
[email protected]
Slide 15
1. Introduction
Unequal recombination (step 3) copy 1
copy 2
copy 3 copy 1
copy 2
copy 3
“block” duplication, i.e. simultaneous duplication of several copies
Olivier Elemento,
[email protected]
Slide 16
1. Introduction
Unequal recombination (step 3) copy 1
copy 2
copy 1
copy 2
copy 3
copy 3
“block” duplication, i.e. simultaneous duplication of several copies
Olivier Elemento,
[email protected]
Slide 17
1. Introduction
Preliminary hypothesis – unequal recombination is the sole generating mechanism
Olivier Elemento,
[email protected]
Slide 18
1. Introduction
Preliminary hypothesis – unequal recombination is the sole generating mechanism – there was no gene conversions
Olivier Elemento,
[email protected]
Slide 19
1. Introduction
Preliminary hypothesis – unequal recombination is the sole generating mechanism – there was no gene conversions – there was “no gene deletions”
Olivier Elemento,
[email protected]
Slide 20
2. Mathematical model The duplication events 1-duplication
Olivier Elemento,
[email protected]
1
Slide 21
2. Mathematical model
The duplication events 1-duplication
1 1’
Olivier Elemento,
[email protected]
Slide 22
2. Mathematical model
The duplication events 1-duplication
1 1’
Olivier Elemento,
[email protected]
1’’
Slide 23
2. Mathematical model
The duplication events 1-duplication
2-duplication
Olivier Elemento,
[email protected]
1 1’
1’’
1
2
Slide 24
2. Mathematical model
The duplication events 1-duplication
1
2-duplication 1’
Olivier Elemento,
[email protected]
1’
1’’
1
2
2’
Slide 25
2. Mathematical model
The duplication events 1-duplication
1
2-duplication 1’
Olivier Elemento,
[email protected]
1’
1’’
1
2
2’
1’’
2’’
Slide 26
2. Mathematical model
The duplication events 1-duplication
1
2-duplication 1’ n-duplication 1
Olivier Elemento,
[email protected]
1’
1’’
1
2
2’
1’’
2
2’’ n
Slide 27
2. Mathematical model
The duplication events 1-duplication
1
2-duplication 1’ n-duplication 1 1’
Olivier Elemento,
[email protected]
2’
1’
1’’
1
2
2’
1’’
2
2’’ n
n’
Slide 28
2. Mathematical model
The duplication events 1-duplication
1
2-duplication 1’ n-duplication 1 1’
Olivier Elemento,
[email protected]
2’
1’
1’’
1
2
2’
1’’
2 n’
2’’ n
1’’
2’’
n’’
Slide 29
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 30
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 31
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 32
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 33
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 34
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 35
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 36
2. Mathematical model
Time valued duplication history (reality)
time
Olivier Elemento,
[email protected]
Slide 37
2. Mathematical model
Time valued duplication history (reality) – it implies phylogeny
a
rooted
time
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 38
2. Mathematical model
Time valued duplication history (reality) – it implies phylogeny
a
rooted
– its taxa are ordered time
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 39
2. Mathematical model
Time valued duplication history (reality) – it implies phylogeny
a
rooted
– its taxa are ordered time
– its branches are time valued
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 40
2. Mathematical model
Time valued duplication history (reality) – it implies phylogeny
a
rooted
– its taxa are ordered time
– its branches are time valued – the root is situated between the most distant taxa
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 41
2. Mathematical model
Duplication tree (what can be infered) – it is an unrooted phylogeny 9 8 7 6 1 2
3
Olivier Elemento,
[email protected]
5
4
Slide 42
2. Mathematical model
Duplication tree (what can be infered) – it is an unrooted phylogeny – its taxa are ordered
9 8 7 6 1 2
3
Olivier Elemento,
[email protected]
5
4
Slide 43
2. Mathematical model
Duplication tree (what can be infered) – it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued
9 8 7 6 1 2
3
Olivier Elemento,
[email protected]
5
4
Slide 44
2. Mathematical model
Duplication tree (what can be infered) 9 8 7 6 1 2
3
Olivier Elemento,
[email protected]
– it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued – its topology is compatible with at least one duplication history
5
4
Slide 45
2. Mathematical model
Duplication tree (what can be infered) 9 8 potential roots
7 6 1 2
3
Olivier Elemento,
[email protected]
4
5
– it is an unrooted phylogeny – its taxa are ordered – its branches are mutation rate-valued – its topology is compatible with at least one duplication history – the root is situated somewhere in the tree between the most distant taxa
Slide 46
2. Mathematical model
Ordinal duplication history – obtained when rooting a duplication tree
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 47
2. Mathematical model
Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 48
2. Mathematical model
Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 49
2. Mathematical model
Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 50
2. Mathematical model
Ordinal duplication history – obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered – its branch lengths have no special meaning
1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
Slide 51
2. Mathematical model
Ordinal duplication history
a b
c 1 2 3 4 5 6 7 8 9
Olivier Elemento,
[email protected]
– obtained when rooting a duplication tree – it is the topological version of the time valued duplication history – it is a rooted phylogeny – its taxa are ordered – its branch lengths have no special meaning – the duplication events are partially ordered
Slide 52
2. Mathematical model
Not all phylogenies are duplication trees 4 2 3
1
Olivier Elemento,
[email protected]
5
Slide 53
2. Mathematical model
Not all phylogenies are duplication trees 4 2 3
(b)
(c)
(a)
(d)
1
5
Olivier Elemento,
[email protected]
Slide 54
2. Mathematical model
Not all phylogenies are duplication trees (a) 4 2 3
(b)
(c)
(a)
(d)
1
5
1
2
3 4 5
2 and 5 are not adjacent !
Olivier Elemento,
[email protected]
Slide 55
2. Mathematical model
Not all potential roots lead to correct ordinal duplication histories (c) 5 1
2
(c)
(a) (b)
3
4
1
2
3
4
5
correct ordinal duplication history Olivier Elemento,
[email protected]
Slide 56
2. Mathematical model
Not all potential roots lead to correct ordinal duplication histories (a) 5 1
2
(c)
(a) (b)
3
4 1
2
3
4
5
incorrect ordinal duplication history Olivier Elemento,
[email protected]
Slide 57
2. Mathematical model
Definition A phylogeny is a duplication tree if, among its potential roots, at least one of them leads to a correct ordinal duplication history
Olivier Elemento,
[email protected]
Slide 58
2. Mathematical model
The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves
Olivier Elemento,
[email protected]
Slide 59
2. Mathematical model
The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves – it recursively agglomerates each terminal pair belonging to correct duplication events
Olivier Elemento,
[email protected]
Slide 60
2. Mathematical model
The PDT algorithm – it takes as input a rooted phylogeny with ordered leaves – it recursively agglomerates each terminal pair belonging to correct duplication events – it stops and returns : (true) when the tree has been reduced to its root (false) when it cannot go further Olivier Elemento,
[email protected]
Slide 61
2. Mathematical model
The PDT algorithm – we apply the PDT algorithm to each potential root of the considered phylogeny
Olivier Elemento,
[email protected]
Slide 62
2. Mathematical model
The PDT algorithm – we apply the PDT algorithm to each potential root of the considered phylogeny – if PDT return “true” at least once, the phylogeny is a duplication tree
Olivier Elemento,
[email protected]
Slide 63
2. Mathematical model
The PDT algorithm a
b c
d
f
g
e 1
Olivier Elemento,
[email protected]
h 2
3
4
5
6
7
8
9
Slide 64
2. Mathematical model
The PDT algorithm a
b c
d
f
g
e 1
Olivier Elemento,
[email protected]
h 2
3
4
5
6
7
8
9
Slide 65
2. Mathematical model
The PDT algorithm a
b c
f
g
d e
Olivier Elemento,
[email protected]
h 3
4
5
6
7
8
9
Slide 66
2. Mathematical model
The PDT algorithm a
b c
f
g
d e
Olivier Elemento,
[email protected]
h 3
4
5
6
7
8
9
Slide 67
2. Mathematical model
The PDT algorithm a b c f
d
Olivier Elemento,
[email protected]
4
g
5
6
7
h
Slide 68
2. Mathematical model
The PDT algorithm a b c f
d
Olivier Elemento,
[email protected]
4
g
5
6
7
h
Slide 69
2. Mathematical model
The PDT algorithm a b c d
Olivier Elemento,
[email protected]
f
g
h
Slide 70
2. Mathematical model
The PDT algorithm a b c d
Olivier Elemento,
[email protected]
f
g
h
Slide 71
2. Mathematical model
The PDT algorithm a b c
Olivier Elemento,
[email protected]
g
h
Slide 72
2. Mathematical model
The PDT algorithm a b c
Olivier Elemento,
[email protected]
g
h
Slide 73
2. Mathematical model
The PDT algorithm a b
Olivier Elemento,
[email protected]
h
Slide 74
2. Mathematical model
The PDT algorithm a b
Olivier Elemento,
[email protected]
h
Slide 75
2. Mathematical model
The PDT algorithm a
true!
Olivier Elemento,
[email protected]
Slide 76
2. Mathematical model
The PDT algorithm a
b c g
d
f
e 1
Olivier Elemento,
[email protected]
h 2
3
4
5
6
7
8
9
Slide 77
2. Mathematical model
The PDT algorithm a
b c g
d
f
e 1
Olivier Elemento,
[email protected]
h 2
3
4
5
6
7
8
9
Slide 78
2. Mathematical model
The PDT algorithm a
b g
c
f d e
Olivier Elemento,
[email protected]
h 3
4
5
6
7
8
9
Slide 79
2. Mathematical model
The PDT algorithm a
b g
c
f d e
Olivier Elemento,
[email protected]
h 3
4
5
6
7
8
9
Slide 80
2. Mathematical model
The PDT algorithm a
g
b c f
h d
Olivier Elemento,
[email protected]
4
5
6
7
8
9
Slide 81
2. Mathematical model
The PDT algorithm a
g
b c f
h d
Olivier Elemento,
[email protected]
4
5
6
7
8
9
Slide 82
2. Mathematical model
The PDT algorithm a b c
g f
d
Olivier Elemento,
[email protected]
4
5
6
7
h
Slide 83
2. Mathematical model
The PDT algorithm a b c
g f
d
4
5
6
7
h
false! 7 is between 6 and h
Olivier Elemento,
[email protected]
Slide 84
2. Mathematical model
Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees
Olivier Elemento,
[email protected]
Slide 85
2. Mathematical model
Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees – the number of duplication trees is largely inferior to the number of distinct phylogenies
Olivier Elemento,
[email protected]
Slide 86
2. Mathematical model
Counting duplication trees – we used PDT to count (or estimate) the number of duplication trees – the number of duplication trees is largely inferior to the number of distinct phylogenies – the number of phylogenies expands approximately faster than the number of duplication trees
Olivier Elemento,
[email protected]
Slide 87
3. Reconstructing duplication trees
Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences
Olivier Elemento,
[email protected]
Slide 88
3. Reconstructing duplication trees
Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences – we use an exhaustive search approach
Olivier Elemento,
[email protected]
Slide 89
3. Reconstructing duplication trees
Reconstructing duplication trees – the goal is to recontruct the optimal duplication tree(s) from a given set of aligned and ordered DNA sequences – we use an exhaustive search approach – we assess the optimality of the reconstruction using a parcimony criterion
Olivier Elemento,
[email protected]
Slide 90
3. Reconstructing duplication trees
Exhaustive approach – we generate every possible duplication tree, using a simulation of the duplication process
Olivier Elemento,
[email protected]
Slide 91
3. Reconstructing duplication trees
Olivier Elemento,
[email protected]
Slide 92
3. Reconstructing duplication trees
1-d
Olivier Elemento,
[email protected]
Slide 93
3. Reconstructing duplication trees
1-d
Olivier Elemento,
[email protected]
Slide 94
3. Reconstructing duplication trees
1-d
Olivier Elemento,
[email protected]
Slide 95
3. Reconstructing duplication trees
1-d
Olivier Elemento,
[email protected]
Slide 96
3. Reconstructing duplication trees
1-d
Olivier Elemento,
[email protected]
Slide 97
3. Reconstructing duplication trees
2-d
Olivier Elemento,
[email protected]
Slide 98
3. Reconstructing duplication trees
Olivier Elemento,
[email protected]
Slide 99
3. Reconstructing duplication trees
Exhaustive approach – we generate every possible duplication tree, using a simulation of the duplication process – we select the trees that minimize the parcimony criterion
Olivier Elemento,
[email protected]
Slide 100
4. Experimental results Experimental results – we applied this reconstruction procedure to the TRGV locus
Olivier Elemento,
[email protected]
Slide 101
4. Experimental results
Experimental results – we applied this reconstruction procedure to the TRGV locus – only 1 duplication tree is found by exhaustive search
Olivier Elemento,
[email protected]
Slide 102
4. Experimental results
First validation – this duplication tree is identical to the most parcimonious phylogeny, reconstructed from the same data, but without restriction to duplication trees
Olivier Elemento,
[email protected]
Slide 103
4. Experimental results
First validation – this duplication tree is identical to the most parcimonious phylogeny, reconstructed from the same data, but without restriction to duplication trees – the probability of a phylogeny to be a duplication tree is less than 0.04 for 9 taxa
Olivier Elemento,
[email protected]
Slide 104
4. Experimental results
First validation phylogenies
duplication trees
Olivier Elemento,
[email protected]
Slide 105
4. Experimental results
Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup
Olivier Elemento,
[email protected]
Slide 106
4. Experimental results
Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup – both methods root the tree at the same branch
Olivier Elemento,
[email protected]
Slide 107
4. Experimental results
Second validation – we root the duplication tree using both the molecular clock hypothesis on functional genes and an outgroup – both methods root the tree at the same branch – the root belongs to the “potential roots”
Olivier Elemento,
[email protected]
Slide 108
4. Experimental results
Second validation V5 V3 V5P
V6
V4 V7 V2 V1 V8
Olivier Elemento,
[email protected]
Slide 109
4. Experimental results
Second validation V5 V3 V5P
V6
V4 V7 V2 V1 V8
Olivier Elemento,
[email protected]
Slide 110
4. Experimental results
Second validation V5 V3 V5P
V6
V4 V7 V2 V1 V1 V2 V3 V4 V5 V5P V6 V7 V8 V8
Olivier Elemento,
[email protected]
Slide 111
4. Experimental results
Third validation – the ordinal duplication history is in agreement with a polymorphism that exists for this locus
Olivier Elemento,
[email protected]
Slide 112
4. Experimental results
Polymorphism for the TRGV locus
V1 V2 V3 V4 V5 V5P V6 V7 V8
Olivier Elemento,
[email protected]
Slide 113
4. Experimental results
Polymorphism for the TRGV locus
V1 V2 V3 V4 V5 V5P V6 V7 V8
Olivier Elemento,
[email protected]
Slide 114
4. Experimental results
Polymorphism for the TRGV locus
Olivier Elemento,
[email protected]
Slide 115
4. Experimental results
Conclusion – these results validate the duplication model
Olivier Elemento,
[email protected]
Slide 116
4. Experimental results
Conclusion – these results validate the duplication model – they show that our reconstruction procedure can provide a valid solution
Olivier Elemento,
[email protected]
Slide 117
4. Experimental results
Conclusion – these results validate the duplication model – they show that our reconstruction procedure can provide a valid solution – they are robust to gene deletions (most duplications are 1-duplications)
Olivier Elemento,
[email protected]
Slide 118
5. Perspectives Perspectives – development of a fast heuristics to improve the reconstruction speed
Olivier Elemento,
[email protected]
Slide 119
5. Perspectives
Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ...
Olivier Elemento,
[email protected]
Slide 120
5. Perspectives
Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?)
Olivier Elemento,
[email protected]
Slide 121
5. Perspectives
Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?) – applying our methods and algorithms to other datasets
Olivier Elemento,
[email protected]
Slide 122
5. Perspectives
Perspectives – development of a fast heuristics to improve the reconstruction speed – comparison with other criteria such as minimum evolution, ... – better mathematical characterisation of duplication trees (for example, can we enumerate them ?) – applying our methods and algorithms to other datasets – more complex duplication models Olivier Elemento,
[email protected]
Slide 123