Phylogenetic Reconstruction Yves Desdevises Université Pierre et Marie Curie (Paris 6) Observatoire Océanologique de Banyuls France
[email protected] http://desdevises.free.fr http://desdevises.free.fr/Phylogenetic_reconstruction
1
References • Felsenstein J. 2004. Inferring phylogenies. Sinauer. • Lemey P., Salemi M. et Vandamme A.-M. 2009. The
phylogenetic handbook. Second Edition. Cambridge University Press.
• Hall B. 2007. Phylogenetic trees made easy. Third Edition. Sinauer.
• Page R. & Holmes E. 1998. Molecular evolution: a phylogenetic approach. Blackwell.
• Nei M. & Kumar S. 2000. Molecular Evolution and Phylogenetics. Oxford University Press.
2
• Goal: propose a hypothesis of relationships between several taxa
• Phylogeny = tree (≠ ladder) • Speciation: binary • Based on homology: similarity from a common ancestor
• Indicates the existence of a common ancestor • Identified from a phylogenetic tree, and basis to build it!
3
Labrus viridis Cheilinus trilobatus Cheilinus chlorourus
Stetojulis albovittata Stetojulis bandanensis
Halichoeres margaritace us albovittata Stetojulis bandanensis Stetojulis rus lorou nus ch Cheili Ch eil in us tril ob a Labrus merula viridis tus
Labropsis australis Halichoeres marginatus
Labroides dimidiatus Labrichthys unilineatus Coris julis Hemigymnus melapterus Hemigymnus fasciatus Thalassoma bifasciatum Thalassoma lunare
Notolabrus tetricus Bodianus rufus Clepticus parrae Pagrus major Symphodus roissali
Symphodus roissali Symphodus cinereus
Symphodus tinca
Symphodus tinca
Symphodus ocellatus
Symphodus ocellatus
Symphodus mediterraneus
Symphodus mediterraneus
Ctenolabrus rupestris
Ctenolabrus rupestris Labrus merula
Labrus viridis
Labrus viridis
Cheilinus chlorourus Epibulus incidiator
Cheilinus trilobatus Cheilinus chlorourus Epibulus incidiator
Stetojulis albovittata
Stetojulis albovittata
Stetojulis bandanensis
Stetojulis bandanensis
Halichoeres hortulanus
Halichoeres hortulanus
Halichoeres margaritaceus
Halichoeres margaritaceus
Labropsis australis
Labropsis australis
Halichoeres marginatus
Halichoeres marginatus
Anampses geographicus
Anampses geographicus
Anampses caeruleopunctatus
Anampses caeruleopunctatus Labroides dimidiatus
Labroides dimidiatus
Labrichthys unilineatus
Labrichthys unilineatus Coris julis
Coris julis Hemigymnus melapterus
Hemigymnus melapterus Hemigymnus fasciatus
Hemigymnus fasciatus
Thalassoma bifasciatum
Thalassoma bifasciatum
Thalassoma lunare
Thalassoma lunare
Thalassoma lutescens Pictilabrus laticlavius Notolabrus tetricus Bodianus rufus
Sympho dus cin ereus Sym phod Sy us tin mp ca Sy ho m du ph so ce od ll us atu s m ed ite rra ne us
Symphodus melanocercus
Labrus merula
Cheilinus trilobatus
Labrus viridis
Thalassoma lutescens Pictilabrus laticlavius
lis Labropsis austra ceus rgarita us es ma lan hoer Halic is ortu ns sh re ne oe da lich an Ha sb juli to Ste
Symphodus cinereus
Symphodus melanocercus
Symphodus roissali
Anampses geographicus Anampses caeruleopunctatus
us tric te s bru fus la to s ru No ianu d rrae Bo us pa ptic Cle major Pagrus
s rcu ce no ela sm ris du ho pest s ru mp bru Sy nola Cte a s merul Labru
stris rupe
Sym ph od us oce lla tus
r to ia cid in
us tinca Symphod
SSyy mmp phh oodd uuss cro inis ere sa ulis
Halichoeres hortulanus Halichoeres margaritaceus
La bro ide sd im cae idia rule opu tus Anam nct atu pses s geog raph icus Halichoeres margin atus
An am pse s
Thalassoma bifasciatum
Ctenolabrus rupestris Labrus merula
Epibulus incidiator
brus nola Cte
Pa gru sm ajo r
Symphodus melanocercus
Ste to juli sa Ep lbo ibu vit lus ta inc Chei idia ta linus tor chlo rour us Cheilinus trilobatus
La
Symphodus ocellatus Symphodus mediterraneus
Th
br An am oide s di pse HLab mid alic ropsis aus s ca iatu tralis ho eru s ere leo sm pu nct arg atu ina s tus
Symphodus cinereus Symphodus tinca
s ulu ib Ep
s rcueus ocean ditnerr s meela hodu s m Symp odu ph Sym
nus fasciatus Hemigym rus apte mel julis ris s Co tu ea ilin un ys th ch bri La
Symphodus roissali
s s nu icu ula ph ort gra eo sh sg ere pse ho am lic An Ha
fus s ru ianu Bod
nus igym Hem
unilineatus Labrichthys
Th TH Cor ala haem is ju ss lasig lis om soym nu a b ma s fa ifa lute sciatu s Hemigymnusscmelapterus iatuscen m s Pic tilabr us are la maticlun lavi sso us Thala Cle ptic tetricus Notolabrus us pa rra e
alasso ma lun Tha are lass Pic om a lu tila tesc bru ens s la tic lav ius
Phylogenetic trees
Thalassoma lutescens Pictilabrus laticlavius Notolabrus tetricus Bodianus rufus
Clepticus parrae
Clepticus parrae
Pagrus major
Pagrus major
4
• Cladogram • No branch lengths • Clades • Phylogram • Branch lengths Ultrametric tree
Additive tree
5
Leafs = terminal taxa
Clade
Terminal branches A
B
C D
E
F
G
H
I
J
Polytomy Internal branches
Node Root
6
• Speciation
7
Hypothesis
A
B
C
8
Rooting
• Gives the branching order • Use of an outgroup • Rest = ingroup Rooted tree outgroup
Non rooted tree
Add an outgroup
9
• Outgroup: sister taxa from ingroup • Shared characters between outgroup and ingroup = ancestral characters
• Sometimes no outgroup: rooting at equal distance from tree tips (need branch lengths) = midpoint rooting B
A
C
D
F
E
B C
E
A D
F
10
• Groups • Monophyletic (clade): natural group
• Mammals • Paraphyletic • Reptiles • Polyphyletic • Algae, protozoans
11
Characters • Organisms are composed of different features • These features are different among taxa: Character states
• All character states form a character • These states are produced by heritable changes • Phylogenetic inference is performed from differences between character states
12
• We want to establish the ancestor-descendant link from the presence/absence of character states
• We look for the appearance of new character states in descendants
• The different character states are homologies • Taxa sharing this new character state (derived) form clades
• Example: hair in mammals • Characters can be differentially weighted 13
• Homology
14
15
• Homoplasy
16
• Ancestral characters: plesiomorphies • Shared ancestral characters: symplesiomorphies • Derived characters: apomorphies • Shared derived characters: synapomorphies • Ideally, identify clades • Non shared derived characters = particular to a given taxon: autapomorphies
17
18
Homology
• Homologies are supposed to show similarities in: • position • structure • development • A recognized criterion to support homology is the congruence with other characters
19
Dog
Lizard
Frog
Human Change HAIR Absents Presents
20
Homoplasy
• Non homologous similarities • Results from independent evolution • Convergence • Parallelism • Reversion • Blurs phylogenetic signal: may lead to false evolutionary relationships
21
Parallelism Convergence
Reversion
22
Lizard
Human TAIL
Frog
Dog
Human
Dog
Absent Present
TAIL
Frog
Lizard
Absent Present
23
• Without homoplasy, phylogenetic inference would be easy
• Main problem of phylogenetic recontruction: discriminate homoplasy (noise) from homology (signal)
• Data quality (“good” phylogenetic signal) is more important than method used
24
• If there is only one correct tree, when characters
support different trees, at least one contains homoplasies Dog Lizard HAIR Absent Present Frog Human
Human
Dog TAIL
Frog
Lizard
Absent Present
25
Congruence
• The chosen tree is the tree maximising the number of congruent characters
MAMMALS Dog HAIR MILK ... Human
Lizard
Frog Changes
26
Case of molecular data
• Homoplasy is more common with molecular than morphological data
• Few states (4 for DNA: A G C T) • Chemically close • Evolutionary rates can be high • No identification of homoplasy via structure or development
27
Data • Fossils: rare • Morphological characters • Molecular character: DNA, proteins, ... • By far the most used now: models, numerous characters, less subjective, ...
• But... phylogeny of the DNA fragment (≠ taxa) • Future: genomes ➙ phylogenomics • Others (behaviour, hosts, habitat, ...)
28
Morphological data
• Homology uneasy to identify • Characters often not numerous: problem when
studying many taxa, especially if they are closely related
• Some subjective decisions • Evolutionary processes poorly known: limit method choice
• Require coding • Sometimes difficult • Hypotheses on character evolution
29
Coding
• Binary: Presence/absence = 0/1 • Multiple states (ordered or not): definition of step numbers between states
• Additive binary coding: e.g. 00, 01, 10, 11 • Linear coding: e.g. 0, 1, 2 • Both can be combined 30
31
Molecular data
• Nucleotides ou amino acids (for ancient divergences) • Characters = base (or AA) positions • Character states = bases (ou AA) identity • Important step: alignment • Sometimes manual • Automated methods: manual editing required • No test: no null hypothesis • Can use information on secondary structure or coding nature
32
• Nucleotides: only 4 states (in 2 types) • Evolution can be modelled • Homoplasy “easy”
33
• Amino acids • 20 states • 5 categories • Evolution much
more difficult to model
• Codons • 61 states! 34
• Gene tree ≠ species tree • Genes: orthologous or paralogous Paralogs Orthologs
b* c
a
Orthologs
C* B
A*
b* C*
A*
Duplication
Tree Ancestral gene
35
Alignment