Phylogenetic Reconstruction Yves Desdevises Université Pierre et Marie Curie (Paris 6) Observatoire Océanologique de Banyuls France
[email protected] http://desdevises.free.fr http://desdevises.free.fr/Phylogenetic_reconstruction
References • Felsenstein J. 2004. Inferring phylogenies. Sinauer. • Lemey P., Salemi M. et Vandamme A.-M. 2009. The
phylogenetic handbook. Second Edition. Cambridge University Press.
• Hall B. 2007. Phylogenetic trees made easy. Third Edition. Sinauer.
• Page R. & Holmes E. 1998. Molecular evolution: a phylogenetic approach. Blackwell.
• Nei M. & Kumar S. 2000. Molecular Evolution and Phylogenetics. Oxford University Press.
• Goal: propose a hypothesis of relationships between several taxa
• Phylogeny = tree (≠ ladder) • Speciation: binary • Based on homology: similarity from a common ancestor
• Indicates the existence of a common ancestor • Identified from a phylogenetic tree, and basis to build it!
L An abro ide am sd p a HL imi ali bropsis ases dia ustrcaa ch li tus s eru oe re leo sm pu nc ar tat gin us atu s
Symphodus tinca Symphodus ocellatus
Symphodus melanocercus Ctenolabrus rupestris Labrus merula Labrus viridis Cheilinus trilobatus Cheilinus chlorourus Epibulus incidiator
Stetojulis bandanensis
Halichoeres margaritaceus Labropsis australis Halichoeres marginatus Anampses geographicus Anampses caeruleopunctatus Labroides dimidiatus Labrichthys unilineatus Coris julis Hemigymnus melapterus Hemigymnus fasciatus Thalassoma bifasciatum Thalassoma lunare Thalassoma lutescens Pictilabrus laticlavius Notolabrus tetricus Bodianus rufus
ralis Labropsis aust s aceu rgarit a m us es lan hoer c u li t a r H is o ns sh e e r oe an lich nd a Ha sb uli j o et St
Clepticus parrae Symphodus roissali Symphodus cinereus
Pagrus major
Symphodus roissali Symphodus cinereus
Symphodus tinca
Symphodus tinca
Symphodus ocellatus
Symphodus ocellatus
Symphodus mediterraneus
Symphodus mediterraneus
Symphodus melanocercus Ctenolabrus rupestris
Symphodus melanocercus Ctenolabrus rupestris
Labrus merula
Labrus merula
Labrus viridis
Labrus viridis
Cheilinus trilobatus
Cheilinus trilobatus
Cheilinus chlorourus
Cheilinus chlorourus
Epibulus incidiator
Epibulus incidiator
Stetojulis albovittata
Stetojulis albovittata
Stetojulis bandanensis
Stetojulis bandanensis
Halichoeres hortulanus
Halichoeres hortulanus
Halichoeres margaritaceus
Halichoeres margaritaceus
Labropsis australis
Labropsis australis
Halichoeres marginatus
Halichoeres marginatus
Anampses geographicus
Anampses geographicus
Anampses caeruleopunctatus
Anampses caeruleopunctatus Labroides dimidiatus
Labroides dimidiatus
Labrichthys unilineatus
Labrichthys unilineatus Coris julis
Coris julis
Hemigymnus melapterus
Hemigymnus melapterus
Hemigymnus fasciatus
Hemigymnus fasciatus
Thalassoma bifasciatum
Thalassoma bifasciatum
Thalassoma lunare
Thalassoma lunare
Thalassoma lutescens Pictilabrus laticlavius Notolabrus tetricus Bodianus rufus
Thalassoma lutescens Pictilabrus laticlavius Notolabrus tetricus Bodianus rufus
Clepticus parrae
Clepticus parrae
Pagrus major
Pagrus major
us ric t te
s ru s ab l ufu to sr o u n N dia rrae Bo s pa u c i pt Cle major Pagrus
Symphodus roissali Symph odus c inereu Sym s pho Sy dus mp tinc Sy a ho m du ph so ce od lla us tus m ed ite rra ne us
s rcu ce no ela sm ris du est ho rup mp us Sy abr nol Cte ula s mer Labru Labrus viridis
Sy mp ho du so ce lla tus
r to ia cid in
m ajo r
us ul ib Ep
ris est rup us abr nol Cte
SSyy mmp phh oodd uuss crion iess real uis
Halichoeres hortulanus
La br oid es dim cae idi rule atu opu s Anam nct atu pses s geog raph icus Halichoeres m arginatus
An am pse s
St et oju lis alb Ep ov ibu itta lus inc Che idia ta ilinu tor s ch lorou rus Cheilinus trilobatus
Halichoeres margaritac eus albovittata Stetojulis bandanensis Stetojulis s ouru chlor inus il e h C Ch ei lin us tri lo ba Labrus merula viridis tus dus tinca Sympho
s rcu s creaneu r o e it n eda us m el phod us m Sym d ho mp Sy
Symphodus mediterraneus
Stetojulis albovittata
s rufu nus a i d Bo Pa gr us
Symphodus cinereus
s s nu icu a l h p tu or gra eo sh g e r es oe ps ch am i l n A Ha
Th TH ala haelam Coris ss sisgym julis om om nu a b a s fa ifa lute sciat us Hemigymnusscmela iat scepteru um ns s Pic tilab rus nare l maaticlu o l a s s viu s Thala Cl ep tic labrus tetricus Noto us pa rra e
tus nus fascia Hemigym rus apte s mel juli nus ris igym s Co tu Hem ea ilin un ys th ich br La
unilineatus Labrichthys
Symphodus roissali
Thalassoma bifasciatum Thala ssom a luna Tha re las som Pic a lu tila tes bru cen sl s ati cla viu s
Phylogenetic trees
• Cladogram • No branch lengths • Clades • Phylogram • Branch lengths Additive tree
Ultrametric tree
Leafs = terminal taxa
Clade
Terminal branches A
B
C D
E
F
G
H
I
J
Polytomy Internal branches
Node Root
• Speciation
Hypothesis
A
B
C
Rooting
• Gives the branching order • Use of an outgroup • Rest = ingroup Non rooted tree
Add an outgroup
Rooted tree outgroup
• Outgroup: sister taxa from ingroup • Shared characters between outgroup and ingroup = ancestral characters
• Sometimes no outgroup: rooting at equal distance from tree tips (need branch lengths) = midpoint rooting B
B C
E
A D
F
A
C
D
F
E
• Groups • Monophyletic (clade): natural group
• Mammals • Paraphyletic • Reptiles • Polyphyletic • Algae, protozoans
Characters • Organisms are composed of different features • These features are different among taxa: Character states
• All character states form a character • These states are produced by heritable changes • Phylogenetic inference is performed from differences between character states
• We want to establish the ancestor-descendant link from the presence/absence of character states
• We look for the appearance of new character states in descendants
• The different character states are homologies • Taxa sharing this new character state (derived) form clades
• Example: hair in mammals • Characters can be differentially weighted
• Homology
• Homoplasy
• Ancestral characters: plesiomorphies • Shared ancestral characters: symplesiomorphies • Derived characters: apomorphies • Shared derived characters: synapomorphies • Ideally, identify clades • Non shared derived characters = particular to a given taxon: autapomorphies
Homology
• Homologies are supposed to show similarities in: • position • structure • development • A recognized criterion to support homology is the congruence with other characters
Dog
Lizard
Frog
Human Change HAIR Absents Presents
Homoplasy
• Non homologous similarities • Results from independent evolution • Convergence • Parallelism • Reversion • Blurs phylogenetic signal: may lead to false evolutionary relationships
Parallelism Convergence
Reversion
Lizard
Human TAIL
Frog
Dog
Human
Dog
Absent Present
TAIL
Frog
Lizard
Absent Present
• Without homoplasy, phylogenetic inference would be easy
• Main problem of phylogenetic recontruction: discriminate homoplasy (noise) from homology (signal)
• Data quality (“good” phylogenetic signal) is more important than method used
• If there is only one correct tree, when characters
support different trees, at least one contains homoplasies Dog Lizard HAIR Absent Present Frog Human
Human
Dog TAIL
Frog
Lizard
Absent Present
Congruence
• The chosen tree is the tree maximising the number of congruent characters
MAMMALS Dog HAIR MILK ... Human
Lizard
Frog Changes
Case of molecular data
• Homoplasy is more common with molecular than morphological data
• Few states (4 for DNA: A G C T) • Chemically close • Evolutionary rates can be high • No identification of homoplasy via structure or development
Data • Fossils: rare • Morphological characters • Molecular character: DNA, proteins, ... • By far the most used now: models, numerous characters, less subjective, ...
• But... phylogeny of the DNA fragment (≠ taxa) • Future: genomes ➙ phylogenomics • Others (behaviour, hosts, habitat, ...)
Morphological data
• Homology uneasy to identify • Characters often not numerous: problem when
studying many taxa, especially if they are closely related
• Some subjective decisions • Evolutionary processes poorly known: limit method choice
• Require coding • Sometimes difficult • Hypotheses on character evolution
Coding
• Binary: Presence/absence = 0/1 • Multiple states (ordered or not): definition of step numbers between states
• Additive binary coding: e.g. 00, 01, 10, 11 • Linear coding: e.g. 0, 1, 2 • Both can be combined
Molecular data
• Nucleotides ou amino acids (for ancient divergences) • Characters = base (or AA) positions • Character states = bases (ou AA) identity • Important step: alignment • Sometimes manual • Automated methods: manual editing required • No test: no null hypothesis • Can use information on secondary structure or coding nature
• Nucleotides: only 4 states (in 2 types) • Evolution can be modelled • Homoplasy “easy”
• Amino acids • 20 states • 5 categories • Evolution much
more difficult to model
• Codons • 61 states!
• Gene tree ≠ species tree • Genes: orthologous or paralogous Paralogs Orthologs
a
b* c
Orthologs
C* B
A*
b* C*
Duplication
Tree Ancestral gene
A*
Alignment