Using Genomics to investigate Cospeciation and ... - Yves Desdevises

Oct 4, 2011 - Phylogenetics, mainly molecular data .... Branch lengths: molecular phylogenies ..... Sensitive to taxa/gene representation in databases. 61.
6MB taille 21 téléchargements 248 vues
O céanolo

This file contains one logo per layer. These logos use RGB (RVB) colors.

g

i

ue

rv

a

ire

q

to

Obse

Laboratoire



er

C

1882 C

1

In the calque menu, click on the "eye" symbol to toggle the logo displayed. The logo that is visible is the version that will be exported, or printed. Turn off this layer (READ ME FIRST) before exporting or printing.

/M

M • UP

de Banyuls

ARAGO

Using Genomics to investigate Cospeciation and Lateral Gene Transfer in symbiotic systems NR

S • INSU •

http://desdevises.free.fr/MEEG_YD

Yves Desdevises Observatoire Océanologique de Banyuls Université Pierre et Marie Curie France

2

Who am I? Roscoff

• Professor • Marine Station of Banyuls/Mer (Deputy Director) • UMR 7232 Integrative Biology of Marine Banyuls

Organisms

• Head of the team Marine Interactions - Evolution and Adaptation

What kind of research am I doing? Evolution and diversity in (marine) symbiotic systems

Development of analytical methods

Microalgae-Viruses

Fish-Platyhelminthes, …

3

Cophylogenetics

Comparative method

4

The tools I use Morphology and Phylogeny in Lamellodiscus

• Phylogenetics, mainly molecular data • Comparative genomics • Environmental genomics • Statistics and numerical ecology

Figure 1. Phylogeny of several Lamellodiscus species obtained by maximum likelihood and Bayesian inference. As topologies obtained with both reconstruction methods gave congruent topologies and similar branch lengths, the most resolved tree, obtained by maximum likelihood, was retained and is presented here. Bootstrap values (1000 replicates) and posterior probabilities (.0.5. Dashes correspond to values ,0.5) are indicated at each node. The clusters of individuals for which the alignment of ITS1 was possible are outlined in grey boxes. Thick black lines indicate ergensi and ignoratus groups. doi:10.1371/journal.pone.0026252.g001

50000

PLoS ONE | www.plosone.org

4

40000 30000 20000 10000

Mamiella

Micromonas_C

Micromonas_B

Monomastix_unknown Monomastix_unknown

Bathycoccaceae_unknown

Crustomastix

Dolichomastix

Micromonas_D

Ostreococcus_D

Mantoniella_unknown

Micromonas_unknown

Monomastigales_unknown

Ostreococcus_unknown

Mamiellaceae_unknown

Dolichomastigales_unknown

Bathycoccus

Micromonas_E

Micromonas_A

Micromonas_BC

Ostreococcus_ABC

0

CA2 (31.46%)

Total amount of sequences

60000

CA1 (32.62%)

October 2011 | Volume 6 | Issue 10 | e26252

5

Symbiotic systems • Symbiotic associations in a broad sense (involving eukaryotes, prokaryotes or viruses): parasitism, mutualism, commensalism, ...

• Closely interacting partners ➡ Closely interacting genomes: coevolution

• Common phylogenetic history? • Lateral gene transfers between partners? 6

Outline 1.

2.

Methods 1.1.

Assessing cophylogenetic history

1.2.

Finding lateral gene transfer

Case study: a microalgae-virus system 2.1.

Cophylogeny

2.2.

Lateral gene transfer

7

Finding cophylogenetic patterns in symbiotic associations

Host-parasite associations Parasites Hosts

Parasites

8 Hosts

9

10

11

• Cospeciation; coevolution; cophylogeny; parallel cladogenesis; cocladogenesis; cophylogenetic descent; cophylogenetic maps ...

• Here: macroevolutionary context • How to reconstruct the common evolutionary history of two clades, for example hosts and parasites?

• Some key dates • 1981: Brooks (see Klassen 1992) • 1994: Page; Hafner et al. • Books • Page (ed.). 2003. Tangled trees. University of Chicago Press.

• Garamszegi L. Z. (ed.) 2014.

Modern comparative methods and their application in evolutionary biology. Chap. 20. Springer.

12

13

Four cophylogenetic events

Cospeciation

Transfer

Duplication

Sorting

14

revue Virologie 2015, xx (x) : 1-10

Quand virus et hôtes évoluent ensemble : la fidélité est-elle la règle ? Laure Bellec1 Yves Desdevises2 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie des organismes et écosystèmes aquatiques (BOREA, UMR 7208), Muséum national d’Histoire naturelle, Université Pierre et Marie Curie, Université de Caen Basse-Normandie, CNRS, IRD, CP26 75231, 43 rue Cuvier, Paris cedex 5, France 2 Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie intégrative des organismes marins (BIOM, UMR 7232), Observatoire océanologique, 66650, Banyuls/Mer, France

Résumé. Les virus ont avec leurs hôtes des interactions très fortes, que ce soit au niveau physiologique ou écologique, se traduisant le plus souvent par une très forte spécificité. Il est donc tentant de penser qu’ils évoluent de conserve et que les virus peuvent pratiquement être considérés comme des caractères de leurs hôtes. Cependant, la cospéciation entre les virus et leurs hôtes, c’est-à-dire le degré auquel leurs arbres phylogénétiques sont similaires, a encore fait l’objet de relativement peu d’études malgré un domaine de recherche très dynamique. Les concepts et méthodes principales pour étudier ces patrons de cospéciation, ou plus généralement la cophylogénie, sont exposés ici. Leur utilisation dans des systèmes hôte-virus montre que contrairement à ce qui est souvent présupposé, l’histoire évolutive conjointe des virus et de leurs hôtes est souvent complexe. Sans une étude cophylogénétique rigoureuse, il est ainsi extrêmement hasardeux de calquer l’histoire évolutive des virus sur celle de leurs hôtes. Mots clés : cophylogénie, cospéciation, transfert d’hôte, évolution, spécificité

Abstract. Viruses display strong interactions with their hosts, from physiological and ecological point of views, often leading to strict patterns of host specificity. It is then tempting to consider that viruses evolve in the same way as their hosts, behaving more or less like hosts’ characters. However, the cospeciation between viruses and their hosts, that is the degree to which their evolutionary trees are similar, has been the subject or relatively few studies, in a field otherwise very dynamic. The main concepts and methods to study the patterns of cospeciation, and more generally cophylogeny, are reviewed here. Their uses with host-virus systems suggest that, contrarily to a common belief, the joint evolutionary history of viruses and their hosts is often complex. Without a rigorous cophylogeny study, it is then very risky to consider that the evolutionary history of viruses mirrors that of their hosts.

Methods

15

Key words: cophylogeny, cospeciation, host switch, evolution, specificity

Introduction L’étude des relations évolutives entre des symbiontes, ou des parasites comme des virus (ce sera le cas dans cet article), et leurs hôtes est désignée dans la littérature scientifique par de nombreux termes plus ou moins complexes : coévolution, cophylogénie, cospéciation, codivergence, cocladogénèse, descente cophylogénétique. . . Cela reflète le dynamisme de ce champ de recherche, qui a véritablement pris son essor dans les années 1980-90, avec le développement de méthodes analytiques dédiées.

Généralement, on parle de cospéciation pour faire référence au processus de spéciation concomitante de deux groupes taxonomiques d’organismes étroitement associés, résultant en deux phylogénies congruentes (présentant un fort degré de similarité) sinon identiques [1]. Cependant, il est important de faire la distinction entre les termes cospéciation et coévolution. Depuis son introduction par Ehrlich et Raven en 1964 [2], le concept de coévolution s’est restreint aux phénomènes micro-évolutifs [3], autrement dit à une sorte de course à l’armement entre 2 espèces en interaction au cours de leur évolution, comme un parasite et son hôte. Le terme coévolution peut ainsi désigner l’évolution réciproque d’adaptations entre des hôtes et leurs virus, mais ce système peut évoluer sans cospéciation alors

doi:10.1684/vir.2015.0612

• Event-based methods • Fit symbiont tree onto host tree by adequately Tirés à part : Y. Desdevises

1

Virologie, Vol xx, n◦ x, xxx 2015

Pour citer cet article : Bellec L, Desdevises Y. Quand virus et hôtes évoluent ensemble : la fidélité est-elle la règle ? Virologie 2015; xx(x) : 1-10 doi:10.1684/vir.2015.0612

mixing the four types of events = reconciled trees

• Optimality criterion: maximise number of

cospeciation events or minimise global cost (each kind of event is attributed a cost)

• Computationally intensive: for simple problems

(quite small trees and/or specific parasites and/or extensive cospeciation)

16

• With complex cases, solutions cannot be computed in polynomial time, unless

• host-switching is precluded, and/or • time constraint are considered, to limit the

number of possible switches (that requires fully dated trees)

17

• Global fit methods • Global congruence between trees • Influence of individual events • Host-parasite links • Importance of the null hypothesis • Cospeciation (e.g. Johnson et al. 2001) • Random associations (Legendre et al. 2002)

Theoretical prerequisites

18

• Well known and fully resolved trees needed for event methods

• Branch lengths: molecular phylogenies • Exhaustive sampling • Monophyletic groups = evolutionary entities • ... quite difficult • Topological congruence does not necessarily mean

19

• Different causes for congruence/incongruence

20

cospeciations!

Some methods

21

• Event-based methods • Brooks parsimony analysis (BPA; Brooks 1981) • Reconciled trees (Component, TreeMap 1, 2 and

3; Page 1993, 1994; Charleston 1998, Tarzan; Merkle and Middendorf 2005, CoRe-PA; Merkle et al. 2010, Jane; Conow et al. 2010, TreeCollapse; Drinkwater and Charleston, 2014)

• Generalised parsimony (TreeFitter; Ronquist 1995) • Probabilistic methods: ML, Bayesian inference (Huelsenbeck et al. 1997, 2000)

22

• Global fit methods • Homogeneity test (Johnson et al. 2001) • Congruence tests (ParaFit; Legendre et al. 2002, Hommola et al. 2009)

23

• Most methods work well if • Widespread cospeciation • ≈ 1 host / 1 parasite • Small phylogenies • Else: event-based methods are computationally very intensive, optimal solution not guaranteed

• Event-based methods all require fully-resolved trees

• Different methods: different results Reconciled trees TreeMap (Page 1994)

• Goal: fitting parasites tree onto host tree by adequately mixing the 4 types of events

• Criterion: maximise number of cospeciations (TM 1)

• Test against a random distribution • Can take branch lengths into account

24

25

• TreeMap 1: problems • Transfers added a posteriori • Limited optimality criterion (can generate many optimal solutions)

• Difficulty with widespread parasites (several host species)

• TreeMap 2 (Charleston & Page, 2002) • Jungles algorithm: introduces event costs • Optimisation of global cost • Find optimal solutions but very computationally

26

intensive

• Many modifiable parameters • Tests: • Global cost • Cherry Picking Test: influence of individual associations

27

• Each host must have at least a parasite(!!) • Needs fully resolved trees • TreeMap 3 is in development: in Java, for all platforms, with a few new functions

• TreeCollapse is a fast heuristic (greedy) algorithm to compute the solution

Example talpoides

28

wardi 17

13 bottae

minor thomomyus

bursarius

15

actuosi 10

hispidus

ewingi

cavator

chapini

14

18

12 11

underwoodi

16 panamensis 12 setzeri

10 9

cherriei 8

14 13

cherriei 11

heterodus

costaricensis

Phylogenies: COXI Pocket Gophers

Chewing Lice

29

TreeMap 1 talpoides wardi thomomyus bottae minor actuosi bursaris ewingi

hispidus chapini cavator panamensis

chapini cavator panamensis

underwoodi setzeri

underwoodi setzeri

underwoodi setzeri

cherriei setzeri cherriei heterodus

(b)

costaricensis

talpoides wardi thomomyus bottae minor actuosi bursaris ewingi

talpoides wardi thomomyus bottae minor actuosi bursaris ewingi

hispidus chapini

hispidus

underwoodi setzeri

panamensis underwoodi setzeri

cherriei setzeri cherriei heterodus

cherriei setzeri cherriei heterodus

costaricensis

hispidus

cherriei setzeri cherriei heterodus

(c)

costaricensis

14

(e)

costaricensis

talpoides wardi thomomyus bottae minor actuosi bursaris ewingi hispidus chapini

chapini cavator

cavator panamensis

(d)

talpoides wardi thomomyus bottae minor actuosi bursaris 5 ewingi

talpoides wardi thomomyus bottae minor actuosi bursaris ewingi

cavator panamensis

costaricensis

14

4

hispidus chapini

cherriei setzeri cherriei heterodus

(a)

3

cavator panamensis underwoodi setzeri cherriei setzeri cherriei heterodus

(f)

costaricensis

TreeMap 2

30

• 6 optimal solutions (out of 9) 1 of 9 | Co = 8, Sw = 4 (total distance: 3.565), Du = 10, Lo = 20;oftotal 9 | Co cost = 8, = 14 Sw = 4 (total distance: 3.67), Du = 10, Lo = 0; total 3 of cost 9 | Co == 1412, Sw = 3 (total distance: 3.009), Du = 6, Lo = 1; total cost = 10 talpoides talpoides talpoides wardi wardi wardi thomomyus thomomyus thomomyus bottae bottae bottae minor minor minor actuosi actuosi actuosi bursarius bursarius bursarius ewingi ewingi ewingi hispidus chapini

hispidus chapini

hispidus chapini

cavator panamensis

cavator panamensis

cavator panamensis

underwoodi setzeri

underwoodi setzeri

underwoodi setzeri

cherriei cherriei

cherriei cherriei

cherriei cherriei

heterodus costaricensis

heterodus costaricensis

heterodus costaricensis

4 of 9 | Co = 12, Sw = 3 (total distance: 3.009), Du = 6, Lo = 61;oftotal 9 | Co cost = 12, = 10 Sw = 2 (total distance: 1.9345), Du = 6, Lo = 3; 6 total of 9 |cost Co ==12, 11 Sw = 2 (total distance: 1.9345), Du = 6, Lo = 3; total cost = 11 talpoides talpoides talpoides wardi wardi wardi thomomyus thomomyus thomomyus bottae bottae bottae minor minor minor actuosi actuosi actuosi bursarius bursarius bursarius ewingi ewingi ewingi hispidus chapini

hispidus chapini

hispidus chapini

cavator panamensis

cavator panamensis

cavator panamensis

underwoodi setzeri

underwoodi setzeri

underwoodi setzeri

cherriei cherriei

cherriei cherriei

cherriei cherriei

heterodus costaricensis

heterodus costaricensis

heterodus costaricensis

• Test against a null distribution (from randomised

31

trees) of inferred number of cospeciations (or global cost with TreeMap 2)

• Confrontation with observed value Observed value

250

Fréquence Frequency

200

P < 0.05 The observed number of cospeciations is higher than 95 % of random iterations

150

100

*

50

0 1

2

3

4

5

6

7

8

9

10

11

12

Nombre de Number ofcospéciations cospeciations

Temporal congruence

• TreeMap can be used to compare divergences in cospeciating pairs

• Evolutionary rates can be compared (e.g.

parasites usually evolve faster than their hosts)

• Temporal congruence of speciation events can be assessed, this is a condition for true cospeciation

• Useful to discriminate evolutionary scenarios

32

• Simultaneity of speciation events?

33

No clock: additive trees, independent branch lengths Parasites

1,i

ivParasites

Hosts

Less changes in ii - evolves slowlier? - diverged later?

4,iv

iii

ii

i

3,iii 2,ii Hosts

Copaths

Molecular clock: ultrametric trees, dependent branch lengths Parasites

iv 3

3,iii

If intercept = 0: cospeciation

1-4,i-iv

ii

i 1

4,iv

iii

4

2,ii

2-3,ii-iii 3-4,iii-iv

2

1,i Hosts

Intervals between Coalescence times speciation events

Page 1996

Slope: compare evolutionary rates for hosts and parasites

• Tests in TreeMap • Branch lengths must be correctly estimated on the

34

tree (e.g. with an evolutionary model)

• Additive trees • Copaths based on reconstruction • Correlation coefficient r between copaths tested via branch lengths randomisation, because copaths are not independent (via phylogeny)

• Ultrametric trees • Coalescence times can be used • Same test 35

• Example with additive trees 19 18 20 17 22

hispidus

chapini

cavator

panamensis

underwoodi cherriei

16 heterodus trichopus

26

bulleri castanops 21 merriami

27

personatus breviceps 29 25 24 23

bmajusculu

28

talpoides

Parasites

cherriei

20

costaricen 18 trichopi 25

29

expansus

(r = 0.5663) talpoides-thomomyus

0.91

merriami-perotensis

bottae-actuosi

nadleri

hispidus-chapini [16]-[18] bulleri-nadleri bottae-minor

27

texanus 21 ewingi 26 23 actuosi 24 geomydis

trichopus-trichopi

underwoodi-setzeri [18]-[19] [25]-[21] cherriei-cherriei talpoides-barbarae

32

cavator-panamensis

oklahomens 22 perotensis

bhalli

bottae

28

setzeri 19

33 30

00

heterodus-costaricen personatus-texanus breviceps-ewingi bhalli-oklahomens bmajusculu-geomydis Hosts

0.96

thomomyus barbarae minor 31

• Example with ultrametric trees

36

• Slope = comparated rates (if same gene) • If the intercept is not different from 0:

37

simultaneous speciation events = cospeciation

Hafner et al. 2003

• Example: aphids and bacteria • Significant cospeciation • TreeMap (tolologies) • ParaFit (distances) • ML (sequences)

38

P < 0.01

Tarzan and Jane

39

• Connected to Jungles (TreeMap 2 and 3), but

faster due to heuristic algorithms, with statistical tests

• Tarzan can consider time ranges for nodes in the parasite tree, to preclude switches that are impossible in time (e.g. to an ancestor)

• Jane can consider time ranges for nodes in the

host and parasite trees, and can modify switch cost according to phylogenetic distances between hosts

ParaFit

• Assess congruence between distance matrices

(potentially) computed from phylogenies of hosts and parasites, via host-parasite associations

• Statistical tests (via permutations) • Global congruence between two trees/matrices (H0: random associations)

• Contribution of each individual association to this congruence (structuring effect)

40

41

Host-parasite associations

Hosts

Parasites

A B Hostparasite associations

Parasites tree

C Hosts tree

• Phylogenetic distances are transformed in principal

42

coordinates

ACGTTCGGA ACTGTCGGA AGTGTCCGA

010010100 010110110 001110110

( )

Raw or patristic distances

1

n

Principal coordinates analysis

n-1 (max)!

Production of a maximum number of n-1 independent variables (principal coordinates) fully equivalent to phylogenetic distances

43 Princ. coordinates

Matrix A

Matrix B

Absence/presence of host-parasite associations (0/1 data)

Coordinates (col.) describing the parasite phylogenetic tree

Parasites

Parasites

Hosts

Pocket gophers

Matrix C Coordinates (rows) describing the host phylogenetic tree

T. talpoides T. bottae Z. trichopus P. bulleri O. hispidus O. underwoodi

Princ. Coordinates Host tree

Princ. coordinates

Hosts

Parasite tree princ. coordinates

Matrix D SSCP parameters to be estimated

T. barbarae T. minor G. trichopi G. nadleri G. chapini G. setzeri G. panamensis

O. cavator

G. cherriei

O. cherriei

G. costaricensis

O. heterodus

G. thomomyus

C. merriami

G. perotensis

C. castanops G. bursarius majus. G. bursarius halli

G. actuosi G. expansus G. geomydis G. oklahomensis

G. breviceps

G. ewingi

G. personatus

G. texanus

Chewing lice

44

• Drawbacks • Events not considered • No scenarios • Advantages • Statistical tests, and tested via simulations • Adapted to complex problems • Various numbers of hosts/parasite and parasites/

45

host

• Use distance matrices: no problem with polytomies, or multiple trees

• ParaFit implemented in CopyCat

46

47

• Use different methods in cophylogenetic studies

48

Looking for Lateral Gene Transfers between symbionts

• LGT in the tree of life

49

50

• Lateral gene transfer is more and more recognized

as an important factor shaping the evolution of life

• Current debate is no more on the existence of LGT but on its importance: can we still consider that the evolution of life is mainly tree-like?

• No (?) in Prokaryotes • Yes (?) in Eukaryotes

51

• LGT needs to be identified and removed before

concatenating genes to build trees from genomes

52

53

• Books • Gogarten et al. (eds) 2009. Horizontal gene transfer. Humana Press

• Pagel and Pomiankowski (eds).

2008. Evolutionary genomics and proteomics. Sinauer

54

Possible mechanisms • "Prokaryotes"



55 Eukaryotes

REPORTS 28. W. D. Koenig, D. Van Vuren, P. N. Hooge, Trends Ecol. Evol. 11, 514 (1996). 29. C. M. Arnaud, F. S. Dobson, J. O. Murie, Mol. Ecol. 21, 493 (2012). 30. K. B. Armitage, D. H. Van Vuren, A. Ozgul, M. K. Oli, Ecology 92, 218 (2011). Acknowledgments: I thank J. Hoogland for encouraging me to reexamine my information on dispersal; my 150+ research assistants over the 31 years of research (especially my four offspring); and D. Boesch, K. Fuller, R. Gardner, R. Morgan, and L. Pitelka of the University of the Maryland Center for Environmental Science (UMCES) for the opportunity for long-term comparative research. Financial support was provided by Colorado Parks and Wildlife, the Denver Zoological Foundation, Environmental Defense, the Eppley Foundation, the Harry Frank Guggenheim Foundation, the National Fish and Wildlife Foundation, the National Geographic Society,

Gene Transfer from Bacteria and Archaea Facilitated Evolution of an Extremophilic Eukaryote Gerald Schönknecht,1,2*† Wei-Hua Chen,3,4† Chad M. Ternes,1† Guillaume G. Barbier,5†‡ Roshan P. Shrestha,5†§ Mario Stanke,6 Andrea Bräutigam,2 Brett J. Baker,7 Jillian F. Banfield,8 R. Michael Garavito,9 Kevin Carr,10 Curtis Wilkerson,5,10 Stefan A. Rensing,11|| David Gagneul,12 Nicholas E. Dickenson,13 Christine Oesterhelt,14 Martin J. Lercher,3,15 Andreas P. M. Weber2,5,15*

NSF, Princeton University, the Ted Turner Foundation, UMCES, and the Universities of Michigan and Minnesota. For help with the manuscript, I thank R. Alexander, D. Blumstein, D. Bowler, C. Brown, J. Clobert, T. H. Clutton-Brock, A. Davis-Robosky, F. S. Dobson, L. Handley, K. Holekamp, A. Hoogland, S. Keller, X. Lambin, M. Oli, P. Sherman, N. Solomon, and D. Van Vuren. Data for this report are archived as supplementary materials on Science Online.

Supplementary Materials

18 October 2012; accepted 23 January 2013 10.1126/science.1231689

lead to expansion of existing gene families (8). In contrast, archaea and bacteria commonly adapt through horizontal gene transfer (HGT) from other lineages (9). HGT has also been observed in some unicellular eukaryotes (10); however, to our knowledge, horizontally acquired genes have not been linked to fitness-relevant traits in freeliving eukaryotes (11). Phylogenetic analyses of G. sulphuraria genes using highly stringent criteria indicate at least 75 separate gene acquisitions from archaea and bacteria (supplementary materials). The origin of these G. sulphuraria genes from HGT is supported by the finding that compared to the genomic average, they have

• LGT is viewed as the main adaptive mechanism in

Some microbial eukaryotes, such as the extremophilic red alga Galdieria sulphuraria, live in hot, toxic metal-rich, acidic environments. To elucidate the underlying molecular mechanisms of 1 adaptation, we sequenced the 13.7-megabase genome of G. sulphuraria. This alga shows an Department of Botany, Oklahoma State University, Stillwater, REPORTS enormous metabolic flexibility, growing either photoautotrophically or heterotrophically on more OK 74078, USA. 2Institute of Plant Biochemistry, HeinrichHeine-Universität Düsseldorf, 40225 Düsseldorf, Germany. than 50 carbon sources. Environmental adaptation seems to have been facilitated by horizontal 3 Ecol. Evol. 17. J. L. Hoogland, Science 215, 1639 (1982). 28. W. D. Koenig, D. Van Vuren, P. N. Hooge, Trends NSF, Science, Princeton University, the Ted Turner Foundation, Institute for Computer Heinrich-Heine-Universität gene transfer from various bacteria and archaea, often followed by gene family expansion. At least 4 18. J. L. Hoogland, The Black-tailed Prairie Dog: Social Life of a 11, 514 (1996). UMCES, and the Universities of Michigan and Minnesota. Düsseldorf, 40225 Düsseldorf, Germany. European Molecular 5%Burrowing of protein-coding genes of G. sulphuraria were probably acquired horizontally. These proteins Biology (EMBL) Heidelberg, Meyerhofstrasse Mammal (Univ. of Chicago Press, Chicago, 1995). 29. C. M. Arnaud, F. S. Dobson, J. O. Murie, Mol. Ecol.Laboratory 21, For help with theEMBL, manuscript, I thank R. Alexander, 5 are J.involved in ecologically processes ranging from 493 heavy-metal 1, 69117 Heidelberg,D.Germany. Department of C. Plant Biology, 19. L. Hoogland, J. Mammal. important 80, 243 (1999). (2012). detoxification to Blumstein, D. Bowler, Brown, J. Clobert, T. H. Clutton-Brock, Wilson State University, Lansing, MI glycerol uptake and metabolism. Thus, findings show that a pan-domain poolVuren, has A. Ozgul, 612 20. J. L. Hoogland, in Rodent Societies, J. O.our Wolff, 30. K. B. Armitage, D.gene H. Van M. K. Oli, Road, Michigan A. Davis-Robosky, F. S.East Dobson, L. Handley, K. Holekamp, 48824, USA. 6Institut Mathematik und Informatik, Ernst facilitated environmental adaptation this Chicago, unicellular eukaryote. P. W. Sherman, Eds. (Univ. of ChicagoinPress, Ecology 92, 218 (2011). A. für Hoogland, S. Keller, X. Lambin, M. Oli, P. Sherman,

• This remains to be studiedAin Eukaryotes, where

Moritz Arndt Universität Greifswald, Walther-Rathenau-Straße 2007), pp. 438–450. N. Solomon, and D. Van Vuren. Data for this report are 47, 17487 Greifswald, Germany. 7Department of Earth and Envi21. Details about bacteria my materials and methods aredomavailable(6). as TheAcknowledgments: J. Hoogland for encouraging archived as supplementary only member ofI thank the Cyanidiophyceae lthough and archaea usually ronmental me Sciences, 4011 CC Little Building, 1100 materials North Uni-on Science Online. supplementary materials on Science hot Online. to reexamine my sequence informationwas on dispersal; my 150+ versityresearch Avenue, University of Michigan, Ann Arbor, MI 48109, a genome previously inate extreme environments, and ex- for which 22. J. B. Silk, Philos. Trans. R. Soc. London Ser. B 362, 539 assistants over the 31 years of research (especially my8 four Earth and Planetary Science, Department tremely acidic habitats are typically devoid available, Cyanidioschyzon merolae (7), diverged USA. Department ofSupplementary Materials (2007). offspring); and D. Boesch, K. Fuller, R. Gardner, of R.Environmental Morgan, Science, Policy, and Management, University 9 www.sciencemag.org/cgi/content/full/339/6124/1205/DC1 sulphuraria about 1 billion years ago, of California, of photosynthetic bacteria. Instead, eukaryotic from G.and 23. J. C. Mitani, J. Call, P. M. Kappeler, R. Palombit, J. B. Silk, L. Pitelka of the University of the Maryland Center for Berkeley CA 94720–4767, USA. Department of Materials and Methods which approximates the evolutionary distance beunicellular red algae of the Cyanidiophyceae are Biochemistry and Molecular Biology, 603 Wilson Road, Michigan The Evolution of Primate Societies (Univ. of Chicago Environmental Science (UMCES) for the opportunity for S1 MI 48824, USA. 10Research TechLansing, fliescomparative and humans (see fig. S1 and the Press, principal photosynthetic organisms in these tween fruit Chicago, 2012). long-term research. Financial supportState wasUniversity, East Fig. Tables and Laboratories, S2 nology Support Facility, PlantS1 Biology 612 Wilson merolae maintains ecological niches (1). Cyanidiophyceae 24. J. L. Hoogland, Behaviour 69, 1 (1979).can grow supplementary provided materials). by Colorado C. Parks and Wildlife, the Denver Zoological Road, Michigan StateReferences University, East Lansing, MI 48824, USA. (31–38) 25. J. L.0 Hoogland, Behav. Ecol. Sociobiol. 63, 1621 Environmental Defense, the Eppley a strictlyFoundation, photoautotrophic lifestyle and does not Foundation, at pH to 4 and temperatures up to 56°C, close(2009). 11 Faculty of Biology and BIOSS Centre for Biological Signalling 18 October 2012; accepted January 2013 26. P. J.upper Greenwood, Anim. limit Behav.for28,eukaryotic 1140 (1980). Harry Guggenheim Foundation, the National Fish high saltFrank or metal concentrations; it difto the temperature life toleratethe Studies, University of Freiburg, Schänzlestrasse 1, 79104 23 Freiburg, 10.1126/science.1231689 27. M. Waser,sulphuraria W. T. Jones,isQ.a Rev. Biol.member 58, 355of(1983). and Wildlife the in National Geographic Society,12UMR USTL-INRA 1281 “Stress Abiotiques et Differs markedly from Foundation, G. sulphuraria ecology, cell Germany. (2). P.Galdieria unique the Cyanidiophyceae, displaying high salt and biology, and physiology. Accordingly, we find férenciation des Végétaux cultivés,”13 Université de Lille 1, 59650 Villeneuve d'Ascq Cédex, France. Department of Microbiology metal tolerance and exhibiting extensive meta- orthologs for only 42% of the 6623 G. sulphuraria and Molecular Genetics, Oklahoma State University, Stillwater, bolic versatility (3, 4). G. sulphuraria naturally proteins in C. merolae, and only 25% of both ge- OK 74078, USA. 14CyanoBiofuels GmbH, Magnusstrasse 11, lead to expansion of existing gene families (8). In inhabits volcanic hot sulfur springs, solfatara soils, nomes constitute syntenic blocks (fig. S2). Coding 12489 Berlin, Germany. 15Cluster of Excellence on Plant Scicontrast, archaea and bacteria commonly adapt Düsseldorf, 40225 and anthropogenic hostile environments. In habi- sequences make up 77.5% of the G. sulphuraria ences (CEPLAS), Heinrich-Heine-Universität tats with high concentrations of arsenic, alumi- genome, resulting in a median intergenic distance Düsseldorf, Germany.through horizontal gene transfer (HGT) from other should addressed. lineages (9).be HGT hasE-mail: also been observed in num, cadmium, mercury, and other toxic metals, of 20 base pairs (bp) (fig. S3). Protein-coding *To whom correspondence [email protected] (G.S.); andreas.weber@ some unicellular eukaryotes (10); however, to G. sulphuraria frequently represents up to 90% genes contain on average two introns (fig. S4), uni-duesseldorf.de (A.P.M.W.) of total biomass and almost all the eukaryotic with median lengths of 55 bp (fig. S5). Thus, the †These authors contributed our knowledge, acquired genes have equally to thishorizontally work. Novozymes, Inc, 1445toDrew Avenue, G. sulphuraria genome is highly condensed by ‡Permanent address: biomass (1, 5). not been linked fitness-relevant traits in freeTo understand the molecular mechanisms comparison with that of C. merolae and most Davis, CA 95618, USA. living (11). Phylogenetic analyses of §Permanent address: Scrippseukaryotes Institution of Oceanography, 1,2 extremophilic and 3,4other eukaryotes. 5 underlying G. sulphuraria’s Gerald Schönknecht, *† Wei-Hua Chen, † Chad M. Ternes,1† Guillaume G. Barbier, †‡ of California, University San Diego, CA 92037, G. sulphuraria genesUSA. using highly stringent crite5 6 2 7 through 8 innovations arise metabolically flexible lifestyle (Fig. 1), we deter||Permanent address: of Biology, Philipps-University Roshan P. Shrestha, †§ Mario Stanke, AndreaEukaryotic Bräutigam, Brett usually J. Baker, Jillian F. Banfield, riaFaculty indicate at least 75 separate gene acquisimined its genome sequence (13.7 Mb;10 table S1) gene duplications Marburg, 35032 9 5,10 and neofunctionalizations, 11 which 12 Marburg, Germany.

gene duplication and evolution is an important process (but see Schonknecht et al. 2013, Science) Gene Transfer from Bacteria and Archaea Facilitated Evolution of an Extremophilic Eukaryote

R. Michael Garavito, Kevin Carr, Curtis Wilkerson, Stefan A. Rensing, || David Gagneul, Nicholas E. Dickenson,13 Christine Oesterhelt,14 Martin J. Lercher,3,15 Andreas P. M. Weber2,5,15* www.sciencemag.org

SCIENCE

VOL 339

8 MARCH 2013

Some microbial eukaryotes, such as the extremophilic red alga Galdieria sulphuraria, live in hot, toxic metal-rich, acidic environments. To elucidate the underlying molecular mechanisms of adaptation, we sequenced the 13.7-megabase genome of G. sulphuraria. This alga shows an enormous metabolic flexibility, growing either photoautotrophically or heterotrophically on more than 50 carbon sources. Environmental adaptation seems to have been facilitated by horizontal gene transfer from various bacteria and archaea, often followed by gene family expansion. At least 5% of protein-coding genes of G. sulphuraria were probably acquired horizontally. These proteins are involved in ecologically important processes ranging from heavy-metal detoxification to glycerol uptake and metabolism. Thus, our findings show that a pan-domain gene pool has facilitated environmental adaptation in this unicellular eukaryote.

tions from archaea and bacteria (supplementary materials). The origin of these G. sulphuraria 1207by the finding genes from HGT is supported that compared to the genomic average, they have 1

Department of Botany, Oklahoma State University, Stillwater, OK 74078, USA. 2Institute of Plant Biochemistry, HeinrichHeine-Universität Düsseldorf, 40225 Düsseldorf, Germany. 3 Institute for Computer Science, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany. 4European Molecular Biology Laboratory (EMBL) Heidelberg, EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany. 5Department of Plant Biology, 612 Wilson Road, Michigan State University, East Lansing, MI 48824, USA. 6Institut für Mathematik und Informatik, Ernst Moritz Arndt Universität Greifswald, Walther-Rathenau-Straße

aded from www.sciencemag.org on March 13, 2013

Bacteria and Archea

56

www.sciencemag.org/cgi/content/full/339/6124/1205/DC1 Materials and Methods Fig. S1 Tables S1 and S2 References (31–38)

Downloaded from www.sciencemag.org on March 13, 2013

17. J. L. Hoogland, Science 215, 1639 (1982). 18. J. L. Hoogland, The Black-tailed Prairie Dog: Social Life of a Burrowing Mammal (Univ. of Chicago Press, Chicago, 1995). 19. J. L. Hoogland, J. Mammal. 80, 243 (1999). 20. J. L. Hoogland, in Rodent Societies, J. O. Wolff, P. W. Sherman, Eds. (Univ. of Chicago Press, Chicago, 2007), pp. 438–450. 21. Details about my materials and methods are available as supplementary materials on Science Online. 22. J. B. Silk, Philos. Trans. R. Soc. London Ser. B 362, 539 (2007). 23. J. C. Mitani, J. Call, P. M. Kappeler, R. Palombit, J. B. Silk, The Evolution of Primate Societies (Univ. of Chicago Press, Chicago, 2012). 24. J. L. Hoogland, Behaviour 69, 1 (1979). 25. J. L. Hoogland, Behav. Ecol. Sociobiol. 63, 1621 (2009). 26. P. J. Greenwood, Anim. Behav. 28, 1140 (1980). 27. P. M. Waser, W. T. Jones, Q. Rev. Biol. 58, 355 (1983).

57

Methods to unveil LGT • Compositional methods • Comparison of evolutionary rates • Look for similar sequences in databases through BLAST

• Phylogenetic approach

Compositional methods

58

• Look via bioinformatics in complete genomes for • atypical nucleotide composition in putatively transferred genes

• atypical codon usage patterns

• Only for recent transfer events (before homogenization)

Evolutionary rates

59

• Compare pairwise distances between gene

orthologs within families vs distances between genomes (from a reference tree): if no LGT, these distances should be roughly equal

• Problem: many other factors than LGT can cause substitution rates (then distances) to be different

• Requires orthologs to be present in genomes

under study: better within than between phyla

60

• Another possibility is to compare instantaneous substitution matrices in genes vs genomes

• In case of LGT, these rates should differ • Problem: difficult to accurately estimate

substitution matrix for short DNA stretches (e.g. single genes)

• Supposes LGT is rare within genomes, because genome rate computation includes transferred genes

BLAST and similarity

61

• Find homologs of a query sequence in databases, genomes, ... via a similarity search (e.g. BLAST)

• Pattern of gene presence/absence in organisms = phyletic pattern

• Identification of LGT for genes with unusual affiliation

• Drawback: similarity does not necessarily mean evolutionary proximity

• Sensitive to taxa/gene representation in databases 62

• BLAST can be used to estimate the amount of transferred genes in closely related organisms using their genomes

• Can find if genes specific to each genome are putatively transferred

Phylogenetic approach

63

• Look for individual gene trees incongruent with a reference phylogeny

• Reference tree: rDNA, genomes, gene

concatenation, consensus tree, supertree, ...

• Need well supported trees • Test for incongruence between topologies (KH, SH, AU, SOWH,…)

• Cannot detect LGT between neighbours • Mix phylogenetic and phyletic approaches (BLAST): find putatively close sequences using BLAST and include them in a phylogenetic analysis

• Between symbionts with complete genomes available: blast symbiont ORFs against host genome to identify putatively transferred genes

• Cophylogenetic event-based methods (gene tree

within species tree) can be used to infer a scenario for the LGT (= host-switch)

64

65

Examples

• Antifreeze protein (AFP) gene in fish

66

Comparison of Herring AFP gene with homologous gene in

AFP tree

Rainbow smelt

16S tree Zebrafish

• Carbohydrate-active enzymes transferred from

67

marine bacteria to microbiota of japanese people

• Transfer from seaweed bacteria (Porphyra in sushi)

• TOP6B transferred from Archaea to photosynthetic eukaryotes and ∂-proteobacteria

68

• frp gene acquired in red algae and green plants

69

• Multiple transfers virus-to-host and host-to-virus

70

• Multiple transfers virus-to-host and host-to-virus

71

from ∂-proteobacteria

between Emiliania huxleyi and EhV86

between Emiliania huxleyi and EhV86

72

Case study: Prasinophyte microalgae and their viruses

Hosts: Prasinophyceae

73

green algae (Order • Chlorophyta: Mamiellales, ubiquitous picoplankton)

• 3 main genera, 6 complete genomes to date • Ostreococcus (3 genomes) free-living eukaryote and • Smallest photosynthetic genome • Bathycoccus (1 genome) • Scales • Micromonas (2 genomes) • Flagellum 74

Ostreococcus

Bathycoccus

Micromonas

75

Viruses • Phycodnavirus • Prasinovirus • Important role in the regulation of phytoplanktonic populations

ML Escande, OOB

76

77

• Hosts and viruses mainly sampled in the gulf of Lion

to

ire

O céanolo

This file contains one logo per layer. These logos use RGB (RVB) colors.

gi

q

ue

rv

a

• Evolution • Diversity • Specificity (viruses) and richness (hosts) • Coevolution • Lateral gene transfers Obse

Laboratoire



• INSU •

er

C

RS

In the calque menu, click on the "eye" symbol to toggle the logo displayed. The logo that is visible is the version that will be exported, or printed. Turn off this layer (READ ME FIRST ) before exporting or printing.

/M

M • UP

de Banyuls

ARAGO 1882 CN

• Large viral genomes: about 200 Kb • 14 complete genomes to date • Ostreococcus virus: 2 OtV, 7 OlV, 1

78

OmV, 1 OxV

• Bathycoccus virus: 2 BpV • Micromonas virus: 1 MpV

79

• OtV5 genome

Genome Flexibility of Ostreococcus Viruses

TABLE 4 Percent identity of the core genes between the seven Ostreococcus lucimarinus virusesa % identity to: Virus

OlV1

OlV2

OlV3

OlV4

OlV5

OlV6

OlV7

OlV1 OlV2 OlV3 OlV4 OlV5 OlV6 OlV7

100

66 100

67 95 100

92 68 68 100

67 94 95 69 100

67 94 95 68 97 100

92 69 68 92 69 69 100

a

FIG 4 Number of core and specific genes among the seven Ostreococcus luci-

• Genome comparisons

types present in the English Channel (not investigated in either study) and successfully cross-infects O. tauri (30). The morphology and size of the six new OlVs sequenced here is typical of other characterized prasinoviruses that infect Micromonas or Ostreococcus (17, 18). The particles also are morphologically similar to the much larger (165 to 190 nm) Chlorella viruses except for the spike structure at the vertex, which is not observed in prasinoviruses (45, 48). Globally, they show icosahedral symmetry, like the great majority of the other dsDNA aquatic eukaryote viruses currently described (49), without any tail, in contrast to many archaeal and bacterial bacteriophages (50, 51). The size of their genomes (around 200 kb) and the number of potential ORFs also are similar to those of the other sequenced prasinoviruses (17–20). Several lines of evidence support the delineation of two distinct OlV groups that we term type I and type II. In phylogenetic analysis of 22 proteins shared across the analyzed viruses, the OlVs formed two bootstrap-supported groups (Fig. 8). Concordance of the two reconstructions performed here (Fig. 8) indicates

!

FIG 5 Synteny among the seven Ostreococcus lucimarinus virus genomes. Synteny analysis is based on the alignment between annotated open reading frames translated into amino acids for each of the seven OlVs. Each red line represents one orthologous gene. Window, 20 amino acids. Each blue line represents an inverted orthologous gene sequence.

that PolB is a reliable phylogenetic marker for investigating natural diversity of chlorella- and prasinoviruses (9, 13). One O. tauri virus (OtV2) was grouped with OlV type II viruses. However, this virus was isolated using Ostreococcus sp. strain RCC393, a clade B strain that is present primarily in oligotrophic waters (25), rather than a strain from clade C (represented by O. tauri). Hence, while it is unknown whether the so-called OtV2 also infects clade A or C strains, its name is misleading, since it was not isolated against O. tauri. In addition to the observed phylogenetic relationships, nucleotide identities were higher for gene orthologs from OlVs in the same phylogenetic group than between groups, and gene presence/absence patterns were more similar within each group than between groups. Types I and II do not appear to correlate with geographical origins of the viruses. This indicates that the inversion and sequence divergence arose before the dispersion of the two groups. Moreover, the fact that the bona fide O. tauri viruses grouped with type I OlVs suggests these viruses cospeciated with their hosts from a common ancestor with OlV2, OlV3, OlV5, and OlV6. Among type II OlVs, the percent nucleotide identity between orthologous genes of viruses isolated from the same location (OlV5 and OlV6) (Table 1) is higher (97%) than that with the other type II viruses (94 to 95%), suggesting that inside each subgroup the sequence distance reflects the geographical distance, or that the viruses infecting Mediterranean Sea host strains have become more specialized for these hosts. However, additional sampling of viruses will be required to test this hypothesis. The presence of the sequence inversion in the two virus subgroups suggests that this inversion is an ancient rare event that occurred before the separation of the two groups. This hypothesis also is supported by the sequence divergence observed inside the inverted fragment that is similar to the divergence found in the rest of the OlV genomes. Furthermore, the phylogeny suggests that the most parsimonious explanation is that O. tauri viruses have arisen from a type I O. lucimarinus viral ancestor (with the inversion) by host switching. Among the genes which are shared by at least two but fewer than the seven OlVs, 2-oxoglutarate-Fe(II) oxygenase genes (52) are present in multiple copies in several viruses. These highly similar copies are located toward the 3= ends of the genomes, suggesting gene duplications from a single or a limited number of initial acquisitions. This gene family also has been described in multiple copies in viruses infecting cyanobacteria (53). The authors proposed that these genes were involved in the regulation of the cellular nitrogen metabolism or in DNA repair for the benefit of the virus. However, they also could be involved in controlling host translation during the infection (54, 55). Interestingly, the pho4

80

Downloaded from http://jvi.asm.org/ on May 6, 2015 by guest

marinus viruses. Pale blue circle, OlV core genes; gray circle, genes present in more than one OlV but not in all seven of the genomes; dark blue triangles, genes specific to only one OlV; yellow external circles, number of these specific genes shared between this OlV and at least one other prasinovirus. See Materials and Methods for the determination of the orthologous genes.

Boldface numbers indicate compaisons between viruses of the same type.

81

• Genome stats

!

• Phylogeny

82

• Specificity

83

Some of our questions • What is the genetic diversity of viruses and their hosts? Are they correlated in some way? Is it linked to environmental variables?

• What are the features of host and virus genomes? • What are the resistance mechanisms in viruses? • Virus evolution: are prasinovirus monophyletic?

Are they coevolving with their hosts? Are there any gene transfers between hosts and viruses? Are evolutionary rates different between hosts and their viruses? What is the origin of viral genes?

84

85

Cophylogeny

86

• Trees are based on the analysis of partial DNA

polymerase gene (about 600 bp) for viruses, and (generally) 18S rDNA for hosts

• Host specificity is assessed experimentally

87

Cospeciation • Significant

cospeciation with reduced dataset

• Need more data • Longer sequences • Specificity • New strains • More complete dataset (51 virus on 22 hosts): too long to compute exact tests host

Bathycoccus

associations

1/100

Bp_RCC1105

parasite

1/100

BpV1 BatV3

Bp_RCC464

OmV63 OmV64

0.96/79

OmV67

1/100

1/99

Om_RCC1107

OtV343 OtV344

D

Om_RCC789

1/61

OtV304 OtV4

1/91

O_RCC344

1/99

Ostreococcus

OtV564

1/92

OtV565 OtV573

A 1/87

O_RCC356

OtV9 OtV22

Ol_CCE9901

OtV3

1/91

OlV158 OlV462

1/100

ALGAE

C

O_RCC1108

OlV360

Mp_RCC658

VIRUSES

0.99/69

MicBV39

1/100

0.92/-

0.98/73

OlV349 OlV536

Ot_RCC745

0.82/50

MicBV16

0.83/-

MicBV13 MicCV9

A

Mp_RCC2485

0.98/63

MicBV40 MicBV25

Mp_RCC834

MicAV31

1/92

MicAV28 MicAV27 Mp_RCC465

Micromonas

MicAV34

1/100

Mp_RCC629

MicAV32

1/99

MicAV38

C

Mp_RCC114

MicAV30 MicAV39

1/99

0.92/55

MicC497V2 Mp_RCC373

MicBV26 MicAV29

Mp_RCC2484

MicCV3 MicCV2

Mp_RCC1109

1/91

B

0.78/-

MicCV10

MicCV1 MicC497V1 MicCV36

Mp_RCC418

1/100

MicCV32 MicCV28

1/100

Mp_RCC2482

0.99/60

MicCV23 MicCV22

0.76/65

MicCV21 Mp_RCC461

1/92

MicB1109V4 MicB1109V6

Mp_RCC2483

MicB1109V14

1/100

88

• Rationale approach: compute congruence test with

89

ParaFit and propose reconstruction with topologybased algorithms

• A reconstruction from Jane

90

• Statistical testing

91

Original instance: P ≤ 0.000

• Difficult to

define evolutionary or geographical entities for viruses, because global distribution ("everything is everywhere"): distance approaches less biased than topology-based approaches

92

93

• No physical barriers between hosts and viruses • If significant cophylogenetic pattern: adaptation ≠ lack of opportunity for transfer

• "Real" cospeciation ➡ ... Need to study this dataset in a more thorough way (e.g. by partitioning the data)... your job!

94

Lateral Gene Transfer

95

• Special case of LGT here: between hosts and their viruses

• Viruses are known as "bag of genes", or "gene robbers", steeling genes from their host

• Suspected to be vectors of gene transfers between eukaryotes

General methodolody for identifying LGT • Define candidate gene for transfer via BLAST: present in host and viruses

• Find same genes in different taxa (using BLAST, GenBank, ...): make a dataset with most closely related hits (BLAST), reference taxa, candidate gene in host and virus

• Align sequences and make tree • Look at the tree to identify LGT

96

Host-virus LGT in OtV5?

97

• Blast each viral ORF against host genome and keep ORF meeting specific criteria (AA ID > 45 % on > 50 AA)

• Blastp against GenBank nr, keep all viral ORFs with host in the 50 best blast hits, and get these BBHs

• Keep these sequences if similar known gene function in Phycodnaviruses

• 6 candidates for LGT

• Make phylogenetic tree for each candidate, adding

98

host and virus sequences in the alignment + other BBHs (and reference sequences)

NO NO Pyrophosphatase

GDP-mannose Unknown

?

99

NO Topoisomerase

NO Ribonucleosidediphosphate reductase

100

? Maybe...

101 reo co ccu s lu cim Ost 1/100

cus ococ

i

taur

1/81

Bath

ycoc

oc

cu

s lu cim ococ arin cus us taur i

us

Chlamydomonas Ph Chlorella om O ysc ry Sorg itrella za h um

oc

Ostre

as

tre

cus

0.1

n mo

Os

Ce V Mim ivir

ro Mic

Micromonas CCMP1545 C299us as RC cocc omon thy Ba

lium opsis ondy Arabid lysph Po Homo

EhV8

6 Thermococcus

AtC V PbCV MT32 1 5 83 PbCV FR4 158 AR CV Pb

NY 2A PbCV1

Methanosarcina Me Met thanoco ccoid hano es saet a

V

1 MpV1 OlV

1

OtV

V1

BpV2

Bp

Bp

BpV2 OtV5

Zoom based on the concatenation of 5 genes (same for hosts and viruses)

Ostre

00

1/1

Micr



Phylogenetic tree of an "evolutionary marker gene", the DNA polymerase

Pb C



ari nu s

Prasinovirus genomes

V1

Sy

ccu s

us

c oc

1

roco

ia

OtV

chlo

c ho

as on

n lsto Ra

m no

1/94

Pro

c ne

ari M

0.88/76

00

V1

1/1

Mp

OlV1 1/100 OtV5 1/100

0.1 0.1

102

• Different AA metabolism in related viral genomes !

Only in MpV and OtV: LGT?

Only in MpV and OtV: LGT?

Only in OtV: LGT?

103

• A HSP70 gene exists only in the BpV genome: LGT from its host?

• This gene is thought to have been frequently and independently recruited in taxonomically distant viruses: what happened here?

➡ ... Find by yourself!

104

LGT of inteins in prasinoviruses

• Virus-to-virus transfer • Intein are selfish genetic

!

elements that can insert in genes and disrupt it without affecting its activity (self excises after translation)

• Scattered phylogenetic distribution

• Comparison with reference

tree (from DNA polymerase) ! FIG. 3. Phylogenetic tree of polB sequences belonging to Prasinoviruses. The phylogenetic tree was built using only the PolB sequences of intein-containing viruses and from reference sequences lacking inteins, BI (codon model) and ML (codon model; 297 nucleotides; 100 bootstrap replicates). Numbers show posterior probabilities (BI) and bootstrap proportions

Base composition

105

Tree comparison

(A)$

!

$

! FIG. 5. Tanglegram of PolB and associated inteins belonging to Prasinovirus. Phylogenetic trees were built using codon models in BI and ML (426 positions for the PolB tree; 438 positions for the inteins tree; 100 bootstrap replicates). Numbers show posterior probabilities (BI) and bootstrap proportions (ML) reflecting clade support. Trees were rooted according to the Fig. 3. PolB and intein trees are shown respectively on the left and on the right, respectively.

O R I G I NA L A RT I C L E doi:10.1111/j.1558-5646.2012.01738.x

Codon usage

$(B)$

GENETIC EXCHANGES OF INTEINS BETWEEN Tree reconciliation ! PRASINOVIRUSES (PHYCODNAVIRIDAE) FIG. 4. GC content (A) and NC codon usage statistic (B) of the PolB gene and associated $

!

FIG. 6. Cophylogenetic scenario. Black and grey trees represent PolB and intein sequences,

Camille Clerissi,1,2,3 Nigel Grimsley1,3 , and Yves Desdevises1,3

inteins. ■: BpV-like (Bathycoccus virus-like); : OtV-like (Ostreococcus virus-like); : Avenue du Fontaule,´ 66650, Banyuls-sur-Mer, France ´ UPMC Univ Paris 06, UMR 7232, Observatoire Oceanologique, 1

2

E-mail: [email protected]

respectively. ●: Switch; MpV-like (Micromonas virus-like). The straight lines have a slope of 1 and correspond to 3

PolB-intein couples without recent transfer signal.



Array of methods

○: Codivergence; - - -: Loss. This scenario was produced with Jane 3

´ ´ 66650, Banyuls-sur-Mer, France CNRS, UMR 7232, Observatoire Oceanologique, Avenue du Fontaule,

106

using the following costs: Codivergence: 0; Duplication: 1; Switch: 1; Loss/Sorting: 2; Failure

Received April 3, 2012

Accepted June 29, 2012

Phylogenetic diversity in the Phycodnaviridae (double-stranded DNA viruses infecting photosynthetic eukaryotes) is most often

to diverge: 1.

• GC content and codon usage point out (recent) studied using their DNA polymerase gene (PolB). This gene and its translated protein product can harbor a selfish genetic element

called an “intein” that disrupts the sequence of the host gene without affecting its activity. After translation, the intein peptide

sequence self-excises precisely, producing a functional ligated host protein. In addition, inteins can encode homing endonuclease (HEN) domains that permit the possibility of lateral transfers to intein-free alleles. However, no clear evidence for their transfer

between viruses has previously been shown. The objective of this paper was to determine whether recent transfers of inteins

transfers

have occurred between prasinoviruses (Phycodnaviridae) that infect the Mamiellophyceae, an abundant and widespread class of

unicellular green algae, by using DNA sequence analyses and cophylogenetic methods. Our results suggest that transfer among prasinoviruses is a dynamic ongoing process and, for the first time in the Phycodnaviridae family, we showed a recombination event within an intein.

• Cophylogenetic analyses identifies the players KEY WORDS:



Gene conversion, lateral gene transfer, Mamiellophyceae, recombination, virus.

Microbes are at the base of food networks in the oceans and thus they shape the structure and function of ecosystems (Azam et al. 1983). Marine viruses are one order of magnitude more abundant than the microbial hosts they predate (Suttle 2005). Hence, they have a strong influence on biogeochemical cycles and on the community structure of microorganisms (Proctor and Fuhrman 1990; Thingstad and Lignell 1997). The majority of them are prokaryotic viruses (bacteriophages) (Suttle and Chan 1993; Sullivan et al. 2003; Weinbauer 2004), but there is increasing interest in eukaryotic viruses, especially because (1) picoeukaryotes make a large contribution to planktonic primary production (Moon-van der Staay et al. 2001), (2) some eukaryotic algae are toxic and kill shellfish (Tarutani et al. 2001), and (3) aquatic environments are likely to harbor an uncharacterized diversity of large algal viruses that have recently been detected by virtue of their DNA sequence similarities to Mimivirus, the largest virus ever discovered (La Scola etO al.R2003; al.RT 2008). I G IMonier NA Let A I CThus, L Ethese algal

Identification of probable LGTs

• between OtV • between putative BpV

viruses were included in the Mimiviridae family (Monier et al. 2008; Fischer et al. 2010), recently proposed to be reclassified as the Megaviridae family (Arslan et al. 2011). To date, the most extensively studied viruses infecting eukaryotic plankton belong to the Phycodnaviridae and Megaviridae families (Van Etten et al. 2002; Dunigan et al. 2006; Iyer et al. 2006; Monier et al. 2008; Fischer et al. 2010; Arslan et al. 2011). Both families contain large, double-stranded DNA viruses (sometimes called “Giruses,” short for Giant viruses [Claverie et al. 2006; Claverie and Ogata 2009; Ogata et al. 2009; Forterre 2010]), and form monophyletic groups within the nucleocytoplasmic large DNA viruses (NCLDV; Iyer et al. 2006; Fischer et al. 2010). Although five marine plankton-infecting viruses have been described so far for the Megaviridae (Phaeocystis phouchetii virus [PpV-01], Jacobsen et al. 1996; Chrysochromulina ericina virus [CeV-01], Pyramimonas orientalis virus [PoV-01], Sandaa et al. 2001; Cafeteria roenbergensis virus [CroV], Fischer et al. 2010;

C 2012 The Society for the Study of Evolution. 2012 The Author(s). Evolution ⃝ Evolution 67-1: 18–33

doi:10.1111/j.1558-5646.2012.01738.x

⃝ C

18

GENETIC EXCHANGES OF INTEINS BETWEEN PRASINOVIRUSES (PHYCODNAVIRIDAE) Camille Clerissi,1,2,3 Nigel Grimsley1,3 , and Yves Desdevises1,3 1

´ ´ 66650, Banyuls-sur-Mer, France UPMC Univ Paris 06, UMR 7232, Observatoire Oceanologique, Avenue du Fontaule,

3

´ ´ 66650, Banyuls-sur-Mer, France CNRS, UMR 7232, Observatoire Oceanologique, Avenue du Fontaule,

2

E-mail: [email protected]

Received April 3, 2012 Accepted June 29, 2012 Phylogenetic diversity in the Phycodnaviridae (double-stranded DNA viruses infecting photosynthetic eukaryotes) is most often studied using their DNA polymerase gene (PolB). This gene and its translated protein product can harbor a selfish genetic element called an “intein” that disrupts the sequence of the host gene without affecting its activity. After translation, the intein peptide sequence self-excises precisely, producing a functional ligated host protein. In addition, inteins can encode homing endonuclease (HEN) domains that permit the possibility of lateral transfers to intein-free alleles. However, no clear evidence for their transfer between viruses has previously been shown. The objective of this paper was to determine whether recent transfers of inteins have occurred between prasinoviruses (Phycodnaviridae) that infect the Mamiellophyceae, an abundant and widespread class of unicellular green algae, by using DNA sequence analyses and cophylogenetic methods. Our results suggest that transfer among prasinoviruses is a dynamic ongoing process and, for the first time in the Phycodnaviridae family, we showed a recombination event within an intein. KEY WORDS:

Gene conversion, lateral gene transfer, Mamiellophyceae, recombination, virus.

Microbes are at the base of food networks in the oceans and thus they shape the structure and function of ecosystems (Azam et al. 1983). Marine viruses are one order of magnitude more abundant than the microbial hosts they predate (Suttle 2005). Hence, they have a strong influence on biogeochemical cycles and on the community structure of microorganisms (Proctor and Fuhrman 1990; Thingstad and Lignell 1997). The majority of them are prokaryotic viruses (bacteriophages) (Suttle and Chan 1993; Sullivan et al. 2003; Weinbauer 2004), but there is increasing interest in eukaryotic viruses, especially because (1) picoeukaryotes make a large contribution to planktonic primary production (Moon-van der Staay et al. 2001), (2) some eukaryotic algae are toxic and kill shellfish (Tarutani et al. 2001), and (3) aquatic environments are likely to harbor an uncharacterized diversity of large algal viruses that have recently been detected by virtue of their DNA sequence similarities to Mimivirus, the largest virus ever discovered (La Scola et al. 2003; Monier et al. 2008). Thus, these algal

18

viruses were included in the Mimiviridae family (Monier et al. 2008; Fischer et al. 2010), recently proposed to be reclassified as the Megaviridae family (Arslan et al. 2011). To date, the most extensively studied viruses infecting eukaryotic plankton belong to the Phycodnaviridae and Megaviridae families (Van Etten et al. 2002; Dunigan et al. 2006; Iyer et al. 2006; Monier et al. 2008; Fischer et al. 2010; Arslan et al. 2011). Both families contain large, double-stranded DNA viruses (sometimes called “Giruses,” short for Giant viruses [Claverie et al. 2006; Claverie and Ogata 2009; Ogata et al. 2009; Forterre 2010]), and form monophyletic groups within the nucleocytoplasmic large DNA viruses (NCLDV; Iyer et al. 2006; Fischer et al. 2010). Although five marine plankton-infecting viruses have been described so far for the Megaviridae (Phaeocystis phouchetii virus [PpV-01], Jacobsen et al. 1996; Chrysochromulina ericina virus [CeV-01], Pyramimonas orientalis virus [PoV-01], Sandaa et al. 2001; Cafeteria roenbergensis virus [CroV], Fischer et al. 2010;

C 2012 The Society for the Study of Evolution. 2012 The Author(s). Evolution ⃝ Evolution 67-1: 18–33

⃝ C