Intra-host viral variability in children clinically infected with H1N1

Apr 16, 2015 - sample preparation (see discussions in e.g. Görzer et al., 2010; ... position, on reads that do not appear to be emulsion PCR duplicates.
854KB taille 5 téléchargements 238 vues
Infection, Genetics and Evolution 33 (2015) 47–54

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Short communication

Intra-host viral variability in children clinically infected with H1N1 (2009) pandemic influenza Vincent Bourret a,b,⇑, Guillaume Croville a,b, Jean-Michel Mansuy c,d, Catherine Mengelle c,d, Jérôme Mariette e, Christophe Klopp e, Clémence Genthon f, Jacques Izopet c,d, Jean-Luc Guérin a,b a

Université de Toulouse, INP, ENVT, Toulouse F-31076, France INRA, UMR 1225, IHAP, Toulouse F-31076, France INSERM, UMR 1043, Toulouse F-31300, France d Laboratoire de Virologie, CHU Purpan, 330 Avenue de Grande Bretagne, F-31300 Toulouse, France e Plateforme Bioinformatique Toulouse Midi-Pyrénées, UBIA, INRA, 31326 Auzeville Castanet-Tolosan, France f GeT-PlaGe, Genotoul, INRA Auzeville, F-31326 Castanet-Tolosan, France b c

a r t i c l e

i n f o

Article history: Received 8 January 2015 Received in revised form 3 April 2015 Accepted 9 April 2015 Available online 16 April 2015 Keywords: Influenza A Intra-host viral diversity 2009 H1N1 pandemic Pyrosequencing Next generation sequencing Deep sequencing

a b s t r a c t Recent in-depth genetic analyses of influenza A virus samples have revealed patterns of intra-host viral genetic variability in a variety of relevant systems. These have included laboratory infected poultry, horses, pigs, chicken eggs and swine respiratory cells, as well as naturally infected poultry and horses. In humans, next generation sequencing techniques have enabled the study of genetic variability at specific positions of the viral genome. The present study investigated how 454 pyrosequencing could help unravel intra-host genetic diversity patterns on the full-length viral hæmagglutinin and neuraminidase genes from human H1N1 (2009) pandemic influenza clinical cases. This approach revealed unexpected patterns of co-infection in a 3-week old toddler, arising from rapid and complex reassortment phenomena on a local epidemiological scale. It also suggested the possible existence of very low frequency mutants resistant to neuraminidase inhibitors in two untreated patients. As well as revealing patterns of intra-host viral variability, this report highlights technical challenges in the appraisal of scientifically and medically relevant topics such as the natural occurrence of homologous recombination or very low frequency drug-resistant variants in influenza virus populations. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction Influenza A viruses probably exist within individual hosts as swarms of many variant viruses whose genomes closely conform to a population consensus sequence. This was highlighted by recent in-depth analyses of samples revealing patterns of intrahost viral genetic variability in a variety of systems. These have included laboratory infected poultry (Iqbal et al., 2009), horses (Murcia et al., 2010, 2013), pigs (Murcia et al., 2012), chicken eggs and swine respiratory cells (Bourret et al., 2013) as well as naturally infected poultry (Croville et al., 2012; Monne et al., 2014) and horses (Hughes et al., 2012). In humans, the emergence of the 2009 H1N1 pandemic strain has coincided in time with the increased use of next generation sequencing techniques to investigate intra-host viral variability. Such techniques have often been

⇑ Corresponding author at: Service de Maladies Contagieuses, ENVT, 23 chemin des Capelles, 31076 Toulouse, France. Tel.: +33 561 192 312; fax: +33 561 193 974. E-mail address: [email protected] (V. Bourret). http://dx.doi.org/10.1016/j.meegid.2015.04.009 1567-1348/Ó 2015 Elsevier B.V. All rights reserved.

used to assess the occurrence of polymorphisms at specific positions of the viral genome. For instance, a customized pyrosequencing assay was specifically designed to detect intra-host polymorphisms at hæmagglutinin amino acid 222 (H1 numbering excluding the signal peptide), deemed to alter receptor binding (Levine et al., 2011). In an epidemiological study, intra-host polymorphism at the same site was positively correlated with severity of the associated clinical condition (Resende et al., 2014). Others have studied the emergence of an oseltamivir resistance mutation in the neuraminidase gene and distinct neuraminidase haplotypes over the course of a prolonged infection in an immunocompromised patient (Ghedin et al., 2011). Finally, some reported on the emergence of the same oseltamivir resistance mutation (H275Y, N1 numbering) in a patient suspected to have been infected by a donor harbouring a pure oseltamivir-sensitive viral population (Fordyce et al., 2013). The influenza A virus hæmagglutinin (HA) and neuraminidase (NA) proteins have essential biological functions and are considered the most protective antigens and most variable proteins of

48

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

Table 1 Summary of the clinical and demographic information of the seven patients. Patient

Age at sampling

Clinical signs

Interval between onset of clinical signs and sampling (days)

A

8 months

Not recorded

C

23 months

E G H

23 months 8 weeks 4 months

I J

3 weeks 13 months

Fever, rhinopharyngitis Fever, vomiting and coughing Fever Fever ‘‘Flu-like’’ syndrome including fever Fever Febrile coughing and dyspnoea

recorded. Samples were resuspended in Eagle’s Minimum Essential Medium supplemented with antibiotics, and samples were frozen at 80 °C until laboratory processing. Viral samples were not cultured at any time. 2.2. Deep pyrosequencing

5 34 0 1

2 3

this virus (Ellebedy and Webby, 2009). Here, we applied next-generation sequencing on a limited number of clinical specimens to unravel intra-host viral genetic diversity patterns over the complete hæmagglutinin and neuraminidase genes in young patients clinically infected with H1N1 pandemic viruses. 2. Materials and methods 2.1. Case history and sampling Seven patients aged 3 weeks–23 months were admitted at Toulouse children hospital between 26th December 2010 and 16th January 2011 for clinical signs compatible with influenza (Table 1). These patients were confirmed with influenza A virus infection using the RealTime Ready Influenza A/H1N1 Detection Set (Roche diagnostics, Meylan, France) as described in Mansuy et al. (2012). Nasal swabs were taken 0–5 days after the onset of illness, except for one patient for which the interval was 34 days and another one for which the onset of illness date was not

2.2.1. Laboratory protocol The full length hæmagglutinin (segment 4, 1777 nt) and neuraminidase (segment 6, 1458 nt) viral genes were purified and amplified from the liquid samples following a protocol described elsewhere (Croville et al., 2012). Care was taken to avoid cross-contamination, notably by using individual tubes for PCR and changing blade between each sample when extracting bands from electrophoresis gels. The purified PCR products were then quantified using Quant-iT PicoGreen kit (Molecular Probes). One pool was generated for each patient, containing 500 ng DNA comprising equal molar proportions of the HA and NA influenza gene segments (275 ng of segment 4 and 225 ng of segment 6). Pools were subsequently treated as in Bourret et al. (2013). Sequencing was carried out on one region of an 8-region Picotiterplate. 2.2.2. Bioinformatic treatment Sequences were demultiplexed using Roche’s tool SFF file without allowing any error per multiplex identifier. Reads were then filtered using Pyrocleaner (Mariette et al., 2011) considering different criteria such as length (with reads shorter or longer than the mean read length ±2 standard deviations being discarded). Reads were also filtered based on their complexity computed using the compressed string length (library zip) on several sub-sequences generated using a sliding window approach. The Pyrocleaner script removed sequences for which all base pairs had a Phred quality value under 20, or if the rate of undetermined bases was higher than 4%. The remaining sequences were aligned on reference sequences from the A/Paris/2573/2009 strain using the BurrowsWheeler Alignment package (BWA) (Li and Durbin, 2009). The

Fig. 1. Within-host viral variability (defined as the average percentage of viral nucleotides differing from the sample’s consensus at any position) as a function of patient age for the hæmagglutinin (s) and neuraminidase (+) genes.

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

49

Fig. 2. Levels of intra-host viral variability at each position for the influenza hæmagglutinin gene. The x-axis represents a map of the gene segment, while vertical bars indicate the level of intra-host variability at each position for the seven patients (A, C, E, G, H, I, J as indicated).

Genome Analysis ToolKit (GATK) (McKenna et al., 2010) was then used for base quality score recalibration, indel realignment, and duplicate removal, and the SAMtools software package (Li et al., 2009) was used for single nucleotide polymorphism (SNP) and indel discovery. 2.3. Viral genome analyses 2.3.1. Statistical analyses A matrix indicating the frequency of each allele at each genome position for each patient was generated using a custom script in Python and subsequently analysed using R (R Core Team, 2014). Averages of intra-host variability estimates were compared between genes using the Mann–Whitney test as implemented in R. Gene by gene visual analysis of pyrosequencing alignments was carried out using the Integrative Genomics Viewer (IGV) graphic interface (Robinson et al., 2011). 2.3.2. Phylogenetic reconstructions Evolutionary relationships between viral consensus sequences from the seven patients and a viral minority sequence from patient

I were inferred using the Maximum Likelihood method as implemented in MEGA6 (Tamura et al., 2013). For HA, we used the Tamura-Nei substitution model (Tamura and Nei, 1993) which allows a different rate for each of the two transition types as well as a different frequency for each of the four bases. For NA, we used the Tamura 3-parameter model (Tamura, 1992) which allows only a common transition rate and two different base frequencies. Those models were deemed to be the most suitable for these datasets by the maximum likelihood method implemented in MEGA6. The homologous sequences from an unrelated avian H1 virus (A/mallard/Netherlands/10-Nmkt/1999) were used as out-groups. A total of 1752 (HA) and 1428 (NA) positions were included in the final analyses. Initial trees for the heuristic searches were obtained automatically by applying Neighbour-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The final trees with the highest log likelihood ( 3931 for HA, 2770 for NA) were kept. Statistical support for the clades was estimated after 500 bootstrap replicates. Evolutionary analyses were conducted in MEGA6 (Tamura et al., 2013).

50

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

Fig. 3. Levels of intra-host viral variability at each position for the influenza neuraminidase gene. The x-axis represents a map of the gene segment, while vertical bars indicate the level of intra-host variability at each position for the seven patients (A, C, E, G, H, I, J as indicated).

3. Results 3.1. Coverage overview We used 454 pyrosequencing to describe intra-host viral variability patterns in the hæmagglutinin (HA) and neuraminidase (NA) genes of influenza virus samples from seven H1N1 (2009) pandemic patients aged 3 weeks–23 months. Mean read length obtained was 436.6 nucleotides. After complete sequence cleanup and bioinformatic treatment (cf. Section 2), 12,461,774 and 20,615,839 nucleotides were included in the viral genome analyses of the HA and NA genes respectively. Sequences obtained were mapped against positions 1–1752 of segment 4 and positions 16–1443 of segment 6, covering both entire coding regions. Average coverage depth ranged across patients from 530 to 2356 reads per position for HA and from 1311 to 3401 reads per position for NA. 3.2. Global levels of intra-host variability We estimated global intra-host viral variability as the average proportion of nucleotides differing from the sample’s consensus

nucleotide at any position. We studied whether, in patients under 2 years of age, this would be affected by age in relation to immunological maturation processes. No clear age-related pattern emerged (Fig. 1), but the youngest patient (aged 23 days), here named patient I, showed the greatest viral variability for both genes while estimates were very stable across the other six patients for both segments. The average intra-host variability was higher for segment 4 than for segment 6 (p < 0.001, or p = 0.002 when excluding patient I). 3.3. Patterns of genetic variability along the viral genes We then mapped the intra-host viral genetic variability on the two viral gene segments to identify possible regional variability patterns and hot spots. Figs. 2 and 3 show the levels of intra-host viral variability at each position on the influenza segments 4 and 6, respectively, for the seven patients. Three main patterns emerged. (i) On segment 4, five main variable regions visually stood out (positions 10–34, 502–552, 982–1018, 1070–1153 and 1273–1316) (Fig. 2). (ii) On both genes, several positions showed a remarkable polymorphism pattern whereby the main alternative base was the same for all patients and its frequency was high and

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

51

Fig. 4. Ratio of linked to segregated minority alleles (r) as a function of physical distance (d) in pairs of successive polymorphic sites for the HA (s) and NA (+) genes in patient I. Each plot could be modelled with a decreasing exponential equation: r = 227  d 0.73 for HA (R2 = 0.93, solid line) and r = 80  d 0.86 for NA (R2 = 0.87, dotted line), that suggested a more efficient recombination in NA than HA.

comparable in all patients (e.g. positions 10, 505, 1003, 1105 and 1449 on segment 4; positions 981, 1040, and 1048 on segment 6). Visual inspection of the individual reads showed that these polymorphisms could be accounted for by insertions or deletions in stretches of several identical bases, a common artefact in 454 datasets (reviewed in Kunin et al., 2009). This emphasizes the need for careful visual inspection of the reads when analysing genetic diversity at select positions even when stringent clean-up programs such as the GATK are used. (iii) At nine positions on both segments, patient I showed minority alleles present at a roughly comparable proportion across positions while the other patients showed little or no polymorphism at these particular positions. Visual inspection of the aligned individual reads showed that those minority alleles tended to occur on the same strands (distinct from the strands bearing the consensus alleles at these positions), suggesting the existence of at least one well defined minority haplotype. 3.4. Minority alleles of patient I We sought to quantify the degree to which patient I’s distinctive minority alleles were physically linked, i.e. located on the same reads. To this end, we analysed all the reads covering pairs of successive polymorphic sites among the nine sites highlighted on each of the two segments for this patient. Fig. 4 shows that the proportion of reads bearing two linked minority alleles tended to decrease when physical distance between the two positions increased. This was strongly suggestive of some form of recombination between two parental haplotypes, more efficient in the case of NA than HA (Fig. 4). Such recombination can be either natural

(occurring in patients) or artefactual (caused by PCR; see Section 4). 3.5. Phylogenetic reconstruction In patient I, recombination (artefactual or natural) likely occurred between two parental haplotypes for each gene: a majority one (the consensus) and a minority one differing from the consensus at nine positions. We investigated the phylogenetic relationships between the minority haplotype and the consensus of the seven patients, all sampled from the same urban area within a few weeks of one another. Fig. 5 shows that patient I’s minority haplotypes did not simply evolve from their respective consensus haplotypes within that patient. Rather, in the case of HA, the minority haplotype diverged early and was later combined with the majority haplotype upon co-infection at some point of the transmission chain leading to patient I. In the case of NA, patient I’s consensus haplotype apparently diverged first, and was later combined with patient J’s consensus haplotype, the latter persisting in patient I as a minority (patients I and J live 9 km apart in a densely populated area). 3.6. Low frequency neuraminidase inhibitor resistance alleles Very deep pyrosequencing may enable identification of low frequency alleles in viral populations. We therefore examined whether alleles conferring some resistance to neuraminidase inhibitors were present at low levels within the individual hosts. We detected that patient A harboured a guanine instead of an adenine in 3 of 1760 reads at position 904, resulting in amino acid

52

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

Fig. 5. Maximum likelihood phylogenetic analysis for the HA (top) and NA (bottom) genes of the seven patients, including both the consensus and minority haplotypes of patient I. Percent bootstrap support values are shown next to the corresponding nodes when >70. Scale bars indicate the estimated number of substitutions per site. Filled arrows indicate patient I’s consensus haplotypes (csn), while open arrows indicate this patient’s minority haplotypes (mi). Although various nodes in the HA phylogeny had a bootstrap support value under 70, that exact same topology was yielded by the maximum parsimony method (data not shown).

difference N295S (N1 numbering). This substitution has been associated with reduced sensitivity to oseltamivir and zanamivir (Collins et al., 2008; Kiso et al., 2004; Le et al., 2005). These were the only polymorphisms detected at that otherwise completely homogeneous position, and they were not accounted for by an insertion or a deletion. The Phred quality scores for the three guanines were 62, 66 and 70 respectively (meaning an estimated sequencing error probability in the 10 7–10 6.2 range for these bases). We also detected that patient C had an adenine instead of a guanine in 3 of 2172 reads at position 760, resulting in amino acid difference S247N (N1 numbering). These polymorphisms were not accounted for by indels, and the adenines had Phred qualities of 46, 62 and 68 respectively. (One single thymine was also present at that position, but its read had multiple upstream indels.) The S247N substitution has been showed to cause an increase in oseltamivir resistance, with remarkable synergy with the H275Y mutation, as well as modest increases in zanamivir and peramivir resistance (Hurt et al., 2011). 4. Discussion We investigated intra-host viral variability in seven children infected with the influenza A H1N1 (2009) pandemic virus using pyrosequencing, with the aim to provide a description of some intra-host viral variability patterns in these human patients. Global levels of intra-host viral diversity were not clearly correlated with age in these very young patients, though a dataset consisting of a larger number of patients of different ages would provide greater statistical power for such analysis. As global estimates of intra-host viral diversity, we used the average proportions of non-consensus nucleotides. This

measurement is similar in nature to Iqbal et al.’s ‘‘substitution frequency’’ (Iqbal et al., 2009) or Murcia et al.’s and Hughes et al.’s ‘‘mutation frequency’’ (Murcia et al., 2010, 2012, 2013; Hughes et al., 2012). The intra-host diversity values reported here for the 2009 pandemic hæmagglutinin in humans (4.2–6.9  10 3 substitutions per nucleotide, s/nt) are higher than those reported for H5 and H7 in birds (1.7  10 4–7.0  10 4 s/nt; Iqbal et al., 2009), H3 in horses (1.8–7.5  10 4 s/nt; Murcia et al., 2010, 2013; Hughes et al., 2012) and H1 in swine (2.8  10 4–1.3  10 3 s/nt; Murcia et al., 2012). This could be attributed to several nonexclusive effects, for instance human babies being more permissive to a more diverse swarm of viruses, or the swarms being more prone to diversification. As suggested by Murcia et al. (2012), it could also be an effect of deeper coverage in the present study causing increased detection of lower frequency variants. As pyrosequencing was used here while others used Sanger sequencing of clones, higher diversity estimates could also reflect higher sequencing error rates. It is clear that errors can contribute to some degree to the variability estimates. These estimates being different for one of the patients while all patients were processed in a single sequencing reaction however suggests that at least part of the variability observed was not accounted for by sequencing errors. Interstudy comparisons should nonetheless remain very cautious given the differences in technology used. General patterns of intra-host variability along the HA gene in our dataset (Fig. 2) did not closely mimic patterns of inter-host (Gog et al., 2007) or intra-host (Murcia et al., 2012) variability detected in other, non-pandemic H1 hæmagglutinins. This suggests that the hæmagglutinin of the 2009 pandemic virus may possibly have its own variability hot spots; this hypothesis would be reinforced by examination of a large number of diverse samples at the intra-host or inter-host level.

V. Bourret et al. / Infection, Genetics and Evolution 33 (2015) 47–54

Patient I, despite their very young age (23 days at sampling), harboured two distinct haplotypes for both HA and NA. The HA and NA minority haplotypes did not simply evolve from patient I’s respective consensus haplotypes; they evolved separately and were combined at different points in the transmission chains, suggesting complex and rapid reassortment phenomena on a local epidemiological scale. For both genes in this patient, the two distinct alleles showed signs of homologous recombination. Although this phenomenon could be natural (i.e. occurring in the patient), it could also be artefactual, occurring during conventional PCR in sample preparation (see discussions in e.g. Görzer et al., 2010; Bourret et al., 2013). Artefactual chimeras are believed to occur when incomplete extension events result in partially elongated segments that later act as mismatching primers (reviewed in Judo et al., 1998; Smyth et al., 2010). In patient I’s sample preparation, an identical absolute elongation time was used for all segments, and the longer HA was amplified less efficiently than the shorter NA (data not shown). So while PCR recombination would in this model be expected to affect HA more than NA, NA recombination appeared more efficient (Fig. 4). This is not a formal proof that recombination was natural however (see e.g. debates in Gibbs et al., 2001; Worobey et al., 2002). Low frequency mutations (