Supplemental Data High Coding Density on the Largest

Acad. Sci. USA 100, 8823–8827. S14. Lyko, F., Ramsahoye, B., and Jaenisch, R. (2000). DNA methyl- ation in Drosophila melanogaster. Nature 408, 538–540.
79KB taille 1 téléchargements 263 vues
Supplemental Data

S1

High Coding Density on the Largest Paramecium tetraurelia Somatic Chromosome Marek Zagulski, Jacek K. Nowak, Anne Le Moue¨l, Mariusz Nowacki, Andrzej Migdalski, Robert Gromadka, Benjamin Noe¨l, Isabelle Blanc, Philippe Dessen, Patrick Wincker, Anne-Marie Keller, Jean Cohen, Eric Meyer, and Linda Sperling Dinucleotide Frequencies The frequencies of dinucleotides for the megabase chromosome sequence were calculated taking into account the frequencies of each individual nucleotide and are given as the ratio of observed to expected frequencies, fobs/exp. We found both CpG and TpA dinucleotides to be underrepresented with respect to their expected frequencies (ratios of 0.4 and 0.84, respectively). In contrast, the reciprocal GpC and ApT frequencies did not deviate from expected values, nor did the frequencies for any of the other dinucleotides. CpG dinucleotide depression is observed in organisms in which DNA can be methylated on cytosine residues, most notably vertebrates with CpG frequencies of 0.2–0.4 of the expected values [S1]. CpG dinucleotide frequency is thought to be depressed because 5MeC is unstable and readily deaminated to thymine, so that the mutation rate for methylated CpG dinucleotides to TpG (and CpA on the opposite strand) is high. Methylation of cytosines, usually but not always in CpG dinucleotides, has long been correlated with inactive genes and a “closed” chromatin conformation, although whether the methylation is cause or consequence of transcriptional (in)activity is still an open question. An attractive hypothesis is that DNA methylation is linked to histone methylation and the formation of heterochromatin [S2, S3]. It has been established in recent years, in organisms from fission yeast to man and including ciliates, that methylation of histone H3 on particular lysine residues (usually K9) can be targeted by small noncoding RNAs such as those produced by the RNase III-like enzyme Dicer and that histone H3 methylation is a sufficient condition for heterochromatin formation. Small double-stranded RNA molecules that arise from large double-stranded RNA molecules (such as those produced from transcription of multicopy elements randomly inserted in the genome) can target heterochromatin formation and gene silencing, for example of centromeric genes [S4] or transposons [S5, S6]. In ciliates, it is now clear that such a mechanism is

intimately involved in developmental DNA elimination processes [S7–S9] and can explain the non-Mendelian heredity of alternative rearrangement patterns [S10, S11]. At least in plants, DNA methylation on cytosine residues and histone H3 methylation on K9 are interdependent processes (each can be both cause and consequence of the other) and together ensure heterochromatin formation [S12, S13]. The emerging picture is that in organisms that do not methylate DNA, histone H3 methylation can suffice to target heterochromatin formation, but in organisms that can methylate DNA, or that methylate DNA at a specific developmental stage [S14], DNA methylation provides an extra level of regulation and/or stabilization. Before considering whether there could be cytosine methylation in Paramecium, it is important to ensure that the low fobs/exp value of 0.4 calculated for the entire megabase chromosome sequence is not a consequence of coding constraints, given that ⵑ77% of this sequence is coding (compared to 1%–3% coding density for vertebrate euchromatic DNA). Calculation of fobs/exp is not a fruitful approach for each of the codon positions. In the first two codon positions, CpG is strongly depressed: Paramecium uses AGA and AGG arginine codons almost exclusively and there are very few CGN arginine codons. However, this strong bias in codon usage may be unrelated to CpG depression. In the second and third codon positions, NCG codons are much less frequently used than synonomous NCA, NCT, or NCC codons, but this could still result from some unrelated codon usage bias. We therefore examined dinucleotides that span two codons, and the result of this analysis is presented in Supplemental Table S1. We asked what percentage of the codons for a given amino acid would end in C if the first nucleotide of

Table S1. Frequencies of Dinucleotides that Span Two Codons (A) Amino Acids with Four Codons

T C A G

T

C

A

G

0.37 0.16 0.40 0.07

0.49 0.12 0.30 0.09

0.31 0.18 0.41 0.10

0.43 0.05 0.44 0.08

(B) Amino Acids with Two Codons (XYT, XYC)

T C

Figure S1. Restriction Digestion of the Megabase Chromosome Isolated megabase chromosome DNA used for shotgun library construction was digested with BsiWI, and the size of the restriction products measured by CHEF gel electrophoresis. The sizes were compared to those of conceptual digestion products and the data fit by linear regression (r ⫽ 0.998).

T

C

A

G

0.77 0.23

0.81 0.19

0.67 0.33

0.88 0.12

These tables assess the frequency of dinucleotides that span two codons. The values presented in the table are the fraction of codons ending in a particular nucleotide, as a function of the first nucleotide of the following codon. The rows give the last nucleotide of the first codon and the columns give the first nucleotide of the second codon. Codons were counted for each amino acid and then summed with respect to the last nucleotide of the first codon, for all amino acids specified by four codons (A) and for all amino acids specified by two codons ending in C and T (B). Fractions were then calculated (the sum of the columns is 1.0). Calculations were made for the entire set of annotated megabase CDSs (244,209 codons). TCN Ser codons, CGN Arg codons, and CTN Leu codons were included in the four-codon category.

S2

Table S2. Clusters of Paralogous Genes Gene 1

Gene 2

Predicted Product

Amino Acid Identity

PTMB.200 PTMB.200 PTMB.200 PTMB.254c PTMB.260c PTMB.290c PTMB.310c PTMB.345c

PTMB.201c PTMB.202 PTMB.202 PTMB.255c PTMB.261c PTMB.291c PTMB.311c PTMB.346c

actin or actin-like actin or actin-like actin or actin-like trichocyst matrix protein hypothetical protein peptidase hypothetical protein steroid dehydrogenase

34% 38% 47% ⬍20%a 30% 25% 40% 56%

a

Both genes are similar to blastp subjects from the same protein family (Tetrahymena Granule Lattice Proteins GRLP4 and GRLP1) and can be aligned with each other and with other Paramecium Trichocyst Matrix Proteins; however, the amino acid identity between the two proteins is very low.

the next codon was a G. In each case where the comparison could be meaningful, i.e., for four-codon amino acids and for two-codon amino acids ending with C and T, we did find that the codon choice for the first amino acid was least likely to end in a C if the first nucleotide of the next codon was a G. We conclude that CpG dinucleotide frequency is depressed independently of coding constraints. The question therefore arises whether CpG dinucleotide frequencies depressed in Paramecium because of DNA methylation. The only methylated base that has been detected in Paramecium or other ciliates by biochemical means (chromatography of DNA reduced to nucleosides) is N6-methyl-adenine. However, a recent report in Stylonychia, using more sensitive molecular assays that rely on bisulfite modification of methylated bases followed by specific PCR amplification, did reveal cytosine methylation of transposon sequences at an early stage of macronuclear development [S15], suggesting that cytosine methylation might be involved in heterochromatin formation and DNA elimination in this ciliate. It is not known whether the CpG dinucleotide frequency is depressed in Stylonichia. If CpG frequency is depressed because of a specific type of mutation, namely deamination of 5MeC→T, then DNA methylation must occur in the micronucleus or in the zygotic nucleus for the mutation to be fixed. However, cytosine methylation in the developing macronucleus for the purpose of marking DNA for heterochromatin formation and elimination should constitute selective pressure to avoid CpGs in macronucleus-destined sequences, leading to depressed CpG ratios, even in the absence of any specific mutation mechanism. In Paramecium, N6-methyl-adenine is found in both macronuclear and micronuclear DNA [S16]. The same authors were unable to detect 5-methyl cytosine in macronuclear DNA. However, no experiments have yet been performed to try to identify cytosine methylation either in micronuclear DNA or at particular stages of macronuclear development. TpA depression was also observed on the megabase chromosome sequence, with a ratio of fobs/exp of 0.84. This could be a direct consequence of CpG depression, as discussed by Duret and Galtier [S17], since CpG depression owing to mutation of 5MeC to T tends to increase the frequencies of both T and A (but not of TpA), thus

lowering the TpA dinucleotide ratio fobs/exp. Alternatively, the TpA depression observed on the megabase chromosome may reflect the key role of this dinucleotide in macronuclear development. It is well established that TpA dinucleotides are important signals for recombination, both in the precise excision of IESs [S18] and in the deletion of multicopy elements leading to chromosome fragmentation [S11]. DNA elimination occurs between direct repeats (or degenerate reverse repeats in the case of IES excision) always containing TA dinucleotides, of which one copy remains in the macronucleus. It may be an advantage for the organism to avoid the TpA signal whenever possible in macronucleus-destined sequences, especially coding sequences. We have already observed a few instances of possible deletion of pseudo-IES elements from coding sequences, both in the megabase shotgun reads and in Genoscope primary sequence data, by internal comparison of the reads with each other (data not shown).

Supplemental References S1. Bird, A. (1980). DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499–1504. S2. Bird, A. (2001). Molecular biology. Methylation talk between histones and DNA. Science 294, 2113–2115. S3. Nakayama, J., Rice, J., Strahl, B., Allis, C., and Grewal, S. (2001). Role of histone H3 lysine 9 methylation in epigenetic control of heterochromatin assembly. Science 292, 110–113. S4. Volpe, T., Kidner, C., Hall, I., Teng, G., Grewal, S., and Martienssen, R. (2002). Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science 297, 1833– 1837. S5. Ketting, R., Haverkamp, T., van Luenen, H., and Plasterk, R. (1999). Mut-7 of C. elegans, required for transposon silencing and RNA interference, is a homolog of Werner syndrome helicase and RNaseD. Cell 99, 133–141. S6. Sijen, T., and Plasterk, R. (2003). Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426, 310–314. S7. Mochizuki, K., Fine, N., Fujisawa, T., and Gorovsky, M. (2002).

Table S3. Accuracy of GlimmerM Predictions Exons Data Set

Nucleotides

Initial

Internal

Terminal

Unique

Genes

429 CDS 141 CDS

87% 97%

52% 72%

47% 63%

47% 60%

49% 56%

29% 40%

GlimmerM predictions were compared to two sets of manually annotated genes. The first set (429 CDS) is the whole set of annotated CDS at the end of the manual phase of megabase chromosome annotation. The second set (141 CDS) is a subset of the first and consists of the CDS that were included in the reference set for GlimmerM training (see Experimental Procedures). Nucleotide accuracy is the percentage of the nucleotides in the manually annotated genes that were also in the GlimmerM predictions. The remaining columns present the sensitivity of the predictions for different types of exons and for genes, i.e., the percentage of each type of element in the manual annotation that is exactly predicted by GlimmerM.

S3

S8.

S9.

S10.

S11.

S12.

S13.

S14. S15.

S16.

S17.

S18.

Analysis of a piwi-related gene implicates small RNAs in genome rearrangement in Tetrahymena. Cell 110, 689–699. Taverna, S., Coyne, R., and Allis, C. (2002). Methylation of histone H3 at lysine 9 targets programmed DNA elimination in Tetrahymena. Cell 110, 701–711. Yao, M., Fuller, P., and Xi, X. (2003). Programmed DNA deletion as an RNA-guided system of genome defense. Science 300, 1581–1584. Meyer, E., and Garnier, O. (2002). Non-Mendelian inheritance and homology-dependent effects in ciliates. Adv. Genet. 46, 305–337. Le Moue¨l, A., Butler, A., Caron, F., and Meyer, E. (2003). Developmentally regulated chromosome fragmentation linked to imprecise elimination of repeated sequences in paramecia. Eukaryot. Cell 2, 1076–1090. Soppe, W., Jasencakova, Z., Houben, A., Kakutani, T., Meister, A., Huang, M.S., Jacobsen, S.E., Schubert, I., and Francsz, P.F. (2002). DNA methylation controls histone H3 lysine 9 methylation and heterochromatin assembly in Arabidopsis. EMBO J. 21, 6549–6559. Tariq, M., Saze, H., Probst, A.V., Lichota, J., Habu, Y., and Paszkowski, J. (2003). Erasure of CpG methylation in Arabidopsis alters patterns of histone H3 methylation in heterochromatin. Proc. Natl. Acad. Sci. USA 100, 8823–8827. Lyko, F., Ramsahoye, B., and Jaenisch, R. (2000). DNA methylation in Drosophila melanogaster. Nature 408, 538–540. Juranek, S., Wieden, H., and Lipps, H.J. (2003). De novo cytosine methylation in the differentiating macronucleus of the stichotrichous ciliate Stylonychia lemnae. Nucleic Acids Res. 31, 1387–1391. Cummings, D., and Goddard, J. (1974). Methylated bases in DNA from Paramecium aurelia. Biochim. Biophys. Acta 374, 1–11. Duret, L., and Galtier, N. (2000). The covariation between TpA deficiency, CpG deficiency, and G⫹C content of human isochores is due to a mathematical artifact. Mol. Biol. Evol. 17, 1620–1625. Gratias, A., and Be´termier, M. (2003). Processing of doublestrand breaks is involved in the precise excision of paramecium internal eliminated sequences. Mol. Cell. Biol. 23, 7152– 7162.