Download Supplemental Data High Coding Density on the Largest

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mutation wikipedia , lookup

Gene expression wikipedia , lookup

Genomic imprinting wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Replisome wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Genome evolution wikipedia , lookup

Biochemistry wikipedia , lookup

Transcriptional regulation wikipedia , lookup

X-inactivation wikipedia , lookup

Molecular cloning wikipedia , lookup

Community fingerprinting wikipedia , lookup

Secreted frizzled-related protein 1 wikipedia , lookup

Nucleosome wikipedia , lookup

DNA supercoil wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Expanded genetic code wikipedia , lookup

Molecular evolution wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Supplemental Data
S1
High Coding Density on the Largest
Paramecium tetraurelia Somatic Chromosome
Marek Zagulski, Jacek K. Nowak, Anne Le Mouël,
Mariusz Nowacki, Andrzej Migdalski,
Robert Gromadka, Benjamin Noël, Isabelle Blanc,
Philippe Dessen, Patrick Wincker, Anne-Marie Keller,
Jean Cohen, Eric Meyer, and Linda Sperling
Dinucleotide Frequencies
The frequencies of dinucleotides for the megabase chromosome
sequence were calculated taking into account the frequencies of
each individual nucleotide and are given as the ratio of observed
to expected frequencies, fobs/exp. We found both CpG and TpA dinucleotides to be underrepresented with respect to their expected
frequencies (ratios of 0.4 and 0.84, respectively). In contrast, the
reciprocal GpC and ApT frequencies did not deviate from expected
values, nor did the frequencies for any of the other dinucleotides.
CpG dinucleotide depression is observed in organisms in which
DNA can be methylated on cytosine residues, most notably vertebrates with CpG frequencies of 0.2–0.4 of the expected values [S1].
CpG dinucleotide frequency is thought to be depressed because
5MeC is unstable and readily deaminated to thymine, so that the
mutation rate for methylated CpG dinucleotides to TpG (and CpA
on the opposite strand) is high. Methylation of cytosines, usually
but not always in CpG dinucleotides, has long been correlated with
inactive genes and a “closed” chromatin conformation, although
whether the methylation is cause or consequence of transcriptional
(in)activity is still an open question. An attractive hypothesis is that
DNA methylation is linked to histone methylation and the formation
of heterochromatin [S2, S3].
It has been established in recent years, in organisms from fission
yeast to man and including ciliates, that methylation of histone H3
on particular lysine residues (usually K9) can be targeted by small
noncoding RNAs such as those produced by the RNase III-like enzyme Dicer and that histone H3 methylation is a sufficient condition
for heterochromatin formation. Small double-stranded RNA molecules that arise from large double-stranded RNA molecules (such as
those produced from transcription of multicopy elements randomly
inserted in the genome) can target heterochromatin formation and
gene silencing, for example of centromeric genes [S4] or transposons [S5, S6]. In ciliates, it is now clear that such a mechanism is
intimately involved in developmental DNA elimination processes
[S7–S9] and can explain the non-Mendelian heredity of alternative
rearrangement patterns [S10, S11].
At least in plants, DNA methylation on cytosine residues and
histone H3 methylation on K9 are interdependent processes (each
can be both cause and consequence of the other) and together
ensure heterochromatin formation [S12, S13]. The emerging picture
is that in organisms that do not methylate DNA, histone H3 methylation can suffice to target heterochromatin formation, but in organisms that can methylate DNA, or that methylate DNA at a specific
developmental stage [S14], DNA methylation provides an extra level
of regulation and/or stabilization.
Before considering whether there could be cytosine methylation
in Paramecium, it is important to ensure that the low fobs/exp value of
0.4 calculated for the entire megabase chromosome sequence is
not a consequence of coding constraints, given that ⵑ77% of this
sequence is coding (compared to 1%–3% coding density for vertebrate euchromatic DNA). Calculation of fobs/exp is not a fruitful approach for each of the codon positions. In the first two codon positions, CpG is strongly depressed: Paramecium uses AGA and AGG
arginine codons almost exclusively and there are very few CGN
arginine codons. However, this strong bias in codon usage may
be unrelated to CpG depression. In the second and third codon
positions, NCG codons are much less frequently used than synonomous NCA, NCT, or NCC codons, but this could still result from
some unrelated codon usage bias. We therefore examined dinucleotides that span two codons, and the result of this analysis is presented in Supplemental Table S1. We asked what percentage of the
codons for a given amino acid would end in C if the first nucleotide of
Table S1. Frequencies of Dinucleotides that Span Two Codons
(A) Amino Acids with Four Codons
T
C
A
G
T
C
A
G
0.37
0.16
0.40
0.07
0.49
0.12
0.30
0.09
0.31
0.18
0.41
0.10
0.43
0.05
0.44
0.08
(B) Amino Acids with Two Codons (XYT, XYC)
T
C
Figure S1. Restriction Digestion of the Megabase Chromosome
Isolated megabase chromosome DNA used for shotgun library construction was digested with BsiWI, and the size of the restriction
products measured by CHEF gel electrophoresis. The sizes were
compared to those of conceptual digestion products and the data
fit by linear regression (r ⫽ 0.998).
T
C
A
G
0.77
0.23
0.81
0.19
0.67
0.33
0.88
0.12
These tables assess the frequency of dinucleotides that span two
codons. The values presented in the table are the fraction of codons
ending in a particular nucleotide, as a function of the first nucleotide
of the following codon. The rows give the last nucleotide of the first
codon and the columns give the first nucleotide of the second codon. Codons were counted for each amino acid and then summed
with respect to the last nucleotide of the first codon, for all amino
acids specified by four codons (A) and for all amino acids specified
by two codons ending in C and T (B). Fractions were then calculated
(the sum of the columns is 1.0). Calculations were made for the
entire set of annotated megabase CDSs (244,209 codons). TCN Ser
codons, CGN Arg codons, and CTN Leu codons were included in
the four-codon category.
S2
Table S2. Clusters of Paralogous Genes
Gene 1
Gene 2
Predicted Product
Amino Acid Identity
PTMB.200
PTMB.200
PTMB.200
PTMB.254c
PTMB.260c
PTMB.290c
PTMB.310c
PTMB.345c
PTMB.201c
PTMB.202
PTMB.202
PTMB.255c
PTMB.261c
PTMB.291c
PTMB.311c
PTMB.346c
actin or actin-like
actin or actin-like
actin or actin-like
trichocyst matrix protein
hypothetical protein
peptidase
hypothetical protein
steroid dehydrogenase
34%
38%
47%
⬍20%a
30%
25%
40%
56%
a
Both genes are similar to blastp subjects from the same protein family (Tetrahymena Granule Lattice Proteins GRLP4 and GRLP1) and can
be aligned with each other and with other Paramecium Trichocyst Matrix Proteins; however, the amino acid identity between the two proteins
is very low.
the next codon was a G. In each case where the comparison could
be meaningful, i.e., for four-codon amino acids and for two-codon
amino acids ending with C and T, we did find that the codon choice
for the first amino acid was least likely to end in a C if the first
nucleotide of the next codon was a G. We conclude that CpG dinucleotide frequency is depressed independently of coding constraints.
The question therefore arises whether CpG dinucleotide frequencies depressed in Paramecium because of DNA methylation. The
only methylated base that has been detected in Paramecium or
other ciliates by biochemical means (chromatography of DNA reduced to nucleosides) is N6-methyl-adenine. However, a recent report in Stylonychia, using more sensitive molecular assays that rely
on bisulfite modification of methylated bases followed by specific
PCR amplification, did reveal cytosine methylation of transposon
sequences at an early stage of macronuclear development [S15],
suggesting that cytosine methylation might be involved in heterochromatin formation and DNA elimination in this ciliate. It is not
known whether the CpG dinucleotide frequency is depressed in
Stylonichia. If CpG frequency is depressed because of a specific
type of mutation, namely deamination of 5MeC→T, then DNA methylation must occur in the micronucleus or in the zygotic nucleus
for the mutation to be fixed. However, cytosine methylation in the
developing macronucleus for the purpose of marking DNA for heterochromatin formation and elimination should constitute selective
pressure to avoid CpGs in macronucleus-destined sequences, leading to depressed CpG ratios, even in the absence of any specific
mutation mechanism.
In Paramecium, N6-methyl-adenine is found in both macronuclear
and micronuclear DNA [S16]. The same authors were unable to
detect 5-methyl cytosine in macronuclear DNA. However, no experiments have yet been performed to try to identify cytosine methylation either in micronuclear DNA or at particular stages of macronuclear development.
TpA depression was also observed on the megabase chromosome sequence, with a ratio of fobs/exp of 0.84. This could be a direct
consequence of CpG depression, as discussed by Duret and Galtier
[S17], since CpG depression owing to mutation of 5MeC to T tends
to increase the frequencies of both T and A (but not of TpA), thus
lowering the TpA dinucleotide ratio fobs/exp. Alternatively, the TpA
depression observed on the megabase chromosome may reflect
the key role of this dinucleotide in macronuclear development. It is
well established that TpA dinucleotides are important signals for
recombination, both in the precise excision of IESs [S18] and in the
deletion of multicopy elements leading to chromosome fragmentation [S11]. DNA elimination occurs between direct repeats (or degenerate reverse repeats in the case of IES excision) always containing
TA dinucleotides, of which one copy remains in the macronucleus.
It may be an advantage for the organism to avoid the TpA signal
whenever possible in macronucleus-destined sequences, especially
coding sequences. We have already observed a few instances of
possible deletion of pseudo-IES elements from coding sequences,
both in the megabase shotgun reads and in Genoscope primary
sequence data, by internal comparison of the reads with each other
(data not shown).
Supplemental References
S1. Bird, A. (1980). DNA methylation and the frequency of CpG in
animal DNA. Nucleic Acids Res. 8, 1499–1504.
S2. Bird, A. (2001). Molecular biology. Methylation talk between
histones and DNA. Science 294, 2113–2115.
S3. Nakayama, J., Rice, J., Strahl, B., Allis, C., and Grewal, S.
(2001). Role of histone H3 lysine 9 methylation in epigenetic
control of heterochromatin assembly. Science 292, 110–113.
S4. Volpe, T., Kidner, C., Hall, I., Teng, G., Grewal, S., and Martienssen, R. (2002). Regulation of heterochromatic silencing and
histone H3 lysine-9 methylation by RNAi. Science 297, 1833–
1837.
S5. Ketting, R., Haverkamp, T., van Luenen, H., and Plasterk, R.
(1999). Mut-7 of C. elegans, required for transposon silencing
and RNA interference, is a homolog of Werner syndrome helicase and RNaseD. Cell 99, 133–141.
S6. Sijen, T., and Plasterk, R. (2003). Transposon silencing in the
Caenorhabditis elegans germ line by natural RNAi. Nature 426,
310–314.
S7. Mochizuki, K., Fine, N., Fujisawa, T., and Gorovsky, M. (2002).
Table S3. Accuracy of GlimmerM Predictions
Exons
Data Set
Nucleotides
Initial
Internal
Terminal
Unique
Genes
429 CDS
141 CDS
87%
97%
52%
72%
47%
63%
47%
60%
49%
56%
29%
40%
GlimmerM predictions were compared to two sets of manually annotated genes. The first set (429 CDS) is the whole set of annotated CDS
at the end of the manual phase of megabase chromosome annotation. The second set (141 CDS) is a subset of the first and consists of the
CDS that were included in the reference set for GlimmerM training (see Experimental Procedures). Nucleotide accuracy is the percentage of
the nucleotides in the manually annotated genes that were also in the GlimmerM predictions. The remaining columns present the sensitivity
of the predictions for different types of exons and for genes, i.e., the percentage of each type of element in the manual annotation that is
exactly predicted by GlimmerM.
S3
S8.
S9.
S10.
S11.
S12.
S13.
S14.
S15.
S16.
S17.
S18.
Analysis of a piwi-related gene implicates small RNAs in genome rearrangement in Tetrahymena. Cell 110, 689–699.
Taverna, S., Coyne, R., and Allis, C. (2002). Methylation of
histone H3 at lysine 9 targets programmed DNA elimination
in Tetrahymena. Cell 110, 701–711.
Yao, M., Fuller, P., and Xi, X. (2003). Programmed DNA deletion
as an RNA-guided system of genome defense. Science 300,
1581–1584.
Meyer, E., and Garnier, O. (2002). Non-Mendelian inheritance
and homology-dependent effects in ciliates. Adv. Genet. 46,
305–337.
Le Mouël, A., Butler, A., Caron, F., and Meyer, E. (2003). Developmentally regulated chromosome fragmentation linked to imprecise elimination of repeated sequences in paramecia. Eukaryot. Cell 2, 1076–1090.
Soppe, W., Jasencakova, Z., Houben, A., Kakutani, T., Meister,
A., Huang, M.S., Jacobsen, S.E., Schubert, I., and Francsz,
P.F. (2002). DNA methylation controls histone H3 lysine 9 methylation and heterochromatin assembly in Arabidopsis. EMBO
J. 21, 6549–6559.
Tariq, M., Saze, H., Probst, A.V., Lichota, J., Habu, Y., and
Paszkowski, J. (2003). Erasure of CpG methylation in Arabidopsis alters patterns of histone H3 methylation in heterochromatin. Proc. Natl. Acad. Sci. USA 100, 8823–8827.
Lyko, F., Ramsahoye, B., and Jaenisch, R. (2000). DNA methylation in Drosophila melanogaster. Nature 408, 538–540.
Juranek, S., Wieden, H., and Lipps, H.J. (2003). De novo cytosine methylation in the differentiating macronucleus of the
stichotrichous ciliate Stylonychia lemnae. Nucleic Acids Res.
31, 1387–1391.
Cummings, D., and Goddard, J. (1974). Methylated bases in
DNA from Paramecium aurelia. Biochim. Biophys. Acta 374,
1–11.
Duret, L., and Galtier, N. (2000). The covariation between TpA
deficiency, CpG deficiency, and G⫹C content of human isochores is due to a mathematical artifact. Mol. Biol. Evol. 17,
1620–1625.
Gratias, A., and Bétermier, M. (2003). Processing of doublestrand breaks is involved in the precise excision of paramecium internal eliminated sequences. Mol. Cell. Biol. 23, 7152–
7162.