* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ancient Ciphers: Minireview Translation in
Epigenetics of neurodegenerative diseases wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
RNA interference wikipedia , lookup
Metagenomics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Polyadenylation wikipedia , lookup
Gene expression profiling wikipedia , lookup
History of RNA biology wikipedia , lookup
Minimal genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Point mutation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Expanded genetic code wikipedia , lookup
Messenger RNA wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding RNA wikipedia , lookup
Transfer RNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Cell, Vol. 89, 1007–1010, June 27, 1997, Copyright 1997 by Cell Press Ancient Ciphers: Translation in Archaea Patrick P. Dennis Department of Biochemistry and Molecular Biology and Canadian Institute for Advanced Research Program in Evolutionary Biology University of British Columbia Vancouver, British Columbia V6T 1Z3 Canada The process of translation occurs on subcellular particles called ribosomes and involves the decoding of nucleotide sequence information carried on mRNA into amino acid sequence of protein. The central and universal features of translation, decoding and amino acid polymerization, utilize RNA-based chemistry that is aided by the presence or activity of numerous structural and catalytic proteins. Although translation is sequential and repetitive, the complexity of the machinery involved and the sophistication at each step in the process are extraordinary. Our understanding of this process is greatly enhanced through comparative studies using organisms from different phylogenetic lineages. Of particular interest are the Archaea because they are the closest prokaryotic relatives to the nuclear component of eucaryal cells. As such, they provide a glimpse of bacterial features mixed with rudimentary features that have become more highly elaborated within the eucaryal lineage. Two instructive examples of this are protein synthesis initiation and precursor rRNA processing. Archaea frequently utilize Shine-Dalgarno sequences to identify AUG, GUG, or UUG translation initiation codons. Remarkably, the regiment of initiation factors employed are clearly homologous to those of eukaryotes that participate in the identification of the AUG initiation codon using a scanning mechanism. How are these bacterial and eucaryal features integrated and what special advantage does such a mixed system provide to the Archaea? Similarly, archaeal pre-rRNA processing generally follows the bacterial paradigm. But Archaea also possess a protein homologous to eucaryal fibrillarin and, in at least one instance, require a trans-acting RNA for the endonucleolytic cleavage of pre-rRNA. Is the trans-acting RNA related to the small nucleolar RNAs (snoRNAs) of Eucarya, and how wide is its distribution? Is it also present in bacteria, and does it possibly function in the 59 end maturation of small subunit rRNA? rRNA Transcription Units and rRNA Processing Within the Archaea, there is considerable variation in the number and structure of rRNA transcription units. The number of units range between one and four per genome, with multiple units always well separated from each other within the genome. In the euryarchaeota branch of Archaea (see figure 1 in Olsen and Woese, 1997 [this issue of Cell]), the units are generally bacteriallike, containing a spacer tRNAAla and a distal tRNACys gene as well as a cotranscribed 5S gene. In the crenarchaeota, the units are more eucaryal-like, generally containing only 16S and 23S rRNA genes. In this instance, the 5S gene is relocated and independently transcribed. Minireview All operons share the common feature of long inverted repeats surrounding both the 16S and 23S rRNA genes. In bacteria, these repeats form helical structures within the primary transcript that are recognized and cleaved by the duplex-specific endonuclease, RNaseIII. Although RNaseIII is not an essential activity in E. coli, the alternate routes for pre-rRNA processing are much less efficient. These initial cleavages excise pre-16S and pre23S rRNA from this primary transcript. Archaea also possess a helix-specific endonuclease, but it is unrelated to bacterial RNaseIII (Figure 1). The substrate requirements of this enzyme have been shown to consist of a helical structure with two 3-base loops on opposite strands separated by four base pairs (Thompson and Daniels, 1988). Endonucleolytic cleavage occurs between the second and third bases of the two loops and thereby releases pre-16S or pre-23S rRNA (Figure 1). Interestingly, this bulge-helix-bulge (BHB) endonuclease is also employed to excise archaeal introns from the transcripts of intron-containing tRNA and rRNA genes (Belfort and Weiner, 1997 [this issue of Cell]). In these instances, the exon–intron junctions are capable of folding into the requisite bulge-helix-bulge structure. As more rRNA operon sequences become available, it is becoming evident that many of the processing helices contain aberrant bulge-helix-bulge motifs. Without this motif, how is pre-rRNA processing initiated? The best studied case is in Sulfolobus acidocaldarius, where the 16S helix contains only a 2-base bulge on the 59 side. Analysis of in vivo rRNA processing intermediates indicated that little or no cleavage occurs within this region of the helical stem. Instead, cleavage occurs at two sites further upstream. An in vitro processing system to study the two cleavages in the 59 ETS and the cleavage at the 59 ETS-16S junction has been developed. The cleavage activity is sensitive to digestion by micrococcal nuclease, implying that the responsible endonuclease contains one or more essential RNA components (Potter et al., 1995). We previously reported the cDNA cloning of a U3-like RNA from the processing activity (Potter et al., 1995). However, we have been unsuccessful in locating this sequence within the genome of S. acidocaldarius and now have evidence that it arose through contamination of the Taq DNA polymerase used in the original amplification protocol. Nevertheless, the endonucleolytic processing activity is sensitive to nuclease digestion, and we believe that it may contain RNAs related to the snoRNAs that have been implicated in the processing of pre-rRNA in Eucarya. The gene encoding a protein related to eucaryal fibrillarin was first identified in Methanococcus vannielii and has now been found in other Archaea. In Eucarya, fibrillarin is localized to the fibrillar region of the nucleolus and is essential for pre-rRNA processing and ribosome subunit biogenesis. Antibodies against fibrillarin coprecipitate a large number of snoRNAs that have been implicated in pre-rRNA processing. Among these, U3, U14, snR30, and MRP RNA are essential in yeast. The presence of an archaeal fibrillarin-like protein and an RNAcontaining endonuclease activity implicated in pre-rRNA Cell 1008 Figure 2. Ribosomal Protein L1 Binding Site Figure 1. Structure of Bulge-Helix-Bulge Motifs The structure of the BHB motifs are shown for the 16S processing stem of H. cutirubrum (Hcu) in pre-rRNA, the exon-intron junctions of Haloferax volcanii (Hvo) in pre-tRNA trp, the 16S processing stem of S. acidocaldarius (Sac) in pre-rRNA, and the 16S processing stem of M. jannaschii (Mja) in pre-rRNA. The first two motifs are cleaved by the BHB endonuclease at the indicated positions (arrows), whereas the third is not cleaved; cleavage on the fourth has not been examined. processing indicates that at least some Archaea may contain the rudiments of a eucaryotic-like pre-rRNA processing system. The archaeal RNPs may be involved in or required for one or more of the maturations at the 59 and 39 ends of small and large subunit rRNAs. Perhaps the elusive 16S rRNA maturase of bacteria also contains a small RNA component. r-Protein Genes and Translational Regulation The complete sequence of the Methanococcus jannaschii genome provides the first clear and comprehensive picture of archaeal organization of ribosomal protein genes (Bult et al., 1996). A total of 60 genes exhibiting sequence similarity to known bacterial, archaeal, and eukaryotic ribosomal protein encoding genes has been identified. In general, the archaeal r-proteins are more similar in sequence to their eucaryal than to their bacterial homologs. However, as in bacteria, 53 of the 60 genes are in 15 separate multigene clusters, the largest of which contains 16 r-protein genes. Many of the clusters also contain genes encoding other essential components such as transcription and translation factors and subunits of RNA polymerase. The nucleotide spacing between genes within a cluster is generally between 10 and 70 nucleotides or more. Evidence from other archaeal species indicates that at least some of the genes within these clusters are cotranscribed. The most interesting and best studied example is the L1, L10, L12 gene cluster. Expression of the proteins from this operon appears to be subject to a bacterial-like autogenous translational control, whereas the proteins themselves are more eucaryal in character. In E. coli, protein L1 is the translational regulator of the L11, L1 bicistronic operon. In the absence of sufficient 23S rRNA, the L1 protein can bind to the translational operator—a structural mimic of the 23S rRNA binding site that is present in the 59 leader of the L11, L1 mRNA—and prevent further translation. In halophilic and methanogenic Archaea, the L11 gene is in a separate transcription unit and the L1 gene is cotranscribed with the L10 The structural mimic of the L1 binding site in 23S rRNA that is present in the coding region of the M. vannielii L1 gene is shown on the left. The amino acid sequence encoded by the RNA is also shown from residue 10 to residue 22. A consensus L1 binding motif for archaeal (methanogens and halophiles) and bacterial mRNAs is illustrated on the right with the conserved core nucleotides indicated. and L12 genes. In halophiles, the L1 translational operator is located in the unusually long (64 nt) 59 untranslated leader of L1, L10, L12 mRNA (Figure 2). The situation in M. vannielii is even more intriguing. Here the site of the L1 translational operator has been transposed from the leader to a position between nucleotides 30 and 60 inside the L1 coding region (Hanner et al., 1994). Sitedirected mutagenesis has been used to demonstrate the importance of this motif in translational regulation. A nearly identical motif is present at the corresponding position in the M. jannaschii L1 gene. As one might expect, the translational operator in Methanococcus genes lies in a region that is not highly conserved in other members of the L1 protein family. Further work is required to determine if there are other r-protein translational operators in Archaea. Translation Initiation Signals One of the major challenges in the analysis of raw DNA sequence data is the identification of functional ORFs. Generally, identification is based on similarity to known sequences in the database, although additional features such as ORF length and codon utilization are also used. The limit of a coding region is then set according to the position of in-frame translation initiation and termination codons. In Archaea, the situation is further complicated by the frequent use of alternate GUG and UUG initiation codons. Since N-terminal amino acid sequence is seldom available, identification of the correct in-frame AUG, GUG, or UUG initiation codon is problematic. How serious is this problem and what are its consequences? Based on a partial survey of Methanococcus ORF assignments, it appears that AUG, GUG, and UUG are used about 70%, 25%, and 5% of the time, respectively, as initiation codons. In about half of the cases, the putative initiation codons are preceded at the appropriate 3–10 nt distance by at least a 4-base complementary match to the pyrimidine-rich sequence (59-AUCACCUCCU-39) at the 39 end of 16S rRNA. In the putative intergenic regions of closely spaced genes (i.e., in r-protein gene clusters), AUG, GUG, and UUG sequences are very prevalent but are seldom preceded at Minireview 1009 the appropriate distance by good Shine-Dalgarno sequences. In the limited instances where the authentic initiation codon is known with a high degree of certainty (i.e., from N-terminal protein sequence), a good ShineDalgarno sequence is also present. This raises the distinct possibility (i) that methanogenic Archaea may rely strongly on a Shine-Dalgarno sequence to recognize and utilize proper initiation codons, and (ii) that many of the assignments of putative initiation codons that lack Shine-Dalgarno sequences may be incorrect. A good example of this is the proposed M. jannaschii L1 r-protein ORF. The assigned AUG initiation codon is not preceded by any Shine-Dalgarno-like sequence. An inframe AUG triplet that is preceded by a very good ribosome binding site occurs at codon 14. The protein generated from this new initiation codon shows high similarity to the N-terminus of the L1 protein from M. vannielii. A similar situation exists with Sulfolobus. The 39 end of 16S rRNA contains the sequence 59-AUCACCUC-39, which is complementary to 59-GAGGUGAU-39 sequences in mRNA. In at least two instances, the L1 and L10 ribosomal protein genes of S. acidocaldarius, it has been suggested that the Shine-Dalgarno sequence overlaps with the GUG initiation codon rather than preceding it by the standard distance of 3 to 10 nt. It is difficult to imagine how this region of the mRNA could interact simultaneously with both the 39 end of 16S rRNA and the anticodon region of the initiator tRNA. More careful examination revealed the presence of an in-frame GUG and an in-frame UUG triplet at codon position 4 in the proposed L1 and L10 reading frames, respectively. It seems likely that these are the authentic initiation codons and that the normal spacing between the ShineDalgarno sequence and the translation initiation codon is maintained. In halophilic Archaea, 59 mRNA leader sequences are often very short or even absent. In these instances, translation initiation may occur by a scanning mechanism utilizing the first available AUG, without the aid of a Shine-Dalgarno interaction. However, within polycistronic mRNAs or RNAs with long 59 leaders, initiation codons are almost always preceded by sequences with some complementarity to the 39 end of 16S rRNA. Clearly, a full understanding of the mechanism used for selection of translation initiation codons will require a more detailed and systematic biochemical analysis of this initiation process. tRNAs and Amino Acyl tRNA Synthetases The genome of M. jannaschii contains a full complement of tRNA but only 16 of the expected 20 amino acyl tRNA synthetases (Bult et al., 1996). Missing are the enzymes for asparagine, cysteine, glutamine, and lysine. Generation of Gln tRNAGln and Asn tRNAAsn in halophilic Archaea is known to involve postcharging conversion of glutamate to glutamine and aspartate to asparagine, respectively (Curnow et al., 1996). It is possible that a similar pathway might be used to produce Cys tRNACys. The tRNACys could be charged with serine using seryl tRNA synthetase and then modified to cysteine. It is more difficult to imagine how a postcharging conversion could generate a Lys tRNALys. More likely, a lysyl tRNA synthetase exists but has escaped detection because of low sequence similarity to other lysyl tRNA synthetases. Archaea also possess a small number of proteins containing the unusual amino acid selenocysteine. As in nonarchaeal organisms, there is no selenocysteinyl tRNA synthetase; selenocysteinyl tRNASec is generated by first charging the tRNASec with serine using seryl tRNA synthetase and then converting the attached serine to selenocysteine with selenocysteine synthetase (Wilting et al., 1997). Initiation, Elongation, and Release Examination of the M. jannaschii genome sequence revealed the presence of eleven genes encoding proteins implicated in translation initiation, three implicated in elongation, and one implicated in termination-release (Bult et al., 1996). Surprisingly, of the 11 putative initiation factor proteins, 10 appear homologous to eucaryal initiation factors (see Table 1). The first is an eIF1A-like protein. In Eucarya, eIF1A binds to free 40S ribosomal subunits and prevents premature reassociation with 60S subunits. Three other proteins are the a, b, and g subunits of an eIF2-like complex. In Eucarya, eIF2 forms a ternary complex with met-tRNAinit and GTP. The complex binds to the 40S subunit to form a 43S preinitiation complex. Normally, the 43S complex then binds mRNA via its 7 methyl G cap; binding is mediated by the cap recognition factor eIF4E and the organizing factor eIF4G. Archaeal mRNAs do not contain the 59 end 7 methyl G modification and neither eIF4E nor 4G homologs appear to be present in M. jannaschii. There are three separate genes encoding members of the eIF4A family. These proteins have ATP-dependent RNA helicase activity; they function to remove secondary structure in eucaryl mRNA and permit scanning for the initiation codon by the 43S preinitiation complex. At the present time, it is uncertain if any or all of the M. jannaschii helicase proteins are required for protein synthesis initiation, if they are used exclusively for this purpose, or if they are employed to unwind or rearrange RNA in other cellular processes. Another initiation factor is homologous to eIF5. In Eucarya, this factor associates with the eIF2 ternary complex on the surface of the 40S subunit; when the initiation codon is properly aligned, it mediates hydrolysis of the GTP bound to eIF2 and facilitates the ejection of the eIF2·GDP and the initiation factors from the 40S subunit. This opens the way for entry of the 60S subunit to form the 80S initiation complex. A common feature of eucaryl eIF2 complexes is that they lack a guanine nucleotide exchange domain and therefore require an extrinsic guanine nucleotide exchange factor. Not surprisingly, two M. jannaschii genes encode homologs to the a and d subunits of the guanine nucleotide exchange factor eIF2B (see Table 1). The d subunit participates directly in the nucleotide exchange reaction whereas the a subunit is involved in regulation of initiation. The M. jannaschii eIF2a subunit contains a conserved serine at position 51, which in Eucarya can be phosphorylated by one of a number of different eIF2specific protein kinases. The eIF2Ba subunit acts as a sensor for phosphoserine at position 51 of EIF2a and, when present, prevents the guanine nucleotide exchange reaction. This blocks the formation of eIF2·GTP·mettRNAinit ternary complex and prevents further translation initiation. Since both the serine at position 51 of eIF2a Cell 1010 Table 1. Translation Initiation Factors Encoded by the Genome of M. jannaschii Gene Number Homolog Proposed Function 0445 0117 0097 1261 eIF1A eIF2a subunit eIF2b subunit eIF2g subunit antiassociation, binds to 40S subunit ternary complex with met-tRNAinit and GTP 1574 1505 0669 eIF4A eIF4A eIF4A ATP-dependent RNA helicase activity ATP-dependent RNA helicase activity ATP-dependent RNA helicase activity 1228 eIF5 stimulates GTP hydrolysis on eIF2 when initiation codon is properly aligned 0454 0122 eIF2Ba subunit eIF2Bd subunit guanine nucleotide exchange factor for eIF2 0262 IF2 (bacterial) ternary complex with fmet-tRNAinit and GTP (bacterial type containing a guanine nucleotide exchange domain) and the sensing eIF2Ba subunit are present in M. jannaschii, it is possible that this important exchange cycle is used to regulate protein synthesis initiation, perhaps in response to environmental stresses such as nutrient or energy limitation. No eIF2-specific protein kinases have yet been identified. The three other eucaryal subunits of eIF2B are apparently not present in M. jannaschii. The last initiation factor gene encodes a protein member of the bacterial IF2 family. In protein synthesis, bacterial IF2 carries out the same function as eIF2. However, it possesses a guanine nucleotide exchange domain and therefore does not require a guanine nucleotide exchange factor for regeneration. Is this IF2-like protein involved in initiation of protein synthesis in M. jannaschii or has it been recruited to carry out a separate and unrelated function? Saccharomyces cerevisiae also encodes a protein of this bacterial IF2 family. The gene is essential for yeast viability but its function is unknown. There is no evidence that it is required for protein synthesis. From the above list of factors, it would appear that Archaea utilize a hybrid system for translation initiation; the factors are mostly eucaryal-like whereas the basic mechanism for initiation codon selection in many and perhaps most instances is bacterial, utilizing an interaction of mRNA with the 39 end of 16S rRNA. There are three genes encoding elongation factor proteins. The first encodes a eucaryl-like EF1a, the second a eucaryl-like EF2 and the third, a bacterial-like EFTu. EF1a forms a ternary complex with aa tRNA and GTP and is responsible for the codon-dependent deposition of aa tRNA into the A site of an elongating ribosome. This protein is expected to require a guanine nucleotide exchange factor as it lacks its own nucleotide recycling domain. No candidate for a recycling protein has been identified in the M. jannaschii genome. The second factor, EF2, carries out GTP-dependent ribosome translocation. This protein possesses a recycling domain and no exchange protein is required. The third bacterial-like factor is a highly specialized Tu factor that deposits selenocysteinyl tRNA into the A site of the ribosome in the presence of a UGA codon and a selenocysteine insertion (SECIS) element somewhere on the mRNA (Wilting et al., 1997). In Methanococcus and eukaryotes, this element is never in the coding region but rather is usually located in the 39 untranslated region of the mRNA. In bacteria, the SECIS element is located immediately 39 to the UGA selenocysteine codon. A single eucaryal-like release factor capable of recognizing all three translation termination codons has been identified. In Eucarya, a second protein with GTPase activity is also required for efficient recognition of termination codons and hydrolysis of the nascent polypeptide from the p site peptidyl tRNA. The homolog of this auxillary protein has not been identified in M. jannaschii. In summary, the Archaea appear to utilize an intriguing mixture of bacterial and rudimentary eucaryl features in their translation apparatus. Recognizing how these features interface within Archaea will lead to a more thorough understanding of the disparate characteristics of bacterial and eucaryal translation and help to create a universal or general model for translation, applicable to all organisms. High on the list of topics demanding intellectual and detailed biochemical investigations are pre-rRNA processing, ribosomal subunit assembly, and translation initiation. The availability of archaeal genome sequences provides an excellent starting position for these studies but many challenges remain. Where there is uncertainty, there are bound to be surprises. Selected Reading Belfort, M., and Weiner, A. (1997). Cell, this issue. Bult, C.J., White, O., Olsen, G.J., Zhou, L., Fleischmann, R.D., Sutton, G.G., Blake, J.A., FitzGerald, L.M., Clayton, R.A., Gocayne, J.D., et al. (1996). Science 273, 1058–1072. Curnow, A.W., Ibba, M., and Söll, D. (1996). Nature 382, 589–590. Hanner, M., Mayer, C., Köhrer, C., Golderer, G., Gröbner, P., and Piendl, W. (1994). J. Bacteriol. 176, 409–418. Olsen, G.J., and Woese, C.R. (1997). Cell, this issue. Potter, S., Durovic, P., and Dennis, P.P. (1995). Science 268, 1056– 1060. Thompson, L.B., and Daniels, C.J. (1988). J. Biol. Chem. 263, 17951– 17959. Wilting, R., Schorling, S., Persson, B.C., and Böck, A. (1997). J. Mol. Biol. 266, 637–641.