* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How were introns inserted into nuclear genes?
Epigenetics of neurodegenerative diseases wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Metagenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Essential gene wikipedia , lookup
Non-coding RNA wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Pathogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Human genome wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression programming wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Alternative splicing wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Genome evolution wikipedia , lookup
Ridge (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Minimal genome wikipedia , lookup
History of RNA biology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Microevolution wikipedia , lookup
Primary transcript wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Helitron (biology) wikipedia , lookup
H~EVIEWS 10 Michel, F., Jacquier, A. and Dujon, B. (1982) Biochemie known to encode double-strand DNA endonucleases, one might speculate that these enzymes impart a selective advantage. If one assumes that these extremely active, site-specific endonucleases cleave at multiple secondary sites, albeit at low efficiency, then the recombinogenicity of the phage may be enhanced. Increased recombination might improve the genetic adaptability of the phage, thereby providing a selective advantage in evolving phage populations. However, all these scenarios remain speculative, and until there is a clear demonstration of the increased fitness of intron-containing over intronless phage variants in a particular environment or host cell, the parasite/symbiont debate will continue. 64, 867-881 11 Belfort, M. etal. (1986) Gene41, 93-102 12 Quirk, S.M., Bell-Pedersen, D. and Belfort, M. (1989) Cell 56, 455-465 13 Bell-Pedersen, D., Quirk, S., Aubrey, M. and Belfort, M. Gene (in press) 14 Michel, E and Dujon, B. (1986) Cell46, 323 15 Gott, J.M. et al. (1988) Genes Dev. 2, 1791-1799 16 Lambowitz, A. (1989) Cell56, 323-326 17 Dujon, B. Gene (in press) 18 Scazzocchio, C. (1989) Trends Genet. 5, 168-172 19 Pedersen-Lane, J. and Belfort, M. (1987) Science 237, 182-184 Acknowledgements I thank Debbie Bell-Pedersen, Mary Bryk, Tim Coetzee, Francois Michel, Sue Quirk, Jill Salvo, Joe Salvo, Renee Schroeder and David Shub for challenging discussions and for their critical reading of the manuscript. Expert preparation of the manuscript by Carolyn S. Wieland is much appreciated. Work in our laboratory is supported by grants from the NIH (GM39422) and NSF (DMB8502961). References 1 Pemtz, M.F. (1986) Nature332, 405 2 Chu, EK., Maley, G.F., Maley, F. and Belfort, M. (1984) Proc. Natl Acad. Sci. USA 81, 3049-3053 3 Cech, T.R. and Bass, B.L. (1986) Annu. Rev. Biochem. 55, 599-629 4 Gott, J.M., Shub, D.A. and Belfort, M. (1986) Cell 10, 81-87 5 Shub, D.A. et al. (1988) Proc. NatlAcad. Sci. USA 85, 1151-1155 6 Goodrich, H.A. et al. (1988) in MolecularBiology of RNA (Cech, T.R., ed.), pp. 59-66, Alan R. Liss 7 Cech, T.R. (1988) Gene73, 259-271 8 Burke, J.M. (1988) Gene73, 273-294 9 Davies, R.W. et al. (1982) Nature300, 719-724 A widely held view concerning classical introns that is, introns in nuclear genes for mRNAs, beginning with GT and ending with AG - is that most or all of them were present in the earliest ancestors of genes, and some have been removed or rearranged to produce the present distribution#, 2. Some introns, such as those flanking immunoglobulin-like domains, clearly are very ancient. But this cannot be true of the discordant introns that are found in some sets of homologous genes or domains - introns whose positions are similar but not identical, differing relative to codons for conserved amino acids or relative to the phase of the reading frame. Evidence for insertion The first clear evidence for intron insertions came from the serine protease family3. Similar conclusions followed from the variety of discordant intron positions in several genes for proteins with tandemly repeated domains (reviewed in Ref. 4): non-fibrillar collagens, transcription factor IIIA and fibronectin. In 20 Quirk, SM. etal. (1989) Nucleic Acids Res. 17, 301-315 21 Jacquier, A. and Dujon, B. (1985) Cell41, 383-394 22 Macreadie, I.G., Scott, R.M., Zinn, A.R. and Butow, R.A. (1985) Cell41, 395-402 23 Szostak, J.W., Orr-Weaver, T.L., Rothstein, R.J. and Stahl, F.W. (1983) Cell33, 25-35 24 Zinn, A.R. and Butow, R.A. (1985) Cell40, 887-895 25 Colleaux, L., D'Auriol, L., Galibert, E and Dujon, B. (1988) Proc. NatI Acad. Sci. USA 85, 6022-6026 26 Wenzlau, J.M., Saldanha, R.J., Butow, R.A. and Perlman, P.S. (1989) Cell56, 421-430 27 Delahodde, A. et al. (1989) Cell56, 431-441 28 Muscarella, D.E. and Vogt, V.M. (1989) Cell 56, 443-454 29 Chandry, P.S. and Belfort, M. (1987) Genes Dev. 1, 1028-1037 30 Woodson S.A. and Cech, T.R. (1989) Cell57, 335-345 31 Darnell, J.E. and Doolittle, W.F. (1986) Proc. NatlAcad. Sci. USA 83, 1271-1275 S . BELFORTIS IN THE WADSWORTHCENTERFORLABORATORIES} AND RESEARCH, NEW YORK STATEDEPARTMENT OF HEALTH,[ EMPIRESTATEPLAZA,PO Box 509, ALaaNY, NY 12201-0509, I USA AND SCHOOL OF PuBuc HEALTHSCIENCES~ UNIVERSITYAT I ALBANY,, STATE UNIVERSITY OF NEW YORK, EMPIRE STATE] PLaZa, A z ~ , H / ] How were intr0ns inserted into nuclear genes? JOHN H. ROGERS There is now abundant evidence that many introns have been inserted into nuclear genes after the divergence of multigene families, sometimes in a semi.regular pattern with respect to pre-existing domains. This note examines ways in which these insertions might have occurred using known molecular mechanisms. all these cases, introns fall at different though similar positions in different domains. The calcium-binding proteins of the calmodulin superfamity provide another extensive data set of discordant intron positions 5-7. The most graphic case is in the family that includes calmodulin and myosin alkali light chain, where four genes have c o m m o n introns in domains I, II and IV, but each has an intron at a different place in domain TIGJULY1989 VOL. 5, NO. 7 ©1989 Elsevier Science Publishers Ltd (UK) 0168 - 9479/89/'$0350 IVY 12237, USA [~EVIEWS III - apparently inserted after the separation of the four genes, close to the middle of what w o u l d then have been the longest exon. Discordant introns have even been discovered in the immunoglobulin superfamily. The immunoglobulinlike domain is the archetypal e x a m p l e of a domain e n c o d e d by an ancestral exon - as it is b o u n d e d by introns in h o m o l o g o u s positions in all the genes of the superfamily - but in the NCAM (neural cell adhesion molecule) gene*, the mouse CD4 geneg, and the rat P0 gene m, domains of this type are also split near the middle by introns that can fall in any phase of the reading frame. Tubulin and actin genes may also have acquired their introns by insertion, as they show unrelated intron patterns in different phyla (N. Dibb and A. Newman, EMBOJ., in press). The discordant positions of introns cannot reasonably be attributed to removal nor to movement. They cannot be accounted for purely by removal of ancestral introns, as some genes would have to have started off with many introns separated by only one or a few nucleotides. For example, there are pairs of serine protease genes with intron positions separated by 4 b p and by 1 bp. Moreover, because all the other introns in these genes are also in different places, one w o u l d have to postulate that these surviving introns are only a small proportion of the original number. Assuming random removal, the binomial distribution predicts that some introns w o u l d have been left in coinciding a q l tb AGGT 11DNA duplication AGGT AGGT L~ splicing AGGU b D K D N G D N S G ..... GAYAAxRAYGGx×AYGGx . . . . . . . E 0 GARYTx FIGH An intron could be created by duplication of exon sequences containing a cryptic, bidirectional splice site (a); but such sites could not have been present at some intron insertion sites within calcium-binding domains (b). The canonical sequence of calcium-binding domains of the calmodulin superfamily is shown, with the nucteotides required to encode it. Bold type, almost invariant residues (o, hydrophobic); arrowheads, positions of inserted introns in various genes (see Ref. 7). positions in different genes unless the original number had b e e n >50. Across a whole serine protease gene, this w o u l d mean an average exon length of <14 bp, which could not e n c o d e a structural motif even if it were plausible. Nor can the discordant positions of introns be accounted for by movement, as many of them would have to have moved across a nonintegral number of codons, often within strongly conserved coding sequence. Such an event w o u l d require separate frameshifting mutations at each end of the intron. which w o u l d seem to be excessively improbable given normal constraints on gene functions and splicing. According to models where the two ends frameshifted sequentially, there would be an intermediate stage in which the gene was inactive, or in which it underwent alternative splicing with at least half the transcripts frameshifted - not likely for an essential gene. In a model where the two ends frameshifted simultaneously, if the frequency of one neutral frameshift were (for example) a generous 1 in 10~ generations, the frequency of the required pair w o u l d be 1 in 101(, generations, which is longer than the age of the universe. Other scenarios would require esoteric circumstances which would also be highly unlikely. The consequentview of phylogeny As a consequence, it is doubtful whether intron positions can give reliable information about early evolution. The very fact that they cannot all be in original positions implies that they do not in all cases clearly define ancestral gene elements. Moreover, the positions of inserted introns arc clearly not random. In the serine protease genes, they tend to map to variable surface loops in the proteins L~. In the TFIIIA gene, they tend to m a p to the loops between domains L2. In other genes, they tend to fall near the middles of pre-existing exons 7.s,1~. This behaviour explains the general tendency of genes to have a rather uniform size of exons l~, which is exemplified by all the genes so far mentioned. So, apparent regularities in intron distribution do not necessarily imply that the introns were present in the ancestral gene. As some introns certainly have been cleanly removed in the course of evolution, there must be a long-term balance between removal and insertion. A consequence (regrettable from a Popperian point of view) is that in this situation one can give a probable explanation of any intron distribution but a definite explanation of none. But the operation of the prop o s e d equilibrium is exemplified by the cahnodulin gene: in comparison with related genes, it appears to have gained a gene-specific intron in domain III in vertebrates, but lost a c o m m o n intron from domain 1 in insects ~,. Given this dynamic equilibrium, it is poss i n e that a rare case of apparent frameshifting of an intron, in a carbonic anhydrase gene ~s, might actually be due to removal and re-insertion. In general, the balance between insertion and deletion seems t o have shifted according to selective pressures on the size of the genome. Large-genome organisms such as mammals and plants retain many of their ancestral and inserted introns', whereas smallg e n o m e organisms such as Drosophila and veast have Tl(; JUL',"1989 VOL. 5, xo. 7 _)lJ [~EVIEWS lost most of the introns that they had 16. They have probably entered an equilibrium such that the few introns which they do have - often in different positions from any in mammals - are most likely to be recently inserted ones. a ~ . Reverse. ""~f'i,~(~/. I, DNAinsertionof GroupII intron Mechanisms of insertion It is not yet clear h o w introns were inserted. When they were first discovered, it was widely supposed that they might be a form of transposable element which, by means of RNA splicing, avoided doing damage to genes in which it inserted itself (summarized in Ref. 17). However, as information accumulated about the characteristic sequences at the GT AG boundaries of classical introns and of transposable elements, it became clear that they did not resemble each other at ~ splicing all. Some transposons in maize, such as Ds, can get themselves excised by RNA splicing, but only imprecisely 18. Ds contains an upstream splice site close to its 5' end, and can find cryptic downstream splice sites that happen to exist in the b target sequence shortly after the insertion site. But transposons of this sort Group II c o n s e n s u s + xxx/GTGCGYx ...... RRRRGGx...xYTAYYYYAY/xxx I''''',,,,, IIIII ,,,,,,,,,''''''''' could not produce clean intron inserExample (Podospora) + CAG/GTGCGCC ...... AGGAGAG...CTTATCCTAC/ATA tions with no alteration in the surroundII III 1 . . . . . ,, ,,, . ,I* Classical consensus xAG/GTRAGTA ........ CURAY...YYYYYYxYAG/Gxx ing sequences. The discovery of self-splicing group I FIG[] introns gave rise to renewed speculation A group II intron could mutate into a classical intron. (a) Proposed sequence of about intron insertion19, and group I events. (b) Example of a group II intron (from Ref. 22) which would have introns are now known to insert then> classical splice signals given a single-base mutation ('). selves (see below), but group I introns remain resolutely unlike the classical introns. Many of the introns in the genes for serine nuclear introns. It is possible that classical introns were proteases and calcium-binding proteins have apparentinserted by a mechanism that no longer exists in the ly been inserted into highly conserved coding regions, limited range of phyla 'that have been studied within which certain nucleotides must always have although more extensive study of Protista, and of their been present to encode essential amino acids. So one very diverse genetic processes, might well uncover can ask whether these nucleotides adhere to the consuch a mechanism. But it would be more satisfactory if sensus for a bidirectional splice site as given above. one could identify a plausible mechanism among An analysis of 14 of these introns (not shown) shows known molecular processes. Two possibilities are disthat they do not; and there are several examples cussed here. where even the required AGGT sequence could not The first mechanism (Fig. la) requires no extranehave been present (Fig. lb). Ahhough one could ous genetic element. In principle, an intron could be invoke special circumstances to evade this conclusion created by tandem duplication of exonic sequences in any particular case, the scarcity of the splice site within which there happened to be a cryptic 'bidirecconsensus implies that the model does not generally tional' splice site - that is, a sequence resembling (Y)nNCAGGTAAGT, where the bold nucleotides would apply. The second hypothesis is that individual classical be obligatory. This sequence could function as an introns have evolved from self-inserted group II upstream splice site in the 5' copy and as a downintrons (Fig. 2a). The evidence that some group II stream splice site in the 3' copy, so the sequence introns, in mitochondria and chloroplasts, are capable duplication could be counteracted by RNA splicing of self-insertion is circumstantial but persuasive 20. without need for further mutations. Such a mechanism Some of them encode a polypeptide with homology to would neatly explain the tendency for genes to reverse transcriptase 21. For one such intron, in mitobecome subdivided into exons of uniform size ~4, usuchondria of the fungus Podospora, there is also a corally between one and two times the minimum size for responding DNA plasmid which is a precisely circularan intron. (I thank K. Kato and N. Dibb for discussion ized copy of the intron22. The existence of reverse concerning this hypothesis.) transcriptase activity in mitochondria is also suggested Unfortunately for this hypothesis, it is testable, and by the high frequency of mutations in which several it is false, at least as regards the majority of inserted A:0mouS TIC;JULY1989 VO1. 5, NO. 7 215 IMu,.,so ~EVIEWS group I and II introns are precisely and simultaneously removed from the mitochondrial DNA a3,24. Some group II introns are self-splicing and so would not inactivate a gene into which they inserted themselves 2s,e6. And many relics are known of DNA transfer between mitochondria, chloroplasts and the nucleus, so it is probable that group II introns would be able to invade nuclear DNA - although this has not yet been shown, even for the Podospora intron plasmid 2v. The exact mechanism of group II intron insertion is not known and is not crucial to the present argument. It might be as shown in Fig. 2a, with reverse transcription of the excised intron RNA, following re-opening of the lariat structure by a 'de-branching enzyme' such as has been described 2s. Or there might be reverse transcription of a primary transcript RNA, producing a cDNA with introns which could insert by homologous recombination. Alternatively, insertion might take place at the DNA level, as with the group I introns. Many group I introns encode site-specific endonucleases that cleave DNA homologous to the site in which the intron resides, and thus trigger intron insertion as a gene conversion event (reviewed in Refs 29, 30). A similar activity is shown by the protein product of a retroposon (R2Bm) which has homology to reverse transcriptase3~. It is not known whether this is due to an additional endonuclease domain, or whether the reverse-transcriptaselike domain itself could have endonuclease activity; unfortunately, endonucleases cannot always be recognized by homology alone. If introns do spread by sitespecific cDNA insertion or by site-specific homologous gene conversion, as suggested by these examples, insertion into non-homologous sites might occur as an occasional error in the process. Whatever the mechanism of group II intron insertion, once such an intron is inserted, it might take only a single base change to convert the group II intron into a classical intron (Fig. 2b). Both types of intron have similar consensus sequences for splicing, and an identical mechanism in which the 5' end of the intron is joined to the 2' hydroxyl of an internal adenosine to form a lariat 2s.2~'. One pyrimidine to guanine substitution might effect the conversion. And there might be considerable selective pressure in favour of such a conversion, since it would transfer control of the splicing operation from the autonomous intron (whose self-abnegation might not be entirely efficient from the point of view of the host) to the host's own nuclear splicing mechanism. The characteristic group II intron sequences would then be superfluous and would rapidly decay. It has already been proposed that the nuclear snRNP machinery evolved from group-II-like introns acting in trans'9; it may be that individual nuclear introns subsequently ew~lved from individual group II insertions. The apparently non-random distribution of intron insertions could be produced by a variety of factors. There might well be a degree of sequence specificity in the insertions themselves. Even if not, there would probably be selection after insertion. Inserts close to pre-existing introns might be disfavoured because of the risk that both introns would be spliced out as a single unit, with the loss of the exonic sequences between them. Also, flanking exonic sequences are known to affect the splicing of group II introns32.33, so a new insert in some positions might disrupt existing splicing patterns, and a new insert might itself be spliceable only in certain positions in the gene. There may also be constraints on the pre-mRNA secondary structure. Thus there is plenty of scope for setting up non-random patterns of inserted introns. Insertion of autonomous group II introns, which are eventually captured and put under the control of the nucleus, could well explain the variety of intron positions seen in present-day genes. References 1 Doolittle, W.F. (1978) Nature 272, 581-582 2 Gilbert, W., Marchionni, M. and McKnight, G. (1986) Celi46, 151-154 3 Rogers, J. (1985) Nature315, 458459 4 Rogers, J. (1986) Trends Genet. 2, 223 5 Berchtold, M.W. et al. (1987)J. Biol. Chem. 262, 8696-8701 6 Smith, V.L. etal. (1987)J. Mol. Biol. 196, 471485 7 Wilson, P.W. etal. (1988) J. Mol. Biol. 200, 615-625 8 Owens, G.C., Edelman, G.M. and Cunningham, B.A. (1987) Proc. Natl Acad. Sci. USA 84, 294-298 9 Littman, D.R. and Gettner, S.N. (1987) Nature325, 453-455 10 Lemke, G., Lamar, E. and Patterson. J. (1988) Neuron 1, 73-83 11 Craik, C.S., Rutter, w.J. and Fletterick, R. (1983) Science 220, 1125-1129 12 Tso, J.Y., Van den Berg, J. and Korn, L,1. (1986) Nucleic Acids Res. 14, 2187-2199 13 Odermatt, E., Tamkun, J.W. and Hynes, R.O. (1985) Proc. Natl Acad. Sci. USA 82, 6571-6575 14 Naora, H. and Deacon, NJ. (1982) Proc. NatlAcad. Sci. USA 79, 6196-6200 15 Yoshihara, C.M., Lee, J-D. and Dodgson, J.B. (1987) Nucleic Acids Res. 15, 753-770 16 Fink, G.R (1987) Cell49, 5-6 17 Cavalier-Smith, T. (1985) Nature 314, 283-284 18 Wessler, S.R., Baran, G. and Varagona, M. (1987) Science 237, 916-918 19 Sharp, R (1985) Ce1142, 397-399 20 Flavell, A. (1985) Nature 316, 574-575 21 Michel, F. and Lang, B.F. (1985) Nature316, 641-643 22 Osiewacz, H.D. and Esser, K. (1984) Curt Genetics 8, 299-305 23 Jacq, C. el al. (1982) in Mitochondrial Genes (Stonimski, P.R et al., eds), pp. 155-184, Cold Spring Harbor Laboratory Press 24 Gargouri, A., Lazowska, J. and Slonimski, P.R (1983) in Mitochondria 1983 (Schweyen, RJ., Wolf, K. and Kaudewitz, F. eds), pp. 259-268, W. de Gruyter 25 Peebles. C.L. et al. (1986) Celi44, 213-223 26 Van der Veen, R. et al, (1986) Ce1144, 225-234 27 Koll, F. (1986) Nature324, 597-599 28 Ruskin, B. and Green, M.R. (1985) Science 229, 135-140 2.9 Lambowitz, A.M. (1989) Cell 56, 323-326 30 Scazzocchio, C. (1989) Trends Genet. 5, 168-172 31 Xiong, Y. and Eickbush, T.H. (1988) Cell55, 235-246 32 Michel, F. and Jacquier, A. (1987) Cold Spring Harbor Syrup. Quant. Biol. 52, 201-212 33 Van der Veen, R., Arnberg, A.C. and Grivell, L.A. (1987) EMBOJ. 6, 1079-1084 J.H. ROGERS TIG JULY 1 9 8 9 VOL. 5, NO. 7 m IS IN TIlE DEPARTMENT OF PHYSIOLOGY, ] UNIVERSITYOF CAMBRIDGE, CAMBRIDGECB2 3EG, UK.