Download How were introns inserted into nuclear genes?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Metagenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

NUMT wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Essential gene wikipedia , lookup

Non-coding RNA wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Pathogenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Point mutation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Human genome wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression programming wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Alternative splicing wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Genome evolution wikipedia , lookup

Ridge (biology) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Minimal genome wikipedia , lookup

History of RNA biology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microevolution wikipedia , lookup

Primary transcript wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
H~EVIEWS
10 Michel, F., Jacquier, A. and Dujon, B. (1982) Biochemie
known to encode double-strand DNA endonucleases,
one might speculate that these enzymes impart a
selective advantage. If one assumes that these
extremely active, site-specific endonucleases cleave at
multiple secondary sites, albeit at low efficiency, then
the recombinogenicity of the phage may be enhanced.
Increased recombination might improve the genetic
adaptability of the phage, thereby providing a selective advantage in evolving phage populations.
However, all these scenarios remain speculative, and
until there is a clear demonstration of the increased
fitness of intron-containing over intronless phage
variants in a particular environment or host cell, the
parasite/symbiont debate will continue.
64, 867-881
11 Belfort, M. etal. (1986) Gene41, 93-102
12 Quirk, S.M., Bell-Pedersen, D. and Belfort, M. (1989)
Cell 56, 455-465
13 Bell-Pedersen, D., Quirk, S., Aubrey, M. and Belfort, M.
Gene (in press)
14 Michel, E and Dujon, B. (1986) Cell46, 323
15 Gott, J.M. et al. (1988) Genes Dev. 2, 1791-1799
16 Lambowitz, A. (1989) Cell56, 323-326
17 Dujon, B. Gene (in press)
18 Scazzocchio, C. (1989) Trends Genet. 5, 168-172
19 Pedersen-Lane, J. and Belfort, M. (1987) Science 237,
182-184
Acknowledgements
I thank Debbie Bell-Pedersen, Mary Bryk, Tim Coetzee,
Francois Michel, Sue Quirk, Jill Salvo, Joe Salvo, Renee
Schroeder and David Shub for challenging discussions and
for their critical reading of the manuscript. Expert preparation of the manuscript by Carolyn S. Wieland is much appreciated. Work in our laboratory is supported by grants from
the NIH (GM39422) and NSF (DMB8502961).
References
1 Pemtz, M.F. (1986) Nature332, 405
2 Chu, EK., Maley, G.F., Maley, F. and Belfort, M. (1984)
Proc. Natl Acad. Sci. USA 81, 3049-3053
3 Cech, T.R. and Bass, B.L. (1986) Annu. Rev. Biochem. 55,
599-629
4 Gott, J.M., Shub, D.A. and Belfort, M. (1986) Cell 10,
81-87
5 Shub, D.A. et al. (1988) Proc. NatlAcad. Sci. USA 85,
1151-1155
6 Goodrich, H.A. et al. (1988) in MolecularBiology of RNA
(Cech, T.R., ed.), pp. 59-66, Alan R. Liss
7 Cech, T.R. (1988) Gene73, 259-271
8 Burke, J.M. (1988) Gene73, 273-294
9 Davies, R.W. et al. (1982) Nature300, 719-724
A widely held view concerning classical introns that is, introns in nuclear genes for mRNAs, beginning
with GT and ending with AG - is that most or all of
them were present in the earliest ancestors of genes,
and some have been removed or rearranged to produce the present distribution#, 2. Some introns, such as
those flanking immunoglobulin-like domains, clearly
are very ancient. But this cannot be true of the discordant introns that are found in some sets of homologous genes or domains - introns whose positions are
similar but not identical, differing relative to codons
for conserved amino acids or relative to the phase of
the reading frame.
Evidence for insertion
The first clear evidence for intron insertions came
from the serine protease family3. Similar conclusions
followed from the variety of discordant intron positions in several genes for proteins with tandemly
repeated domains (reviewed in Ref. 4): non-fibrillar
collagens, transcription factor IIIA and fibronectin. In
20 Quirk, SM. etal. (1989) Nucleic Acids Res. 17, 301-315
21 Jacquier, A. and Dujon, B. (1985) Cell41, 383-394
22 Macreadie, I.G., Scott, R.M., Zinn, A.R. and Butow, R.A.
(1985) Cell41, 395-402
23 Szostak, J.W., Orr-Weaver, T.L., Rothstein, R.J. and Stahl,
F.W. (1983) Cell33, 25-35
24 Zinn, A.R. and Butow, R.A. (1985) Cell40, 887-895
25 Colleaux, L., D'Auriol, L., Galibert, E and Dujon, B.
(1988) Proc. NatI Acad. Sci. USA 85, 6022-6026
26 Wenzlau, J.M., Saldanha, R.J., Butow, R.A. and Perlman,
P.S. (1989) Cell56, 421-430
27 Delahodde, A. et al. (1989) Cell56, 431-441
28 Muscarella, D.E. and Vogt, V.M. (1989) Cell 56, 443-454
29 Chandry, P.S. and Belfort, M. (1987) Genes Dev. 1,
1028-1037
30 Woodson S.A. and Cech, T.R. (1989) Cell57, 335-345
31 Darnell, J.E. and Doolittle, W.F. (1986) Proc. NatlAcad.
Sci. USA 83, 1271-1275
S . BELFORTIS IN THE WADSWORTHCENTERFORLABORATORIES}
AND RESEARCH, NEW YORK STATEDEPARTMENT OF HEALTH,[
EMPIRESTATEPLAZA,PO Box 509, ALaaNY, NY 12201-0509, I
USA AND SCHOOL OF PuBuc HEALTHSCIENCES~ UNIVERSITYAT I
ALBANY,, STATE UNIVERSITY OF NEW YORK, EMPIRE STATE]
PLaZa, A z ~ ,
H
/
]
How were intr0ns inserted
into nuclear genes?
JOHN H. ROGERS
There is now abundant evidence that many introns have
been inserted into nuclear genes after the divergence of
multigene families, sometimes in a semi.regular pattern
with respect to pre-existing domains. This note examines
ways in which these insertions might have occurred using
known molecular mechanisms.
all these cases, introns fall at different though similar
positions in different domains. The calcium-binding
proteins of the calmodulin superfamity provide another
extensive data set of discordant intron positions 5-7.
The most graphic case is in the family that includes
calmodulin and myosin alkali light chain, where four
genes have c o m m o n introns in domains I, II and IV,
but each has an intron at a different place in domain
TIGJULY1989 VOL. 5, NO. 7
©1989 Elsevier Science Publishers Ltd (UK) 0168 - 9479/89/'$0350
IVY 12237, USA
[~EVIEWS
III - apparently inserted after the separation of the
four genes, close to the middle of what w o u l d then
have been the longest exon.
Discordant introns have even been discovered in
the immunoglobulin superfamily. The immunoglobulinlike domain is the archetypal e x a m p l e of a domain
e n c o d e d by an ancestral exon - as it is b o u n d e d by
introns in h o m o l o g o u s positions in all the genes of the
superfamily - but in the NCAM (neural cell adhesion
molecule) gene*, the mouse CD4 geneg, and the rat P0
gene m, domains of this type are also split near the
middle by introns that can fall in any phase of the
reading frame.
Tubulin and actin genes may also have acquired
their introns by insertion, as they show unrelated
intron patterns in different phyla (N. Dibb and A.
Newman, EMBOJ., in press).
The discordant positions of introns cannot reasonably be attributed to removal nor to movement. They
cannot be accounted for purely by removal of ancestral introns, as some genes would have to have started
off with many introns separated by only one or a few
nucleotides. For example, there are pairs of serine protease genes with intron positions separated by 4 b p
and by 1 bp. Moreover, because all the other introns
in these genes are also in different places, one w o u l d
have to postulate that these surviving introns are only
a small proportion of the original number. Assuming
random removal, the binomial distribution predicts
that some introns w o u l d have been left in coinciding
a
q l
tb
AGGT
11DNA
duplication
AGGT
AGGT
L~
splicing
AGGU
b
D
K
D
N
G
D
N
S
G
.....
GAYAAxRAYGGx×AYGGx . . . . . . .
E
0
GARYTx
FIGH
An intron could be created by duplication of exon sequences
containing a cryptic, bidirectional splice site (a); but such sites
could not have been present at some intron insertion sites
within calcium-binding domains (b). The canonical sequence of
calcium-binding domains of the calmodulin superfamily is
shown, with the nucteotides required to encode it. Bold type,
almost invariant residues (o, hydrophobic); arrowheads,
positions of inserted introns in various genes (see Ref. 7).
positions in different genes unless the original number
had b e e n >50. Across a whole serine protease gene,
this w o u l d mean an average exon length of <14 bp,
which could not e n c o d e a structural motif even if it
were plausible.
Nor can the discordant positions of introns be
accounted for by movement, as many of them would
have to have moved across a nonintegral number of
codons, often within strongly conserved coding
sequence. Such an event w o u l d require separate
frameshifting mutations at each end of the intron.
which w o u l d seem to be excessively improbable given
normal constraints on gene functions and splicing.
According to models where the two ends frameshifted
sequentially, there would be an intermediate stage in
which the gene was inactive, or in which it underwent
alternative splicing with at least half the transcripts
frameshifted - not likely for an essential gene. In a
model where the two ends frameshifted simultaneously,
if the frequency of one neutral frameshift were (for
example) a generous 1 in 10~ generations, the frequency of the required pair w o u l d be 1 in 101(, generations, which is longer than the age of the universe.
Other scenarios would require esoteric circumstances
which would also be highly unlikely.
The consequentview of phylogeny
As a consequence, it is doubtful whether intron
positions can give reliable information about early
evolution. The very fact that they cannot all be in original positions implies that they do not in all cases
clearly define ancestral gene elements.
Moreover, the positions of inserted introns arc
clearly not random. In the serine protease genes, they
tend to map to variable surface loops in the proteins L~.
In the TFIIIA gene, they tend to m a p to the loops
between domains L2. In other genes, they tend to fall
near the middles of pre-existing exons 7.s,1~. This behaviour explains the general tendency of genes to have
a rather uniform size of exons l~, which is exemplified
by all the genes so far mentioned. So, apparent regularities in intron distribution do not necessarily imply that
the introns were present in the ancestral gene.
As some introns certainly have been cleanly
removed in the course of evolution, there must be a
long-term balance between removal and insertion. A
consequence (regrettable from a Popperian point of
view) is that in this situation one can give a probable
explanation of any intron distribution but a definite
explanation of none. But the operation of the prop o s e d equilibrium is exemplified by the cahnodulin
gene: in comparison with related genes, it appears to
have gained a gene-specific intron in domain III in
vertebrates, but lost a c o m m o n intron from domain 1
in insects ~,. Given this dynamic equilibrium, it is poss i n e that a rare case of apparent frameshifting of an
intron, in a carbonic anhydrase gene ~s, might actually
be due to removal and re-insertion.
In general, the balance between insertion and
deletion seems t o have shifted according to selective
pressures on the size of the genome. Large-genome
organisms such as mammals and plants retain many of
their ancestral and inserted introns', whereas smallg e n o m e organisms such as Drosophila and veast have
Tl(; JUL',"1989 VOL. 5, xo. 7
_)lJ
[~EVIEWS
lost most of the introns that they had 16.
They have probably entered an equilibrium such that the few introns which
they do have - often in different positions from any in mammals - are most
likely to be recently inserted ones.
a
~ .
Reverse.
""~f'i,~(~/.
I,
DNAinsertionof
GroupII intron
Mechanisms of insertion
It is not yet clear h o w
introns were
inserted. When they were first discovered, it was widely supposed that they
might be a form of transposable element
which, by means of RNA splicing, avoided
doing damage to genes in which it
inserted itself (summarized in Ref. 17).
However, as information accumulated
about the characteristic sequences at the
GT
AG
boundaries of classical introns and of
transposable elements, it became clear
that they did not resemble each other at
~
splicing
all. Some transposons in maize, such as
Ds, can get themselves excised by RNA
splicing, but only imprecisely 18. Ds contains an upstream splice site close to its
5' end, and can find cryptic downstream
splice sites that happen to exist in the
b
target sequence shortly after the insertion site. But transposons of this sort
Group
II c o n s e n s u s +
xxx/GTGCGYx
...... RRRRGGx...xYTAYYYYAY/xxx
I''''',,,,,
IIIII
,,,,,,,,,'''''''''
could not produce clean intron inserExample
(Podospora)
+ CAG/GTGCGCC
......
AGGAGAG...CTTATCCTAC/ATA
tions with no alteration in the surroundII III
1
. . . . . ,,
,,,
.
,I*
Classical
consensus
xAG/GTRAGTA
........ CURAY...YYYYYYxYAG/Gxx
ing sequences.
The discovery of self-splicing group I
FIG[]
introns gave rise to renewed speculation
A group II intron could mutate into a classical intron. (a) Proposed sequence of
about intron insertion19, and group I
events. (b) Example of a group II intron (from Ref. 22) which would have
introns are now known to insert then>
classical splice signals given a single-base mutation (').
selves (see below), but group I introns
remain resolutely unlike the classical
introns. Many of the introns in the genes for serine
nuclear introns. It is possible that classical introns were
proteases and calcium-binding proteins have apparentinserted by a mechanism that no longer exists in the
ly been inserted into highly conserved coding regions,
limited range of phyla 'that have been studied within which certain nucleotides must always have
although more extensive study of Protista, and of their
been present to encode essential amino acids. So one
very diverse genetic processes, might well uncover
can ask whether these nucleotides adhere to the consuch a mechanism. But it would be more satisfactory if
sensus for a bidirectional splice site as given above.
one could identify a plausible mechanism among
An analysis of 14 of these introns (not shown) shows
known molecular processes. Two possibilities are disthat they do not; and there are several examples
cussed here.
where even the required AGGT sequence could not
The first mechanism (Fig. la) requires no extranehave been present (Fig. lb). Ahhough one could
ous genetic element. In principle, an intron could be
invoke special circumstances to evade this conclusion
created by tandem duplication of exonic sequences
in any particular case, the scarcity of the splice site
within which there happened to be a cryptic 'bidirecconsensus implies that the model does not generally
tional' splice site - that is, a sequence resembling
(Y)nNCAGGTAAGT, where the bold nucleotides would
apply.
The second hypothesis is that individual classical
be obligatory. This sequence could function as an
introns have evolved from self-inserted group II
upstream splice site in the 5' copy and as a downintrons (Fig. 2a). The evidence that some group II
stream splice site in the 3' copy, so the sequence
introns, in mitochondria and chloroplasts, are capable
duplication could be counteracted by RNA splicing
of self-insertion is circumstantial but persuasive 20.
without need for further mutations. Such a mechanism
Some of them encode a polypeptide with homology to
would neatly explain the tendency for genes to
reverse transcriptase 21. For one such intron, in mitobecome subdivided into exons of uniform size ~4, usuchondria of the fungus Podospora, there is also a corally between one and two times the minimum size for
responding DNA plasmid which is a precisely circularan intron. (I thank K. Kato and N. Dibb for discussion
ized copy of the intron22. The existence of reverse
concerning this hypothesis.)
transcriptase activity in mitochondria is also suggested
Unfortunately for this hypothesis, it is testable, and
by the high frequency of mutations in which several
it is false, at least as regards the majority of inserted
A:0mouS
TIC;JULY1989 VO1. 5, NO. 7
215
IMu,.,so
~EVIEWS
group I and II introns are precisely and simultaneously
removed from the mitochondrial DNA a3,24. Some group
II introns are self-splicing and so would not inactivate
a gene into which they inserted themselves 2s,e6. And
many relics are known of DNA transfer between mitochondria, chloroplasts and the nucleus, so it is probable that group II introns would be able to invade
nuclear DNA - although this has not yet been shown,
even for the Podospora intron plasmid 2v.
The exact mechanism of group II intron insertion is
not known and is not crucial to the present argument.
It might be as shown in Fig. 2a, with reverse transcription of the excised intron RNA, following re-opening
of the lariat structure by a 'de-branching enzyme' such
as has been described 2s. Or there might be reverse
transcription of a primary transcript RNA, producing a
cDNA with introns which could insert by homologous
recombination.
Alternatively, insertion might take place at the DNA
level, as with the group I introns. Many group I introns
encode site-specific endonucleases that cleave DNA
homologous to the site in which the intron resides,
and thus trigger intron insertion as a gene conversion
event (reviewed in Refs 29, 30). A similar activity is
shown by the protein product of a retroposon (R2Bm)
which has homology to reverse transcriptase3~. It is not
known whether this is due to an additional endonuclease domain, or whether the reverse-transcriptaselike domain itself could have endonuclease activity;
unfortunately, endonucleases cannot always be recognized by homology alone. If introns do spread by sitespecific cDNA insertion or by site-specific homologous
gene conversion, as suggested by these examples,
insertion into non-homologous sites might occur as an
occasional error in the process.
Whatever the mechanism of group II intron insertion, once such an intron is inserted, it might take only
a single base change to convert the group II intron
into a classical intron (Fig. 2b). Both types of intron
have similar consensus sequences for splicing, and an
identical mechanism in which the 5' end of the intron
is joined to the 2' hydroxyl of an internal adenosine to
form a lariat 2s.2~'. One pyrimidine to guanine substitution might effect the conversion. And there might be
considerable selective pressure in favour of such a
conversion, since it would transfer control of the splicing operation from the autonomous intron (whose
self-abnegation might not be entirely efficient from the
point of view of the host) to the host's own nuclear
splicing mechanism. The characteristic group II intron
sequences would then be superfluous and would
rapidly decay. It has already been proposed that the
nuclear snRNP machinery evolved from group-II-like
introns acting in trans'9; it may be that individual
nuclear introns subsequently ew~lved from individual
group II insertions.
The apparently non-random distribution of intron
insertions could be produced by a variety of factors.
There might well be a degree of sequence specificity
in the insertions themselves. Even if not, there would
probably be selection after insertion. Inserts close to
pre-existing introns might be disfavoured because of
the risk that both introns would be spliced out as a
single unit, with the loss of the exonic sequences
between them. Also, flanking exonic sequences are
known to affect the splicing of group II introns32.33, so
a new insert in some positions might disrupt existing
splicing patterns, and a new insert might itself be
spliceable only in certain positions in the gene. There
may also be constraints on the pre-mRNA secondary
structure. Thus there is plenty of scope for setting up
non-random patterns of inserted introns. Insertion of
autonomous group II introns, which are eventually
captured and put under the control of the nucleus,
could well explain the variety of intron positions seen
in present-day genes.
References
1 Doolittle, W.F. (1978) Nature 272, 581-582
2 Gilbert, W., Marchionni, M. and McKnight, G. (1986)
Celi46, 151-154
3 Rogers, J. (1985) Nature315, 458459
4 Rogers, J. (1986) Trends Genet. 2, 223
5 Berchtold, M.W. et al. (1987)J. Biol. Chem. 262,
8696-8701
6 Smith, V.L. etal. (1987)J. Mol. Biol. 196, 471485
7 Wilson, P.W. etal. (1988) J. Mol. Biol. 200, 615-625
8 Owens, G.C., Edelman, G.M. and Cunningham, B.A.
(1987) Proc. Natl Acad. Sci. USA 84, 294-298
9 Littman, D.R. and Gettner, S.N. (1987) Nature325,
453-455
10 Lemke, G., Lamar, E. and Patterson. J. (1988) Neuron 1,
73-83
11 Craik, C.S., Rutter, w.J. and Fletterick, R. (1983) Science
220, 1125-1129
12 Tso, J.Y., Van den Berg, J. and Korn, L,1. (1986) Nucleic
Acids Res. 14, 2187-2199
13 Odermatt, E., Tamkun, J.W. and Hynes, R.O. (1985)
Proc. Natl Acad. Sci. USA 82, 6571-6575
14 Naora, H. and Deacon, NJ. (1982) Proc. NatlAcad. Sci.
USA 79, 6196-6200
15 Yoshihara, C.M., Lee, J-D. and Dodgson, J.B. (1987)
Nucleic Acids Res. 15, 753-770
16 Fink, G.R (1987) Cell49, 5-6
17 Cavalier-Smith, T. (1985) Nature 314, 283-284
18 Wessler, S.R., Baran, G. and Varagona, M. (1987) Science
237, 916-918
19 Sharp, R (1985) Ce1142, 397-399
20 Flavell, A. (1985) Nature 316, 574-575
21 Michel, F. and Lang, B.F. (1985) Nature316, 641-643
22 Osiewacz, H.D. and Esser, K. (1984) Curt Genetics 8,
299-305
23 Jacq, C. el al. (1982) in Mitochondrial Genes (Stonimski,
P.R et al., eds), pp. 155-184, Cold Spring Harbor
Laboratory Press
24 Gargouri, A., Lazowska, J. and Slonimski, P.R (1983) in
Mitochondria 1983 (Schweyen, RJ., Wolf, K. and
Kaudewitz, F. eds), pp. 259-268, W. de Gruyter
25 Peebles. C.L. et al. (1986) Celi44, 213-223
26 Van der Veen, R. et al, (1986) Ce1144, 225-234
27 Koll, F. (1986) Nature324, 597-599
28 Ruskin, B. and Green, M.R. (1985) Science 229, 135-140
2.9 Lambowitz, A.M. (1989) Cell 56, 323-326
30 Scazzocchio, C. (1989) Trends Genet. 5, 168-172
31 Xiong, Y. and Eickbush, T.H. (1988) Cell55, 235-246
32 Michel, F. and Jacquier, A. (1987) Cold Spring Harbor
Syrup. Quant. Biol. 52, 201-212
33 Van der Veen, R., Arnberg, A.C. and Grivell, L.A. (1987)
EMBOJ. 6, 1079-1084
J.H.
ROGERS
TIG JULY 1 9 8 9 VOL. 5, NO. 7
m
IS IN
TIlE DEPARTMENT
OF PHYSIOLOGY, ]
UNIVERSITYOF CAMBRIDGE, CAMBRIDGECB2 3EG, UK.