* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download R4, a non-LTR retrotransposon specific to the
Survey
Document related concepts
Nucleic acid analogue wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Genomic library wikipedia , lookup
Molecular ecology wikipedia , lookup
Gene desert wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Gene regulatory network wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Transposable element wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Transcript
4628-4634 Nucleic Acids Research, 1995, Vol. 23, No. 22 © 1995 Oxford University Press R4, a non-LTR retrotransposon specific to the large subunit rRNA genes of nematodes William D. Burke, Fritz Muller1 and Thomas H. Eickbush* Department of Biology, University of Rochester, Rochester, NY 14627 USA and institute of Zoology, University of Fribourg, Perolles, CH-1700 Fribourg, Switzerland Received August 14, 1995; Revised and Accepted October 6, 1995 ABSTRACT A 4.7 kb sequence-specific insertion in the 26S ribosomal RNA gene of Ascaris lumbricoides, named R4, is shown to be a non-long terminal repeat (nonLTR) retrotransposable element. The R4 element inserts at a site in the large subunit rRNA gene which is midway between two other sequence-specific non-LTR retrotransposable elements, R1 and R2, found in most Insect species. Based on the structure of its open reading frame and the sequence of its reverse transcriptase domain, R4 elements do not appear to be a family of R1 or R2 elements that have changed their insertion site. R4 is most similar in structure and in sequence to the element Dong, which is not specialized for insertion into rDNA units. Thus R4 represents a separate non-LTR retrotransposable element that has become specialized for insertion in the rRNA genes of its host. Using oligonucleotide primers directed to a conserved region of the reverse transcriptase encoding domain, Insertions in the R4 site were also amplified from Parascarls equorum and Haemonchus contortus. Why several non-LTR retrotransposable elements have become specialized for insertion into a short (87 bp) region of the large subunit rRNA gene is discussed. INTRODUCTION Most transposable elements have insertion specificities that extend only a few base pairs along the DNA, thus they insert at numerous locations throughout the host genome. When these elements fall within or near transcription units they can cause significant detrimental effects on the host. Less well appreciated is the risk that this random method of insertion entails for the transposable elements themselves. Chromosomes bearing these detrimental mutations will be eliminated from the population, while many other insertions may be in genomic locations (e.g. heterochromatin) that will not allow proper expression. Thus there are ample evolutionary reasons to suspect that some transposable elements would evolve target site specificity. Indeed, numerous examples of sequence or site specificity have been found among the largest class of mobile elements, the retrotransposable * To whom correspondence should be addressed Genbank accession nos U29445, U29456, U29590 elements. Rl, R2 and RT elements insert within the 28S rRNA genes of insects (1,2). CRE1, SLACS and CZAR elements insert within the spliced leader exons of trypanosomes (3). Txl inserts into another mobile element in the genome of frogs (4). Ty 1, Ty3 and DRE ejements insert adjacent to tRNA genes of yeast and slime mold (5-7). Finally, TRAS elements insert into the telomere repeat sequences of Bombyx mori (8) and TART and HetA actually form the telomeres of chromosomes in Drosophila melanogaster (9,10). The mechanisms controlling this specificity are best known for the R2 and Ty3 elements. The R2 element encodes a polypeptide capable of acting as a sequence-specific endonuclease that nicks the target site and a reverse transcriptase that uses the nick to prime cDNA synthesis (11). The Ty3 element encodes an integrase that requires association with the transcription factors TFIIIB and TF1IIC for specific integration (12). The nematode Ascaris lumbricoides has been shown to contain an -A.I kb insertion in a small fraction of its large subunit (26S) rRNA genes (13). As shown in Figure 1, the location of this insertion is approximately midway between the Rl and R2 insertion sites found in most insects (14-16). Another element, R3, was found upstream of the R2 site in two insect species, however, the structure of R3 elements is poorly defined at present (17,18). Several properties of the A.lumbricoides insertion originally suggested that it might be an R1 element (13,19). In this report we present the sequence of a full-length A.lumbricoides insertion, hereafter referred to as the R4 element Like Rl and R2, the R4 element is a non-LTR retrotransposable element. However, based on its structure and a phylogenetic analysis of its reverse transcriptase domain, R4 is not related to either R1 or R2. An insertion at the R4 site was also found in two other nematode species, suggesting that R4 may be widespread in this phylum. MATERIALS AND METHODS Sequence analysis of the A.lumbricoides element Based on the restriction map of genomic clone pAlr20 (13) restriction fragments were cloned into the corresponding sites of the M13mp 18 and M13mp 19 sequencing vectors. The Universal sequencing primer and nested oligonucleotide primers were used to sequence both strands of the element (20). The sequence of the R4 element was submitted to GenBank under the accession no. U29445. The sequence relationship of R4 to the other non-LTR Nucleic Acids Research, 1995, Vol. 23, No. 22 4629 rDNA Insertion elements (A. himbrieeU— btMrtkm) R3 R2 R1 R4 .1 T-am«£«ca«n«TAa«ttGAAn^Tc«n™c(w<i«^ Figure 1. Location of non-LTR retrotransposable elements in the large subunit rRNA genes. The sequence of a portion of the D.melanogaster2&S rRNA gene is shown (40). Two base pair differences in the A.lumbricoides 26S rRNA gene sequence are shown below the insect sequence (13). Arrows indicate the insertion site of the various elements based on the 3'junction of the element with the rRNA gene. Vertical lines within the 28S sequence represent probable top and bottom strand cleavage sites generated by the endonuclease encoded by each element based on the R2 model of integration (11). Qeavage of the top strand downstream of the bottom strand generates a target site duplication upon insertion, while cleavage of the top strand upstream of the bottom strand generates a deletion of the target site. The 5'-»3' orientation of the R1-R4 elements is the same as the rDNA transcription unit. retrotransposable elements was determined based on the amino acid sequence of the reverse transcriptase domains (21). The distance method neighbor joining (22) was used, as made available in the Clustal V program package (23). Construction of a vector to clone PCR fragments PCR amplified products were cloned into a mpl8 derivative, mpl8T2, engineered for the direct cloning of PCR-amplified DNA. The mpl 8T2 vector was made by removing two restriction fragments from the multiple cloning site of ml3mpl8 and replacing them with synthetic double-stranded oligonucleotides having restriction sites that yield a 3' T overhang when digested with Xcm\ restriction endonuclease. This vector was constructed by first replacing the EcoR\Sst\ fragment from the multiple cloning site of mpl8 with the preannealed oligonucleotides 5'-ATTCCATGCATAGATTGGTTACGT-3' and 5'-AACCAATCTATGCATGG-3'. This replacement preserves the EcoRl site but destroys the Sstl site. The Pst\-HindUl fragment of the mutiple cloning site was replaced with the preannealed oligonucleitides 5'-CCATCATACTTATGGAA-3' and 5'-AGCTTTCCATAAGTATGATGGTGCA-3'. This second replacement preserves the HindlU site but destroys the Pstl site. The double replacement preserves the reading frame of the lacZ gene and eliminates all but the Hind\\\ and EcoRl restriction sites from the original multiple cloning region. Cleavage of the two Xcm\ sites within the newly added sequence generates 3' T overhangs at each end suitable for direct cloning of PCR-amplified DNA. PCR amplification and sequencing of elements from other species Parascaris equorum and Dilphilaria immilis DNAs were obtained from Timothy P.Friedlander (University of Maryland Biotechnology Institute) and Steven A.Nadler (Northern Illinois University) and Caenorhabditis elegans DNA was obtained from Scott Emmons (Albert Einstein School of Medicine). Haemonchus contortus was obtained from Raymond Fetterer (USDA). To PCR amplify the 3'-haIf of R4 elements from other nematode species the degenerate primer 5'-TTYTWYATGGAYGAYNT-3' (N, any nucleotide; Y, T or C; W, A or T) encoding the amino acid sequence YMDDI/V located in the reverse transcriptase domain was used in combination with a second primer complementary to the 26S gene sequence 77-97 bp downstream of the R4 insertion site, 5'-GCCAGATTAGAGTCAAGCTC-3'. To clone uninserted 26S gene sequences of A.lumbricoides, P.equorum and H.contortus the primer 5'-CTAAGTCGACTGCCCAGTGCTCTGAATGTC-3', complementary to the 26S gene ~ 100 bp upstream of the R4 insertion site, was used in combination with the primer 5'-AAGAGCCGACATCGAAGGATC-3', complementary to sequences -700 bp dowstream of the R4 site. To clone the 5'-ends of A.lumbricoides and P.equorum elements the upstream 26S primer was used in combination with the primer 5'-TAGAACTTCCGGTTGCG-3', complementary to sequence -1460 bp from the 5'-end of the R4 insertion. BRL Taq polymerase was used for PCR amplifications under conditions specified by the supplier. Approximately 0.2 fig genomic DNA were amplified in 30 cycles of 94°C for 1 min, 60°C for 1 min and 72°C for 3 min. Clones containing the two orientations of the PCR product from P.equorum and H.contortus were sequenced using the Universal sequencing primers and nested primers -0.5 and 1.0 kb downstream of the ends of the 1.5-1.8 kb fragments. The P.equorum and H.contortus 3' sequences are available in GenBank under the accession nos U29456 and U29590 respectively. RESULTS Sequence of a complete A.lumbricoides insertion The rRNA genes of A.lumbricoides have been previously studied by Southern blots and analysis of cloned rDNA units (13,19). Two major size classes of rDNA units were identified, 8.4 and 8.8 kb, which differed by the presence or absence of a 400 bp segment in the intergenic (non-transcribed spacer) region of the unit Approximately 5% of the rDNA (-15 units/haploid genome) were found to be >8.8 kb in length as a result of an insertion at a unique location within the 26S gene. Most of these insertions were 4.7 kb in length, but shorter length versions of the insertion were also identified. The nucleotide sequence of the 5' and 3' junctions of a 4.7 kb insertion (clone pAlr20), a 4.2 kb insertion (pAlr22) and a 119 bp insertion (pAlr23) revealed that the insertions had neither direct nor inverted terminal repeats (13). The shorter copies had 3' junctions identical to the full-length version, while their 5'-ends were truncated. The insertions were flanked by a duplication of 26S gene sequences present only once in uninserted genes. The lengths of these target duplications were 6 bp in pAIr23, 13 bp in pAlr20 and 14 bp in pAlr22. As defined by the 3' junction of the insertion with the 26S gene, all three sequenced copies were inserted into the identical site in the rRNA 4630 Nucleic Acids Research, 1995, Vol. 23, No. 22 R 4 (A. lumbricokJes) CCHC JUTR TR R1 (D. melanogaster) FirrR OBF1 CCHC RT SUTR OflF2 -J-GAVEEDRDflVLWE J-GDPYEDwilLCAi iRsi-AGVPENArlAIFE - [-GDVSEDWEIVLCR E-|-GEAEETAD|VWWE L-i-NLSQESMP|LGKD R2 (D. melanogaster) Dong (6. mori) PETAEglTSA CCHC RT i i -• *H 1 JL/TH RT CCHC sum JLTrR 1 H Figure 2. Comparison of the structure of R4 with three other non-LTR retrotransposable elements. The horizontal bars represent the total length of each element. Areas with no shadingrepresentthe 5' and 3'-UTR. Shaded areas represent ORFs. RI elements contain two slightly overlapping ORFs, while R2, R4 and Dong each contain a single ORE Darker shading within these ORFs indicate the location of the reverse transcriptase domain (RT) and various putative nucleic acid binding motifs composed of cysteine (C) and histidine (H) residues. All four elements contain a motif (CCHC) downstream of the RT domain. The sequences of these motifs are shown in Figure 3. Upstream of the RT domain Rl elements contain three closely spaced CCHC motifs near the C-terminal end of the first ORF. R2 elements contain a single CCHH motif near the N-terminal end of the ORF. Data for Rl and R2 are from Jakubczak et al. (16), while the data for Dong is from Xiong and Eickbush (25). genes, which is 26 bp downstream of the R2 insertion site and 34 bp upstream of the Rl insertion site (Fig. 1). The 4686 bp full-length R4 insertion in clone pAlr20 was completely sequenced on both strands (see Materials and Methods). The nucleotide sequence of R4revealedthat it encoded a single open reading frame (ORF) beginning 366 bp from the left (50 end of the element and ending only 173 bp from theright(30 end. A comparison of the structural features of R4 with the R1 and R2 elements of D.melanogaster is shown in Figure 2. Centrally located in the ORF of the R4 element is a reverse transcriptase domain containing the conserved amino acid motifs found in all reverse transcriptases (24). Protease, RNase H and integrase domains, which are present in LTR-containing retrotransposable elements but absent in non-LTR retrotransposable elements, could not be detected in the R4 element. Extensive homology searches between R4 and retrotransposable elements of both the LTR and non-LTR classes revealed only one additionalregionof similarity, a putative nucleic acid binding motif containing three cysteine (C) and one histidine (H) residues downstream of the reverse transcriptase domain. As shown in Figure 3, this CCHC motif is similar to motifs found in R1 and R2 elements, as well as many other non-LTR retrotransposable elements. In all elements this CCHC motif is located downstream of the reverse transcriptase domain. The spacing between the first two C residues varies from one to three residues in the different non-LTR retrotransposable elements. The R4 element has two consecutive C residues near the second C position. Thus, depending on which C is used, the spacing of the C and H residues in the R4 motif is the same as R1 and Cin4 elements or the same as Txl, Dong and 1 elements. Like many other non-LTR retrotransposable elements, Rl and R2 also contain one or more putative nucleic acid binding motifs composed of cysteine and histidine residues upstream of the reverse QPETIQBITGA -F|-GKGESVF|AYFT -FJJ-QGDISLN9IFNS -GERGTLLHCWWE -KVRETTA8ILQQ IRAG|-DAPETTN§IMQK -GLPETLYIWQQ :GGf-GKQATIsfvLQR Figure 3. Comparison of the putative nucleic acid binding motif in R4 with that of other non-LTRretrotransposableelements. In all cases a single CCHC motif is located downsteam of the reverse transcriptase domain. The critical C and H residues of the motif are shaded. This list is not intended to be comprehensive for all non-LTR retrotransposable elements. Sequences are derived from the following sources: B.mori, RI Bm (15), R2Bm (14) and Dong (25); D.melanogaster, RIDm and R2Dm (16) and 1 (41); A.gambiae, RTI (2); Nasonia vitripennis, RINv and R2Nv (18); Popillia japonica RIPj and R2Pj (18); Zea mays, Cin4 (42); Xenopus laevis, Txl (4): Mus domesticits, LIMd (43). transcriptase domain (Fig. 2). While the ORF upstream of the reverse transcriptase domain in the R4 element is extensive, no nucleic acid binding motifs could be detected. Phylogenetic analysis of the R4 element The only sequence that can be readily identified in all non-LTR retrotransposable elements, and thus be used to resolve their phylogeneticrelationship,is thereversetranscriptase domain. The ORF of the R4 insertion contains the seven conserved segments identified in reverse transcriptase sequences from all eukaryotic and prokaryotic sources (21). The R4 reverse transcriptase domain also contains an additional -30 amino acid region between segments 2 and 3 which is unique to non-LTR retrotransposable elements, group II introns and bacterial reverse transcriptases. Based on a phylogenetic analysis of the universally conserved segments of thereversetranscriptase domain, the R4 element falls within the non-LTRretrotransposablegroup of elements (data not shown). Shown in Figure 4 are the results of a phylogenetic analysis including only the non-LTR retrotransposable elements. The analysis was based on the neighbor joining method (22) and the tree is rooted using group II intron sequences, which are the closest known retroelements outside this group. The many non-LTRretrotransposableelements found in animals, plants and protists are highly divergent in sequence (21,24). As a consequence it is difficult to completely resolve the relationships between these elements. This failure to resolve their relationship can be seen in Figure 4 by the very low bootstrap values on all nodes at the deep branches of the tree. Only bootstrap values >50 are shown (i.e. >50% of the time the sequences to the right of the node branch together). Thus nodes with no bootstrap values are not considered significant Nucleic Acids Research, 1995, Vol. 23, No. 22 4631 R4 98| I Dong Tx1 -1 1 Cin4 I L1Ha Identification of R4 elements in other nematode species L1Md The specific location of Rl and R2 elements within the 28S rRNA genes of the host simplifies their identification in other species by either Southern blotting or PCR amplification (1,18). PCR is the more sensitive of the two approaches and we are currently able to detect R1 and/or R2 elements in insects that we previously scored by Southern analysis as being negative. The PCR approach uses a degenerate oligonucleotide primer to highly conserved sequences in the reverse transcriptase domain in combination with a non-degenerate primer complementary to 28S gene sequences downstream of the insertion site (18). A similar scheme was designed to test whether R4 elements are present in the rDNA units of other nematode species. We chose a degenerate primer (see Material and Methods) capable of encoding the amino acid sequence YMDDV/I, present in both Dong and R4 elements. The 26S primer was complementary to a sequence starting 77 bp downstream of the R4 insertion site. PCR amplification was conducted with genomic DNA from four nematode species: the parasitic nematodes P.equorum, D.immilis and H.contortus and the free-living nematode Caenorhabditis elegans. It was unlikely that R4 elements were present in C.elegans, as variant rDNA units had been previously characterized from this species and no insertion elements were found (26). Only the PCR amplifications with P.equorum and H.contortus DNA gave rise to an appropriately sized band on agarose gels (1.5-1.8 kb). These PCR fragments were cloned and the 3' junction with the 26S gene was sequenced from multiple clones (see Materials and Methods). Theresultsof this sequence analysis are shown in Figure 5. The insertion in H.contortus initially appeared to be located within the 26S gene 1 bp downstream of that in A.lumbricoides and P.equorum (Fig. 5B). However, when we determined the sequence of this region of the 26S gene from uninserted units in all three species (see Material and Methods) the H.contortus 26S gene sequence was found to have a base substitution at the first position downstream of the insertion site. Thus the insertions in H.contortus are at the identical position as the insertions in A.lumbricoides and P.equorum. Other variations detected within the 26S genes were single substitutions within P.equorum, 1 and 4 bp downstream of the insertion site in clones with R4 insertions, but not in the uninserted 26S genes. We sequenced the entire PCR product from one P.equorum and one H.contortus insertion. The P.equorum insertion was similar in all respects to the R4 element of A.lumbricoides (Fig. 5). Within the 558 codons of the ORF that could be compared nucleotide identity between the two elements was 83.9%, with only one 3 bp insertion/deletion event. The rate of nucleotide substitution at synonymous codon positions (Ks = 0.55) was higher than the rate of substitution at replacement positions (Ka = 0.10) (see 27 for a discussion of Ka and Ks values). The 5.5-fold faster rate of nucleotide substitution at synonymous positions suggests that the ORFs of R4 elements are under selective pressure. The short 3'-UTR of the R4 elements had 78.4% nucleotide identity and seven insertion/deletion differences. The H.contortus insertion was very different in sequence from the A.lumbricoides and P.equorum R4 insertions. The H.contor- R2Dm R2Bm I H Ingl e 100 Doc Jockey — CCHC motif are detected between different non-LTR elements. The greater similarity of R4 to Dong than to either Rl or R2 suggests that R4 represents the independent specialization of a non-LTR element for insertion into the rDNA unit. I- J w R1Dm RT1 R1Bm Cri y T1 100 SLACS CRE1 1U0[~ a1-P« a1-3c Figure 4. Phylogenetic relationship of R4 to other non-LTR rctrotransposable elements. The phylogeny is based on 178 amino acid positions of the reverse transenptase domain using neighbour joining algorithms as described in Xiong and Eickbush (21). The numbers given at certain nodes represent the bootstrap values per 100 replications. Bootstrap values <50 are not given, thus the reliability of nodes with no bootstrap values are low. Two group U intron sequences from fungal mitochondria were used as an outgroup to root the tree. References to the various reverse transcriptase sequences used can be found in Xiong and Eickbush (21), except for Crl (44), Doc (45), and RT1 (2). R4 is clearly not on the same branch of the tree as either the R1 or R2 elements. Instead R4 is most closely related to Dong, an element previously identified in B.mori (25). Dong elements were first identified as insertions in the non-transcribed spacer region of the rDNA unit, Dong elements do not appear specific for this spacer region, because their insertion specificity appears to involve only tandemly repeated TAA sequences and many copies of Dong are present outside the rDNA units. Because of the similarity in sequence of their reverse transcriptase domains we have also compared the structures of the R4 and Dong ORFs in Figure 2. Dong is similar to R4 in a number of properties. Both elements encode a single ORF which contains a CCHC motif downstream of the reverse transcriptase domain, but no cysteine/ histidine motifs upstream of the reverse transcriptase domain. The Dong CCHC motif is highly similar in sequence to R4 (11/18 positions, Fig. 3). Indeed, Dong and R4 elements share limited amino acid sequence identity (18%) throughout the region downstream of the reverse transcriptase domain. This identity is highly significant, because in general no similarities outside the 4632 Nucleic Acids Research, 1995, Vol. 23, No. 22 A. lumbricoldes R4 IATGACGCGCATGAATG] . . CT4GCOCUC«1AT6ACGCGCATGAATG I (3) FM 2SSgsn* sequence variation of 0.5-2.0% between the different clones, well above the PCR error rate (30), indicated that most of these PCR clones represented different R4 elements in each species. Thus variable length target site duplications associated with A.lumbricoides R4 are only associated with 5' truncated elements. The integration of full-length R4 elements appear to be a precise event resulting in 13 bp target site duplications. P. equorum R4 DISCUSSION CCHC ! [^ 26Sgm« ~~1 ATGACGCGCATGAATG . . . TACCAAAAACCA ATGACGCGCATGAATG, (1) ... TACCAAAAACCA ATGGCGCGCATGAATGi (1) . . . TACCAAAAACCA GTGACGCGCATGAATGl (1) H4 26Sgsn* H. contortus R4 woo I |CT6ACGCGCATGAATG | . . . GACGGTTAGACGiaGACKGCATGAATG j (4) R4 Figure 5. Structural diagram of the amplified region of the R4 elements from three nematode species. Boxes and their shading are identical to that in Figure 2. Arrows above these boxes represent the PCR primers YMDD and -77. No long ORF could be detected in the H.contortus insertion. One region of this insertion contains an A-nch repeat (diagonal shading) that is unlikely to have ever been part of an ORF. Junction sequences of the insertions with the 26S gene are shown in the expanded view of the 3'-end of each element. R4 sequences are in italics, while 26S gene sequences are in bold Numbers in parentheses represent the number of clones obtained for each junction sequence. tus insertion did not encode a long ORF and a portion of the sequence contained a tandemly repeated A-rich sequence which was unlikely to have ever been part of an ORF. Thus it is not clear whether this insertion is a remnant of an intact R4 element. Highly defective Rl elements have been detected in D.melanogaster (28,29). These defective elements are associated with fragments of rDNA units located in the centromeric heterochromatin. A similar situation may explain the insertions we cloned from H.contortus, because the 26S gene sequence downstream of the insertion exhibited nearly 4% nucleotide sequence differences between copies (data not shown), indicating that these 26S genes were unlikely to be part of normal rDNA units found in the rDNA loci. Clearly a more extensive set of PCR primers are needed to determine if intact R4 elements are still present in H.contortus and a greater number of species will need to be tested to determine the distribution of R4 elements in nematodes. Finally, based on the sequence of their 5' and 3' junctions with the 26S gene the three previously cloned copies of R4 elements in A.lumbricoides had target site duplications of 6, 13 and 14 bp (13). As in the case of Rl and R2 elements (14-18), the 3' junction of all R4 elements were identical (13, see also Fig. 5), while it is the sequence of the 5' junction that determined the length of each target site duplication. To test whether R4 target site duplications are variable in length we PCR amplified and sequenced (see Material and Methods) the 5' junctions of additional R4 insertions from A.lumbricoides and P.equorum (data not shown). The six clones characterized from each species represented full-length elements and each contained a 5'junction consistent with a 13 bp target site duplication. Nucleotide In this report we have shown that a 4.7 kb insertion in the 26S rRNA genes of A.lumbricoides is a non-LTR retrotransposable element. We have termed the nematode element R4 because it represents the fourth element to be identified that is specialized for insertion into the rRNA genes of its host (Fig. 1). The first rDNA-specific elements characterized, Rl and R2, are widely distributed in insects (1) and considerable information is known about their mechanisms of insertion and stability within a lineage. R3 is also an insect element, but is poorly defined at present (17,18). Site-specific insertions in the 28S rRNA genes of the mosquito Anopheles gambiae, termed RT1 and RT2, have also been described by Collins and co-workers (2,31). RT elements insert 634 bp downstream of the Rl insertion site. The sequence of two complete elements has shown that these elements have two overlapping ORFs that are very similar in structure to the ORFs of Rl elements (2). Based on the similarities of their ORFs and the phylogenetic relationship of their reverse transcriptase domains (Fig. 4) RT elements appear to be R1 elements that have changed their target specificity on the 28S gene. Unlike RT elements, R4 elements in nematodes do not appear to be Rl or R2 elements that have changed their insertion specificity to another location in the rDNA repeat. The organization of their ORFs are quite distinct and the sequence of the reverse transcriptase domain of these elements (Fig. 4) suggests that R1, R2 and R4 are no morerelatedto each other than they are to any other non-LTR element Indeed, R4 is most related to the Dong element of B.mori (25). However, because the deep phylogeny of the non-LTR elements is poorly resolved it remains formally possible that only oneretrotransposableelement became specialized for insertion in the large subunit rRNA gene and that different lineages of this single element changed their insertion specificity for sites in this gene. This issue will only be settled by further refinement of the phylogenetic analysis to better resolve the deeper phylogeny of the non-LTRretrotransposableelements. Therelationshipof these rDNA insertions to another non-LTR retrotransposable element should also be mentioned. Most G elements of D.melanogaster have been shown to insert specifically into the non-transcribed spacer region of defective rDNA units located within the centromeric heterochromatin (32,33). The sequence of the spacerregiontarget site exhibits considerable similarity (45/52 bp with one 18 bp deletion) to the region of the 28S gene containing the Rl, R2 and R4 insertion sites (32). The location of the G insertion site in this 28S-derived sequence is 1 bp downstream of the R4 insertion site. Based on its reverse transcriptase sequence, G is located well within the branch of non-LTR elements containing Doc and Jockey elements (Fig. 4). It seems doubtful that a retrotransposable element would have become highly specialized for insertion into a sequence of the non-transcribed spacer, because these sequences are poorly conserved in evolution. It is more likely that G represents a fifth non-LTR retrotransposable element that has become specialized Nucleic Acids Research, 1995, Vol. 23, No. 22 4633 for insertion into the 28S gene. In this model G was only found in the non-transcribed spacer because its 28S gene target site has become part of the spacer region in D.melanogaster. Consistent with this model, R1 elements are also occasionally found inserted into this spacer region sequence in D.melanogaster (32). In conclusion, evidence to date suggests as many as five different non-LTR elements have become specific for insertion into different sites in the large subunit rRNA gene. As we have previously discussed for the Rl and R2 elements (30), there are several advantages to being specialized for insertion into the rDNA transcription unit, even when such insertions inactivate functioning of that unit First, specificity for the rDNA unit ensures a population of uniform target sites for the insertion of new copies that can be regulated in an identical manner to that of the donor element. Random insertions by a transposable element along a chromosome can lead to copies that cannot be appropriately expressed. Second, if one assumes that the host species has more than sufficient rDNA units for its survival, then when low copy numbers of an rDNA insertion make transposition necessary the insertion of new copies will have a minimal effect on the fitness of the host. Random insertions by transposable elements within a genome always run a risk of being deleterious. Third, recombination between copies of a transposable element inserted in the rDNA locus will be no different from recombinations involving the rDNA sequences themselves. Recombination between transposable elements inserted at random along a chromosome can cause detrimental chromosomal rearrangements (34). It is interesting to note that the multiple transposable elements that have become specialized for insertion into the rDNA unit are all non-LTR retrotransposable elements. Why have no examples been found of rDNA-specific elements derived from the equally widespread classes of LTR-containingretrotransposableelements or DNA-mediated transposable elements? Because non-LTR retrotransposable elements do not have terminal repeats their transcription must be regulated by either an external promoter, an internal promoter located downstream of the transcription initiation site or both (35-37). Thus it may be easier for non-LTR elements to adapt to a read-through transcript coming from the rDNA unit itself. Rl, R2, R3 and R4 are all organized such that their transcription is in the same direction as the rDNA unit G and the RT family of R1 elements, on the other hand, are inserted in the opposite orientation to that of the rDNA unit, suggesting they must encode their own promoter. It is interesting to note that the only difference between RT and Rl elements is that RT elements contain 2 kb of additional untranslated sequences at their 5'-end (2). This insertion may include such promoter sequences. A third question that should be addressed is why are the many elements specialized for insertion into the rDNA unit clustered in this one small region of the large subunit rRNA gene? There are many highly conserved regions of the 28S and 18S gene, yet as many as five elements may have become specialized for a region spanning only 87 bp. Even a number of group I introns have been identified in the R1-R4 insertionregionof the large rRNA subunit gene in various protozoans. For example, the well-characterized self-splicing intron of various Tetrahymena species is located only 6 bp upstream of the R2 insertion site (38). Perhaps the advantage of inserting in this region of the 28S gene involves regulation of transcription. The insertion of Rl, R2 and R4 are known to down-regulate transcription of the rDNA unit, thus it is possible that there is a polymerase I enhancer located in thisregion.In such a model disruption of this region would automatically down-regu- late transcription, and each element would only need the ability to up-regulate this transcription at appropriate times in the germ cells. An alternative scenario is one in which this region of the 28S gene may be a favorable site for expression of an element's RN A transcript. This advantage to the element could involve processing of the element RNA from a read-through transcript or the ability of the element's RNA to be transported out of the nucleus for translation in the cytoplasm without excision from the 28S gene. Whatever advantages this small region of the 28S gene offers as a site for insertion, it offers them at multiple sites which can apparently be occupied at a tolerable cost to the host. All of the non-LTR retrotransposable elements that insert into the large subunit rRNA gene have precise 3' junctions with chromosomal DNA, while their 5'-ends are sometimes truncated. This feature is readily explained if these elements use a mechanism of integration similar to that of R2 elements (11). R2 elements encode an endonuclease which cleaves (nicks) the DNA strand used as template for rRNA transcription (bottom strand in Fig. 1). The 3' hydroxyl group released by this nick primes reverse transcription of the R2 transcript, thus defining the eventual 3' junction of the element with the chromosome. RNA sequences within the short 3'-UTR of the R2 transcript are necessary and sufficient for precise recognition by thereversetranscriptase (39). After reverse transcription the endonuclease cleaves the second (top) strand of the 28S gene. Application of the R2 model for R4 insertions makes predictions about where cleavage occurs on the two strands of the target DNA. In the first step an R4-encoded endonuclease cleaves the bottom strand of the target DNA (Fig. 1). Cleavage of the top strand by the endonuclease (either before or after reverse transcription) occurs at a site 13 bp downstream of the bottom strand site. Following synthesis of the second DNA strand and repair of the ends the size of the target site duplication generated by the insertion is 13 bp. In the case of R2 cleavage of the upper DNA strand occurs 2 bp upstream of the bottom strand cleavage (11) and as a consequence a 2 bp deletion of the target DNA is generated upon insertion (14,16). As has already proven to be the case with the R1 and R2 elements of insects, analysis of R4 in other nematode species is likely to reveal conserved and variable features of their sequences and their insertions that will be useful in further refining the general integration mechanism used by non-LTR retrotransposable elements. ACKNOWLEDGEMENTS We thank Tim Friedlander, Steven Nadler, Scott Emmons and Raymond Fetterer for suppling nematode materials. We also thank Warren Lathe III for help with the phylogenetic analysis and Danna Eickbush and Janet George for their helpful comments on the manuscript. This work was supported by National Science Foundation grant MCB-9219123. REFERENCES 1 JakubczakJ.L., Burke.W.D. and Eckbush.T.H. (1991) Pmc. NatlAcad. Sci. USA, 88, 3295-3299. 2 BesanskyJMJ., Paskewitz£.M., Hamn%D.M. and Collins^.H. (1992) MoL CelLBiol., 12, 5102-5110. 3 Aksoy,S. (1991) Parasitol. Today, 7, 281 -285. 4 GanwU.E., Knutzon,D.S. and CarrollA (1989) MoL CelL BioL, 9, 3018-3027. 5 Oialker.D.L. and Sandmeyer,S.B. (1990) Genetics, 126,837-850. 4634 Nucleic Acids Research, 1995, Vol. 23, No. 22 6 Marschalek.R., HofmannJ., Schumann.G., Gosseringer.R. and Dingermann,T. (1992) Mol. Cell. Biol., 12, 229-239. 7 Ji.H., Moore.D.R, Blomberg,M.A., Braiterman,L.T., Vbytas.D.F, Natsoulis,G. and BoekeJ.D. (1993) Cell, 73, 1007-1018. 8 Okazakj^., Ishikawa,H. and Fujiwara,H. (1995) Mol. Cell. Biol., 15, 4545-4552. 9 Levis,R.W., GanesanJt.. Houtchens,K., Tolar.L.A. and Sheeivf.M. (1993) Cell, IS, 1083-1093. 10 Biessmann.H., Mason J.M., Ferry.K., d'HulstM., Valgeirsdotnr.K, Traverse.KX. and Pardue,M.-L. (1990). Cell, 61,663-673. 11 LuanJI.D., KormanM.H., JakubczakJ.L. and Eickbush.TH. (1993) Cell, 72, 595-605. 12 KirchnerJ., Connolly.C.M. and Sandmeyer.S.B. (1995) Science, 267, 1488-1491. 13 Back.E., VanMeir,E., MYller.F, Schaller.D., Neuhaus,H., Aeby.P. and Tobler,H. (1984) EMBOJ., 3, 2523-2529. 14 Burke.W.D.. Calalang,C.C. and Eickbush.T.H. (1987) Mol. Cell. Biol., 7, 2221-2230. 15 Xiong.Y. and Eickbush.T.H. (1988) Mol Cell Biol., 8, 114-123. 16 JakubczakJ.L., Xiong.Y. and Eickbush.T.H. (1990) / Mol. Biol, 212, 37-52. 17 KerrebrockA-W., Srivastava,R. and Gerbi.S.A. (1989) J. Mol Biol, 210, 1-13. 18 Burke,W.D., Eickbush.D.G., Xiong.Y., JakubczakJ.L. and Eickbush.T.H. (1993) Mol. Biol. Evol, 10, 163-185. 19 Neuhaus,H., MYllerJ?., EtterA and Tobler.H. (1987) Nucleic Acids Res., 15, 7689-7707. 20 Sanger,E, Nicklen.S. and Coulson^V.R. (1977) Proc. Natl Acad Sci. USA, 74, 5463-5467. 21 Xiong.Y. and Eickbush.T.H. (1990) EMBO J., 9, 3353-3362. 22 Saitou,N. and Nei,M. (1987) Mol. Biol Evol, 4, 406-425. 23 HigginsJD.G., Bleasby,AJ. and Fuchs.R. (1992) Compt. Appl Biosci., 8, 189-191. 24 Eickbush.T.H. (1994) In Morse,S.S. (ed.), 77ie Evolutionary Biology of Viruses. Raven Press, New York, NY, pp. 121-157. 25 Xiong.Y and Eickbush.T.H. (1993) Nucleic Acids Res., 21,1318. 26 FilesJ.G. and Hirsh.D. (1981) / Mol. Biol, 149, 223-240. 27 Perler,F, EfstradtiadisA LomedicoJ5., Gilbert,W., Kolodner.R. and DodgsonJ. (1980) Cell, 20, 555-566. 28 Roiha,H., MillerJ.R., WoodsXC. and GloverJJ.M. (1981) Nature, 290, 749-753. 29 Kidd.SJ. and Glover.DJ. (1980) Cell, 19, 103-119. 30 Ekkbush,D.G. and EickbushJ.H. (1995) Genetics, 139, 671-684. 31 Paskewitz,S.M. and Collins.F.H. (1989) Nucleic Acids Res., 17, 8125-8133. 32 Di NoceraJ'.P., Graziani.F. and Lavorgna,G. (1986) Nucleic Acids Res., 14, 675-691. 33 Di Nocera^.P. (1988) Nucleic Acids Res., 16, 4041-4052. 34 Chariesworth.B. and Langley.C.H. (1989) Annu. Rev. Genet., 23, 251 -287. 35 Mizrokhi.LJ., Georgieva,S.G. and Ilyin.Y.V. (1988) Cell, 54, 685-691. 36 Swergold,G.D. (1990) Mol. Cell. Biol, 10, 6718-6729. 37 ChaboissierJ^.C, BusseauJ., ProsserJ., Finnegan.DJ. and BuchetonA (1990) EMBO J., 9, 3557-3563. 38 Cech.T.R. and BassJ3.L. (1986) Annu. Rev. Biochem., 55, 599-629. 39 Luan.D.D. and EickbushJ.H. (1995) Mol. Cell Biol, 15, 3882-3891 40 Tautz.D., HancockJ.M., Webb.D.A., Tautz,C. and Dover.G.A. (1988) Mol. Biol Evol, 5, 366-376. 41 Fawcett,D.H., Lister.C.K , KellettE. and Finnegan.DJ. (1986) Cell, 47, 1OO7-IOI5. 42 Schwarz-Sommer,Z., Leclercq.L., Gobel.E and Saedler.H. (1987) EMBO J., 6, 3873-3880. 43 Loeb.D.D., Padgett,R.W., Hardies,S.C, Shehee.W.R., Comer,M.B., EdgellJvl.H. and Hutchison.C.A (1986) Mol Cell. Biol, 6, 168-182. 44 BurchJ.B.E., Davis.D.L. and Haas.N.B. (1993) Proc. Natl Acad. Sci. USA, 90, 8199-8203. 45 O'Hare.K., Alley,M.R.K., Culingford,T.E., DriverA and Sanderson,MJ. (1991) Mol. Gen. Genet., 225, 17-24.