Download R4, a non-LTR retrotransposon specific to the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid analogue wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Gene expression wikipedia , lookup

Genomic library wikipedia , lookup

Molecular ecology wikipedia , lookup

Gene desert wikipedia , lookup

RNA-Seq wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Gene regulatory network wikipedia , lookup

Gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transposable element wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcript
4628-4634
Nucleic Acids Research, 1995, Vol. 23, No. 22
© 1995 Oxford University Press
R4, a non-LTR retrotransposon specific to the large
subunit rRNA genes of nematodes
William D. Burke, Fritz Muller1 and Thomas H. Eickbush*
Department of Biology, University of Rochester, Rochester, NY 14627 USA and institute of Zoology, University of
Fribourg, Perolles, CH-1700 Fribourg, Switzerland
Received August 14, 1995; Revised and Accepted October 6, 1995
ABSTRACT
A 4.7 kb sequence-specific insertion in the 26S
ribosomal RNA gene of Ascaris lumbricoides, named
R4, is shown to be a non-long terminal repeat (nonLTR) retrotransposable element. The R4 element inserts at a site in the large subunit rRNA gene which is
midway between two other sequence-specific non-LTR
retrotransposable elements, R1 and R2, found in most
Insect species. Based on the structure of its open
reading frame and the sequence of its reverse transcriptase domain, R4 elements do not appear to be a
family of R1 or R2 elements that have changed their
insertion site. R4 is most similar in structure and in
sequence to the element Dong, which is not specialized for insertion into rDNA units. Thus R4 represents
a separate non-LTR retrotransposable element that
has become specialized for insertion in the rRNA
genes of its host. Using oligonucleotide primers
directed to a conserved region of the reverse transcriptase encoding domain, Insertions in the R4 site were
also amplified from Parascarls equorum and Haemonchus contortus. Why several non-LTR retrotransposable elements have become specialized for insertion
into a short (87 bp) region of the large subunit rRNA
gene is discussed.
INTRODUCTION
Most transposable elements have insertion specificities that
extend only a few base pairs along the DNA, thus they insert at
numerous locations throughout the host genome. When these
elements fall within or near transcription units they can cause
significant detrimental effects on the host. Less well appreciated
is the risk that this random method of insertion entails for the
transposable elements themselves. Chromosomes bearing these
detrimental mutations will be eliminated from the population,
while many other insertions may be in genomic locations (e.g.
heterochromatin) that will not allow proper expression. Thus there
are ample evolutionary reasons to suspect that some transposable
elements would evolve target site specificity. Indeed, numerous
examples of sequence or site specificity have been found among
the largest class of mobile elements, the retrotransposable
* To whom correspondence should be addressed
Genbank accession nos U29445, U29456, U29590
elements. Rl, R2 and RT elements insert within the 28S rRNA
genes of insects (1,2). CRE1, SLACS and CZAR elements insert
within the spliced leader exons of trypanosomes (3). Txl inserts
into another mobile element in the genome of frogs (4). Ty 1, Ty3
and DRE ejements insert adjacent to tRNA genes of yeast and
slime mold (5-7). Finally, TRAS elements insert into the
telomere repeat sequences of Bombyx mori (8) and TART and
HetA actually form the telomeres of chromosomes in Drosophila
melanogaster (9,10). The mechanisms controlling this specificity
are best known for the R2 and Ty3 elements. The R2 element
encodes a polypeptide capable of acting as a sequence-specific
endonuclease that nicks the target site and a reverse transcriptase
that uses the nick to prime cDNA synthesis (11). The Ty3 element
encodes an integrase that requires association with the transcription factors TFIIIB and TF1IIC for specific integration (12).
The nematode Ascaris lumbricoides has been shown to contain
an -A.I kb insertion in a small fraction of its large subunit (26S)
rRNA genes (13). As shown in Figure 1, the location of this
insertion is approximately midway between the Rl and R2
insertion sites found in most insects (14-16). Another element,
R3, was found upstream of the R2 site in two insect species,
however, the structure of R3 elements is poorly defined at present
(17,18). Several properties of the A.lumbricoides insertion
originally suggested that it might be an R1 element (13,19). In this
report we present the sequence of a full-length A.lumbricoides
insertion, hereafter referred to as the R4 element Like Rl and R2,
the R4 element is a non-LTR retrotransposable element. However, based on its structure and a phylogenetic analysis of its
reverse transcriptase domain, R4 is not related to either R1 or R2.
An insertion at the R4 site was also found in two other nematode
species, suggesting that R4 may be widespread in this phylum.
MATERIALS AND METHODS
Sequence analysis of the A.lumbricoides element
Based on the restriction map of genomic clone pAlr20 (13)
restriction fragments were cloned into the corresponding sites of
the M13mp 18 and M13mp 19 sequencing vectors. The Universal
sequencing primer and nested oligonucleotide primers were used
to sequence both strands of the element (20). The sequence of the
R4 element was submitted to GenBank under the accession no.
U29445. The sequence relationship of R4 to the other non-LTR
Nucleic Acids Research, 1995, Vol. 23, No. 22 4629
rDNA Insertion elements
(A. himbrieeU— btMrtkm)
R3
R2
R1
R4
.1
T-am«£«ca«n«TAa«ttGAAn^Tc«n™c(w<i«^
Figure 1. Location of non-LTR retrotransposable elements in the large subunit rRNA genes. The sequence of a portion of the D.melanogaster2&S rRNA gene is shown
(40). Two base pair differences in the A.lumbricoides 26S rRNA gene sequence are shown below the insect sequence (13). Arrows indicate the insertion site of the
various elements based on the 3'junction of the element with the rRNA gene. Vertical lines within the 28S sequence represent probable top and bottom strand cleavage
sites generated by the endonuclease encoded by each element based on the R2 model of integration (11). Qeavage of the top strand downstream of the bottom strand
generates a target site duplication upon insertion, while cleavage of the top strand upstream of the bottom strand generates a deletion of the target site. The 5'-»3'
orientation of the R1-R4 elements is the same as the rDNA transcription unit.
retrotransposable elements was determined based on the amino
acid sequence of the reverse transcriptase domains (21). The
distance method neighbor joining (22) was used, as made available
in the Clustal V program package (23).
Construction of a vector to clone PCR fragments
PCR amplified products were cloned into a mpl8 derivative,
mpl8T2, engineered for the direct cloning of PCR-amplified
DNA. The mpl 8T2 vector was made by removing two restriction
fragments from the multiple cloning site of ml3mpl8 and
replacing them with synthetic double-stranded oligonucleotides
having restriction sites that yield a 3' T overhang when digested
with Xcm\ restriction endonuclease. This vector was constructed
by first replacing the EcoR\Sst\ fragment from the multiple
cloning site of mpl8 with the preannealed oligonucleotides
5'-ATTCCATGCATAGATTGGTTACGT-3' and 5'-AACCAATCTATGCATGG-3'. This replacement preserves the EcoRl site
but destroys the Sstl site. The Pst\-HindUl fragment of the
mutiple cloning site was replaced with the preannealed oligonucleitides 5'-CCATCATACTTATGGAA-3' and 5'-AGCTTTCCATAAGTATGATGGTGCA-3'. This second replacement
preserves the HindlU site but destroys the Pstl site. The double
replacement preserves the reading frame of the lacZ gene and
eliminates all but the Hind\\\ and EcoRl restriction sites from the
original multiple cloning region. Cleavage of the two Xcm\ sites
within the newly added sequence generates 3' T overhangs at each
end suitable for direct cloning of PCR-amplified DNA.
PCR amplification and sequencing of elements from
other species
Parascaris equorum and Dilphilaria immilis DNAs were obtained from Timothy P.Friedlander (University of Maryland
Biotechnology Institute) and Steven A.Nadler (Northern Illinois
University) and Caenorhabditis elegans DNA was obtained from
Scott Emmons (Albert Einstein School of Medicine). Haemonchus contortus was obtained from Raymond Fetterer (USDA). To
PCR amplify the 3'-haIf of R4 elements from other nematode
species the degenerate primer 5'-TTYTWYATGGAYGAYNT-3'
(N, any nucleotide; Y, T or C; W, A or T) encoding the amino acid
sequence YMDDI/V located in the reverse transcriptase domain
was used in combination with a second primer complementary to
the 26S gene sequence 77-97 bp downstream of the R4 insertion
site, 5'-GCCAGATTAGAGTCAAGCTC-3'. To clone uninserted
26S gene sequences of A.lumbricoides, P.equorum and H.contortus the primer 5'-CTAAGTCGACTGCCCAGTGCTCTGAATGTC-3', complementary to the 26S gene ~ 100 bp upstream of the
R4 insertion site, was used in combination with the primer
5'-AAGAGCCGACATCGAAGGATC-3', complementary to
sequences -700 bp dowstream of the R4 site. To clone the 5'-ends
of A.lumbricoides and P.equorum elements the upstream 26S
primer was used in combination with the primer 5'-TAGAACTTCCGGTTGCG-3', complementary to sequence -1460 bp
from the 5'-end of the R4 insertion. BRL Taq polymerase was
used for PCR amplifications under conditions specified by the
supplier. Approximately 0.2 fig genomic DNA were amplified in
30 cycles of 94°C for 1 min, 60°C for 1 min and 72°C for 3 min.
Clones containing the two orientations of the PCR product from
P.equorum and H.contortus were sequenced using the Universal
sequencing primers and nested primers -0.5 and 1.0 kb downstream of the ends of the 1.5-1.8 kb fragments. The P.equorum
and H.contortus 3' sequences are available in GenBank under the
accession nos U29456 and U29590 respectively.
RESULTS
Sequence of a complete A.lumbricoides insertion
The rRNA genes of A.lumbricoides have been previously studied
by Southern blots and analysis of cloned rDNA units (13,19).
Two major size classes of rDNA units were identified, 8.4 and 8.8
kb, which differed by the presence or absence of a 400 bp segment
in the intergenic (non-transcribed spacer) region of the unit
Approximately 5% of the rDNA (-15 units/haploid genome)
were found to be >8.8 kb in length as a result of an insertion at a
unique location within the 26S gene. Most of these insertions
were 4.7 kb in length, but shorter length versions of the insertion
were also identified. The nucleotide sequence of the 5' and 3'
junctions of a 4.7 kb insertion (clone pAlr20), a 4.2 kb insertion
(pAlr22) and a 119 bp insertion (pAlr23) revealed that the
insertions had neither direct nor inverted terminal repeats (13).
The shorter copies had 3' junctions identical to the full-length
version, while their 5'-ends were truncated. The insertions were
flanked by a duplication of 26S gene sequences present only once
in uninserted genes. The lengths of these target duplications were
6 bp in pAIr23, 13 bp in pAlr20 and 14 bp in pAlr22. As defined
by the 3' junction of the insertion with the 26S gene, all three
sequenced copies were inserted into the identical site in the rRNA
4630 Nucleic Acids Research, 1995, Vol. 23, No. 22
R 4 (A. lumbricokJes)
CCHC
JUTR
TR
R1 (D. melanogaster)
FirrR
OBF1
CCHC
RT
SUTR
OflF2
-J-GAVEEDRDflVLWE
J-GDPYEDwilLCAi
iRsi-AGVPENArlAIFE
- [-GDVSEDWEIVLCR
E-|-GEAEETAD|VWWE
L-i-NLSQESMP|LGKD
R2 (D. melanogaster)
Dong (6. mori)
PETAEglTSA
CCHC
RT
i i -•
*H 1
JL/TH
RT
CCHC
sum
JLTrR
1
H
Figure 2. Comparison of the structure of R4 with three other non-LTR
retrotransposable elements. The horizontal bars represent the total length of
each element. Areas with no shadingrepresentthe 5' and 3'-UTR. Shaded areas
represent ORFs. RI elements contain two slightly overlapping ORFs, while R2,
R4 and Dong each contain a single ORE Darker shading within these ORFs
indicate the location of the reverse transcriptase domain (RT) and various
putative nucleic acid binding motifs composed of cysteine (C) and histidine (H)
residues. All four elements contain a motif (CCHC) downstream of the RT
domain. The sequences of these motifs are shown in Figure 3. Upstream of the
RT domain Rl elements contain three closely spaced CCHC motifs near the
C-terminal end of the first ORF. R2 elements contain a single CCHH motif near
the N-terminal end of the ORF. Data for Rl and R2 are from Jakubczak et al.
(16), while the data for Dong is from Xiong and Eickbush (25).
genes, which is 26 bp downstream of the R2 insertion site and 34
bp upstream of the Rl insertion site (Fig. 1).
The 4686 bp full-length R4 insertion in clone pAlr20 was
completely sequenced on both strands (see Materials and
Methods). The nucleotide sequence of R4revealedthat it encoded
a single open reading frame (ORF) beginning 366 bp from the left
(50 end of the element and ending only 173 bp from theright(30
end. A comparison of the structural features of R4 with the R1 and
R2 elements of D.melanogaster is shown in Figure 2. Centrally
located in the ORF of the R4 element is a reverse transcriptase
domain containing the conserved amino acid motifs found in all
reverse transcriptases (24). Protease, RNase H and integrase
domains, which are present in LTR-containing retrotransposable
elements but absent in non-LTR retrotransposable elements,
could not be detected in the R4 element. Extensive homology
searches between R4 and retrotransposable elements of both the
LTR and non-LTR classes revealed only one additionalregionof
similarity, a putative nucleic acid binding motif containing three
cysteine (C) and one histidine (H) residues downstream of the
reverse transcriptase domain. As shown in Figure 3, this CCHC
motif is similar to motifs found in R1 and R2 elements, as well as
many other non-LTR retrotransposable elements. In all elements
this CCHC motif is located downstream of the reverse transcriptase
domain. The spacing between the first two C residues varies from
one to three residues in the different non-LTR retrotransposable
elements. The R4 element has two consecutive C residues near the
second C position. Thus, depending on which C is used, the
spacing of the C and H residues in the R4 motif is the same as R1
and Cin4 elements or the same as Txl, Dong and 1 elements. Like
many other non-LTR retrotransposable elements, Rl and R2 also
contain one or more putative nucleic acid binding motifs composed
of cysteine and histidine residues upstream of the reverse
QPETIQBITGA
-F|-GKGESVF|AYFT
-FJJ-QGDISLN9IFNS
-GERGTLLHCWWE
-KVRETTA8ILQQ
IRAG|-DAPETTN§IMQK
-GLPETLYIWQQ
:GGf-GKQATIsfvLQR
Figure 3. Comparison of the putative nucleic acid binding motif in R4 with that
of other non-LTRretrotransposableelements. In all cases a single CCHC motif
is located downsteam of the reverse transcriptase domain. The critical C and H
residues of the motif are shaded. This list is not intended to be comprehensive
for all non-LTR retrotransposable elements. Sequences are derived from the
following sources: B.mori, RI Bm (15), R2Bm (14) and Dong (25); D.melanogaster, RIDm and R2Dm (16) and 1 (41); A.gambiae, RTI (2); Nasonia
vitripennis, RINv and R2Nv (18); Popillia japonica RIPj and R2Pj (18); Zea
mays, Cin4 (42); Xenopus laevis, Txl (4): Mus domesticits, LIMd (43).
transcriptase domain (Fig. 2). While the ORF upstream of the
reverse transcriptase domain in the R4 element is extensive, no
nucleic acid binding motifs could be detected.
Phylogenetic analysis of the R4 element
The only sequence that can be readily identified in all non-LTR
retrotransposable elements, and thus be used to resolve their
phylogeneticrelationship,is thereversetranscriptase domain. The
ORF of the R4 insertion contains the seven conserved segments
identified in reverse transcriptase sequences from all eukaryotic and
prokaryotic sources (21). The R4 reverse transcriptase domain
also contains an additional -30 amino acid region between
segments 2 and 3 which is unique to non-LTR retrotransposable
elements, group II introns and bacterial reverse transcriptases.
Based on a phylogenetic analysis of the universally conserved
segments of thereversetranscriptase domain, the R4 element falls
within the non-LTRretrotransposablegroup of elements (data not
shown). Shown in Figure 4 are the results of a phylogenetic
analysis including only the non-LTR retrotransposable elements.
The analysis was based on the neighbor joining method (22) and
the tree is rooted using group II intron sequences, which are the
closest known retroelements outside this group. The many
non-LTRretrotransposableelements found in animals, plants and
protists are highly divergent in sequence (21,24). As a consequence it is difficult to completely resolve the relationships
between these elements. This failure to resolve their relationship
can be seen in Figure 4 by the very low bootstrap values on all
nodes at the deep branches of the tree. Only bootstrap values >50
are shown (i.e. >50% of the time the sequences to the right of the
node branch together). Thus nodes with no bootstrap values are not
considered significant
Nucleic Acids Research, 1995, Vol. 23, No. 22 4631
R4
98|
I
Dong
Tx1
-1
1
Cin4
I
L1Ha
Identification of R4 elements in other nematode species
L1Md
The specific location of Rl and R2 elements within the 28S rRNA
genes of the host simplifies their identification in other species by
either Southern blotting or PCR amplification (1,18). PCR is the
more sensitive of the two approaches and we are currently able to
detect R1 and/or R2 elements in insects that we previously scored
by Southern analysis as being negative. The PCR approach uses
a degenerate oligonucleotide primer to highly conserved sequences
in the reverse transcriptase domain in combination with a
non-degenerate primer complementary to 28S gene sequences
downstream of the insertion site (18). A similar scheme was
designed to test whether R4 elements are present in the rDNA
units of other nematode species. We chose a degenerate primer
(see Material and Methods) capable of encoding the amino acid
sequence YMDDV/I, present in both Dong and R4 elements. The
26S primer was complementary to a sequence starting 77 bp
downstream of the R4 insertion site. PCR amplification was
conducted with genomic DNA from four nematode species: the
parasitic nematodes P.equorum, D.immilis and H.contortus and
the free-living nematode Caenorhabditis elegans. It was unlikely
that R4 elements were present in C.elegans, as variant rDNA units
had been previously characterized from this species and no
insertion elements were found (26).
Only the PCR amplifications with P.equorum and H.contortus
DNA gave rise to an appropriately sized band on agarose gels
(1.5-1.8 kb). These PCR fragments were cloned and the 3'
junction with the 26S gene was sequenced from multiple clones
(see Materials and Methods). Theresultsof this sequence analysis
are shown in Figure 5. The insertion in H.contortus initially
appeared to be located within the 26S gene 1 bp downstream of
that in A.lumbricoides and P.equorum (Fig. 5B). However, when
we determined the sequence of this region of the 26S gene from
uninserted units in all three species (see Material and Methods)
the H.contortus 26S gene sequence was found to have a base
substitution at the first position downstream of the insertion site.
Thus the insertions in H.contortus are at the identical position as
the insertions in A.lumbricoides and P.equorum. Other variations
detected within the 26S genes were single substitutions within
P.equorum, 1 and 4 bp downstream of the insertion site in clones
with R4 insertions, but not in the uninserted 26S genes.
We sequenced the entire PCR product from one P.equorum and
one H.contortus insertion. The P.equorum insertion was similar
in all respects to the R4 element of A.lumbricoides (Fig. 5).
Within the 558 codons of the ORF that could be compared
nucleotide identity between the two elements was 83.9%, with
only one 3 bp insertion/deletion event. The rate of nucleotide
substitution at synonymous codon positions (Ks = 0.55) was
higher than the rate of substitution at replacement positions (Ka
= 0.10) (see 27 for a discussion of Ka and Ks values). The 5.5-fold
faster rate of nucleotide substitution at synonymous positions
suggests that the ORFs of R4 elements are under selective
pressure. The short 3'-UTR of the R4 elements had 78.4%
nucleotide identity and seven insertion/deletion differences.
The H.contortus insertion was very different in sequence from
the A.lumbricoides and P.equorum R4 insertions. The H.contor-
R2Dm
R2Bm
I
H
Ingl
e
100
Doc
Jockey
—
CCHC motif are detected between different non-LTR elements.
The greater similarity of R4 to Dong than to either Rl or R2
suggests that R4 represents the independent specialization of a
non-LTR element for insertion into the rDNA unit.
I-
J
w
R1Dm
RT1
R1Bm
Cri
y
T1
100
SLACS
CRE1
1U0[~
a1-P«
a1-3c
Figure 4. Phylogenetic relationship of R4 to other non-LTR rctrotransposable
elements. The phylogeny is based on 178 amino acid positions of the reverse
transenptase domain using neighbour joining algorithms as described in Xiong
and Eickbush (21). The numbers given at certain nodes represent the bootstrap
values per 100 replications. Bootstrap values <50 are not given, thus the
reliability of nodes with no bootstrap values are low. Two group U intron
sequences from fungal mitochondria were used as an outgroup to root the tree.
References to the various reverse transcriptase sequences used can be found in
Xiong and Eickbush (21), except for Crl (44), Doc (45), and RT1 (2).
R4 is clearly not on the same branch of the tree as either the R1
or R2 elements. Instead R4 is most closely related to Dong, an
element previously identified in B.mori (25). Dong elements
were first identified as insertions in the non-transcribed spacer
region of the rDNA unit, Dong elements do not appear specific
for this spacer region, because their insertion specificity appears
to involve only tandemly repeated TAA sequences and many
copies of Dong are present outside the rDNA units. Because of the
similarity in sequence of their reverse transcriptase domains we
have also compared the structures of the R4 and Dong ORFs in
Figure 2. Dong is similar to R4 in a number of properties. Both
elements encode a single ORF which contains a CCHC motif
downstream of the reverse transcriptase domain, but no cysteine/
histidine motifs upstream of the reverse transcriptase domain.
The Dong CCHC motif is highly similar in sequence to R4 (11/18
positions, Fig. 3). Indeed, Dong and R4 elements share limited
amino acid sequence identity (18%) throughout the region
downstream of the reverse transcriptase domain. This identity is
highly significant, because in general no similarities outside the
4632 Nucleic Acids Research, 1995, Vol. 23, No. 22
A. lumbricoldes R4
IATGACGCGCATGAATG]
. . CT4GCOCUC«1AT6ACGCGCATGAATG I (3)
FM
2SSgsn*
sequence variation of 0.5-2.0% between the different clones, well
above the PCR error rate (30), indicated that most of these PCR
clones represented different R4 elements in each species. Thus
variable length target site duplications associated with A.lumbricoides R4 are only associated with 5' truncated elements. The
integration of full-length R4 elements appear to be a precise event
resulting in 13 bp target site duplications.
P. equorum R4
DISCUSSION
CCHC
! [^ 26Sgm«
~~1
ATGACGCGCATGAATG
. . . TACCAAAAACCA ATGACGCGCATGAATG, (1)
... TACCAAAAACCA ATGGCGCGCATGAATGi (1)
. . . TACCAAAAACCA GTGACGCGCATGAATGl (1)
H4
26Sgsn*
H. contortus R4
woo
I
|CT6ACGCGCATGAATG
| . . . GACGGTTAGACGiaGACKGCATGAATG
j (4)
R4
Figure 5. Structural diagram of the amplified region of the R4 elements from
three nematode species. Boxes and their shading are identical to that in Figure
2. Arrows above these boxes represent the PCR primers YMDD and -77. No
long ORF could be detected in the H.contortus insertion. One region of this
insertion contains an A-nch repeat (diagonal shading) that is unlikely to have
ever been part of an ORF. Junction sequences of the insertions with the 26S gene
are shown in the expanded view of the 3'-end of each element. R4 sequences
are in italics, while 26S gene sequences are in bold Numbers in parentheses
represent the number of clones obtained for each junction sequence.
tus insertion did not encode a long ORF and a portion of the
sequence contained a tandemly repeated A-rich sequence which
was unlikely to have ever been part of an ORF. Thus it is not clear
whether this insertion is a remnant of an intact R4 element. Highly
defective Rl elements have been detected in D.melanogaster
(28,29). These defective elements are associated with fragments
of rDNA units located in the centromeric heterochromatin. A
similar situation may explain the insertions we cloned from
H.contortus, because the 26S gene sequence downstream of the
insertion exhibited nearly 4% nucleotide sequence differences
between copies (data not shown), indicating that these 26S genes
were unlikely to be part of normal rDNA units found in the rDNA
loci. Clearly a more extensive set of PCR primers are needed to
determine if intact R4 elements are still present in H.contortus and
a greater number of species will need to be tested to determine the
distribution of R4 elements in nematodes.
Finally, based on the sequence of their 5' and 3' junctions with
the 26S gene the three previously cloned copies of R4 elements
in A.lumbricoides had target site duplications of 6, 13 and 14 bp
(13). As in the case of Rl and R2 elements (14-18), the 3'
junction of all R4 elements were identical (13, see also Fig. 5),
while it is the sequence of the 5' junction that determined the
length of each target site duplication. To test whether R4 target
site duplications are variable in length we PCR amplified and
sequenced (see Material and Methods) the 5' junctions of
additional R4 insertions from A.lumbricoides and P.equorum
(data not shown). The six clones characterized from each species
represented full-length elements and each contained a 5'junction
consistent with a 13 bp target site duplication. Nucleotide
In this report we have shown that a 4.7 kb insertion in the 26S
rRNA genes of A.lumbricoides is a non-LTR retrotransposable
element. We have termed the nematode element R4 because it
represents the fourth element to be identified that is specialized
for insertion into the rRNA genes of its host (Fig. 1). The first
rDNA-specific elements characterized, Rl and R2, are widely
distributed in insects (1) and considerable information is known
about their mechanisms of insertion and stability within a lineage.
R3 is also an insect element, but is poorly defined at present
(17,18). Site-specific insertions in the 28S rRNA genes of the
mosquito Anopheles gambiae, termed RT1 and RT2, have also
been described by Collins and co-workers (2,31). RT elements
insert 634 bp downstream of the Rl insertion site. The sequence
of two complete elements has shown that these elements have two
overlapping ORFs that are very similar in structure to the ORFs
of Rl elements (2). Based on the similarities of their ORFs and
the phylogenetic relationship of their reverse transcriptase
domains (Fig. 4) RT elements appear to be R1 elements that have
changed their target specificity on the 28S gene.
Unlike RT elements, R4 elements in nematodes do not appear
to be Rl or R2 elements that have changed their insertion
specificity to another location in the rDNA repeat. The organization of their ORFs are quite distinct and the sequence of the
reverse transcriptase domain of these elements (Fig. 4) suggests
that R1, R2 and R4 are no morerelatedto each other than they are
to any other non-LTR element Indeed, R4 is most related to the
Dong element of B.mori (25). However, because the deep
phylogeny of the non-LTR elements is poorly resolved it remains
formally possible that only oneretrotransposableelement became
specialized for insertion in the large subunit rRNA gene and that
different lineages of this single element changed their insertion
specificity for sites in this gene. This issue will only be settled by
further refinement of the phylogenetic analysis to better resolve
the deeper phylogeny of the non-LTRretrotransposableelements.
Therelationshipof these rDNA insertions to another non-LTR
retrotransposable element should also be mentioned. Most G
elements of D.melanogaster have been shown to insert specifically into the non-transcribed spacer region of defective rDNA
units located within the centromeric heterochromatin (32,33).
The sequence of the spacerregiontarget site exhibits considerable
similarity (45/52 bp with one 18 bp deletion) to the region of the
28S gene containing the Rl, R2 and R4 insertion sites (32). The
location of the G insertion site in this 28S-derived sequence is 1
bp downstream of the R4 insertion site. Based on its reverse
transcriptase sequence, G is located well within the branch of
non-LTR elements containing Doc and Jockey elements (Fig. 4).
It seems doubtful that a retrotransposable element would have
become highly specialized for insertion into a sequence of the
non-transcribed spacer, because these sequences are poorly
conserved in evolution. It is more likely that G represents a fifth
non-LTR retrotransposable element that has become specialized
Nucleic Acids Research, 1995, Vol. 23, No. 22 4633
for insertion into the 28S gene. In this model G was only found in
the non-transcribed spacer because its 28S gene target site has
become part of the spacer region in D.melanogaster. Consistent
with this model, R1 elements are also occasionally found inserted
into this spacer region sequence in D.melanogaster (32). In
conclusion, evidence to date suggests as many as five different
non-LTR elements have become specific for insertion into different
sites in the large subunit rRNA gene.
As we have previously discussed for the Rl and R2 elements
(30), there are several advantages to being specialized for insertion
into the rDNA transcription unit, even when such insertions
inactivate functioning of that unit First, specificity for the rDNA
unit ensures a population of uniform target sites for the insertion of
new copies that can be regulated in an identical manner to that of
the donor element. Random insertions by a transposable element
along a chromosome can lead to copies that cannot be appropriately expressed. Second, if one assumes that the host species has more
than sufficient rDNA units for its survival, then when low copy
numbers of an rDNA insertion make transposition necessary the
insertion of new copies will have a minimal effect on the fitness of
the host. Random insertions by transposable elements within a
genome always run a risk of being deleterious. Third, recombination between copies of a transposable element inserted in the rDNA
locus will be no different from recombinations involving the rDNA
sequences themselves. Recombination between transposable elements inserted at random along a chromosome can cause
detrimental chromosomal rearrangements (34).
It is interesting to note that the multiple transposable elements
that have become specialized for insertion into the rDNA unit are
all non-LTR retrotransposable elements. Why have no examples
been found of rDNA-specific elements derived from the equally
widespread classes of LTR-containingretrotransposableelements
or DNA-mediated transposable elements? Because non-LTR
retrotransposable elements do not have terminal repeats their
transcription must be regulated by either an external promoter, an
internal promoter located downstream of the transcription initiation site or both (35-37). Thus it may be easier for non-LTR
elements to adapt to a read-through transcript coming from the
rDNA unit itself. Rl, R2, R3 and R4 are all organized such that
their transcription is in the same direction as the rDNA unit G and
the RT family of R1 elements, on the other hand, are inserted in the
opposite orientation to that of the rDNA unit, suggesting they must
encode their own promoter. It is interesting to note that the only
difference between RT and Rl elements is that RT elements
contain 2 kb of additional untranslated sequences at their 5'-end
(2). This insertion may include such promoter sequences.
A third question that should be addressed is why are the many
elements specialized for insertion into the rDNA unit clustered in
this one small region of the large subunit rRNA gene? There are
many highly conserved regions of the 28S and 18S gene, yet as
many as five elements may have become specialized for a region
spanning only 87 bp. Even a number of group I introns have been
identified in the R1-R4 insertionregionof the large rRNA subunit
gene in various protozoans. For example, the well-characterized
self-splicing intron of various Tetrahymena species is located only
6 bp upstream of the R2 insertion site (38). Perhaps the advantage
of inserting in this region of the 28S gene involves regulation of
transcription. The insertion of Rl, R2 and R4 are known to
down-regulate transcription of the rDNA unit, thus it is possible
that there is a polymerase I enhancer located in thisregion.In such
a model disruption of this region would automatically down-regu-
late transcription, and each element would only need the ability to
up-regulate this transcription at appropriate times in the germ
cells. An alternative scenario is one in which this region of the 28S
gene may be a favorable site for expression of an element's RN A
transcript. This advantage to the element could involve processing of the element RNA from a read-through transcript or the
ability of the element's RNA to be transported out of the nucleus
for translation in the cytoplasm without excision from the 28S
gene. Whatever advantages this small region of the 28S gene
offers as a site for insertion, it offers them at multiple sites which
can apparently be occupied at a tolerable cost to the host.
All of the non-LTR retrotransposable elements that insert into
the large subunit rRNA gene have precise 3' junctions with
chromosomal DNA, while their 5'-ends are sometimes truncated.
This feature is readily explained if these elements use a
mechanism of integration similar to that of R2 elements (11). R2
elements encode an endonuclease which cleaves (nicks) the DNA
strand used as template for rRNA transcription (bottom strand in
Fig. 1). The 3' hydroxyl group released by this nick primes reverse
transcription of the R2 transcript, thus defining the eventual 3'
junction of the element with the chromosome. RNA sequences
within the short 3'-UTR of the R2 transcript are necessary and
sufficient for precise recognition by thereversetranscriptase (39).
After reverse transcription the endonuclease cleaves the second
(top) strand of the 28S gene.
Application of the R2 model for R4 insertions makes predictions about where cleavage occurs on the two strands of the target
DNA. In the first step an R4-encoded endonuclease cleaves the
bottom strand of the target DNA (Fig. 1). Cleavage of the top
strand by the endonuclease (either before or after reverse
transcription) occurs at a site 13 bp downstream of the bottom
strand site. Following synthesis of the second DNA strand and
repair of the ends the size of the target site duplication generated
by the insertion is 13 bp. In the case of R2 cleavage of the upper
DNA strand occurs 2 bp upstream of the bottom strand cleavage
(11) and as a consequence a 2 bp deletion of the target DNA is
generated upon insertion (14,16). As has already proven to be the
case with the R1 and R2 elements of insects, analysis of R4 in
other nematode species is likely to reveal conserved and variable
features of their sequences and their insertions that will be useful
in further refining the general integration mechanism used by
non-LTR retrotransposable elements.
ACKNOWLEDGEMENTS
We thank Tim Friedlander, Steven Nadler, Scott Emmons and
Raymond Fetterer for suppling nematode materials. We also
thank Warren Lathe III for help with the phylogenetic analysis
and Danna Eickbush and Janet George for their helpful comments
on the manuscript. This work was supported by National Science
Foundation grant MCB-9219123.
REFERENCES
1 JakubczakJ.L., Burke.W.D. and Eckbush.T.H. (1991) Pmc. NatlAcad.
Sci. USA, 88, 3295-3299.
2 BesanskyJMJ., Paskewitz£.M., Hamn%D.M. and Collins^.H. (1992) MoL
CelLBiol., 12, 5102-5110.
3 Aksoy,S. (1991) Parasitol. Today, 7, 281 -285.
4 GanwU.E., Knutzon,D.S. and CarrollA (1989) MoL CelL BioL, 9,
3018-3027.
5 Oialker.D.L. and Sandmeyer,S.B. (1990) Genetics, 126,837-850.
4634 Nucleic Acids Research, 1995, Vol. 23, No. 22
6 Marschalek.R., HofmannJ., Schumann.G., Gosseringer.R. and
Dingermann,T. (1992) Mol. Cell. Biol., 12, 229-239.
7 Ji.H., Moore.D.R, Blomberg,M.A., Braiterman,L.T., Vbytas.D.F,
Natsoulis,G. and BoekeJ.D. (1993) Cell, 73, 1007-1018.
8 Okazakj^., Ishikawa,H. and Fujiwara,H. (1995) Mol. Cell. Biol., 15,
4545-4552.
9 Levis,R.W., GanesanJt.. Houtchens,K., Tolar.L.A. and Sheeivf.M. (1993)
Cell, IS, 1083-1093.
10 Biessmann.H., Mason J.M., Ferry.K., d'HulstM., Valgeirsdotnr.K,
Traverse.KX. and Pardue,M.-L. (1990). Cell, 61,663-673.
11 LuanJI.D., KormanM.H., JakubczakJ.L. and Eickbush.TH. (1993) Cell,
72, 595-605.
12 KirchnerJ., Connolly.C.M. and Sandmeyer.S.B. (1995) Science, 267,
1488-1491.
13 Back.E., VanMeir,E., MYller.F, Schaller.D., Neuhaus,H., Aeby.P. and
Tobler,H. (1984) EMBOJ., 3, 2523-2529.
14 Burke.W.D.. Calalang,C.C. and Eickbush.T.H. (1987) Mol. Cell. Biol., 7,
2221-2230.
15 Xiong.Y. and Eickbush.T.H. (1988) Mol Cell Biol., 8, 114-123.
16 JakubczakJ.L., Xiong.Y. and Eickbush.T.H. (1990) / Mol. Biol, 212,
37-52.
17 KerrebrockA-W., Srivastava,R. and Gerbi.S.A. (1989) J. Mol Biol, 210,
1-13.
18 Burke,W.D., Eickbush.D.G., Xiong.Y., JakubczakJ.L. and Eickbush.T.H.
(1993) Mol. Biol. Evol, 10, 163-185.
19 Neuhaus,H., MYllerJ?., EtterA and Tobler.H. (1987) Nucleic Acids Res.,
15, 7689-7707.
20 Sanger,E, Nicklen.S. and Coulson^V.R. (1977) Proc. Natl Acad Sci. USA,
74, 5463-5467.
21 Xiong.Y. and Eickbush.T.H. (1990) EMBO J., 9, 3353-3362.
22 Saitou,N. and Nei,M. (1987) Mol. Biol Evol, 4, 406-425.
23 HigginsJD.G., Bleasby,AJ. and Fuchs.R. (1992) Compt. Appl Biosci., 8,
189-191.
24 Eickbush.T.H. (1994) In Morse,S.S. (ed.), 77ie Evolutionary Biology of
Viruses. Raven Press, New York, NY, pp. 121-157.
25 Xiong.Y and Eickbush.T.H. (1993) Nucleic Acids Res., 21,1318.
26 FilesJ.G. and Hirsh.D. (1981) / Mol. Biol, 149, 223-240.
27 Perler,F, EfstradtiadisA LomedicoJ5., Gilbert,W., Kolodner.R. and
DodgsonJ. (1980) Cell, 20, 555-566.
28 Roiha,H., MillerJ.R., WoodsXC. and GloverJJ.M. (1981) Nature, 290,
749-753.
29 Kidd.SJ. and Glover.DJ. (1980) Cell, 19, 103-119.
30 Ekkbush,D.G. and EickbushJ.H. (1995) Genetics, 139, 671-684.
31 Paskewitz,S.M. and Collins.F.H. (1989) Nucleic Acids Res., 17,
8125-8133.
32 Di NoceraJ'.P., Graziani.F. and Lavorgna,G. (1986) Nucleic Acids Res., 14,
675-691.
33 Di Nocera^.P. (1988) Nucleic Acids Res., 16, 4041-4052.
34 Chariesworth.B. and Langley.C.H. (1989) Annu. Rev. Genet., 23, 251 -287.
35 Mizrokhi.LJ., Georgieva,S.G. and Ilyin.Y.V. (1988) Cell, 54, 685-691.
36 Swergold,G.D. (1990) Mol. Cell. Biol, 10, 6718-6729.
37 ChaboissierJ^.C, BusseauJ., ProsserJ., Finnegan.DJ. and BuchetonA
(1990) EMBO J., 9, 3557-3563.
38 Cech.T.R. and BassJ3.L. (1986) Annu. Rev. Biochem., 55, 599-629.
39 Luan.D.D. and EickbushJ.H. (1995) Mol. Cell Biol, 15, 3882-3891
40 Tautz.D., HancockJ.M., Webb.D.A., Tautz,C. and Dover.G.A. (1988) Mol.
Biol Evol, 5, 366-376.
41 Fawcett,D.H., Lister.C.K , KellettE. and Finnegan.DJ. (1986) Cell, 47,
1OO7-IOI5.
42 Schwarz-Sommer,Z., Leclercq.L., Gobel.E and Saedler.H. (1987) EMBO
J., 6, 3873-3880.
43 Loeb.D.D., Padgett,R.W., Hardies,S.C, Shehee.W.R., Comer,M.B.,
EdgellJvl.H. and Hutchison.C.A (1986) Mol Cell. Biol, 6, 168-182.
44 BurchJ.B.E., Davis.D.L. and Haas.N.B. (1993) Proc. Natl Acad. Sci. USA,
90, 8199-8203.
45 O'Hare.K., Alley,M.R.K., Culingford,T.E., DriverA and Sanderson,MJ.
(1991) Mol. Gen. Genet., 225, 17-24.