Download Determination of the entire sequence of turtle CR1: the first open

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epitranscriptome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Transcription factor wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Expanded genetic code wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Genomic library wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Genome evolution wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

History of RNA biology wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Deoxyribozyme wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Microsatellite wikipedia , lookup

Sequence alignment wikipedia , lookup

NEDD9 wikipedia , lookup

Transposable element wikipedia , lookup

Genetic code wikipedia , lookup

Metagenomics wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Human genome wikipedia , lookup

Genomics wikipedia , lookup

Primary transcript wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Determination of the Entire Sequence of Turtle CRl: The First Open
Reading Frame of the Turtle CR1 Element Encodes a Protein with a
Novel Zinc Finger Motif
Masaki Kajikawa,
Faculty of Bioscience
Kazuhiko
Ohshima, and Norihiro
and Biotechnology,
Okada
Tokyo Institute of Technology,
Japan
CR1 elements are a family of retroposons. They are classified as long interspersed elements (LINES) or non-longterminal-repeat
(non-LTR) retrotransposons,
and they have been found in the genomes of many vertebrates. However, they have been only partially characterized,
and only a 2-kb region of the 3’ end of chicken CR1 has been
sequenced. In the present study, we determined the entire consensus sequence of CR1 elements in the turtle genome,
designated PsCRl. The first open reading frame (ORFl) of PsCRl has two unusual arrangements of Cys residues.
One of them includes a zinc finger motif, CX,CX,,CX&.
The putative zinc finger has cysteine residues with
identical spacing and a similar amino acid composition to those found in the species-specific
transcription initiation
factors SLl and TIF-IB. The 5’ untranslated region (5’ UTR) of PsCRl contains a sequence similar to part of the
human Ll promoter, Ll site A, and several cis elements of the type found in eukaryotic genes. Within a region of
about 500 bp, there are nine “E boxes,” cis elements that are recognized by the basic helix-loop-helix
(bHLH)
family of proteins. This observation raises the possibility that cellular transcription factors that bind to these sequences might act in concert to regulate the expression of PsCRl . The extent of the sequence divergence of the 3 ’
UTR of CR1 between species was found to be lower than the rate of nonsynonymous
substitutions per site in
0RF2, suggesting that a strict functional constraint must exist for this region. This result strongly suggests that the
conserved 3’-end sequence of CR1 is the recognition site for the reverse transcriptase of CRl. A discussion is
presented of a possible mechanism for the integration of CR1 elements and also of the intriguing possible recruitment
of the reverse transcriptase for the retroposition of SINES.
Introduction
The reverse flow- of genetic information from RNA
to DNA is known as retroposition,
and each transposed
informational
element is known as a retroposon (Rogers
1985; Weiner, Deininger, and Efstratiadis
1986). Retroposons that encode a reverse transcriptase
(RTase) for
replication of their genomes can be divided into three
groups, namely, non-long-terminal-repeat
(non-LTR) retrotransposons
(also known as LINES; hereafter this nomenclature will be used), LTR retrotransposons,
and retroviruses
(Fanning
and Singer 1987; Doolittle et al.
1989; Eickbush
1994; Smit 1996). Retroviruses
and
LTR retrotransposons
replicate their genomes via a complex reverse-transcription
process, and the corresponding mechanism for retrotransposition
is well understood
(Boeke and Chapman
1991; Whitcomb
and Hughes
1992). By contrast, the mechanism
responsible
for retrotransposition
of LINES remains to be fully elucidated
(Eickbush 1994).
An essential step in the retrotransposition
of LINES
is their initial transcription.
Several LINES, such as Drosophila jockey and I and human LINE-l (Ll), have been
shown to have promoter sequences within the 5’-end
regions of LINES that can initiate transcription
at the
first nucleotide
of the element (Mizrokhi,
Georgieva,
and Ilyin 1988; Swergold 1990; Minakami et al. 1992;
McLean,
Bucheton,
and Finnegan
1993; Minchiotti,
Contursi, and Di Nocera 1997). Drosophila jockey is
transcribed by RNA polymerase II via an internal proKey words: CR1
SLIDIF-IB,
SINE.
element,
LINE,
retrotransposon,
zinc finger,
Address for correspondence
and reprints: Norihiro Okada, Faculty
of Bioscience and Biotechnology,
Tokyo Institute of Technology, Midori-ku, Yokohama, 226, Japan. E-mail: [email protected].
Mol. Biol. Evol. 14(12):1206-1217.
1997
0 1997 by the Society for Molecular Biology
1206
and Evolution.
ISSN: 0737-4038
moter (Mizrokhi, Georgieva, and Ilyin 1988), while the
human Ll promoter is pol III-dependent
(Kurose et al.
1995). It has been suggested, in contrast, that LINES that
have evolved with target-site specificity must be inserted
adjacent to a reliable exogenous promoter sequence for
their transcription
(Eickbush 1994). Thus, the details of
the molecular
mechanisms
of transcription
of most
LINES remain to be defined.
Additional critical steps in retrotransposition
are reverse transcription
and integration. Most LINE elements
are truncated at various positions in their 5’ regions, the
lengths of which range from 100 to 1,000 bp (Hutchison
et al. 1989; Eickbush 1994). The existence of truncated
forms indicates that an RTase encoded by a LINE must
recognize the 3’ end of the RNA template and that it
might use the free 3’ ends of breaks in chromosomal
DNA as primers for initiation of first-strand synthesis
(Schwarz-Sommer
et al. 1987; Eickbush
1992). This
model was verified by Luan et al. in elegant experiments
with R2Bm of Bombyx mori (Luan et al. 1993; Luan
and Eickbush 1995). In the case of R2Bm, the R2 protein makes a specific nick in one of the DNA strands at
the insertion site in the 28s rRNA gene and uses the 3’
hydroxyl group exposed by this nick to prime reverse
transcription
of its RNA transcript. Furthermore, the recent finding that the reverse transcription
of a group II
intron, a12, of yeast mitochondrial
DNA is also accomplished by analogous target-DNA-primed
reverse transcription supports the generality of such a mechanism
(Zimmerly et al. 1995). However, in contrast to the results for R2Bm, it was shown recently that the 3’ untranslated region (UTR) of human Ll is not essential for
its retrotransposition
in cultured mammalian cells (Moran et al. 1996). Therefore, it remains essential to determine how the RTases of Ll and other LINES recognize
their RNA templates.
Determination
In R2Bm, a protein with both sequence-specific
endonuclease activity and RTase activity is encoded by a
single open reading frame (ORF) (Luan et al. 1993).
Several LINES, including human Ll, encode an endonuclease-like
domain in the second ORF, which resembles amino acid sequences of AP endonucleases
(Feng
et al. 1996; Martin, Olivares, and Ldpez 1996). The encoded endonucleolytic
activity of human Ll has been
verified biochemically
(Feng et al. 1996). These findings
raise the possibility that generation of a nick at the target
site and reverse transcription
might be coupled, even in
the case of LINES that have no apparent target site specificity.
CR1 elements were first described as members of
a SINE family (Stumph et al. 1981). The number of CR1
elements
in the chicken
genome
was estimated
as
7,000-30,000
from the results of hybridization
experiments (Stumph et al. 1981; Burch, Davis, and Haas
1993) and as 100,000 by sequence analysis (Vandergon
and Reitman 1994). Since most members of this family
were extensively truncated at their 5’ ends, no ORF with
significant similarity to ORFs that encode known polypeptides was identified (HachC and Deeley 1988). CR1
elements were subsequently
detected in representatives
of nine orders that encompass a wide spectrum of species in the class Aves (Chen et al. 1991). More recently,
long members of the CR1 family encoding an ORF segment were isolated, and the consensus sequence of CR1
elements was extended for up to 2,200 bp from the 3’
end (Burch, Davis, and Haas 1993). In view of the fact
that CR1 elements have common 3’ ends and variable
5’ truncations and of the finding that they contain a pollike ORF, Burch, Davis, and Haas (1993) concluded that
CR1 elements are members of a LINE family. The consensus sequence of 2.2 kb of CR1 was estimated to correspond to roughly half of the entire length of CRl.
Burch, Davis, and Haas (1993) also found that CRl-like
sequences in mouse and human (mammals), a frog (an
amphibian),
and a ray (a cartilaginous
fish) had been
deposited in DNA databases. Vandergon and Reitman
( 1994) detected sequences
similar to avian CR1 elements in a lizard (a reptile). In addition, they noted the
similarity between the avian CR1 element and the tortoise SINE. Therefore, CRl-like elements appear to exist in all classes of vertebrates. As far as we know, the
CR1 element provides the only example of a LINE family with phylumwide
distribution.
We recently reported the isolation of CRl-like elements with 5’ truncations
from the turtle genome
(Ohshima et al. 1996). We showed that the sequence at
the 3’ end of tortoise SINES was identical to that of the
CR1 element in the turtle genome (Ohshima et al. 1996).
This result suggested that the tortoise SINES might have
borrowed enzymes for their retroposition
from the CR1
elements in the turtle genome (Ohshima et al. 1996; see
also a recent review by Okada et al. 1997). In the present
work, we determined the entire consensus sequence of
CR1 in the turtle genome in order to obtain further information about the sharing of retropositional
machinery
between SINES and LINES.
of the Entire Sequence
of Turtle CR1
Materials and Methods
Determination
of the Consensus Sequence
Elements by Genomic DNA Walking
1207
of PsCRl
To determine the 5’ upstream sequence from the 5’
end of 4-a,
which contains 2.1 kb of the CR1 element from the 3’ end (Ohshima et al. 1996), we employed the method of genomic DNA walking that is
known as cassette PCR. In principle, the method was
employed as described in our previous study (Ohshima
et al. 1996), with modifications
as follows. Restriction
enzymes, EcoRI, Hi&III or Pst I, and the corresponding
cassettes were purchased from Takara (Shiga, Japan).
Fragments of approximately
5OO-1,000 bp, obtained as
products of PCR, were isolated from an agarose gel and
fractionated
on SizeSepTM 400 Spun Columns (Pharmacia, Uppsala, Sweden) to remove fragments shorter
than 400 bp. Longer fragments were ligated into the
pUC 18 or pUC 19 vector, and then the nucleotide
sequences of the cloned DNAs were determined. The consensus sequence of PsCRl was determined from these
sequences. We repeated this series of experiments eight
times and determined the entire consensus sequence of
PsCRl.
For the last 2.1-kb region of the 3’ end of
PsCRl, we constructed the consensus sequence from sequences of several genornic clones that had been isolated
from a genomic library of the side-necked turtle (Phcomplementary
to 4temys spixii) with oligonucleotides
2(Ps) as probes.
Designations
of Nucleotides
Sequence of PsCRl
in the Consensus
The nucleotide of every position in the consensus
sequence of PsCRl was determined from the results obtained from at least three clones. When two nucleotides
were present predominantly
at a certain position in the
consensus
sequence, the nucleotides
were represented
by the two-base
ambiguity
code (IUB single-letter
code): K, M, R, S, W or Y. When dinucleotides
in the
consensus
sequence
were CG, TG, or CA between
clones, the consensus sequence of the dinucleotide was
shown to be CG. The divergence might possibly have
resulted from methylation
and subsequent deamination.
The entire consensus sequence of PsCRl has been deposited in the DDBJ, EMBL, and GeneBank nucleotide
databases with accession number ABOO
1.
Search of Databases
and Phylogenetic
Analysis
A search was made for sequences homologous
to
PsCRl both at the nucleotide level and at the amino acid
level using the BLAST program (Altschul et al. 1990).
Construction
of a phylogenetic
tree and calculation of
bootstrap values were performed using programs in the
PHYLIP package (Felsenstein
1995).
Estimation
of the Copy Number
of PsCRl
Dot blot analysis was performed to estimate the
copy numbers of PsCRl with various 5’ truncations.
Progressively
decreasing
amounts of genomic DNA
from P. @xii and of cloned DNA were dotted on a
membrane.
Four kinds of probe were prepared as follows. A DNA fragment of approximately
160 bp was
1208
Kajikawa
et al.
A
0
I
I
1
I
2
I
I
I
3
I
1
4
I
kb
6
5 ’UTR
I
I
ORFI
I
ORF2
3’UTR
I
I
3’derminal
r repeats
4393 4480 bp
FIG. l.-Determination
respective clones of PsCRl
for PsCRl was determined.
of the entire consensus sequence of CR1 elements from the turtle, designated PsCRl. A, Nucleotide sequences of
with various 5’ truncations (represented by bars) were determined. From these sequences, the consensus sequence
B, Schematic representation of PsCRl (see text for details).
amplified by PCR, with a cloned DNA that was nearly
identical to the consensus sequence of PsCRl as template and a set of primers, for amplification of particular
regions of PsCRl, such as nucleotides 3 lo-47 1, 103 l1197, 2354-25 11, and 3600-3774,
respectively,
in the
presence of [a-32P]dCTF? Hybridization
was performed
at 42°C in 50% formamide. Washing was performed in
a solution of 2 X SSC and 1% SDS at 55°C for 60 min.
From comparisons
of the intensities
of spots obtained
with the genomic DNA and the cloned DNA, we were
able to estimate the copy number. The haploid genome
of the turtle was assumed to contain 2 X lo9 bp.
Results and Discussion
Determination
of the Entire Consensus
the CR1 Elements in the Turtle
Sequence
of
In a previous study, we isolated CRl-like elements
from the turtle genome (Ohshima et al. 1996). One
clone, designated 4-I,
exhibited extensive similarity
to chicken CR1 in the region of the 2.1-kb EcoRI fragment (64% similarity over the entire 2.1 kb). We tried
to determine the farther-upstream
sequence of CR1 elements to characterize the entire structure of this family.
Because these elements have 5’ truncations
at various
positions and, moreover, because only 2 kb of the sequence of chicken CR1 had been reported from the 3’
end, we adopted the strategy of gradual extension in the
5’ direction. First, the genomic DNA of the side-necked
turtle was digested with a restriction enzyme, and cassettes were ligated to the fragments. Next, we synthesized oligonucleotides
complementary
to the turtle CR1
sequence that had already been determined. Using these
oligonucleotides
and those complementary
to the cassette, we performed nested PCR. Products of PCR were
cloned and their sequences were determined. The clones
that we obtained had variable 5’ truncations,
and we
determined the consensus sequence from these sequences
(fig. 1A). We repeated this series of experiments
until
we had constructed
turtle CR 1 elements.
the entire
consensus
The First ORF of Turtle CR1 Encodes
a Novel Zinc Finger Motif
sequence
a Protein
of
with
Figure 1B shows the structure of turtle CRl, designated PsCRl (Ps stands for Plutemys spixii). PsCRl
contains two overlapping
ORFs. ORFl begins at a position 474 bp from the 5’ terminus and encodes a protein
of 334 amino acids from the first ATG codon.
To date, more than 30 full-length
sequences of
LINES have been determined. The sequences generally
encode one or two type(s) of cysteine-rich
motif. One
motif is CX2CX4HX4C, which is characteristic of retroviral gag genes and has been identified in ORFl of
many of the LINES described to date (Jakubczak, Xiong,
and Eickbush 1990; Leeton and Smyth 1993). The gag
protein is a nucleocapsid
protein, and the zinc-fingerlike motif (Berg 1986, 1990; Sanchez-Garcia
and Rabbitts 1994) in the gag protein is essential for the specific
packaging of viral RNA (Gorelick et al. 1988). Although
most LINES encode this motif, several LINES, such as
Ll (Hohjoh and Singer 1996), Dong (Xiong and Eickbush 1993) and R4 (Burke, Miiller, and Eickbush 1995),
do not. Another
cysteine-rich
motif is CX1_3CX7_8
HX4C, which has been identified at the carboxyl terminus of the protein encoded by 0RF2 of many LINES
downstream
of the reverse transcriptase
domain (Jakubczak, Xiong, and Eickbush 1990; Leeton and Smyth
1993). The function of this cysteine-rich
motif is currently unknown. Several LINES, such as F (Di Nocera
and Casari 1987), Jockey (Priimagi, Mizrokhi, and Ilyin
1988), Dot (O’Hare et al. 1991), Juan-A (Mouches,
Bensaadi, and Salvado 1992), Juan-C (Agarwal et al.
1993), NLRlCth (Blinov et al. 1993), TART (Sheen and
Levis 1994), and BSl (Udomkit et al. 1995), lack this
motif. At present, it appears that only a few LINES, such
as Tl (Besansky
1990) and Q (Besansky, Bedell, and
Mukabayire
1994), lack both these motifs.
Determination
of the Entire Sequence
of Turtle CR1
1209
A
ORFI
1
11
cx,,cx,,c~,~~
120
-
11
Zinc finger motif
CX2CXj4CX2C
11
217
334 aa
cx&&x,,c
32
B
P&R1
SLl (hTAF,63)
TIF-I6
(mTAF,68)
11
47
13
FIG. 2.-The
first ORF (ORFl) of PsCRl encodes a protein with a zinc finger motif which resembles that of the species-specific transcription
factors SLl and TIF-IB. A, ORFl of PsCRl has cysteine residues with unusual spacings. The constantly spaced cysteine residues are denoted
by “C” and the spacings are shown by numbers beside “X,” which stands for any amino acid. At the beginning of the cysteine cluster, from
residues 11 to 32, there is a zinc-finger-like
motif, CX,CX,,CX,C.
The numbers above the line are numbers of residues from the 5’ end of
ORFl. B, The zinc-finger-like
motif has cysteine residues with identical spacings and a similar amino acid composition to those found in
transcription factors SLl (Comai et al. 1994) and TIF-IB (Heix et al. 1997). Identical or similar amino acid residues in the three sequences are
shaded. The cysteine residues that can potentially form a zinc finger are emphasized by shaded boxes. The numbers at the beginning and end
of the each sequence indicate the numbers of residues from the 5’ end of the deduced protein.
PsCRl also lacks both of the two motifs discussed
above. Instead, PsCRl has two unusual arrangements of
Cys residues. ORFl of PsCRl encodes a protein with
constant
spacing between
Cys residues
as follows:
CX20CX21CX19C
(from
residues
11 to 74) and
CX&X&X&
(from residues 120 to 217). The former motif includes a zinc-finger-like
motif, CX2CX14
CX2C (Berg 1986, 1990; Sanchez-Garcia
and Rabbitts
1994) (fig. 2A). This zinc-finger-like
motif has cysteine
residues with identical spacings and a similar amino acid
composition
to those found in transcription
factors SLl
(Comai et al. 1994) and TIF-IB (Heix et al. 1997) (fig.
2B). SLl consists of a TATA-binding
protein (TBP) and
three TBP-associated
factors (TAFs). One of the latter
factors, TAFi63, contains
two putative zinc fingers,
CX2CX&X2C
and CX,HX1sHX3C (Comai et al. 1994).
TAFi63 can be cross-linked
to the rDNA promoter, and
it has been shown to be involved in the binding of SLl
to this promoter (Beckmann et al. 1995). mTAFi68, the
murine homolog of human TAFi63, also includes the
corresponding
zinc finger (fig. 2B). Although this similarity suggests that the putative zinc finger of PsCRl
might play a role in DNA binding, such an activity of
ORFl proteins has not been demonstrated.
It remains to
be seen whether the zinc finger of PsCRl has a function
in DNA recognition
or participates
in an alternative
method of RNA binding.
The Product of 0RF2 of PsCRl Contains a Putative
Endonuclease
Domain in its Amino-Terminal
Region
0RF2 (from position 1475 to position 4393) encodes a protein of 963 amino acids, which starts from
the first ATG codon at position 1505. The beginning of
0RF2 overlaps ORFl by 22 bp in the - 1 reading frame.
The putative protein product contains the conserved domains found in all reverse transcriptases
(Xiong and
Eickbush 1990) from residue 509 to residue 773 (fig.
3). The amino acid sequence of this PsCRl RTase indicates that PsCRl is most closely related to a group of
LINE families that includes the chicken CR1 (Burch,
Davis, and Haas 1993) and the mosquito Tl elements
(Besansky 1990). Sequences from Cuenorhabditis eleguns that encode putative reverse transcriptase domains
were identified in the nucleic acid database, and they are
also closely related to this group of sequences (accession
numbers
and locations:
U46668, F38E9.3;
U57054,
B0478.2; U64846, F47D2.2; and many others; figs. 3
and 4). These sequences also contain a region that corresponds to the endonuclease
domain (see below). We
suggest that they belong to a family of LINES in the
genome of C. elegans. We shall refer to these elements
collectively
as CeCRT, which stands for the LINES of
c eleguns that resemble ml
and _Tl.
The amino-terminal
amino acid sequence encoded
by ORF2 of PsCRl (49-259) was remarkably similar to
1210
Kajikawa
et al.
PsCRl (turtle)
CR1 (chicken)
Tl (mosquito)
CeCRT (nematode)
509
domain I
SSWRLGEVPDDWKlU4NIVPIF~GK+
domain II
DPG ti PVSLTSIPGKIMEQVLKESILRHLEER
258
515
549
568
317
574
608
domain III
PsCRl (turtle)
CR1 (chicken)
CR1 -like (frog)
Tl (mosquito)
CeCRT (nematode)
ICVIRSSQHGFTKGKSCLTNLIAFYEEVTGSVD
R~~~D.....~R.R......V...~~~~..
E.~.T...~~~LT..~~M.P.H~....
--•SPK..**MP*R*TS***MS*VTNIFR*FE
-- .SK.*F**MNSR**TLA*LNACSKILD*LT
PsCRl (turtle)
CR1 (chicken)
CRl-like (frog)
Tl (mosquito)
CeCRT (nematode)
686
435
79
690
726
domain V
domain VI
PsCRl (turtle)
CR1 (chicken)
CR1 -like (frog)
Tl (mosquito)
CeCRT (nematode)
PsCRl (turtle)
CR1 (chicken)
CR1 -like (frog)
Tl (mosquito)
CeCRT (nematode)
627
376
20
631
666
744
493
137
749
781
GWRNPMHSYRLGTDELGS..R..R.L...EGAV.E.l-C.NI.L.-.NGTA..KSHSLSPI*FNYTLSNSSLS
lKN**KFV*TANGIIIAK-
domain VII
SS2UXDLGvTv
.e.......LM
.IM......L.
SIR***IIL
c KKSV****IF*
773
522
164
780
810
FIG. 3.-Comparison
of amino acid sequences in the reverse transcriptase
domains encoded by PsCRl and several LINES that gave the
highest homology scores in this region. Dots denote amino acids identical to those in the PsCRl product. Amino acids with chemical properties
similar to those in the PsCRl product are indicated in boldface. Gaps (-) have been introduced to maximize homology. The conserved domains
found in all reverse transcriptases
(Xiong and Eickbush 1990) are boxed. The numbers of residues from the 5’ end of 0RF2 are shown. Sources
of sequences are as follows: CR1 (Burch, Davis, and Haas 1993), Tl (Besansky 1990), CeCRT (DDBJ/EMBL/GeneBank,
accession number
U46668; gene location F38E9.3), and frog CRl-like element (a consensus sequence in sequences with accession numbers M24187 and X71067;
5’ truncations are found).
the corresponding regions encoded by the Tl element and
Q element of Anopheles (Besansky, Bedell, and Mukabayire 1994) and, to a lesser extent, by NLRlCth of Chironomus (Blinov et al. 1993), Juan-C of C&X (Agarwal
et al. 1993) and the putative LINE family of C. elegans
mentioned above (fig. 5). Recently, it was reported that
the corresponding regions encoded by several LINE families, including the Q element, have several domains that
are highly homologous to members of the AP endonuclease family and that the active residues of exonuclease
III are included in these domains (Martin, Olivares, and
Lopez 1996). Figure 5 shows an alignment of the amino
acid sequence encoded by PsCRl and the sequences encoded by LINES, together with part of human endonuclease I for comparison.
The deduced amino acid sequence encoded by PsCRl corresponds closely to the domains defined by Martin, Olivares, and Lopez (1996), in
particular to domains I, II, III, V, VI, VIII, and IX. These
similarities suggest the potential endonucleolytic
of the PsCRl protein in this region.
activity
The 5’ Untranslated
Region of PsCRl Contains a
Sequence that Resembles the Human Ll Promoter and
Several cis Elements Found in Eukaryotic Genes
The 5’ untranslated
region (5’ UTR) of PsCRl is
473 bp long. This region contains three sets of direct
repeats (DRs) (fig. 6A). One is 48 bp long with an interval of 304 bp (double underlined),
another is 19 bp
long with an interval of 19 bp (underlined),
and the
other is 23 bp long with an interval of 28 bp (dashed
line). Several deletions or insertions, ranging from several nucleotides
to several dozen nucleotides,
were
found in the 5’ UTR and, in particular, in the regions of
DRs of various clones (not shown). These observations
suggest that the region might have undergone frequent
recombinational
events.
Determination
a7
89
XICRl
Tl
I
al -pa
FIG. 4.-Phylogenetic
relationships
among products of PsCRl
and CeCRT and other LINES. The phylogenetic tree is based on seven
amino acid domains that contain a total of 178 residues and have been
identified in all reverse transcriptases
(Xiong and Eickbush 1990). The
tree was constructed by the neighbor-joining
method (Saitou and Nei
1987). The numbers above the branches indicate the bootstrap values
per 100 replications, which provide an indication of the statistical significance of the nodes. A group II intron was used as an outgroup to
root the tree, as described by Burke, Mtiller, and Eickbush (1995).
Sources of sequences are as follows: R2Bm (Burke, Calalang, and
Eickbush 1987); R2Dm (Jakubczak, Xiong, and Eickbush 1990), LlHs
(Hattori et al. 1986), LlMd (Loeb et al. 1986), Jockey (Priimagi, Mizrokhi, and Ilyin 1988), NLRlCth
(Blinov et al. 1993), Juan-A
(Mouches, Bensaadi, and Salvado 1992), Q (Besansky, Bedell, and
Mukabayire
1994), and al-pa (Osiewacz and Esser 1984). The CRllike elements in the genome of Xenopus laevis are designated XlCRl
in this figure.
r
AP ENase
I
66
i-
domain 11
WmV DmR
(7) a-
of the Entire Sequence
of Turtle CR1
1211
Minakarni et al. (1992) showed that the nucleotide
sequence of human Ll from position 3 to position 26
promoted expression of the gene for chloramphenicol
acetyltransferase
(CAT) in HeLa cells, and they designated this region Ll site A of the human Ll promoter
(fig. 6B). In the nucleotide
sequence of PsCRl,
we
found that the sequence from nucleotide (nt) 58 to nt 65
was identical to that of the first eight nucleotides of Ll
site A, as shown by a hatched box in figure 6A. This
coincidence
suggests the presence of a common transcription factor that binds to the corresponding
sites of
PsCRl and human Ll . However, the nine nucleotides
downstream
of this site in PsCRl show no significant
homology
to the corresponding
region in Ll site A,
which is the target core element for the pol II transcription factor YYl (Becker et al. 1993; Kurose et al. 1995).
Therefore, the putative protein that might bind to the
common site in PsCRl and human Ll is probably different from YY 1. The region of PsCRl corresponding
to the core element for binding of YY 1 to human Ll is
replaced exactly by an “E box,” the cis element for
binding of the basic helix-loop-helix
(bHLH) family of
proteins (Murre, McCaw, and Baltimore 1989; Murre et
al. 1994), which are regulatory factors essential for determination of cell type, such as members of the MyoD
family. These proteins bind as dimers to DNA sequences
that generally share the consensus CANNTG (the E box;
Murre et al. 1989) (fig. 6B). Putative E boxes, including
the one mentioned above, are clustered in the 5’ UTR
of PsCRl (boxed in fig. 6A). Within a region of about
500 bp, there are nine E boxes. In addition to the E
boxes, other potential binding sites for c-myb (Howe,
Reakes, and Watson 1990) are also found in this region
(fig. 6A).
I
domain II ‘-1
D WMK E E A PID[Cm
QmK
C S E
domain Ill -1
(23) KE GYWL
S RQ
(27)
PsCRl
CeCRT
r
AP ENase
I
domain V -I
F VDV
T 4-N
I
A
(26)
domain VI -1
P~~C~V~lHIE
r domain VIII -,
E I D (62) mDIY F-1
r
(17) p]C
domain 1x7
PmTmYm
PsCRl
259
CeCRT
Q
Tl
NLRl Cth
Juan-C
310
323
220
227
223
FIG. 5.-The
amino-terminal
region of the deduced product of 0RF2 contains a putative endonuclease domain. The amino acid sequ lence
of the amino-terminal
region of PsCRl and those encoded by several LINES with strong similarity to PsCRl in this region are aligned. When,
of six amino acids at a certain position, at least four amino acids have similar chemical properties, they are highlighted in gray. The sequences
are also compared with part of human endonuclease
I (AP ENase I). When a residue in human endonuclease I has similar chemical properties
to those of residues encoded by LINES that are highlighted in gray, the residue in the endonuclease is also highlighted. The conserved nuclease
domains defined by Martin, Olivares, and Lopez (1996) are indicated. Numbers in parentheses are the numbers of amino acid residues between
conserved domains. The numbers at the beginning and end of the each sequence are the numbers of residues from the 5’ end of the ORE
1212
Kajikawa
et al.
A
100
E box
50
GTGCTACRTGAGGGGAGCTGTGTTGTG?
AGTGAGCTGYGAACARAGGAGAGGC~CAGAAGGAGTTTGCCTGGGAT~TGTCC
200
c-Myb
150 E box
E box
GCTAGAGGGGTGAGTATCTGAGAGA~~~TTGACTGGTGC~GTT~~~CTGTGTGTGTGATTGTGACTGGT~~~AGGGACTGTTT~
.....................................................................
300
E box
250
E box (c-Myb)
~CAGTTG~CCGTGTGTGTGATTGATTGAAAAGTGTGAATGGCACTGAGCY~
.................... .................................................
E box
400
E box (c-Myb) 350 E box
GG~~~TTYGAGTCAGCAGCCTTATAAGAAGCAG~~~G~~C~GTG~GCTGC~CAGAGGAGAGGC~CAG~GGAGTTTGCCTGGG~
E box
450
GTTCACCTTGGGGGAGAGCCCAYAGYGGGTTTTTGCCTTTCAGACTTAG~TGAGCAGT~TACA~CATCTG~GAGGCTCTCAGAGG~GA~
B
Ll site A
PsCRl
5%
81
E box
FIG. 6.-The
5’ untranslated region (5’ UTR) of PsCRl contains sequences similar to those in the human Ll promoter and several cis
elements that have been found in eukaryotic genes. A, E boxes and several binding sites for transcription factors are found in this region. The
5’ UTR of PsCRl contains three sets of direct repeats, which are indicated by double underlining,
underlining,
and dashed underlining,
respectively. The putative initiation codon is highlighted in black. B, A nucleotide sequence in the 5’ UTR of PsCRl is identical to that of the
first eight nucleotides of Ll site A (shaded box). The numbers at the beginning and end of Ll site A are the numbers of residues from the 5’
end of human Ll (Minakami et al. 1992). The region of PsCRl corresponding
to the core element for binding of YY 1 in human Ll is replaced
exactly by the E box, a cis element for binding of members of the basic helix-loop-helix
family of proteins.
The biological significance
of such sequences in
PsCRl is unknown. However, the possibility that cellular
transcription factors that bind to these sequences might act
in concert to regulate the expression of PsCRl is clearly
of interest. Cooperation between different E boxes on the
same promoter and cooperative binding of bHLH proteins
with another class of transactivators have been generally
recognized in the regulation of transcription of tissue-specific genes (Weintraub et al. 1990; Genetta, Ruezinsky, and
Kadesch 1994; Di Rocco et al. 1997).
940
The 3’ Untranslated Regions of CR1 from Reptiles and
a Bird Exhibit Strong Conservation Among Species
Figure 7 shows an alignment of the 3’-end sequences
of CR1 elements from reptiles and a bird. The predicted
amino acid sequences of the carboxy-terminal
regions
encoded by ORF2 from these four species (four top lines
in fig. 7) seem to be strongly conserved.
To determine whether the 3’ UTRs of these CRls
might be under some selective constraint, we calculated
the nucleotide sequence divergences among the regions
960
950
turtle
snake
lizard
chicken
4350
turtle
snake
lizard
chicken
CCT AGG
GAG GTG GTG GCA TCT
.*A lco . . . UT G..
. . . ca
..c 0.. lcl’ . . . . . . lAG GT*
. .c . .A . . . l.‘JJ . . . .AT G..
CCA TCT
TTA
GiG
GTT
TTT
AA0
CTG CGG CTT
. . . A.A ceG .eA . . . . . . . . . -0
0-T
0-C
.TG ..c
0-G
. . .
lCo
c.0
...
l.G l.c
. . .
. . .
-A-
. . . wc
GAC AAA
AeA .*G . . . .m
A.0
l*G
A..
TOG l.T m
l*T
ACC
CTC
GCT GGG AT0
CAT TOG T*c .u
GGC CAT
G..
l.G
T-G
0-G
lGC’
mc
. . .
ATT
TAG
. . . 0.A . . .
Go.
C..
.GA
coo G*c . . .
FIG. 7.-The
3’-end sequences of the CR1 elements from reptiles and a bird are compared. Dots indicate nucleotides identical to those in
the sequence from turtle. The highly conserved regions in the 3’ UTR are shaded. The 8-bp direct repeat in the 3’ termini are indicated by
arrows. In addition, the carboxy-terminal
regions are compared. When, of four amino acids at a certain position, at least three amino acids have
similar chemical properties, they are boxed. Numbers above the amino acids are the numbers of residues from the 5’ end of the protein encoded
by ORF2 of PsCRl. Sources of sequences are as follows: turtle (PsCRl), this paper; snake, a consensus sequence for sequences with accession
numbers D31777, D13384, D31782, and D31779 (5’ truncations are found); lizard, accession number L31503 (5’ truncation is found), and
chicken (CRl; Burch, Davis, and Haas 1993).
Determination
Table 1
Rates of Nonsynonymous
Element
Turtle
Turtle. . . . . .
Snake......
Lizard . . . . .
Chicken. . . .
Substitution (&) for the CR1
Snake
Lizard
Chicken
0.471 +- 0.064
0.359 + 0.052
0.357 + 0.050
0.340 r 0.052
0.491 + 0.067
0.463 + 0.062
NOTE.-& values were calculated by the method of Ina (1995) from the last
288 bp (96 codons) of 0RF2.
that encode 0RF2 and the 3’ UTRs from these species
that are available (tables 1 and 2). The values for nonsynonymous
substitutions per site (&) ranged from 0.34
to 0.47 (table 1). In most cases, the value for synonymous substitutions
per site (ds) was saturated
(not
shown). The dN value for the gene for P-globin in birds
and mammals is 0.24 (Li, Wu, and Luo 1985), and that
of the gene for P-crystallin
is 0.07-0.12
(Aarts et al.
1989). Most protein-encoding
genes of mammals have
dN values that range from 0.005 to 0.211 (Ohta 1995).
The results suggest that CR1 in each lineage has been
under selective pressure with respect to expression of
the protein product since the value for synonymous
substitutions is much higher than that for nonsynonymous
substitutions.
We were surprised that the value for the
sequence divergence of the 3’ UTR between species was
even lower than the & value of 0RF2 (table 2). These
results suggest the presence of some strict functional
constraint in this region. The results also reminded us
of results for the R2 elements of Drosophila species.
The value for the sequence divergence of the 3’ UTR
of R2 elements between species was only twice the dN
value and one third of the ds value in the coding region
(Eickbush et al. 1995).
During integration
of R2 elements by the target
DNA-primed
mechanism,
the R2 protein binds specifically to a region in the 3’ UTR of the RNA template to
prime reverse transcription
(Luan et al. 1993; Eickbush
et al. 1995; Luan and Eickbush 1995; Mathews et al.
1997). Our demonstration
that the 3’ UTR of CRls has
been under strict selective constraint suggests that the
conserved 3 ‘-end sequence of CR1 s is also the recognition site for their reverse transcriptase.
The Possible Recruitment
Tortoise SINE
of CR1 Enzymes
by the
The 3’ end of the chicken CR1 element is defined
by the presence
of an 8-bp direct
repeat,
5’(CATTCTRT)(GATTCTRT)-3’
(Silva and Burch 1989).
Almost the same repeat, 5’-(TATTCTAT)(GATTCTAT)3’, is found in reptilian CRls (fig. 7). Chicken CR1
elements have been integrated into preferred target sites
that resemble the 3’ repeat units (Silva and Burch 1989).
In the present study, we found that the amino-terminal
region of the deduced product of 0RF2 of turtle CR1
contains a putative endonuclease
domain. The endonucleolytic activity of the product of the second ORF of
human L 1 was recently
demonstrated
biochemically
The CR1 endonu(Feng et al. 1996; see Introduction).
of the Entire Sequence
of Turtle CR1
1213
Table 2
Nucleotide Sequence Divergence of the 3’ UTR of the
CR1 Element
Turtle
Turtle. . . . .
Snake......
Lizard . . . . .
Chicken. . . .
Snake
Lizard
0.045 +- 0.033
0.160 ? 0.063
0.205 t 0.072
Chicken
0.139
0.135
0.299
% 0.066
+ 0.059
+ 0.087
NOTE.-Distances
were calculated by the method of Adachi and Hasegawa
(1996) on the basis of the region shaded in figure 7 (56 informative sites).
clease might cleave sequences that resemble the 3’ repeat units. Then the RTase of CR1 might prime reverse
transcription
from the free 3’ ends at nicked target sites
that can hybridize to a repeat unit within the CR1 transcript. In this process, the conserved sequence in the 3’
UTR of CRl, mentioned above, might provide the recognition site for the RTase on the RNA template. The
involvement
of the target-DNA-primed
mechanism
in
the reverse transcription
of CR1 was first proposed by
Burch, Davis, and Haas (1993).
We reported recently that the sequence at the 3 ’end
of CR1 in the turtle genome is nearly identical to that
of a family of tortoise SINES (tortoise Pol III/SINE;
Ohshima et al. 1996). SINES are short (approximately
80-400 bp) repetitive elements which have a composite
structure with regions homologous to a tRNA region, a
tRNA-unrelated
region, and an AT-rich region (Okada
1991a, 1991b; Ohshima et al. 1993; Okada and Ohshima
1995). SINES do not encode the enzymes required for
their amplification,
such as RTases, so they must “borrow” these enzymes from other sources. The general
finding that 3’ ends are shared by SINES and LINES has
been reinforced by the finding of examples other than
the pair of tortoise Pol III/SINE and CRl. Thus, it seems
likely that each SINE family recruited the enzymatic
machinery
for retroposition
from the corresponding
LINE through a common “tail” sequence (Ohshima et
al. 1996; see also a recent review by Okada et al. 1997).
As discussed above, the conserved 3’-end sequences of CRls probably serve as the recognition
sites for
their RTase (fig. 7). It is noteworthy that only the conserved region is shared with the tortoise Pol III/SINE
(fig. 8). This observation
supports our hypothesis that
the tortoise SINE might have acquired retropositional
activity by gaining the 3’-end sequence of the CR1 element (Ohshima et al. 1996). However, it should be noted that the 8-bp direct repeat, which is prominent in the
3’-terminal region of CRls, is not found in the 3’-terminal region of the tortoise Pol III/SINE. In the latter
case, an AT-rich sequence of variable length is found
(not shown). The molecular mechanism responsible for
this difference is unknown. The difference might reflect
the ability of the CR1 RTase to add “nontemplated”
nucleotides
to the target DNA before it engages the
RNA template (such an activity has been found in
R2Bm; Luan and Eickbush 1995), and/or it might reflect
the participation
of other cellular components in the integration of SINES (Rogers 1985).
1214
Kajikawa
et al.
ORF2
3’UTR
turtle PsCRl
tortoise Pol III/SINE
6\
turtle PsCRl
tortoise Pol III/SINE
I IIIIIIII
III
IIIIIIII
IIIIII
IIIIIIII
I
GGC
I
II
GGAGATTGGTATATCTCCAATTATT
100
150
FIG. S-The
sequence conserved at the 3’ ends of CR1 elements from several species is also found in the tortoise Pol III/SINE. Structures
of PsCRl and tortoise Pol III/SINE are shown schematically
(top). The common sequence in PsCRl and tortoise Pol III/SINE is denoted by
boxes with oblique shading. The nucleotide sequences of PsCRl and the SINE in this region are compared (bottom). Common sequences are
boxed. The region in the 3’ UTR of the CR1 element that is strongly conserved among species (fig. 7) is shaded.
The PsCRl
Elements
Form at Least Two Subfamilies
In general, LINE elements have frequent 5 ’ truncations of various lengths (Hutchison et al. 1989; Eickbush 1994). To estimate the copy numbers of PsCRl
elements of various lengths, we performed dot blot analysis using several probes that corresponded
to distinct
blocks of PsCRl (fig. 9). We estimated that, in each
haploid genome, about 400 copies of CR1 were nearly
full length (4,000 bp), whereas about 10,000 copies of
CR1 were truncated at positions as far as 3,500 bp from
their 5’ ends. It seems that more than 40% of elements
of PsCRl extend as much as 2 kb from their 3’ ends.
Copies
12’ooo1
L
n
10,000 -
8,000 -
6,000 -
4,000
-
2,000 -
1
-~
n
n
1
Ia
4
n
I
3
I
I
2
t
I
1
t
I
0
kb
FIG. 9.-Estimated
copy numbers of PsCRl elements with various
5’ truncations in the turtle genome. The vertical axis represents the
haploid copy number, and the horizontal axis represents the position
(in kb) of the probe used in the dot blot analysis. The 3’ end of PsCRl
is at the right. Each point indicates the total copy number of PsCRl
elements longer than the indicated length on the horizontal axis with
various 5’ truncations.
This result contrasts with the result for chicken CRl:
only 0.1% of elements of chicken CR1 extend as much
as 2 kb from their 3’ ends (Burch, Davis, and Haas
1993).
During the course of our efforts to construct the
consensus sequence of PsCRl , we found that PsCRl can
be divided into two subfamilies on the basis of correlated changes in particular nucleotides
which can be
considered diagnostic nucleotides
(table 3; Smit et al.
1995). The two subfamilies can be distinguished
from
each other in terms of 10 diagnostic nucleotides
in a
region of approximately
400 bp that corresponds to part
of the region that encodes the RTase. Among the 10
sites, 4 substitutions
result in changes in amino acids.
The values for the nucleotide
divergences
between
members of the type I subfamily, as determined by pairwise comparison,
range from 3.9% to 12.8% (average
9.5%), and those of the type II subfamily range from
14.2% to 20.3% (average 17.1%), suggesting that type
II is older than type I. Chicken CR1 was classified previously into six subfamilies, designated A through E by
phylogenetic
analysis (Vandergon and Reitman 1994).
Some CR1 elements
from avian species other than
chicken, such as duck, were grouped with members of
different subfamilies
from the chicken and not with
members of the respective species, demonstrating
that
multiple subfamilies must have existed early in the avian
evolution. We examined the relationships of the subfamilies of PsCRl to the chicken subfamilies. Phylogenetic
analysis indicated that the two subfamilies of PsCRl are
more closely related to each other than to any subfamilies in the chicken and, moreover, that the turtle CR1
lineage might have diverged at an early time from the
avian CR1 lineage, even though the statistical significance of results was not particularly high (not shown).
The CR1 s of reptiles and birds might have evolved from
a few ancestral elements in the genome of a progenitor
common to reptiles and birds and retained their identity
during the course of their respective host’s divergence,
which is estimated to have occurred more than 250
MYA, with the generation of multiple lineages of descendants in each species.
Determination
Table 3
Two Subfamilies
of PsCRl
Elements
Can Be Distinguished
of the Entire Sequence
on the Basis of Diagnostic
of Turtle CR1
1215
Nucleotides
POSITIONS
CLASS
Type I
CLONENAME
....
Type II.
...
Consensus
CR1 4-2
Ps 5-3
Ps 2-7
Ps 2-6
Consensus
Ps 4-5
Ps 2-3
Ps 4-3
Ps 2-2
Ps 2-4
3030
3061
3091
3094
3106
3146
3199
3222
3229
3442
G
C
C
G
.b
0
0
0
A
A
A
A
A
A
0
0
0
A
0
A
0
G
0
0
0
0
0
0
A
0
0
G
0
0
0
A
0
T
0
0
T
T
T
0
A
0
0
0
0
G
G
G
G
G
G
0
0
A
A
A
A
G
G
G
G
T
T
T
0
T
T
T
T
T
T
0
A
A
A
A
A
A
0
G
G
G
G
G
G
G
G
G
G
a “Position” indicates the number of residues from the 5’end of PsCR1.
bDots indicate nucleotidesidentical to those in the consensus sequence of type I elements.
Acknowledgments
The authors thank Ying Cao for calculations of nucleotide sequence divergence of CR1 elements and Takesi Sasayama for sequencing the SINES from the softshelled turtle. The authors also thank Dr. Ren Hirayama
for identification
of the turtle species. This work was
supported by a Grant-in-Aid
for Specially Promoted Research from the Ministry of Education, Science, Sports
and Culture of Japan.
LITERATURE
CITED
AARTS, H. J. M., E. H. M. JACOBS, G. VAN WILLIGEN, N. H.
LUBSEN, and J. G. G. SCHOENMAKERS.1989. Different evolution rates within the lens-specific B-crystallin gene family.
J. Mol. Evol. 28:313-321.
ADACHI, J., and M. HASEGAWA. 1996. Computer science monograms for molecular phylogenetics
based on maximum
likelihood. Institute of Statistical Mathematics, Tokyo.
AGARWAL, M., N. BENSAADI, J.-C. SALVADO, K. CAMPBELL,
and C. MOUCHBS. 1993. Characterization
and genetic organization of full-length copies of a LINE retroposon family dispersed in the genome of Culex pipiens mosquitoes.
Insect Biochem. Mol. Biol. 23:621-629.
ALTSCHUL, S. E, W. GISH, W. MILLER, E. W. MYERS, and D.
J. LIPMAN. 1990. Basic local alignment search tool. J. Mol.
Biol. 215:403-410.
BECKER, K. G., G. D. SWERGOLD, K. OZATO, and R. E. THAYER. 1993. Binding of the ubiquitous nuclear transcription
factor YYl to a cis regulatory sequence in the human
LINE-l transposable element. Hum. Mol. Genet. 10:16971702.
BECKMANN, H., J.-L, CHEN, T. O’BRIEN, and R. TJIAN. 1995.
Coactivator and promoter-selective
properties of RNA polymerase I TAFs. Science 270:1506-1509.
BERG, J. M. 1986. Potential metal-binding domains in nucleic
acid binding proteins. Science 232:485-487.
1990. Zinc fingers and other metal-binding
domains.
J. Biol. Chem. 265:65 13-65 16.
BESANSKY, N. J. 1990. A retrotransposable
element from the
mosquito Anopheles gambiae. Mol. Cell. Biol. 10:863-87 1.
BESANSKY, N. J., J. A. BEDELL, and 0. MUKABAYIRE. 1994.
Q: a new retrotransposon
from the mosquito Anopheles
gambiae. Insect Mol. Biol. 3:49-56.
BLINOV, A. G., Y. V. SOBANOV, S. S. BOGACHEV, A. I? DONCHENKO, and M. A. FILIPPOVA. 1993. The Chironomus
thummi genome contains a non-LTR retrotransposon.
Mol.
Gen. Genet. 237:412-420.
BOEKE, J. D., and K. B. CHAPMAN. 1991. Retrotransposition
mechanisms. Curr. Opin. Cell Biol. 3:502-507.
BURCH, J. B. E., D. L. DAVIS, and N. B. HAAS. 1993. Chicken
repeat 1 elements contain a pal-like open reading frame and
belong to the non-long terminal repeat class of retrotransposons. Proc. Natl. Acad. Sci. USA 90:8199-8203.
BURKE, W. D., C. C. CALALANG, and T. H. EICKBUSH. 1987.
The site-specific
ribosomal insertion element type II of
Bombyx mori (R2Bm) contains the coding sequence for a
reverse transcriptase-like
enzyme. Mol. Cell. Biol. 7:22212230.
BURKE, W. D., E MUELLER,and T. H. EICKBUSH. 1995. R4, a
non-LTR retrotransposon
specific to the large subunit rRNA
genes of nematodes. Nucleic Acids Res. 23:4628-4634.
CHEN, Z.-Q., R. G. RITZEL, C. C. LIN, and R. B. HODGE-ITS.
1991. Sequence conservation in avian CR1 : an interspersed
repetitive
DNA family evolving under functional constraints. Proc. Natl. Acad. Sci. USA 88:5814-5818.
COMAI, L., J. C. B. M. ZOMERDIJK, H. BECKMANN, S. ZHOU,
A. ADMON, and R. TJIAN. 1994. Reconstitution of transcription factor SLl: exclusive binding of TBP by SLl or TFIID
subunits. Science 266: 1966-1972.
DI NOCERA, l? P, and G. CASARI. 1987. Related polypeptides
are encoded by Drosophila F elements, I factors, and mammalian Ll sequences. Proc. Natl. Acad. Sci. USA 84:58435847.
DI Rocco, G., M. PENNUTO, B. ILLI et al. (13 co-authors).
1997. Interplay of the E box, the cyclic AMP response element, and HTF4/HEB in transcriptional regulation of the
neurospecific,
neurotrophin-inducible
vgf gene. Mol. Cell.
Biol. 17: 1244-1253.
DOOLITTLE, R. E, D.-E FENG, M. S. JOHNSON, and M. A. McCLURE. 1989. Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64:1-30.
EICKEXJSH,D. G., W. C. LATHE III, M. I? FRANCINO, and T. H.
EICKBUSH. 1995. Rl and R2 retrotransposable
elements of
Drosophila evolve at rates similar to those of nuclear genes.
Genetics 139:685-695.
EICKEXJSH,T. H. 1992. Transposing without ends: the non-LTR
retrotransposable
elements. New Biol. 4:430-440.
-.
1994. Origin and evolutionary relationships of retroelements. Pp. 121-157 in S. S. MORSE, ed. The evolutionary biology of viruses. Raven Press, New York.
FANNING, T. G., and M. E SINGER. 1987. LINE-l: a mammalian transposable
element. Biochim. Biophys. Acta 910:
203-212.
1216
Kajikawa
et al.
FELSENSTEIN,
J. 1995.PHYLIP (phylogeny inference package).
Version 3.57~. University of Washington, Seattle.
FENG, Q., J. V. MORAN, H. H. KAZAZIAN,and J. D. BOEKE.
1996. Human Ll retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905-916.
GENETTA,T., D. RUEZINSKY,and T. KADESCH.1994. Displacement of an E-box-binding repressor by basic helix-loophelix proteins: implications for B-cell specificity of the immunoglobulin heavy-chain enhancer. Mol. Cell. Biol. 14:
6153-6163.
GORELICK,R. J., L. E. HENDERSON,J. I? HANSER,and A. REIN.
1988. Point mutants of Moloney murine leukemia virus that
fail to package viral RNA: evidence for specific RNA recognition by a “zinc finger-like” protein sequence. Proc.
Natl. Acad. Sci. USA 85:8420-8424.
HACHB, R. J. G., and R. G. DEELEY. 1988. Organization, sequence and nuclease hypersensitivity of repetitive elements
flanking the chicken apoVLDLI1 gene: extended sequence
similarity to elements flanking the chicken vitellogenin
gene. Nucleic Acids Res. 16:97-l 13.
HATTORI, M., S. KUHARA, 0. TAKENAKA,and Y. SAKAKI.
1986. Ll family of repetitive DNA sequences in primates
may be derived from a sequence encoding a reverse transcriptase-related protein. Nature 321:625-628.
I-&IX, J., J. C. B. M. ZOMERDIJK,A. RAVANPAY,R. TJIAN, and
I. GRUMMT. 1997. Cloning of murine RNA polymerase Ispecific TAF factors: conserved interactions between the
subunits of the species-specific transcription initiation factor
TIF-IB/SLl. Proc. Natl. Acad. Sci. USA 94:1733-1738.
HOHJOH,H., and M. E SINGER. 1996. Cytoplasmic ribonucleoprotein complexes containing human LINE-l protein and
RNA. EMBO J. 15:630-639.
HOWE, K. M., C. E L. REAKES,and R. J. WATSON. 1990. Characterization of the sequence-specific interaction of mouse cmyb protein with DNA. EMBO J. 9:161-169.
HUTCHISON,C. A. III, S. C. HARDIES,D. D. LOEB, W. R. SHEHEE, and M. H. EDGELL. 1989. LINES and related retroposons: long interspersed repeated sequences in the eucaryotic
genome. Pp. 593-617 in D. E. BERG and M. M. HOWE,eds.
Mobile DNA. American Society for Microbiology, Washington, D.C.
INA, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol.
40: 190-226.
JAKUBCZAK,J. L., Y. XIONG, and T H. EICKBUSH.1990. Type
I (Rl) and type II (R2) ribosomal DNA insertions of Drosophila melanogaster are retrotransposable elements closely
related to those of Bombyx mori. J. Mol. Biol. 212:37-52.
KUROSE, K., K. HATA, M. HATTORI, and Y. SAKAKI. 1995.
RNA polymerase III dependence of the human Ll promoter
and possible participation of the RNA polymerase II factor
YYl in the RNA polymerase III transcription system. Nucleic Acids Res. 23:3704-3709.
LEETON,I? R. J., and D. R. SMYTH. 1993. An abundant LINElike element amplified in the genome of Lilium speciosum.
Mol. Gen. Genet. 237:97-104.
LI, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for
estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174.
LOEB, D. D., R. W. PADGETT,S. C. HARDIES,W. R. SHEHEE,
M. B. COMER, M. H. EDGELL,and C. A. HUTCHISONIII.
1986. The sequence of a large LlMd element reveals a tandemly repeated 5’ end and several features found in retrotransposons. Mol. Cell. Biol. 6: 168-182.
LUAN, D. D., and T H. EICKBUSH.1995. RNA template reauirements for target DNA-mimed reverse transcrintion bv
the R2 retrotransposable element. Mol. Cell. Biol. 15:38823891.
LUAN, D. D., M. H. KORMAN,J. L. JAKUBCZAK,and T. H.
EICKBUSH.1993. Reverse transcription of R2Bm RNA is
primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595-605.
MCLEAN, C., A. BUCHETON,and D. J. FINNEGAN.1993. The
5’ untranslated region of I factor, a long interspersed nuclear
element-like retrotransposon of Drosophila melanogaster,
contains an internal promoter and sequences that regulate
expression. Mol. Cell. Biol. 13: 1042-1050.
MARTIN, E, M. OLIVARES,and M. C. LOPEZ. 1996. Do nonlong terminal repeat retrotransposons have nuclease activity? Trends Biochem. Sci. 21:283-285.
MATHEWS,D. H., A. R. BANERJEE,D. D. LUAN, T. H. EICKBUSH,and D. H. TURNER.1997. Secondary structure model
of the RNA recognized by the reverse transcriptase from
the R2 retrotransposable element. RNA 3:1-16.
MINAKAMI,R., K. KUROSE,K. ETOH, Y. FURUHATA,M. HATTORI, and Y. SAKAKI.1992. Identification of an internal ciselement essential for the human Ll transcription and a nuclear factor(s) binding to the element. Nucleic Acids Res.
20:3139-3145.
MINCHIOTTI,G., C. CONTURSI,and P P DI NOCERA. 1997.
Multiple downstream promoter modules regulate the transcription of the Drosophila melanogaster I, Dot and F elements. J. Mol. Biol. 267:37-46.
MIZROKHI, L. J., S. G. GEORGIEVA,and Y. V. ILYIN. 1988.
Jockey, a mobile Drosophila element similar to mammalian
LINES, is transcribed from the internal promoter by RNA
polymerase II. Cell 54:685-69 1.
MORAN,J. V., S. E. HOLMES,T. l? NAAS, R. J. DEBERARDINIS,
J. D. BOEKE, and H. H. KAZAZIAN.1996. High frequency
retrotransposition in cultured mammalian cells. Cell 87:
917-927.
MOUCHBS,C., N. BENSAADI,and J.-C. SALVADO.1992. Characterization of a LINE retroposon dispersed in the genome
of three non-sibling Aedes mosquito species. Gene 120:
183-190.
MURRE, C., G. BAIN, M. A. VAN DIJK, I. ENGEL, B. A. FURNARI,M. E. MASSARI,J. R. MATTHEWS,M. W. QUONG,R.
R. RIVERA,and M. H. STUIVER.1994. Structure and function of helix-loop-helix proteins. Biochim. Biophys. Acta
1218:129-135.
MURRE, C., l? S. MCCAW, and D. BALTIMORE.1989. A new
DNA binding and dimerization motif in immunoglobulin
enhancer binding, daughterless, MyoD, and myc proteins.
Cell 56:777-783.
MURRE, C., l? S. MCCAW, H. VAESSINet al. (12 co-authors).
1989. Interactions between heterologous helix-loop-helix
proteins generate complexes that bind specifically to a common DNA sequence. Cell 58:537-544.
O’HARE, K., M. R. K. ALLEY,T. E. CULLINGFORD,
A. DRIVER,
and M. J. SANDERSON.1991. DNA sequence of the Dot
retroposon in the white-one mutant of Drosophila melanogaster and of secondary insertions in the phenotypically
altered derivatives white-honey and white-eosin. Mol. Gen.
Genet. 225: 17-24.
OHSHIMA,K., M. HAMADA,Y. TERAI, and N. OKADA. 1996.
The 3’ ends of tRNA-derived short interspersed repetitive
elements are derived from the 3’ ends of long interspersed
repetitive elements. Mol. Cell. Biol. 16:37563764.
OHSHIMA,K., R. KOISHI, M. MATSUO, and N. OKADA. 1993.
Several short interspersed repetitive elements (SINES) in
distant species may have originated from a common ancestral retrovirus: characterization of a squid SINE and a pos-
Determination
sible mechanism for generation of tRNA-derived
retroposons. Proc. Natl. Acad. Sci. USA 90:6260-6264.
OHTA, T. 1995. Synonymous and nonsynonymous
substitutions
in mammalian genes and the nearly neutral theory. J. Mol.
Evol. 40:56-63.
OKADA, N. 1991a. SINES. Curr. Opin. Genet. Dev. 1:498-504.
1991b. SINES: short interspersed repeated elements of
the eukaryotic genome. Trends Ecol. Evol. 6:358-361.
OKADA, N., M. HAMADA, I. OGIWARA, and K. OHSHIMA. 1997.
SINES and LINES share common 3’ sequences: a review.
Gene (in press).
OKADA, N., and K. OHSHIMA. 1995. Evolution of tRNA-derived SINES. Pp. 61-79 in R. J. MARAIA, ed. The impact
of short interspersed elements (SINES) on the host genome.
R. G. Landes Company, Austin, Tex.
OSIEWACZ, H. D., and K. ESSER. 1984. The mitochondrial plasmid of Podosporu anserina: a mobile intron of a mitochondrial gene. Cm-r. Genet. 8:299-305.
PRIIM;~GI, A. E, L. J. MIZROKHI, and Y. V. ILYIN. 1988. The
Drosophila mobile element jockey belongs to LINES and
contains coding sequences homologous to some retroviral
proteins. Gene 70:253-262.
ROGERS, J. H. 1985. The structure and evolution of retroposons. Int. Rev. Cytol. 93:231-279.
SAITOU, N., and M. NEI. 1987. The neighbor-joining
method:
a new method for reconstructing
phylogenetic
trees. Mol.
Biol. Evol. 4:406-425.
SANCHEZ-GARCIA, I., and T. H. RABBITTS. 1994. The LIM domain: a new structural motif found in zinc-finger-like
proteins. Trends Genet. 10:315-320.
SCHWARZ-SOMMER, Z., L. LECLERCQ, E. G~BEL, and H. SAEDLER. 1987. Cin4, an insert altering the structure of the Al
gene in Zeu muys, exhibits properties of nonviral retrotransposons. EMBO J. 6:3873-3880.
SHEEN, F-M., and R. W. LEVIS. 1994. Transposition
of the
LINE-like retrotransposon
TART to Drosophila chromosome termini. Proc. Natl. Acad. Sci. USA 91: 12510-12514.
SILVA, R., and J. B. E. BURCH. 1989. Evidence that chicken
CR1 elements represent a novel family of retroposons. Mol.
Cell. Biol. 9:3563-3566.
of the Entire Sequence
of Turtle CR1
12 17
SMIT, A. E A. 1996. The origin of interspersed repeats in the
human genome. Cur-r. Opin. Genet. Dev. 6:743-748.
SMIT, A. E A., G. TOTH, A. D. RIGGS, and J. JURKA. 1995.
Ancestral, mammalian-wide
subfamilies of LINE- 1 repetitive sequences. J. Mol. Biol. 246:401-417.
STUMPH, W. E., P KRISTO, M.-J. TSAI, and B. W. O’MALLEY.
1981. A chicken middle-repetitive
DNA sequence which
shares homology with mammalian ubiquitous repeats. Nucleic Acids Res. 9:5383-5397.
SWERGOLD, G. D. 1990. Identification,
characterization,
and
cell specificity of a human LINE-l promoter. Mol. Cell.
Biol. lo:67 18-6729.
UDOMKIT, A., S. FORBES, G. DALGLEISH, and D. J. FINNEGAN.
1995. BS a novel LINE-like element in Drosophilu melunoguster. Nucleic Acids Res. 23: 1354-1358.
VANDERGON,T. L., and M. REITMAN. 1994. Evolution of chicken repeat 1 (CRl) elements: evidence for ancient subfamilies and multiple progenitors. Mol. Biol. Evol. 11:886-898.
WEINER, A. M., I? L. DEININGER, and A. EFSTRATIADIS. 1986.
Nonviral retroposons: genes, pseudogenes, and transposable
elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:63 1-661.
WEINTRAUB, H., R. DAVIS, D. LOCKSHON, and A. LASSAR.
1990. MyoD binds cooperatively
to two sites in a target
enhancer sequence: occupancy of two sites is required for
activation. Proc. Natl. Acad. Sci. USA 87:5623-5627.
WHITCOMB, J. M., and S. H. HUGHES. 1992. Retroviral reverse
transcription and integration: progress and problems. Annu.
Rev. Cell Biol. 8:275-306.
XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of
retroelements
based upon their reverse transcriptase
sequences. EMBO J. 9:3353-3362.
-.
1993. Dong, a non-long terminal repeat (non-LTR)
retrotransposable
element from Bombyx mori. Nucleic Acids Res. 21:1318.
ZIMMERLY, S., H. Guo, l? S. PERLMAN, and A. M. LAMBOWITZ.
1995. Group II intron mobility occurs by target DNAprimed reverse transcription. Cell 82:545-554.
MITIKO Go, reviewing
Accepted
August
editor
14, 1997