* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Determination of the entire sequence of turtle CR1: the first open
Epitranscriptome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transcription factor wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Expanded genetic code wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genomic library wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Non-coding RNA wikipedia , lookup
Genome evolution wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of RNA biology wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Microsatellite wikipedia , lookup
Sequence alignment wikipedia , lookup
Transposable element wikipedia , lookup
Genetic code wikipedia , lookup
Metagenomics wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Human genome wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome editing wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Determination of the Entire Sequence of Turtle CRl: The First Open Reading Frame of the Turtle CR1 Element Encodes a Protein with a Novel Zinc Finger Motif Masaki Kajikawa, Faculty of Bioscience Kazuhiko Ohshima, and Norihiro and Biotechnology, Okada Tokyo Institute of Technology, Japan CR1 elements are a family of retroposons. They are classified as long interspersed elements (LINES) or non-longterminal-repeat (non-LTR) retrotransposons, and they have been found in the genomes of many vertebrates. However, they have been only partially characterized, and only a 2-kb region of the 3’ end of chicken CR1 has been sequenced. In the present study, we determined the entire consensus sequence of CR1 elements in the turtle genome, designated PsCRl. The first open reading frame (ORFl) of PsCRl has two unusual arrangements of Cys residues. One of them includes a zinc finger motif, CX,CX,,CX&. The putative zinc finger has cysteine residues with identical spacing and a similar amino acid composition to those found in the species-specific transcription initiation factors SLl and TIF-IB. The 5’ untranslated region (5’ UTR) of PsCRl contains a sequence similar to part of the human Ll promoter, Ll site A, and several cis elements of the type found in eukaryotic genes. Within a region of about 500 bp, there are nine “E boxes,” cis elements that are recognized by the basic helix-loop-helix (bHLH) family of proteins. This observation raises the possibility that cellular transcription factors that bind to these sequences might act in concert to regulate the expression of PsCRl . The extent of the sequence divergence of the 3 ’ UTR of CR1 between species was found to be lower than the rate of nonsynonymous substitutions per site in 0RF2, suggesting that a strict functional constraint must exist for this region. This result strongly suggests that the conserved 3’-end sequence of CR1 is the recognition site for the reverse transcriptase of CRl. A discussion is presented of a possible mechanism for the integration of CR1 elements and also of the intriguing possible recruitment of the reverse transcriptase for the retroposition of SINES. Introduction The reverse flow- of genetic information from RNA to DNA is known as retroposition, and each transposed informational element is known as a retroposon (Rogers 1985; Weiner, Deininger, and Efstratiadis 1986). Retroposons that encode a reverse transcriptase (RTase) for replication of their genomes can be divided into three groups, namely, non-long-terminal-repeat (non-LTR) retrotransposons (also known as LINES; hereafter this nomenclature will be used), LTR retrotransposons, and retroviruses (Fanning and Singer 1987; Doolittle et al. 1989; Eickbush 1994; Smit 1996). Retroviruses and LTR retrotransposons replicate their genomes via a complex reverse-transcription process, and the corresponding mechanism for retrotransposition is well understood (Boeke and Chapman 1991; Whitcomb and Hughes 1992). By contrast, the mechanism responsible for retrotransposition of LINES remains to be fully elucidated (Eickbush 1994). An essential step in the retrotransposition of LINES is their initial transcription. Several LINES, such as Drosophila jockey and I and human LINE-l (Ll), have been shown to have promoter sequences within the 5’-end regions of LINES that can initiate transcription at the first nucleotide of the element (Mizrokhi, Georgieva, and Ilyin 1988; Swergold 1990; Minakami et al. 1992; McLean, Bucheton, and Finnegan 1993; Minchiotti, Contursi, and Di Nocera 1997). Drosophila jockey is transcribed by RNA polymerase II via an internal proKey words: CR1 SLIDIF-IB, SINE. element, LINE, retrotransposon, zinc finger, Address for correspondence and reprints: Norihiro Okada, Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, 226, Japan. E-mail: [email protected]. Mol. Biol. Evol. 14(12):1206-1217. 1997 0 1997 by the Society for Molecular Biology 1206 and Evolution. ISSN: 0737-4038 moter (Mizrokhi, Georgieva, and Ilyin 1988), while the human Ll promoter is pol III-dependent (Kurose et al. 1995). It has been suggested, in contrast, that LINES that have evolved with target-site specificity must be inserted adjacent to a reliable exogenous promoter sequence for their transcription (Eickbush 1994). Thus, the details of the molecular mechanisms of transcription of most LINES remain to be defined. Additional critical steps in retrotransposition are reverse transcription and integration. Most LINE elements are truncated at various positions in their 5’ regions, the lengths of which range from 100 to 1,000 bp (Hutchison et al. 1989; Eickbush 1994). The existence of truncated forms indicates that an RTase encoded by a LINE must recognize the 3’ end of the RNA template and that it might use the free 3’ ends of breaks in chromosomal DNA as primers for initiation of first-strand synthesis (Schwarz-Sommer et al. 1987; Eickbush 1992). This model was verified by Luan et al. in elegant experiments with R2Bm of Bombyx mori (Luan et al. 1993; Luan and Eickbush 1995). In the case of R2Bm, the R2 protein makes a specific nick in one of the DNA strands at the insertion site in the 28s rRNA gene and uses the 3’ hydroxyl group exposed by this nick to prime reverse transcription of its RNA transcript. Furthermore, the recent finding that the reverse transcription of a group II intron, a12, of yeast mitochondrial DNA is also accomplished by analogous target-DNA-primed reverse transcription supports the generality of such a mechanism (Zimmerly et al. 1995). However, in contrast to the results for R2Bm, it was shown recently that the 3’ untranslated region (UTR) of human Ll is not essential for its retrotransposition in cultured mammalian cells (Moran et al. 1996). Therefore, it remains essential to determine how the RTases of Ll and other LINES recognize their RNA templates. Determination In R2Bm, a protein with both sequence-specific endonuclease activity and RTase activity is encoded by a single open reading frame (ORF) (Luan et al. 1993). Several LINES, including human Ll, encode an endonuclease-like domain in the second ORF, which resembles amino acid sequences of AP endonucleases (Feng et al. 1996; Martin, Olivares, and Ldpez 1996). The encoded endonucleolytic activity of human Ll has been verified biochemically (Feng et al. 1996). These findings raise the possibility that generation of a nick at the target site and reverse transcription might be coupled, even in the case of LINES that have no apparent target site specificity. CR1 elements were first described as members of a SINE family (Stumph et al. 1981). The number of CR1 elements in the chicken genome was estimated as 7,000-30,000 from the results of hybridization experiments (Stumph et al. 1981; Burch, Davis, and Haas 1993) and as 100,000 by sequence analysis (Vandergon and Reitman 1994). Since most members of this family were extensively truncated at their 5’ ends, no ORF with significant similarity to ORFs that encode known polypeptides was identified (HachC and Deeley 1988). CR1 elements were subsequently detected in representatives of nine orders that encompass a wide spectrum of species in the class Aves (Chen et al. 1991). More recently, long members of the CR1 family encoding an ORF segment were isolated, and the consensus sequence of CR1 elements was extended for up to 2,200 bp from the 3’ end (Burch, Davis, and Haas 1993). In view of the fact that CR1 elements have common 3’ ends and variable 5’ truncations and of the finding that they contain a pollike ORF, Burch, Davis, and Haas (1993) concluded that CR1 elements are members of a LINE family. The consensus sequence of 2.2 kb of CR1 was estimated to correspond to roughly half of the entire length of CRl. Burch, Davis, and Haas (1993) also found that CRl-like sequences in mouse and human (mammals), a frog (an amphibian), and a ray (a cartilaginous fish) had been deposited in DNA databases. Vandergon and Reitman ( 1994) detected sequences similar to avian CR1 elements in a lizard (a reptile). In addition, they noted the similarity between the avian CR1 element and the tortoise SINE. Therefore, CRl-like elements appear to exist in all classes of vertebrates. As far as we know, the CR1 element provides the only example of a LINE family with phylumwide distribution. We recently reported the isolation of CRl-like elements with 5’ truncations from the turtle genome (Ohshima et al. 1996). We showed that the sequence at the 3’ end of tortoise SINES was identical to that of the CR1 element in the turtle genome (Ohshima et al. 1996). This result suggested that the tortoise SINES might have borrowed enzymes for their retroposition from the CR1 elements in the turtle genome (Ohshima et al. 1996; see also a recent review by Okada et al. 1997). In the present work, we determined the entire consensus sequence of CR1 in the turtle genome in order to obtain further information about the sharing of retropositional machinery between SINES and LINES. of the Entire Sequence of Turtle CR1 Materials and Methods Determination of the Consensus Sequence Elements by Genomic DNA Walking 1207 of PsCRl To determine the 5’ upstream sequence from the 5’ end of 4-a, which contains 2.1 kb of the CR1 element from the 3’ end (Ohshima et al. 1996), we employed the method of genomic DNA walking that is known as cassette PCR. In principle, the method was employed as described in our previous study (Ohshima et al. 1996), with modifications as follows. Restriction enzymes, EcoRI, Hi&III or Pst I, and the corresponding cassettes were purchased from Takara (Shiga, Japan). Fragments of approximately 5OO-1,000 bp, obtained as products of PCR, were isolated from an agarose gel and fractionated on SizeSepTM 400 Spun Columns (Pharmacia, Uppsala, Sweden) to remove fragments shorter than 400 bp. Longer fragments were ligated into the pUC 18 or pUC 19 vector, and then the nucleotide sequences of the cloned DNAs were determined. The consensus sequence of PsCRl was determined from these sequences. We repeated this series of experiments eight times and determined the entire consensus sequence of PsCRl. For the last 2.1-kb region of the 3’ end of PsCRl, we constructed the consensus sequence from sequences of several genornic clones that had been isolated from a genomic library of the side-necked turtle (Phcomplementary to 4temys spixii) with oligonucleotides 2(Ps) as probes. Designations of Nucleotides Sequence of PsCRl in the Consensus The nucleotide of every position in the consensus sequence of PsCRl was determined from the results obtained from at least three clones. When two nucleotides were present predominantly at a certain position in the consensus sequence, the nucleotides were represented by the two-base ambiguity code (IUB single-letter code): K, M, R, S, W or Y. When dinucleotides in the consensus sequence were CG, TG, or CA between clones, the consensus sequence of the dinucleotide was shown to be CG. The divergence might possibly have resulted from methylation and subsequent deamination. The entire consensus sequence of PsCRl has been deposited in the DDBJ, EMBL, and GeneBank nucleotide databases with accession number ABOO 1. Search of Databases and Phylogenetic Analysis A search was made for sequences homologous to PsCRl both at the nucleotide level and at the amino acid level using the BLAST program (Altschul et al. 1990). Construction of a phylogenetic tree and calculation of bootstrap values were performed using programs in the PHYLIP package (Felsenstein 1995). Estimation of the Copy Number of PsCRl Dot blot analysis was performed to estimate the copy numbers of PsCRl with various 5’ truncations. Progressively decreasing amounts of genomic DNA from P. @xii and of cloned DNA were dotted on a membrane. Four kinds of probe were prepared as follows. A DNA fragment of approximately 160 bp was 1208 Kajikawa et al. A 0 I I 1 I 2 I I I 3 I 1 4 I kb 6 5 ’UTR I I ORFI I ORF2 3’UTR I I 3’derminal r repeats 4393 4480 bp FIG. l.-Determination respective clones of PsCRl for PsCRl was determined. of the entire consensus sequence of CR1 elements from the turtle, designated PsCRl. A, Nucleotide sequences of with various 5’ truncations (represented by bars) were determined. From these sequences, the consensus sequence B, Schematic representation of PsCRl (see text for details). amplified by PCR, with a cloned DNA that was nearly identical to the consensus sequence of PsCRl as template and a set of primers, for amplification of particular regions of PsCRl, such as nucleotides 3 lo-47 1, 103 l1197, 2354-25 11, and 3600-3774, respectively, in the presence of [a-32P]dCTF? Hybridization was performed at 42°C in 50% formamide. Washing was performed in a solution of 2 X SSC and 1% SDS at 55°C for 60 min. From comparisons of the intensities of spots obtained with the genomic DNA and the cloned DNA, we were able to estimate the copy number. The haploid genome of the turtle was assumed to contain 2 X lo9 bp. Results and Discussion Determination of the Entire Consensus the CR1 Elements in the Turtle Sequence of In a previous study, we isolated CRl-like elements from the turtle genome (Ohshima et al. 1996). One clone, designated 4-I, exhibited extensive similarity to chicken CR1 in the region of the 2.1-kb EcoRI fragment (64% similarity over the entire 2.1 kb). We tried to determine the farther-upstream sequence of CR1 elements to characterize the entire structure of this family. Because these elements have 5’ truncations at various positions and, moreover, because only 2 kb of the sequence of chicken CR1 had been reported from the 3’ end, we adopted the strategy of gradual extension in the 5’ direction. First, the genomic DNA of the side-necked turtle was digested with a restriction enzyme, and cassettes were ligated to the fragments. Next, we synthesized oligonucleotides complementary to the turtle CR1 sequence that had already been determined. Using these oligonucleotides and those complementary to the cassette, we performed nested PCR. Products of PCR were cloned and their sequences were determined. The clones that we obtained had variable 5’ truncations, and we determined the consensus sequence from these sequences (fig. 1A). We repeated this series of experiments until we had constructed turtle CR 1 elements. the entire consensus The First ORF of Turtle CR1 Encodes a Novel Zinc Finger Motif sequence a Protein of with Figure 1B shows the structure of turtle CRl, designated PsCRl (Ps stands for Plutemys spixii). PsCRl contains two overlapping ORFs. ORFl begins at a position 474 bp from the 5’ terminus and encodes a protein of 334 amino acids from the first ATG codon. To date, more than 30 full-length sequences of LINES have been determined. The sequences generally encode one or two type(s) of cysteine-rich motif. One motif is CX2CX4HX4C, which is characteristic of retroviral gag genes and has been identified in ORFl of many of the LINES described to date (Jakubczak, Xiong, and Eickbush 1990; Leeton and Smyth 1993). The gag protein is a nucleocapsid protein, and the zinc-fingerlike motif (Berg 1986, 1990; Sanchez-Garcia and Rabbitts 1994) in the gag protein is essential for the specific packaging of viral RNA (Gorelick et al. 1988). Although most LINES encode this motif, several LINES, such as Ll (Hohjoh and Singer 1996), Dong (Xiong and Eickbush 1993) and R4 (Burke, Miiller, and Eickbush 1995), do not. Another cysteine-rich motif is CX1_3CX7_8 HX4C, which has been identified at the carboxyl terminus of the protein encoded by 0RF2 of many LINES downstream of the reverse transcriptase domain (Jakubczak, Xiong, and Eickbush 1990; Leeton and Smyth 1993). The function of this cysteine-rich motif is currently unknown. Several LINES, such as F (Di Nocera and Casari 1987), Jockey (Priimagi, Mizrokhi, and Ilyin 1988), Dot (O’Hare et al. 1991), Juan-A (Mouches, Bensaadi, and Salvado 1992), Juan-C (Agarwal et al. 1993), NLRlCth (Blinov et al. 1993), TART (Sheen and Levis 1994), and BSl (Udomkit et al. 1995), lack this motif. At present, it appears that only a few LINES, such as Tl (Besansky 1990) and Q (Besansky, Bedell, and Mukabayire 1994), lack both these motifs. Determination of the Entire Sequence of Turtle CR1 1209 A ORFI 1 11 cx,,cx,,c~,~~ 120 - 11 Zinc finger motif CX2CXj4CX2C 11 217 334 aa cx&&x,,c 32 B P&R1 SLl (hTAF,63) TIF-I6 (mTAF,68) 11 47 13 FIG. 2.-The first ORF (ORFl) of PsCRl encodes a protein with a zinc finger motif which resembles that of the species-specific transcription factors SLl and TIF-IB. A, ORFl of PsCRl has cysteine residues with unusual spacings. The constantly spaced cysteine residues are denoted by “C” and the spacings are shown by numbers beside “X,” which stands for any amino acid. At the beginning of the cysteine cluster, from residues 11 to 32, there is a zinc-finger-like motif, CX,CX,,CX,C. The numbers above the line are numbers of residues from the 5’ end of ORFl. B, The zinc-finger-like motif has cysteine residues with identical spacings and a similar amino acid composition to those found in transcription factors SLl (Comai et al. 1994) and TIF-IB (Heix et al. 1997). Identical or similar amino acid residues in the three sequences are shaded. The cysteine residues that can potentially form a zinc finger are emphasized by shaded boxes. The numbers at the beginning and end of the each sequence indicate the numbers of residues from the 5’ end of the deduced protein. PsCRl also lacks both of the two motifs discussed above. Instead, PsCRl has two unusual arrangements of Cys residues. ORFl of PsCRl encodes a protein with constant spacing between Cys residues as follows: CX20CX21CX19C (from residues 11 to 74) and CX&X&X& (from residues 120 to 217). The former motif includes a zinc-finger-like motif, CX2CX14 CX2C (Berg 1986, 1990; Sanchez-Garcia and Rabbitts 1994) (fig. 2A). This zinc-finger-like motif has cysteine residues with identical spacings and a similar amino acid composition to those found in transcription factors SLl (Comai et al. 1994) and TIF-IB (Heix et al. 1997) (fig. 2B). SLl consists of a TATA-binding protein (TBP) and three TBP-associated factors (TAFs). One of the latter factors, TAFi63, contains two putative zinc fingers, CX2CX&X2C and CX,HX1sHX3C (Comai et al. 1994). TAFi63 can be cross-linked to the rDNA promoter, and it has been shown to be involved in the binding of SLl to this promoter (Beckmann et al. 1995). mTAFi68, the murine homolog of human TAFi63, also includes the corresponding zinc finger (fig. 2B). Although this similarity suggests that the putative zinc finger of PsCRl might play a role in DNA binding, such an activity of ORFl proteins has not been demonstrated. It remains to be seen whether the zinc finger of PsCRl has a function in DNA recognition or participates in an alternative method of RNA binding. The Product of 0RF2 of PsCRl Contains a Putative Endonuclease Domain in its Amino-Terminal Region 0RF2 (from position 1475 to position 4393) encodes a protein of 963 amino acids, which starts from the first ATG codon at position 1505. The beginning of 0RF2 overlaps ORFl by 22 bp in the - 1 reading frame. The putative protein product contains the conserved domains found in all reverse transcriptases (Xiong and Eickbush 1990) from residue 509 to residue 773 (fig. 3). The amino acid sequence of this PsCRl RTase indicates that PsCRl is most closely related to a group of LINE families that includes the chicken CR1 (Burch, Davis, and Haas 1993) and the mosquito Tl elements (Besansky 1990). Sequences from Cuenorhabditis eleguns that encode putative reverse transcriptase domains were identified in the nucleic acid database, and they are also closely related to this group of sequences (accession numbers and locations: U46668, F38E9.3; U57054, B0478.2; U64846, F47D2.2; and many others; figs. 3 and 4). These sequences also contain a region that corresponds to the endonuclease domain (see below). We suggest that they belong to a family of LINES in the genome of C. elegans. We shall refer to these elements collectively as CeCRT, which stands for the LINES of c eleguns that resemble ml and _Tl. The amino-terminal amino acid sequence encoded by ORF2 of PsCRl (49-259) was remarkably similar to 1210 Kajikawa et al. PsCRl (turtle) CR1 (chicken) Tl (mosquito) CeCRT (nematode) 509 domain I SSWRLGEVPDDWKlU4NIVPIF~GK+ domain II DPG ti PVSLTSIPGKIMEQVLKESILRHLEER 258 515 549 568 317 574 608 domain III PsCRl (turtle) CR1 (chicken) CR1 -like (frog) Tl (mosquito) CeCRT (nematode) ICVIRSSQHGFTKGKSCLTNLIAFYEEVTGSVD R~~~D.....~R.R......V...~~~~.. E.~.T...~~~LT..~~M.P.H~.... --•SPK..**MP*R*TS***MS*VTNIFR*FE -- .SK.*F**MNSR**TLA*LNACSKILD*LT PsCRl (turtle) CR1 (chicken) CRl-like (frog) Tl (mosquito) CeCRT (nematode) 686 435 79 690 726 domain V domain VI PsCRl (turtle) CR1 (chicken) CR1 -like (frog) Tl (mosquito) CeCRT (nematode) PsCRl (turtle) CR1 (chicken) CR1 -like (frog) Tl (mosquito) CeCRT (nematode) 627 376 20 631 666 744 493 137 749 781 GWRNPMHSYRLGTDELGS..R..R.L...EGAV.E.l-C.NI.L.-.NGTA..KSHSLSPI*FNYTLSNSSLS lKN**KFV*TANGIIIAK- domain VII SS2UXDLGvTv .e.......LM .IM......L. SIR***IIL c KKSV****IF* 773 522 164 780 810 FIG. 3.-Comparison of amino acid sequences in the reverse transcriptase domains encoded by PsCRl and several LINES that gave the highest homology scores in this region. Dots denote amino acids identical to those in the PsCRl product. Amino acids with chemical properties similar to those in the PsCRl product are indicated in boldface. Gaps (-) have been introduced to maximize homology. The conserved domains found in all reverse transcriptases (Xiong and Eickbush 1990) are boxed. The numbers of residues from the 5’ end of 0RF2 are shown. Sources of sequences are as follows: CR1 (Burch, Davis, and Haas 1993), Tl (Besansky 1990), CeCRT (DDBJ/EMBL/GeneBank, accession number U46668; gene location F38E9.3), and frog CRl-like element (a consensus sequence in sequences with accession numbers M24187 and X71067; 5’ truncations are found). the corresponding regions encoded by the Tl element and Q element of Anopheles (Besansky, Bedell, and Mukabayire 1994) and, to a lesser extent, by NLRlCth of Chironomus (Blinov et al. 1993), Juan-C of C&X (Agarwal et al. 1993) and the putative LINE family of C. elegans mentioned above (fig. 5). Recently, it was reported that the corresponding regions encoded by several LINE families, including the Q element, have several domains that are highly homologous to members of the AP endonuclease family and that the active residues of exonuclease III are included in these domains (Martin, Olivares, and Lopez 1996). Figure 5 shows an alignment of the amino acid sequence encoded by PsCRl and the sequences encoded by LINES, together with part of human endonuclease I for comparison. The deduced amino acid sequence encoded by PsCRl corresponds closely to the domains defined by Martin, Olivares, and Lopez (1996), in particular to domains I, II, III, V, VI, VIII, and IX. These similarities suggest the potential endonucleolytic of the PsCRl protein in this region. activity The 5’ Untranslated Region of PsCRl Contains a Sequence that Resembles the Human Ll Promoter and Several cis Elements Found in Eukaryotic Genes The 5’ untranslated region (5’ UTR) of PsCRl is 473 bp long. This region contains three sets of direct repeats (DRs) (fig. 6A). One is 48 bp long with an interval of 304 bp (double underlined), another is 19 bp long with an interval of 19 bp (underlined), and the other is 23 bp long with an interval of 28 bp (dashed line). Several deletions or insertions, ranging from several nucleotides to several dozen nucleotides, were found in the 5’ UTR and, in particular, in the regions of DRs of various clones (not shown). These observations suggest that the region might have undergone frequent recombinational events. Determination a7 89 XICRl Tl I al -pa FIG. 4.-Phylogenetic relationships among products of PsCRl and CeCRT and other LINES. The phylogenetic tree is based on seven amino acid domains that contain a total of 178 residues and have been identified in all reverse transcriptases (Xiong and Eickbush 1990). The tree was constructed by the neighbor-joining method (Saitou and Nei 1987). The numbers above the branches indicate the bootstrap values per 100 replications, which provide an indication of the statistical significance of the nodes. A group II intron was used as an outgroup to root the tree, as described by Burke, Mtiller, and Eickbush (1995). Sources of sequences are as follows: R2Bm (Burke, Calalang, and Eickbush 1987); R2Dm (Jakubczak, Xiong, and Eickbush 1990), LlHs (Hattori et al. 1986), LlMd (Loeb et al. 1986), Jockey (Priimagi, Mizrokhi, and Ilyin 1988), NLRlCth (Blinov et al. 1993), Juan-A (Mouches, Bensaadi, and Salvado 1992), Q (Besansky, Bedell, and Mukabayire 1994), and al-pa (Osiewacz and Esser 1984). The CRllike elements in the genome of Xenopus laevis are designated XlCRl in this figure. r AP ENase I 66 i- domain 11 WmV DmR (7) a- of the Entire Sequence of Turtle CR1 1211 Minakarni et al. (1992) showed that the nucleotide sequence of human Ll from position 3 to position 26 promoted expression of the gene for chloramphenicol acetyltransferase (CAT) in HeLa cells, and they designated this region Ll site A of the human Ll promoter (fig. 6B). In the nucleotide sequence of PsCRl, we found that the sequence from nucleotide (nt) 58 to nt 65 was identical to that of the first eight nucleotides of Ll site A, as shown by a hatched box in figure 6A. This coincidence suggests the presence of a common transcription factor that binds to the corresponding sites of PsCRl and human Ll . However, the nine nucleotides downstream of this site in PsCRl show no significant homology to the corresponding region in Ll site A, which is the target core element for the pol II transcription factor YYl (Becker et al. 1993; Kurose et al. 1995). Therefore, the putative protein that might bind to the common site in PsCRl and human Ll is probably different from YY 1. The region of PsCRl corresponding to the core element for binding of YY 1 to human Ll is replaced exactly by an “E box,” the cis element for binding of the basic helix-loop-helix (bHLH) family of proteins (Murre, McCaw, and Baltimore 1989; Murre et al. 1994), which are regulatory factors essential for determination of cell type, such as members of the MyoD family. These proteins bind as dimers to DNA sequences that generally share the consensus CANNTG (the E box; Murre et al. 1989) (fig. 6B). Putative E boxes, including the one mentioned above, are clustered in the 5’ UTR of PsCRl (boxed in fig. 6A). Within a region of about 500 bp, there are nine E boxes. In addition to the E boxes, other potential binding sites for c-myb (Howe, Reakes, and Watson 1990) are also found in this region (fig. 6A). I domain II ‘-1 D WMK E E A PID[Cm QmK C S E domain Ill -1 (23) KE GYWL S RQ (27) PsCRl CeCRT r AP ENase I domain V -I F VDV T 4-N I A (26) domain VI -1 P~~C~V~lHIE r domain VIII -, E I D (62) mDIY F-1 r (17) p]C domain 1x7 PmTmYm PsCRl 259 CeCRT Q Tl NLRl Cth Juan-C 310 323 220 227 223 FIG. 5.-The amino-terminal region of the deduced product of 0RF2 contains a putative endonuclease domain. The amino acid sequ lence of the amino-terminal region of PsCRl and those encoded by several LINES with strong similarity to PsCRl in this region are aligned. When, of six amino acids at a certain position, at least four amino acids have similar chemical properties, they are highlighted in gray. The sequences are also compared with part of human endonuclease I (AP ENase I). When a residue in human endonuclease I has similar chemical properties to those of residues encoded by LINES that are highlighted in gray, the residue in the endonuclease is also highlighted. The conserved nuclease domains defined by Martin, Olivares, and Lopez (1996) are indicated. Numbers in parentheses are the numbers of amino acid residues between conserved domains. The numbers at the beginning and end of the each sequence are the numbers of residues from the 5’ end of the ORE 1212 Kajikawa et al. A 100 E box 50 GTGCTACRTGAGGGGAGCTGTGTTGTG? AGTGAGCTGYGAACARAGGAGAGGC~CAGAAGGAGTTTGCCTGGGAT~TGTCC 200 c-Myb 150 E box E box GCTAGAGGGGTGAGTATCTGAGAGA~~~TTGACTGGTGC~GTT~~~CTGTGTGTGTGATTGTGACTGGT~~~AGGGACTGTTT~ ..................................................................... 300 E box 250 E box (c-Myb) ~CAGTTG~CCGTGTGTGTGATTGATTGAAAAGTGTGAATGGCACTGAGCY~ .................... ................................................. E box 400 E box (c-Myb) 350 E box GG~~~TTYGAGTCAGCAGCCTTATAAGAAGCAG~~~G~~C~GTG~GCTGC~CAGAGGAGAGGC~CAG~GGAGTTTGCCTGGG~ E box 450 GTTCACCTTGGGGGAGAGCCCAYAGYGGGTTTTTGCCTTTCAGACTTAG~TGAGCAGT~TACA~CATCTG~GAGGCTCTCAGAGG~GA~ B Ll site A PsCRl 5% 81 E box FIG. 6.-The 5’ untranslated region (5’ UTR) of PsCRl contains sequences similar to those in the human Ll promoter and several cis elements that have been found in eukaryotic genes. A, E boxes and several binding sites for transcription factors are found in this region. The 5’ UTR of PsCRl contains three sets of direct repeats, which are indicated by double underlining, underlining, and dashed underlining, respectively. The putative initiation codon is highlighted in black. B, A nucleotide sequence in the 5’ UTR of PsCRl is identical to that of the first eight nucleotides of Ll site A (shaded box). The numbers at the beginning and end of Ll site A are the numbers of residues from the 5’ end of human Ll (Minakami et al. 1992). The region of PsCRl corresponding to the core element for binding of YY 1 in human Ll is replaced exactly by the E box, a cis element for binding of members of the basic helix-loop-helix family of proteins. The biological significance of such sequences in PsCRl is unknown. However, the possibility that cellular transcription factors that bind to these sequences might act in concert to regulate the expression of PsCRl is clearly of interest. Cooperation between different E boxes on the same promoter and cooperative binding of bHLH proteins with another class of transactivators have been generally recognized in the regulation of transcription of tissue-specific genes (Weintraub et al. 1990; Genetta, Ruezinsky, and Kadesch 1994; Di Rocco et al. 1997). 940 The 3’ Untranslated Regions of CR1 from Reptiles and a Bird Exhibit Strong Conservation Among Species Figure 7 shows an alignment of the 3’-end sequences of CR1 elements from reptiles and a bird. The predicted amino acid sequences of the carboxy-terminal regions encoded by ORF2 from these four species (four top lines in fig. 7) seem to be strongly conserved. To determine whether the 3’ UTRs of these CRls might be under some selective constraint, we calculated the nucleotide sequence divergences among the regions 960 950 turtle snake lizard chicken 4350 turtle snake lizard chicken CCT AGG GAG GTG GTG GCA TCT .*A lco . . . UT G.. . . . ca ..c 0.. lcl’ . . . . . . lAG GT* . .c . .A . . . l.‘JJ . . . .AT G.. CCA TCT TTA GiG GTT TTT AA0 CTG CGG CTT . . . A.A ceG .eA . . . . . . . . . -0 0-T 0-C .TG ..c 0-G . . . lCo c.0 ... l.G l.c . . . . . . -A- . . . wc GAC AAA AeA .*G . . . .m A.0 l*G A.. TOG l.T m l*T ACC CTC GCT GGG AT0 CAT TOG T*c .u GGC CAT G.. l.G T-G 0-G lGC’ mc . . . ATT TAG . . . 0.A . . . Go. C.. .GA coo G*c . . . FIG. 7.-The 3’-end sequences of the CR1 elements from reptiles and a bird are compared. Dots indicate nucleotides identical to those in the sequence from turtle. The highly conserved regions in the 3’ UTR are shaded. The 8-bp direct repeat in the 3’ termini are indicated by arrows. In addition, the carboxy-terminal regions are compared. When, of four amino acids at a certain position, at least three amino acids have similar chemical properties, they are boxed. Numbers above the amino acids are the numbers of residues from the 5’ end of the protein encoded by ORF2 of PsCRl. Sources of sequences are as follows: turtle (PsCRl), this paper; snake, a consensus sequence for sequences with accession numbers D31777, D13384, D31782, and D31779 (5’ truncations are found); lizard, accession number L31503 (5’ truncation is found), and chicken (CRl; Burch, Davis, and Haas 1993). Determination Table 1 Rates of Nonsynonymous Element Turtle Turtle. . . . . . Snake...... Lizard . . . . . Chicken. . . . Substitution (&) for the CR1 Snake Lizard Chicken 0.471 +- 0.064 0.359 + 0.052 0.357 + 0.050 0.340 r 0.052 0.491 + 0.067 0.463 + 0.062 NOTE.-& values were calculated by the method of Ina (1995) from the last 288 bp (96 codons) of 0RF2. that encode 0RF2 and the 3’ UTRs from these species that are available (tables 1 and 2). The values for nonsynonymous substitutions per site (&) ranged from 0.34 to 0.47 (table 1). In most cases, the value for synonymous substitutions per site (ds) was saturated (not shown). The dN value for the gene for P-globin in birds and mammals is 0.24 (Li, Wu, and Luo 1985), and that of the gene for P-crystallin is 0.07-0.12 (Aarts et al. 1989). Most protein-encoding genes of mammals have dN values that range from 0.005 to 0.211 (Ohta 1995). The results suggest that CR1 in each lineage has been under selective pressure with respect to expression of the protein product since the value for synonymous substitutions is much higher than that for nonsynonymous substitutions. We were surprised that the value for the sequence divergence of the 3’ UTR between species was even lower than the & value of 0RF2 (table 2). These results suggest the presence of some strict functional constraint in this region. The results also reminded us of results for the R2 elements of Drosophila species. The value for the sequence divergence of the 3’ UTR of R2 elements between species was only twice the dN value and one third of the ds value in the coding region (Eickbush et al. 1995). During integration of R2 elements by the target DNA-primed mechanism, the R2 protein binds specifically to a region in the 3’ UTR of the RNA template to prime reverse transcription (Luan et al. 1993; Eickbush et al. 1995; Luan and Eickbush 1995; Mathews et al. 1997). Our demonstration that the 3’ UTR of CRls has been under strict selective constraint suggests that the conserved 3 ‘-end sequence of CR1 s is also the recognition site for their reverse transcriptase. The Possible Recruitment Tortoise SINE of CR1 Enzymes by the The 3’ end of the chicken CR1 element is defined by the presence of an 8-bp direct repeat, 5’(CATTCTRT)(GATTCTRT)-3’ (Silva and Burch 1989). Almost the same repeat, 5’-(TATTCTAT)(GATTCTAT)3’, is found in reptilian CRls (fig. 7). Chicken CR1 elements have been integrated into preferred target sites that resemble the 3’ repeat units (Silva and Burch 1989). In the present study, we found that the amino-terminal region of the deduced product of 0RF2 of turtle CR1 contains a putative endonuclease domain. The endonucleolytic activity of the product of the second ORF of human L 1 was recently demonstrated biochemically The CR1 endonu(Feng et al. 1996; see Introduction). of the Entire Sequence of Turtle CR1 1213 Table 2 Nucleotide Sequence Divergence of the 3’ UTR of the CR1 Element Turtle Turtle. . . . . Snake...... Lizard . . . . . Chicken. . . . Snake Lizard 0.045 +- 0.033 0.160 ? 0.063 0.205 t 0.072 Chicken 0.139 0.135 0.299 % 0.066 + 0.059 + 0.087 NOTE.-Distances were calculated by the method of Adachi and Hasegawa (1996) on the basis of the region shaded in figure 7 (56 informative sites). clease might cleave sequences that resemble the 3’ repeat units. Then the RTase of CR1 might prime reverse transcription from the free 3’ ends at nicked target sites that can hybridize to a repeat unit within the CR1 transcript. In this process, the conserved sequence in the 3’ UTR of CRl, mentioned above, might provide the recognition site for the RTase on the RNA template. The involvement of the target-DNA-primed mechanism in the reverse transcription of CR1 was first proposed by Burch, Davis, and Haas (1993). We reported recently that the sequence at the 3 ’end of CR1 in the turtle genome is nearly identical to that of a family of tortoise SINES (tortoise Pol III/SINE; Ohshima et al. 1996). SINES are short (approximately 80-400 bp) repetitive elements which have a composite structure with regions homologous to a tRNA region, a tRNA-unrelated region, and an AT-rich region (Okada 1991a, 1991b; Ohshima et al. 1993; Okada and Ohshima 1995). SINES do not encode the enzymes required for their amplification, such as RTases, so they must “borrow” these enzymes from other sources. The general finding that 3’ ends are shared by SINES and LINES has been reinforced by the finding of examples other than the pair of tortoise Pol III/SINE and CRl. Thus, it seems likely that each SINE family recruited the enzymatic machinery for retroposition from the corresponding LINE through a common “tail” sequence (Ohshima et al. 1996; see also a recent review by Okada et al. 1997). As discussed above, the conserved 3’-end sequences of CRls probably serve as the recognition sites for their RTase (fig. 7). It is noteworthy that only the conserved region is shared with the tortoise Pol III/SINE (fig. 8). This observation supports our hypothesis that the tortoise SINE might have acquired retropositional activity by gaining the 3’-end sequence of the CR1 element (Ohshima et al. 1996). However, it should be noted that the 8-bp direct repeat, which is prominent in the 3’-terminal region of CRls, is not found in the 3’-terminal region of the tortoise Pol III/SINE. In the latter case, an AT-rich sequence of variable length is found (not shown). The molecular mechanism responsible for this difference is unknown. The difference might reflect the ability of the CR1 RTase to add “nontemplated” nucleotides to the target DNA before it engages the RNA template (such an activity has been found in R2Bm; Luan and Eickbush 1995), and/or it might reflect the participation of other cellular components in the integration of SINES (Rogers 1985). 1214 Kajikawa et al. ORF2 3’UTR turtle PsCRl tortoise Pol III/SINE 6\ turtle PsCRl tortoise Pol III/SINE I IIIIIIII III IIIIIIII IIIIII IIIIIIII I GGC I II GGAGATTGGTATATCTCCAATTATT 100 150 FIG. S-The sequence conserved at the 3’ ends of CR1 elements from several species is also found in the tortoise Pol III/SINE. Structures of PsCRl and tortoise Pol III/SINE are shown schematically (top). The common sequence in PsCRl and tortoise Pol III/SINE is denoted by boxes with oblique shading. The nucleotide sequences of PsCRl and the SINE in this region are compared (bottom). Common sequences are boxed. The region in the 3’ UTR of the CR1 element that is strongly conserved among species (fig. 7) is shaded. The PsCRl Elements Form at Least Two Subfamilies In general, LINE elements have frequent 5 ’ truncations of various lengths (Hutchison et al. 1989; Eickbush 1994). To estimate the copy numbers of PsCRl elements of various lengths, we performed dot blot analysis using several probes that corresponded to distinct blocks of PsCRl (fig. 9). We estimated that, in each haploid genome, about 400 copies of CR1 were nearly full length (4,000 bp), whereas about 10,000 copies of CR1 were truncated at positions as far as 3,500 bp from their 5’ ends. It seems that more than 40% of elements of PsCRl extend as much as 2 kb from their 3’ ends. Copies 12’ooo1 L n 10,000 - 8,000 - 6,000 - 4,000 - 2,000 - 1 -~ n n 1 Ia 4 n I 3 I I 2 t I 1 t I 0 kb FIG. 9.-Estimated copy numbers of PsCRl elements with various 5’ truncations in the turtle genome. The vertical axis represents the haploid copy number, and the horizontal axis represents the position (in kb) of the probe used in the dot blot analysis. The 3’ end of PsCRl is at the right. Each point indicates the total copy number of PsCRl elements longer than the indicated length on the horizontal axis with various 5’ truncations. This result contrasts with the result for chicken CRl: only 0.1% of elements of chicken CR1 extend as much as 2 kb from their 3’ ends (Burch, Davis, and Haas 1993). During the course of our efforts to construct the consensus sequence of PsCRl , we found that PsCRl can be divided into two subfamilies on the basis of correlated changes in particular nucleotides which can be considered diagnostic nucleotides (table 3; Smit et al. 1995). The two subfamilies can be distinguished from each other in terms of 10 diagnostic nucleotides in a region of approximately 400 bp that corresponds to part of the region that encodes the RTase. Among the 10 sites, 4 substitutions result in changes in amino acids. The values for the nucleotide divergences between members of the type I subfamily, as determined by pairwise comparison, range from 3.9% to 12.8% (average 9.5%), and those of the type II subfamily range from 14.2% to 20.3% (average 17.1%), suggesting that type II is older than type I. Chicken CR1 was classified previously into six subfamilies, designated A through E by phylogenetic analysis (Vandergon and Reitman 1994). Some CR1 elements from avian species other than chicken, such as duck, were grouped with members of different subfamilies from the chicken and not with members of the respective species, demonstrating that multiple subfamilies must have existed early in the avian evolution. We examined the relationships of the subfamilies of PsCRl to the chicken subfamilies. Phylogenetic analysis indicated that the two subfamilies of PsCRl are more closely related to each other than to any subfamilies in the chicken and, moreover, that the turtle CR1 lineage might have diverged at an early time from the avian CR1 lineage, even though the statistical significance of results was not particularly high (not shown). The CR1 s of reptiles and birds might have evolved from a few ancestral elements in the genome of a progenitor common to reptiles and birds and retained their identity during the course of their respective host’s divergence, which is estimated to have occurred more than 250 MYA, with the generation of multiple lineages of descendants in each species. Determination Table 3 Two Subfamilies of PsCRl Elements Can Be Distinguished of the Entire Sequence on the Basis of Diagnostic of Turtle CR1 1215 Nucleotides POSITIONS CLASS Type I CLONENAME .... Type II. ... Consensus CR1 4-2 Ps 5-3 Ps 2-7 Ps 2-6 Consensus Ps 4-5 Ps 2-3 Ps 4-3 Ps 2-2 Ps 2-4 3030 3061 3091 3094 3106 3146 3199 3222 3229 3442 G C C G .b 0 0 0 A A A A A A 0 0 0 A 0 A 0 G 0 0 0 0 0 0 A 0 0 G 0 0 0 A 0 T 0 0 T T T 0 A 0 0 0 0 G G G G G G 0 0 A A A A G G G G T T T 0 T T T T T T 0 A A A A A A 0 G G G G G G G G G G a “Position” indicates the number of residues from the 5’end of PsCR1. bDots indicate nucleotidesidentical to those in the consensus sequence of type I elements. Acknowledgments The authors thank Ying Cao for calculations of nucleotide sequence divergence of CR1 elements and Takesi Sasayama for sequencing the SINES from the softshelled turtle. The authors also thank Dr. Ren Hirayama for identification of the turtle species. This work was supported by a Grant-in-Aid for Specially Promoted Research from the Ministry of Education, Science, Sports and Culture of Japan. LITERATURE CITED AARTS, H. J. M., E. H. M. JACOBS, G. VAN WILLIGEN, N. H. LUBSEN, and J. G. G. SCHOENMAKERS.1989. Different evolution rates within the lens-specific B-crystallin gene family. J. Mol. Evol. 28:313-321. ADACHI, J., and M. HASEGAWA. 1996. Computer science monograms for molecular phylogenetics based on maximum likelihood. Institute of Statistical Mathematics, Tokyo. AGARWAL, M., N. BENSAADI, J.-C. SALVADO, K. CAMPBELL, and C. MOUCHBS. 1993. Characterization and genetic organization of full-length copies of a LINE retroposon family dispersed in the genome of Culex pipiens mosquitoes. Insect Biochem. Mol. Biol. 23:621-629. ALTSCHUL, S. E, W. GISH, W. MILLER, E. W. MYERS, and D. J. LIPMAN. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. BECKER, K. G., G. D. SWERGOLD, K. OZATO, and R. E. THAYER. 1993. Binding of the ubiquitous nuclear transcription factor YYl to a cis regulatory sequence in the human LINE-l transposable element. Hum. Mol. Genet. 10:16971702. BECKMANN, H., J.-L, CHEN, T. O’BRIEN, and R. TJIAN. 1995. Coactivator and promoter-selective properties of RNA polymerase I TAFs. Science 270:1506-1509. BERG, J. M. 1986. Potential metal-binding domains in nucleic acid binding proteins. Science 232:485-487. 1990. Zinc fingers and other metal-binding domains. J. Biol. Chem. 265:65 13-65 16. BESANSKY, N. J. 1990. A retrotransposable element from the mosquito Anopheles gambiae. Mol. Cell. Biol. 10:863-87 1. BESANSKY, N. J., J. A. BEDELL, and 0. MUKABAYIRE. 1994. Q: a new retrotransposon from the mosquito Anopheles gambiae. Insect Mol. Biol. 3:49-56. BLINOV, A. G., Y. V. SOBANOV, S. S. BOGACHEV, A. I? DONCHENKO, and M. A. FILIPPOVA. 1993. The Chironomus thummi genome contains a non-LTR retrotransposon. Mol. Gen. Genet. 237:412-420. BOEKE, J. D., and K. B. CHAPMAN. 1991. Retrotransposition mechanisms. Curr. Opin. Cell Biol. 3:502-507. BURCH, J. B. E., D. L. DAVIS, and N. B. HAAS. 1993. Chicken repeat 1 elements contain a pal-like open reading frame and belong to the non-long terminal repeat class of retrotransposons. Proc. Natl. Acad. Sci. USA 90:8199-8203. BURKE, W. D., C. C. CALALANG, and T. H. EICKBUSH. 1987. The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme. Mol. Cell. Biol. 7:22212230. BURKE, W. D., E MUELLER,and T. H. EICKBUSH. 1995. R4, a non-LTR retrotransposon specific to the large subunit rRNA genes of nematodes. Nucleic Acids Res. 23:4628-4634. CHEN, Z.-Q., R. G. RITZEL, C. C. LIN, and R. B. HODGE-ITS. 1991. Sequence conservation in avian CR1 : an interspersed repetitive DNA family evolving under functional constraints. Proc. Natl. Acad. Sci. USA 88:5814-5818. COMAI, L., J. C. B. M. ZOMERDIJK, H. BECKMANN, S. ZHOU, A. ADMON, and R. TJIAN. 1994. Reconstitution of transcription factor SLl: exclusive binding of TBP by SLl or TFIID subunits. Science 266: 1966-1972. DI NOCERA, l? P, and G. CASARI. 1987. Related polypeptides are encoded by Drosophila F elements, I factors, and mammalian Ll sequences. Proc. Natl. Acad. Sci. USA 84:58435847. DI Rocco, G., M. PENNUTO, B. ILLI et al. (13 co-authors). 1997. Interplay of the E box, the cyclic AMP response element, and HTF4/HEB in transcriptional regulation of the neurospecific, neurotrophin-inducible vgf gene. Mol. Cell. Biol. 17: 1244-1253. DOOLITTLE, R. E, D.-E FENG, M. S. JOHNSON, and M. A. McCLURE. 1989. Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64:1-30. EICKEXJSH,D. G., W. C. LATHE III, M. I? FRANCINO, and T. H. EICKBUSH. 1995. Rl and R2 retrotransposable elements of Drosophila evolve at rates similar to those of nuclear genes. Genetics 139:685-695. EICKEXJSH,T. H. 1992. Transposing without ends: the non-LTR retrotransposable elements. New Biol. 4:430-440. -. 1994. Origin and evolutionary relationships of retroelements. Pp. 121-157 in S. S. MORSE, ed. The evolutionary biology of viruses. Raven Press, New York. FANNING, T. G., and M. E SINGER. 1987. LINE-l: a mammalian transposable element. Biochim. Biophys. Acta 910: 203-212. 1216 Kajikawa et al. FELSENSTEIN, J. 1995.PHYLIP (phylogeny inference package). Version 3.57~. University of Washington, Seattle. FENG, Q., J. V. MORAN, H. H. KAZAZIAN,and J. D. BOEKE. 1996. Human Ll retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905-916. GENETTA,T., D. RUEZINSKY,and T. KADESCH.1994. Displacement of an E-box-binding repressor by basic helix-loophelix proteins: implications for B-cell specificity of the immunoglobulin heavy-chain enhancer. Mol. Cell. Biol. 14: 6153-6163. GORELICK,R. J., L. E. HENDERSON,J. I? HANSER,and A. REIN. 1988. Point mutants of Moloney murine leukemia virus that fail to package viral RNA: evidence for specific RNA recognition by a “zinc finger-like” protein sequence. Proc. Natl. Acad. Sci. USA 85:8420-8424. HACHB, R. J. G., and R. G. DEELEY. 1988. Organization, sequence and nuclease hypersensitivity of repetitive elements flanking the chicken apoVLDLI1 gene: extended sequence similarity to elements flanking the chicken vitellogenin gene. Nucleic Acids Res. 16:97-l 13. HATTORI, M., S. KUHARA, 0. TAKENAKA,and Y. SAKAKI. 1986. Ll family of repetitive DNA sequences in primates may be derived from a sequence encoding a reverse transcriptase-related protein. Nature 321:625-628. I-&IX, J., J. C. B. M. ZOMERDIJK,A. RAVANPAY,R. TJIAN, and I. GRUMMT. 1997. Cloning of murine RNA polymerase Ispecific TAF factors: conserved interactions between the subunits of the species-specific transcription initiation factor TIF-IB/SLl. Proc. Natl. Acad. Sci. USA 94:1733-1738. HOHJOH,H., and M. E SINGER. 1996. Cytoplasmic ribonucleoprotein complexes containing human LINE-l protein and RNA. EMBO J. 15:630-639. HOWE, K. M., C. E L. REAKES,and R. J. WATSON. 1990. Characterization of the sequence-specific interaction of mouse cmyb protein with DNA. EMBO J. 9:161-169. HUTCHISON,C. A. III, S. C. HARDIES,D. D. LOEB, W. R. SHEHEE, and M. H. EDGELL. 1989. LINES and related retroposons: long interspersed repeated sequences in the eucaryotic genome. Pp. 593-617 in D. E. BERG and M. M. HOWE,eds. Mobile DNA. American Society for Microbiology, Washington, D.C. INA, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40: 190-226. JAKUBCZAK,J. L., Y. XIONG, and T H. EICKBUSH.1990. Type I (Rl) and type II (R2) ribosomal DNA insertions of Drosophila melanogaster are retrotransposable elements closely related to those of Bombyx mori. J. Mol. Biol. 212:37-52. KUROSE, K., K. HATA, M. HATTORI, and Y. SAKAKI. 1995. RNA polymerase III dependence of the human Ll promoter and possible participation of the RNA polymerase II factor YYl in the RNA polymerase III transcription system. Nucleic Acids Res. 23:3704-3709. LEETON,I? R. J., and D. R. SMYTH. 1993. An abundant LINElike element amplified in the genome of Lilium speciosum. Mol. Gen. Genet. 237:97-104. LI, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174. LOEB, D. D., R. W. PADGETT,S. C. HARDIES,W. R. SHEHEE, M. B. COMER, M. H. EDGELL,and C. A. HUTCHISONIII. 1986. The sequence of a large LlMd element reveals a tandemly repeated 5’ end and several features found in retrotransposons. Mol. Cell. Biol. 6: 168-182. LUAN, D. D., and T H. EICKBUSH.1995. RNA template reauirements for target DNA-mimed reverse transcrintion bv the R2 retrotransposable element. Mol. Cell. Biol. 15:38823891. LUAN, D. D., M. H. KORMAN,J. L. JAKUBCZAK,and T. H. EICKBUSH.1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595-605. MCLEAN, C., A. BUCHETON,and D. J. FINNEGAN.1993. The 5’ untranslated region of I factor, a long interspersed nuclear element-like retrotransposon of Drosophila melanogaster, contains an internal promoter and sequences that regulate expression. Mol. Cell. Biol. 13: 1042-1050. MARTIN, E, M. OLIVARES,and M. C. LOPEZ. 1996. Do nonlong terminal repeat retrotransposons have nuclease activity? Trends Biochem. Sci. 21:283-285. MATHEWS,D. H., A. R. BANERJEE,D. D. LUAN, T. H. EICKBUSH,and D. H. TURNER.1997. Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element. RNA 3:1-16. MINAKAMI,R., K. KUROSE,K. ETOH, Y. FURUHATA,M. HATTORI, and Y. SAKAKI.1992. Identification of an internal ciselement essential for the human Ll transcription and a nuclear factor(s) binding to the element. Nucleic Acids Res. 20:3139-3145. MINCHIOTTI,G., C. CONTURSI,and P P DI NOCERA. 1997. Multiple downstream promoter modules regulate the transcription of the Drosophila melanogaster I, Dot and F elements. J. Mol. Biol. 267:37-46. MIZROKHI, L. J., S. G. GEORGIEVA,and Y. V. ILYIN. 1988. Jockey, a mobile Drosophila element similar to mammalian LINES, is transcribed from the internal promoter by RNA polymerase II. Cell 54:685-69 1. MORAN,J. V., S. E. HOLMES,T. l? NAAS, R. J. DEBERARDINIS, J. D. BOEKE, and H. H. KAZAZIAN.1996. High frequency retrotransposition in cultured mammalian cells. Cell 87: 917-927. MOUCHBS,C., N. BENSAADI,and J.-C. SALVADO.1992. Characterization of a LINE retroposon dispersed in the genome of three non-sibling Aedes mosquito species. Gene 120: 183-190. MURRE, C., G. BAIN, M. A. VAN DIJK, I. ENGEL, B. A. FURNARI,M. E. MASSARI,J. R. MATTHEWS,M. W. QUONG,R. R. RIVERA,and M. H. STUIVER.1994. Structure and function of helix-loop-helix proteins. Biochim. Biophys. Acta 1218:129-135. MURRE, C., l? S. MCCAW, and D. BALTIMORE.1989. A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins. Cell 56:777-783. MURRE, C., l? S. MCCAW, H. VAESSINet al. (12 co-authors). 1989. Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence. Cell 58:537-544. O’HARE, K., M. R. K. ALLEY,T. E. CULLINGFORD, A. DRIVER, and M. J. SANDERSON.1991. DNA sequence of the Dot retroposon in the white-one mutant of Drosophila melanogaster and of secondary insertions in the phenotypically altered derivatives white-honey and white-eosin. Mol. Gen. Genet. 225: 17-24. OHSHIMA,K., M. HAMADA,Y. TERAI, and N. OKADA. 1996. The 3’ ends of tRNA-derived short interspersed repetitive elements are derived from the 3’ ends of long interspersed repetitive elements. Mol. Cell. Biol. 16:37563764. OHSHIMA,K., R. KOISHI, M. MATSUO, and N. OKADA. 1993. Several short interspersed repetitive elements (SINES) in distant species may have originated from a common ancestral retrovirus: characterization of a squid SINE and a pos- Determination sible mechanism for generation of tRNA-derived retroposons. Proc. Natl. Acad. Sci. USA 90:6260-6264. OHTA, T. 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:56-63. OKADA, N. 1991a. SINES. Curr. Opin. Genet. Dev. 1:498-504. 1991b. SINES: short interspersed repeated elements of the eukaryotic genome. Trends Ecol. Evol. 6:358-361. OKADA, N., M. HAMADA, I. OGIWARA, and K. OHSHIMA. 1997. SINES and LINES share common 3’ sequences: a review. Gene (in press). OKADA, N., and K. OHSHIMA. 1995. Evolution of tRNA-derived SINES. Pp. 61-79 in R. J. MARAIA, ed. The impact of short interspersed elements (SINES) on the host genome. R. G. Landes Company, Austin, Tex. OSIEWACZ, H. D., and K. ESSER. 1984. The mitochondrial plasmid of Podosporu anserina: a mobile intron of a mitochondrial gene. Cm-r. Genet. 8:299-305. PRIIM;~GI, A. E, L. J. MIZROKHI, and Y. V. ILYIN. 1988. The Drosophila mobile element jockey belongs to LINES and contains coding sequences homologous to some retroviral proteins. Gene 70:253-262. ROGERS, J. H. 1985. The structure and evolution of retroposons. Int. Rev. Cytol. 93:231-279. SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. SANCHEZ-GARCIA, I., and T. H. RABBITTS. 1994. The LIM domain: a new structural motif found in zinc-finger-like proteins. Trends Genet. 10:315-320. SCHWARZ-SOMMER, Z., L. LECLERCQ, E. G~BEL, and H. SAEDLER. 1987. Cin4, an insert altering the structure of the Al gene in Zeu muys, exhibits properties of nonviral retrotransposons. EMBO J. 6:3873-3880. SHEEN, F-M., and R. W. LEVIS. 1994. Transposition of the LINE-like retrotransposon TART to Drosophila chromosome termini. Proc. Natl. Acad. Sci. USA 91: 12510-12514. SILVA, R., and J. B. E. BURCH. 1989. Evidence that chicken CR1 elements represent a novel family of retroposons. Mol. Cell. Biol. 9:3563-3566. of the Entire Sequence of Turtle CR1 12 17 SMIT, A. E A. 1996. The origin of interspersed repeats in the human genome. Cur-r. Opin. Genet. Dev. 6:743-748. SMIT, A. E A., G. TOTH, A. D. RIGGS, and J. JURKA. 1995. Ancestral, mammalian-wide subfamilies of LINE- 1 repetitive sequences. J. Mol. Biol. 246:401-417. STUMPH, W. E., P KRISTO, M.-J. TSAI, and B. W. O’MALLEY. 1981. A chicken middle-repetitive DNA sequence which shares homology with mammalian ubiquitous repeats. Nucleic Acids Res. 9:5383-5397. SWERGOLD, G. D. 1990. Identification, characterization, and cell specificity of a human LINE-l promoter. Mol. Cell. Biol. lo:67 18-6729. UDOMKIT, A., S. FORBES, G. DALGLEISH, and D. J. FINNEGAN. 1995. BS a novel LINE-like element in Drosophilu melunoguster. Nucleic Acids Res. 23: 1354-1358. VANDERGON,T. L., and M. REITMAN. 1994. Evolution of chicken repeat 1 (CRl) elements: evidence for ancient subfamilies and multiple progenitors. Mol. Biol. Evol. 11:886-898. WEINER, A. M., I? L. DEININGER, and A. EFSTRATIADIS. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:63 1-661. WEINTRAUB, H., R. DAVIS, D. LOCKSHON, and A. LASSAR. 1990. MyoD binds cooperatively to two sites in a target enhancer sequence: occupancy of two sites is required for activation. Proc. Natl. Acad. Sci. USA 87:5623-5627. WHITCOMB, J. M., and S. H. HUGHES. 1992. Retroviral reverse transcription and integration: progress and problems. Annu. Rev. Cell Biol. 8:275-306. XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353-3362. -. 1993. Dong, a non-long terminal repeat (non-LTR) retrotransposable element from Bombyx mori. Nucleic Acids Res. 21:1318. ZIMMERLY, S., H. Guo, l? S. PERLMAN, and A. M. LAMBOWITZ. 1995. Group II intron mobility occurs by target DNAprimed reverse transcription. Cell 82:545-554. MITIKO Go, reviewing Accepted August editor 14, 1997