* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Euglena gracilis chloroplast rpoB gene
Messenger RNA wikipedia , lookup
Genomic library wikipedia , lookup
Gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Molecular ecology wikipedia , lookup
Transposable element wikipedia , lookup
Gene desert wikipedia , lookup
Polyadenylation wikipedia , lookup
Gene nomenclature wikipedia , lookup
RNA interference wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Biochemistry wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Gene regulatory network wikipedia , lookup
Epitranscriptome wikipedia , lookup
Genetic code wikipedia , lookup
RNA silencing wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Point mutation wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene expression wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Community fingerprinting wikipedia , lookup
Chloroplast DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
© 1990 Oxford University Press Nucleic Acids Research, Vol. 18, No. 7 1869 The Euglena gracilis chloroplast rpoB gene. Novel gene organization and transcription of the RNA polymerase subunit operon Gloria M.Yepiz-Plascencia, Catherine A.Radebaugh and Richard B.Hallick* Department of Biochemistry, University of Arizona, Tucson, AZ 85721, USA Received September 27, 1989; Revised and Accepted December 7, 1989 EMBL accession no. X17191 ABSTRACT The rpoB gene coding for a /3-like subunit of the chloroplast DNA-dependent RNA polymerase has been located on the chloroplast genome of Euglena gracilis distal to the rrnC ribosomal RNA operon. We have determined 5760 base-pairs of DNA sequence, including 97 bp of the 5S rRNA gene, an intergenic spacer of 1264 bp, the rpoB gene of 4249 bp, 84 bp spacer and 67 bp of the rpoC1 gene. The rpoB gene is of the same polarity as the rRNA operons. The organization of the rpoB and rpoC genes resembles the E. coll rpoB-rpoC and higher plant chloroplast rpoBrpoC1-rpoC2 operons. The Euglena rpoB gene (1082 codons) encodes a polypeptlde with a predicted molecular weight of 124,288. The rpoB gene is interrupted by seven Group III introns of 93, 95, 94, 99, 101, 110 and 99 bp respectively and a Group II intron of 309 bp. All other known rpoB genes lack introns. All the exon-exon junctions were experimentally determined by cDNA cloning and sequencing or direct primer extension RNA sequencing. Transcripts from the rpoB locus were characterized by Northern hybridization. Fully-spliced, monocistronic rpoB mRNA, as well as rpoB-rpoC1 and rpoB1-rpoC1-rpoC2 mRNAs were identified. INTRODUCTION Chloroplast genes are transcribed, and the resulting mRNAs are translated via plastid-specific RNA polymerase(s) and ribosomes, respectively. The genes for tRNAs, rRNAs and several messenger RNAs are chloroplast encoded (I, 2, 3, 4). In the chloroplast of the unicellular protist Euglena gracilis, two different RNA polymerase activities have been reported (4, 5, 6, 7). One of the polymerases is tightly bound to the chloroplast DNA. This complex is known as the transcriptionally active chromosome ('TAC'). A different RNA polymerase activity is found in a soluble extract of the chloroplast ('soluble'). In E. coli the RNA polymerase is a protein complex composed of four different polypeptides designated a (MW 37 000), (3 (MW 151 000), B' (MW 155 000) and a (MW 70 000) that are encoded by the genes rpoA, rpoB, rpoC and rpoD, respectively. The * To whom correspondence should be addressed subunit composition is a-$(5'o (8). Eukaryotic RNA polymerase genes coding for the largest and second largest subunits, homologues to the E. coli /3 and /3'-subunits have been described (9, 10, 11, 12). It has been suggested that the genes for the chloroplast RNA polymerase are nuclear encoded (13). Subsequently, the equivalent of the E. coli rpoA (14), rpoB and rpoC (15) genes were reported in the spinach chloroplast genome. These three genes have been also identified in tobacco (3), liverwort (2), and rice (16). In chloroplasts, the rpoC-like genetic information appears to be encoded in two genes, designated rpoCl-rpoC2 (15). We are interested in the relationship between chloroplast genes for RNA polymerase subunits and the known chloroplast polymerase activities. Antibodies against fusion proteins that contained fragments of the chloroplast genes rpoA from spinach, rpoB from tobacco, and rpoC2 from Euglena, were able to immobilize a chloroplast RNA polymerase from spinach, pea and Euglena gracilis (17). The antibodies also inhibited the 'soluble' enzyme active in tRNA and mRNA synthesis but had almost no effect on the activity of the E. gracilis 'TAC (17). It has been recently shown (18) that fusion protein antibodies prepared against the pea chloroplast rpoA gene product detect a 43 kDa polypeptide in a chloroplast RNA polymerase preparation. To better understand the differences in enzymatic activity of Euglena chloroplast 'soluble' and 'TAC RNA polymerases, we have characterized the Euglena chloroplast rpoB locus. This is a first step toward identification of the corresponding enzyme subunit in RNA polymerase preparations. We found that the rpoB gene organization was so unusual that cDNA cloning and sequencing was also required to determine the structure of the mature mRNA. We present the complete nucleotide sequence of the Euglena gracilis chloroplast rpoB gene, the upstream region of the gene including 97 nucleotides of the 3'-end of the 5S rRNA gene from the rrnC operon for orientation (19), the spacer region between the rpoB and rpoCl genes and 67 nucleotides of the 5'-end of the rpoCl gene. The rpoB gene is interrupted by 8 introns. Intron 8 (309 nt) belongs to the Group II category. The remaining 7 introns belong to a new class of introns, designated as 'Group III' (20) similar to the small introns described in E. gracilis tufA (21) and ribosomal protein genes (20,22). All the exon-exon boundaries were experimentally determined by cDNA 1870 Nucleic Acids Research, Vol. 18, No. 7 primer extension sequencing of the mRNA, or by cDNA synthesis, PCR amplification, cloning and sequencing of the cDNA. The relatedness of Euglena gracilis chloroplast rpoB product was evaluated by comparing its derived amino acid sequence with the amino acid sequences of the E. coli 0-subunit gene and the chloroplast homologues from tobacco, liverwort and spinach chloroplast genomes using computer-assisted, multiple sequence alignment algorithms. MATERIALS AND METHODS Materials Enzymes, chemicals and [a35S]dATP were purchased from BRL, (Gaithersburg, MD), NEN, Dupont, (Boston, MA), BIO RAD, (Richmond, CA) and Sigma Chemical Company (St. Louis, MO). Bluescript and Bluescribe (+) and ( - ) vectors were obtained from Vector Cloning Systems, (San Diego, CA). Sequenase kits were purchased from U.S. Biochemical Co. (Cleveland, OH.) DNA subcloning and exonuclease IIT/S1 deletions Chloroplast DNA from Euglena gracilis Pringsheim, strain Z was isolated according to the procedure described in (23). Recombinant plasmid pPGl 1, containing the EcoRI restriction fragment EcoF (Fig.l) (19), was used as a source of the 5.14 kb EcoRI-BamHI fragment of EcoF. This fragment was cloned into Bluescript and Bluescribe (—) vectors using JM101 or XLl-blue E. coli cells as hosts (24). The resulting recombinant plasmids were designated pEZC931 and pEZC932, respectively. The EcoRI fragment EcoR (Fig. 1) was subcloned from plasmid pPG671 (22) into Bluescribe ( - ) in both orientations. The new plasmids are designated pEZC929 and pEZC93O. A 2.0 kb Hindm fragment, that overlaps the EcoF-EcoR (Fig. 1) junction was isolated from HindHI-digested chloroplast DNA by agarose gel electrophoresis, eluted from the agarose using GeneClean (BIO 101, La Jolla, CA.), and cloned into Bluescript ( - ) in both orientations. The new recombinant DNAs are designated pEZC935 and pEZC936. Plasmid DNAs were purified using a cleared lysis method (25) and linearized with restriction enzymes for exonuclease HI/SI digestions. Overlapping, unidirectional deletion subclones were generated according to the procedure of Henikoff (26) with the following three steps: (i) unidirectional 3'-exonuclease III deletions into the chloroplast insert, (ii) SI nuclease digestion, and (iii) intramolecular blunt end religation and transformation. Single stranded template DNAs were prepared using the defective phage M13KO7 as a helper phage (27), and sequenced using [o^SJdATP and the dideoxy-chain termination method (28) with the Klenow fragment of DNA polymerase I or Sequenase (28). DNA sequence analysis Analysis of DNA sequence data was performed on IBM-PC/XT and PC/AT computers using the DNA and protein analysis programs of Mount and Conrad (30). The program FASTP (31) was used for initial homology searches with the derived amino acid sequence in the Protein Identification Resource (P.I.R.) of the National Biomedical Research Foundation. A progressive multiple alignment method (32, 33), was utilized to achieve the multiple sequence alignments of the /3-like subunit amino acid sequences on a DEC-Microvax 2 computer. cDNA synthesis, DNA amplification and cDNA cloning Chloroplasts were purified from cell lysates by differential centrifugation and sucrose flotation (22). RNA was isolated by resuspending the chloroplasts in lysis buffer (0.5% SDS, 10 mM Tris-HCl pH 7.5, 1 mM EDTA, 5 mM DTT), extracting three times with an equal volume of phenol (saturated with 10 mM Tris-HCl pH 8.0, 1 mM EDTA), followed by two extractions with chloroform-isoamyl alcohol (24:1), and collected by ethanol precipitation. Before the RNA was utilized for cDNA synthesis, the DNA was digested with RQ1 RNase-free DNase (Promega Biotechnology, Madison,WI), followed by phenol and chloroform extractions and ethanol precipitation as previously described. Synthetic oligodeoxynucleotides for cDNA synthesis and polymerase chain reactions (PCR)-amplification were purchased from Promega Biotechnology. The primers 5'-Cl TlGAAGAAGTTCACC-3' (positions 2986-2970, Figs. 1 and 2) designated Cl and 5'-GCTTTAATCTCTGAACCT-3' (positions 4651-4634, Figs. 1 and 2) designated C2, complementary to exons 8 and 9 respectively, were used to perform two separate cDNA synthesis reactions. The reactions containing 10 fig of DNA-free RNA and 280 ng of the appropriate primer were performed using a cDNA synthesis kit (BRL). The resulting cDNAs were amplified by PCR using the Taq polymerase (Perkin-Elmer Cetus, Emeryville, CA.). The reactions contained the cDNA synthesis product from 5 fig of chloroplast RNA and 0.66 ng of a pair of cDNA and PCR primers (Cl-Pl and C2-P2 respectively). The oligodeoxynucleotide 5'<X3TTTGGTAGAAGAGTTAAG-3' (positions 1528-1548, exon 2, Figs. 1 and 2) was designated PI and 5'-GCTTAGTTCCTTTTTTGG-3' (positions 3622-3639, exon 9, Figs. 1 and 2) was designated P2. The amplification cycle consisted of 1 min of denaturation at 95°C, followed by 2 min of annealing at 45°C and 3 min of polymerization at 72°C. Amplification was repeated for 30 cycles. Amplified DNA fragments were digested with SI nuclease (70 units per /tg of DNA) to produce blunt ends, electrophoresed through a 1% agarose gel, eluted from the agarose using GeneClean and cloned into Smal digested Bluescript or Bluescribe vectors. The cDNA clones were designated pEZClOOO (Cl-Pl primers) and pEZClOOl (C2-P2 primers), respectively. Analysis of rpoB transcripts Whole cell RNA was isolated using aurintricarboxilic acid as a nuclease inhibitor (34). For Northern analysis, 20 fig of total cell RNA from photoautotrophically grown cells were electrophoresed through 1.0% agarose gels containing 0.66 M formaldehyde. The RNA was transferred to GeneScreen membranes (NEN, Dupont), (35). RNAs of known size (BRL-RNA ladder) were used as molecular weight standards. Hybridization probe was synthesized using a plasmid DNA deletion clone of pECZ932. It was linearized at the Seal restriction site at position 1176 and a [32P]-labeled RNA transcript of 3.9 kb was synthesized using T7 RNA polymerase and pEZC932 as a template (Promega, Technical bulletin 002). The probe was complementary to exon 1 through the 5'-end of exon 9. Hybridization was carried out in 50% formamide, 5x SSPE (SSPE: 0.18 M NaCl, 0.01 M sodium-phosphate, 1 mM EDTA), 1 % SDS, 0.5 mg/ml Ficoll, 0.5 mg/ml polyvinyl pyrrolidone, 0.5 mg/ml BSA, and 100 /tg/ml herring sperm DNA at 55°C for 24 hours. Following hybridization, the filters were washed in 2 x SSC (SSC: 0.15 M NaCl, 0.15 M sodium-citrate) at room temperature for 15 minutes, two times in 2 x SSC, 2 % SDS at 65°C for 20 minutes and once in 0.1 x SSC at room temperature for 15 minutes, blotted dry and exposed to Kodak SB-5 X ray film. Nucleic Acids Research, Vol. 18, No. 7 1871 rpoC1 3' rpoB 5" •II C^ 5S 23S III pi-.-c CDNA, cDNA, R F EcoRI Hdlll BamHI 36 C 1 6 S 4 3 2 1 0 Kbp Figure 1. Organization and partial restnction map for the rpoB coding locus. The 5S and 23S rRNA genes from the rmC locus arc included as reference (19). Exons are shown as filled boxes and introns as open boxes. The transcription of the gene is from right to left. Relative positions and polarity of the synthetic oligodeoxynucleotkk primers used for cDNA cloning and for RNA sequencing are indicated by arrows. cDNA, and cDNA2 represent the cDNA-PCR amplified DNA products. Restriction fragments (46) are labeled with letters or numbers between the restriction sites. Primer extension RNA sequencing The purified oligodeoxynucleotide primer 5'-CCCTTTTTTAAAAAGGAGCG-3'(positions 1530-1511, Fig 1), designated C, and complementary to exon 2, was 5'-end labeled with T4 polynucleotide kinase (25). The primer extension sequencing reactions (20) contained 15 /ig of chloroplast RNA and 2.0 X10* dpm of 5'-end labeled oligodeoxynucleotide. RESULTS Molecular cloning, organization and genomk DNA sequence of the rpoB gene The region of the Euglena gracilis chloroplast genome containing the rpoB gene was cloned as two different restriction fragments. The 5.1 kb EcoRI-BamHI restriction fragment from EcoF (19) and the EcoRI fragment EcoR (restriction map, Fig. 1). In order to sequence across the EcoRI restriction site between EcoF and EcoR (Fig. 1), the 2 kb Hindlll fragment, Hind36, was also sequenced. The complete nucleotide sequence (5760 bp) of 100% of both strands of the rpoB locus, including the upstream spacer region, the 5S rRNA gene, and the 5'-end of the downstream rpoCI gene was determined (Fig. 2). The sequence includes 97 bp of the 5S rRNA gene (19) followed by an intergenic spacer of 1264 bp. The rpoB gene spans a region of 4248 bp. It is of the same polarity as the ribosomal RNA operons (19) and the rpoCI gene (C. Radebaugh, G. Yepiz-Plascencia and R.B. HaJlick, manuscript in preparation). The overall gene organization is shown in Fig. 1. Cloning and sequencing of cDNAs Identification of the exon-intron boundaries in the Euglena rpoB gene was much more difficult than with other Euglena chloroplast genes because of the relatively low conservation of amino acid sequences among chloroplast and prokaryotic rpoB-like gene products. Exons were initially identified via the FASTP search algorithm, as encoding portions of polypeptides similar to other rpoB polypeptides. Introns were detected as very AT rich interruptions in putative protein coding regions that contained in-frame termination codons. However, it was not possible to accurately determine the exon-intron boundaries from the genomic chloroplast DNA sequence alone. Therefore, synthetic oligonucleotide primers were synthesized for PCR amplification of specific rpoB cDNAs. The primers correspond to conserved exon domains in the predicted rpoB polypeptide. The primers Cl, complementary to the RNA sequence of exon 9, positions 2986—2970, and C2, complementary to exon 7, positions 4651 —4634 were used for cDNA synthesis. The resulting cDNA reaction products were amplified by the PCR. For every amplification reaction a pair of primers (Cl-Pl or C2-P2) and the cDNA synthesis products were employed (see materials and methods). A diagram of the cDNA-PCR amplified DNA fragments is shown in Fig. 1. The PCR-amplified DNA designated cDNA-1 is the product of the amplification of rpoB mRNA from the 3'-end of exon 2 through the 5'-end of exon 8. It is a DNA fragment of 850 bp. cDNA-2 is the product of the amplification from the 3'-end of exon 8 to the 5'-end of exon 9. It is a DNA fragment of 720 bp. Both double stranded cDNAs fragments were cloned and sequenced. The assignment of the splice boundaries for introns 2 through 8 is based on comparison of the cDNA and genomic sequences. An example of the data from the genomic and cDNA sequences used to determine the exon 5 and 6 splice boundaries is shown in Fig. 3. Direct sequencing of the spliced mRNA product, using a 20-nt primer complementary to exon 2, positions 1530-1511 (primer C), was employed to determine the sequence at the exon 1-exon 2 junction (data not shown). Thus, all of the splice boundaries for the rpoB introns were experimentally determined. The rpoB gene is interrupted by seven small introns of 93, 95, 94, 99, 101, 110, and 99 nt, and a larger intron of 309 nt beginning at positions 1400, 1720, 1898, 2105, 2423, 2564, 2816 and 4248, respectively (Figs. 1 and 2). 1872 Nucleic Acids Research, Vol. 18, No. 7 Ribosome binding sites Within the 20 bases immediately preceding the start codon of the rpoB gene is a sequence complementary to both the 3' end of the 16S rRNA from Euglena chloroplast (5'CAACUCCC-OH 3') and the experimentally determined ribo-oligonucleotide binding sequences (5'CUCCC-OH 3') for the small ribosomal subunit of Euglena chloroplast ribosomes (36, 37). The sequence 5'GTGAG 3' (—8 to —3) differs by one base from the sequence 5'GGGAG 3' (complementary to 5'CUCCC-OH 3'). The E. gracilis rpoB gene product and its homology to bacterial and chloroplast RNA polymerase /3-subunits The E. gracilis rpoB gene spans a region of 4249 bp. The derived amino acid sequence of the exons is shown in Fig. 2. The ATG initiator codon for rpoB is at position 1363. The mature rpoB mRNA has a minimum size of 3.2 kb. It encodes a polypeptide of 1082 amino acids with a predicted molecular weight of 124,288. This predicted molecular weight is close to one of the prominent polypeptides from the E. gracilis 'TAC RNA polymerase (118,000) (4). Based on the Northern analysis (described below and C. Radebaugh, G. Yepiz-Plascencia and R.B. Hallick, manuscript in preparation), rpoB is the first gene in the tricistronic rpoB-rpoCl-rpoC2 operon. The ATG initiator codon for the rpoCl gene, lies 85 bp downstream of the rpoB termination codon. The FASTP algorithm was used to identify protein sequences in the PIR data base with similarity to the Euglena rpoB gene product. The only sequences that were selected with significant similarity scores were RNA polymerase subunits from chloroplast, bacteria and eukaryotic nuclei. A progressive multiple alignment program (32) was then used to compare the amino acid sequences of the selected bacterial, chloroplast and eukaryotic /3-like subunits. The sequences are aligned progressively, beginning with the most similar pair and continuing with the addition of the next most similar sequence or set of sequences. The E. gracilis chloroplast rpoB gene product was aligned with the /3-subunit sequence of E.coli (38), and S. typhimurium (39) RNA polymerase, and with the predicted polypeptides from chloroplast genes of tobacco (40), spinach (15), Marchantia polymorpha (2), the partial sequence from Saponaria qfficinalis (41) and the homologous eukaryotic polypeptides from the Drosophila melanogaster locus DmRP140 (12) and the Saccharomyces cerevisiae locus RPB2 (11). The multiple protein alignment is shown in Fig. 4. We identified six highly conserved regions present in all the rpoB gene products: region I (114 amino acids, positions 14-128 in £. gracilis), region U (86 amino acids, positions 358-444), region in (46 amino acids, positions 518-564), region IV (152 amino acids, positions 667-819), region V (72 amino acids, positions 823-895) and region VI (93 amino acids, positions GGATCCACTTAAAACATTTCGAACTTGCAAGTTAAACATAAAGCCTAAATGGATACTTGGAAGGTTCCTTTCTGSGAAAAGCTTTTAGTGCCCTTATCGCCAGTTTATTTATTAATATTG 240 AGTATGTTTTTGATTGATTTTATGAATTTTGTCCATTTCTnGTTAAAGTAGTTAAAATCTTTAAAGTTTTGAIAAAAAATTTTTCTCTCAACAAGTTAAATAATAAAAAAACATCATGT TTGAACTTTTTGTAAAGAAATTTTTAAGACAATCrTTGAATAAAGAAAAAGTGACICTTAAAAATTTGAACrCTAATACTTTTTTGGTTATGATTTTTTGTCAATTTAATAATTATTTAT '80 ACTATTGCTTAAAATTTTTTCTAATATTGAGCTAGTT TAAACT TTGTTATTT TACTATAATACCATAAATTTGCCAAACTTTCCTATTACAAGAATTGAAGATTGTTATTTTTTCAAGTT 600 AGC^DWAAATTGTACGTCAATTTTGTTAAAAATACTTTAATTTTTTATGGTTTATTTTCTCTATTGTATGTGTAACACTTGACCTATTTATATTAGTAAAAATGGTTCTAATTTTATCA 720 AATTTCCATCTMTCUCTAMTATAGAATTAGAGCATCGTTTTCGT1ATAAATTTCAATATMTATTTMTGAAAAAATTTTTTTUTAAMGCGUATTTTTATTAAATGGTUGAAC 840 AAGTUTTTTTTTACAAA>MTATAMTTATTAAATTTTTAGAAAACTCCTATGATATGAAAAAAACAAGACAATAATrTTTTAGTATCAAATTTAATAAAGAGGCAAATTTTTCAATTT 960 TTAAAGTTTTGAGTTAATAAAAACAATTTGTTCATATAATACCATTTTAATTTTTTCAACTTTTTTGAAAAACTAATAGITTTATGTAGTGTAATATCAAAATTTATGTTATAAAAAAAC 1080 TATTATTTAACAAGAGTTCGTTTATTTTTTTCGAAAATTTATTTATTTAACTCAGCATCTAATTTGTCTTTAATAATGGAGAGTAATGAAACTAAAAGAGAAGTAAGCATTAACAGGTTG 1200 GTTGCAAACTTCCGTAAGAAAGCCTTTGAAAArAAAGGATTAAAAAAGCTCGTTCTTGGTTAATTAATTTTGGCCATr^UUMTAATGCGAAAAATAGTACTTCTGCAAAATTGTATAAAC 1320 ATCGAAGTAArTGTTTTAAAACATTATTTTTTTAAGGGAACTTTTTGTTTIATTAAATATTAAGGTTAGTTAATTTATATGCATTTAAAATTAATTTTAAAAAATCCATTATACTATTAA 1U0 TGTCAMTCAMGJUUCAGACTGTTTGMCTTTGTGAGAAATGGTAAATGGTTGTAGAGTACGCTCTAATTTATTAGATTTGTGAATTAATTCTTAACTTTGGTTTTATTTATTACAAAT RP08 K V W G C B V R S N L L P 1S60 AAATAAATAATTTTTTTTGGTATGTTTATGTTAGATAJUMTTTTACT6ATGATACMCGAMTAGTTTTC6CTCCTTTTTAAAAMGGGTTTGGTAGAAGAGTTAACAAAAATAAAAGA J O B N S F H S F L K K G L V E E L R K I K D 1660 TATAGCTCATOUGGCTTTAGCATUGCTTTCAJUtCAGATAATGTGAAATATAAAAAGCCTAAAATATCTGCIGMTTTGCATTGAAAAACGGGGAAACATACAGTTTArCTGTTCATAT l A H E G F R I S F Q T D N V K T K K P K I S A E F A L K t t G E T Y S L S V M l 1800 ACCTGTTGAAGTTACCTATMTAATATGTTCCTTGTTAGGAWGATTTTTATTATTTTrCCTATTAATTTTATAATATATTTTTCATTATAATAATAAGTTTTTTTTTAGCAAAAAACAA 1«0 TTTATTATCTTATTAAJ>TAAATATATATTATTTGCAJUUU>TTCCTTTAATMCTGAAAA*GCTACGTTTATTTTTAACGCTAATAAAAGAATTATGGATGTGGTTTATATTATTACTTTT N K T I L F A K I P L M T E K G T F I F N G H K R I t t 2040 AArTTTTTATTMTTATATTTATAAGTTAATTTTTTTTTAATTTTTAGGTTTGAGTTTTTTTAATTCATTTTTAACCAAATTATTCGTAGTCCTGGCGTTTATTTTGAAAAAAATCGTTA V N 0 1 I S S P G V Y F f K P t G T 2160 TAATGATTCrJTATTTGCGACGTTWTACCAACTTTTGGTACTTGGTTAACTTTTAAAATTGJITGTGCGTTTTTAAATGTTATTTAGTAATTTGGATTTTTAAATTATATTTTrACATTA N O S I F A T l l P T F G S W l T F K t D 2280 O D E G I F V K V D K I K T A I P L I M F L K C L G 2400 CCTATCCCGAA>AAAATTTTTTTTGTATCTT«CG*TCCTATTTTTAT«yUlACGTlACAAGAMGTGAGTCTTATGGTATTAGATTGGAATTTTTCGAATTTTATAAA#TATTTTTTCC 6TITTTTTAJJ^lj AAACGAAGTTMTGTTCGTTT M E V H V R F 26*0 AAAGGAAATGCCCGAAAATTTTTACGTTCAAAGTTTATGGATCTTGAGTTTATTCGT6TATTrTTTGTAATrTACAAAAGTTAAATTrTTT«TA&ArTATTTTAAAATTTTTTGATTTAT G H A K K F I Q S K F H D 2760 TATTTTATG^GCATTATTAKCGAGATTAAACACGTAAATATGATTTGGSTGAAGTAtftK P R K Y D L G E V G R F R U H r K T GAGT T AAAGAAAGGUGGGAAAT AA T GAAGAA T AA r i E L K K G P F t) E 1 ATTTATTATAAAAATTTTTTAGCAATAHTTATTAATGA1 i r R S E F F O S H f l T 2880 kTTTAAAAAACAAATTAGTACGTAGTATAGGTGAACTTCTTCAAAGTCAATTCCGAATTA D D TTTTAAATGAATTAGAATCTAGTTTUAAGAAAAATTGA 'ATTTCTTTATAAAAATCCTTCT L :TTTTAG»TTATCTAGATTTTTTAATTCTTACTTTA.IAACTAATCGTATTC TGTCTGAGCTTACACATAAAAGGAAGTTAAGICCCTTTGGTCCTAArGGTCTTAATAAAGAAAGAA Nucleic Acids Research, Vol. 18, No. 7 1873 1360 ITrilTrilMMt6n«TIT«ATTTIATCITTA«CAAA<tATCTAA«AETTCATA l E f l K H A Q L I L S I A K D V R V D 34J0 AGTAICGAnnTACAAA£CCCimiMAMGTTTT& SWmATTTTATTICTfCTKgCAOlAAAAATATTrTAUCTAGeTCCCTTTWTC i l Y F I I S A O E K Y F T V A P F D 1600 TCTTTASAACTTCTCAAASTMTCTTTTAUTAAAAATAAGCTTCTACSTOTTAAAAGAASTAAAATTT l D F I I I I I O V F I t S O S I I l l S I C > I C I . l S V ( t l l t I I 3720 CAAAAnAOATSTAACAGAAATAAATACTACTCAATATCCAACCCTTTeTCCTATTCACAl T ( L D V I E I * T S O r « I V C P I E 3840 nATAAAAACTCa^CSCCAfi6CrnMnAATCCACAAACT6AT6CCACCaTTTTAKAAAAACTACTCCAAJUtfTAATTTATTCTACnTAAAAAAAATAgnAnCAACACCAACAAT F l l t l S I E A L i y i E I D A I V L A l I I C l l V I T J J L C f l V I Q t E E 3960 ATTCTenUTAn(^nniTAATAAAAAIIATTCTIT7TTTUnTATTttaBTaCTIAMiaUUTArCTaUUAAMTTTTIUUUTTATTAAAAAMtIAACCAAA«AAnTATT T C A I I E F F I C a T t F F > L l G C L a E I I O t l F K L I ( ( V I O I ( i r 4080 TTTTAGAi^acCTAACAJUTCQAATCACCeTCTTTArATTCACAAUTTCCTATTCTTCATGAACGTGAfiTGSCTTAGAAAACCACAAATTATAGCTGATCQTATCAfiTACATTAAGAC C C I 4320 ATATCAAAAAATTTAAAJurTTTTATTOTAAATCATGAAAAAAAACCTCTCCSAATTTAAATTTAaTTAATrnCAAAATTTAATTCTTAATnTAnnTCCTAAAnTTaTATAAATA C 4560 AITCAAITAAAAinilTMIIIAAATUUAAAICIIIAIIITATIIAATAAAaCTAAAiaAITAIIACTTI/UAICTTIACITT&UAACOAAIATIAATTTCnAAilUIUtAAA E 4680 ATATTUTAICTITATICCrUTCCAUITTAAAAACTATTAAAAACTTUAAAATAATaTAIIATIAAAATAnTTCASAWTTAAASUCAaUICTTITAATIGCAACAATTAAAS V K L K a T P K U K N L l A F F G a K V R K D V S L l S P K I L V G I V T f V E 4920 TntATaiAAAAAtTCCAAITCTia^TCTTATIDITaTT(KTBMIAACarAM»TACAAAT/>aiMTAAAATTTC»lin»riinMiaTAATAAAMIATAATTTCaAAAATTirtTC ! L C K K S U C t V L I U V A C K > > i a i e O X I A e i t l i a H K C I I S I C I V 5040 CGICTATAfiATATCCCTTTTCTTCCTCATOQTACACCTaTTCATATOATTTTAAATCCTCTACCAATACCATCTAGAATCUTQrTQQTCAAGTCTTTQAAAaTTTfiTTAAACTTCTaT P S I D H P F L P D G T P V D H I l l l P I . S I P ! » l i a V < 1 0 V F E I L I . I I I . I 5160 CITTATTTTTQAAACAAASeTATAAMTTCAACCTTTT(aTaAASTCCAAACAACTATCAATTCAAAATCTTTT0TCTATAAAAAATTAAAT6AACCCCCTAAAA6AACTAAAAAACATI i l . F L K E I Y K i a P F D E V O T S I I I I I I C S F V V ( K I . > E A I K I I T I C I C D 5280 a^TTTTAATCCAAATTATCCAGflAAAACCCTTTTTATATQATMTACAAATTGTACaXTTnCATCACCCTBTACCTTTTOOTTATCCCTATATniAAAATTAATACATATCCTTA U l F a P a Y P 0 K A F L Y 0 8 « « C « P F D « P V A F G Y A Y I L i : L l l f > I V 5*00 AAUaUAATTUTSCGAaUTTACaXCCCTIATTCITCAGTAACTUACAACCTTTAaTOlUAATCSAAAUTWTGCAI^UUUSTTTGGSCAM ( D K I H A I I V T e P r S S V i a 0 P l > Q I I I i : a S G a > F D E H E V U A I E 5520 OAtTIMTGC7aaTA7TIGTT(KAUaOITAriAA<:TATIAAATCTGAI«I8TAII/UWIABAICAawsa:iIATTTAGITTAATAAAIG«TACTTAITITTCTAAeCCAAAIAITC C F 0 J « r i L O E L l . I I [ I D D V l « I I E A L F l L I l l C I T f J [ P I I I 5640 CKAAeCTTICAAGTIATITAIIIIAGAAATCKAATCTIIAItlAIIGAIAIIAAAATITrTACAAATAATTAIAAAAAAITCtATTAGITAAArTIIIGAtllATIIIITTIIIIAIGA P E A F I L F I L E H O S L C I D I K I F T U K T t l C F D * 5760 AGTTTTTATaTUATTAAQACtTUTAnTATTTITATTTAITTTTBTATTTATGAAAaATTATQTaASAATAAAaATACnTaaXACAACAACTTTTAAOTTGCACCGAAAGATCTT pcti N K D Y V I I K I A I P O O V L I U T E I I L Figure 2. Nucleotide sequence of the RNA-like strand of the E. gracilii chloroplast rpoB gene and flanking regions. The DNA sequence is given in the 5' to 3' direction. The amino acid sequence as predicted from the nucleotide sequence using the universal genetic code is shown below the second nucleotide in each tnplet codon. The intron splice boundaries are underlined. The first 97 nucleotides are the 3'-end of the 55 rRNA gene and the last 67 nucleotides are the 5'-end of the rpoCl gene (see text). 940—1033) (Fig. 4). All the chloroplast sequences, including Euglena, have small deletions at the amino termini and a larger deletion centered at position 1000 when compared to the bacterial genes. Particularly interesting is an insertion of 35 amino acids only present in the Euglena gene at positions 598-633. This is not a PCR artifact since the region was sequenced from two independently isolated cDNA clones. It is flanked by a poorly conserved upstream region, whereas, the downstream sequence has a higher amino acid similarity (region IV, amino acids 641 -812 in Euglena). It corresponds to amino acids 756-931 in the E. coli polypeptide that presumably take part in the formation of the enzyme-DNA binding site of the allosteric regulation center responsible for the interaction with ppGpp (42). In the E. gracilis polypeptide, the analogue to the E. coli Cys 764 residue (42), essential for the enzyme interaction with the DNA template is substituted by Tyr 649, but two Cys residues, Cys 616 and Cys 680 are present in close proximity. In the case of the E. coli polypeptide, most of the mutations conferring rifampicin and streptolydigin resistance have been localized to amino acids 511 -576 (42). This region corresponds to the E. gracilis amino acid residues 365-429, region II. It contains 27 identical amino acids and 24 conservative replacements. A region of 118 residues (positions 932-1050) in E. coli, is substituted by only 10 residues in E. gracilis (positions 813—823) and also in the higher plant chloroplast homologues. This region was shown to be redundant in E. coli, since its deletion did not sufficiently influence enzymatic properties (42). Two of the most highly conserved regions, corresponding to regions IV and V, amino acids 796-819 and 861-877, in E. gracilis are homologous to the E. coli domains involved in nucleoside triphosphate binding (positions 1047-1070 and 1228-1244 in the E. coli polypeptide). These domains have been identified in E. coli by affinity labelling of the polypeptide with an initiating substrate analogue. They contain the lysine residues and histidine 1237 that are situated in the nearest neighborhood to, or directly involved in, the formation of the active center of initiating substrate binding (43). The Euglena rpoB gene product has about 30% identity with homologous bacterial and chloroplast polypeptides at the amino terminus and a higher conservation of 48% at the carboxy terminus. The overall amino acid sequence identities with the bacterial, chloroplast and eukaryotic genes are summarized in table 1. Expression of the chloroplast rpoB gene Transcripts of 3.2, 4.7 and 7.7 kb were detected with the rpoBspecific probe. The probe contained exons 1 through 8, approximately 60% of exon 9 and all of the introns. The probe hybridized to transcripts of approximately 3.2, 4.7 and 7.7 kb 1874 Nucleic Acids Research, Vol. 18, No. 7 ct DNAs sequence cDNA order is found in bacteria (38, 44) and in chloroplast from spinach (15), tobacco (3), liverwort (2) rice (16) and Euglena gracilis. The location of the Euglena chloroplast rpoB-rpoCl-rpoCl operon distal to, and in the same polarity as, the ribosomal RNA operon has some additional similarity to the arrangement of these same genes in E. coli. The E. coli rpoB-rpoC genes are within the rif cluster (44,45). The gene arrangement is rmB operon, 4 tRNA genes, tufB, rplK-rplA, rpU-rplL-rpoB-rpoC. The chloroplast equivalent of rplK (rplll), rplA (rpll), rpLJ (rpllO), and rplL (rpl7/rpll2) are all believed to be nuclear encoded in plants. The juxtapositioninenomes.g of the Euglena RNA polymerase operon distal to the ribosomal RNA operons is perhaps noteworthy because of the overall similarity to the E. coli rif cluster. The large intercistronic region of 1.2 kbp between rrnC and rpoB is at present unexplained. Large intercistronic regions are uncommon in chloroplast genomes. In Euglena chloroplast DNA, most intercistronic spacers are less than 100 bp length (46). Although a protein coding locus for this region cannot be ruled out at present, we do not find any open reading frames in this region with similarity to known proteins in the P.I.R. data base. In addition, transcripts from this region could not be detected from either DNA strand by Northern blot hybridization (data not shown). sequence ..A C C T A G G T G C A A A T C T C G G Figure 3. Chloroplast genomic and cDNA sequence analysis of the splice junctions for introns 5 and 6. The DNA sequences for the genomic and cDNA clones are given to the left of the lanes from which they are read. Two arrows point to the 5' and 3'-splice junctions in the genomic sequence and converge at the exonexon junctions as a single arrow in the cDNA sequence. The sequence given on the left side is the complement of the RNA-like strand. Lanes A,T,C,G, contain the corresponding dideoxynucleotides. that were present in all of the RNA samples (Fig. 5). Differences in migration by the RNA samples are due to salt concentration. The RNA species of 3.2 kb most likely represents the mature fully spliced rpoB mRNA. This size is in very close agreement with the minimum size message of 3.2 kb predicted from the nucleotide sequence. An unspliced, rpoB pre-mRNA would be 4.2—4.3 kb in size. Transcripts of this size were not detected in any of the RNA samples. In contrast, discrete transcripts of 4.7 and 7.7 kb, (larger than the region encoding the rpoB gene) were detected (Fig. 5). The 4.7 and 7.7 kb transcripts are interpreted as fully spliced di- and tri-cistronic mRNAs of rpoBrpoCl, rpoB-rpoCl-rpoC2 mRNAs, respectively. DISCUSSION Gene organization The organization of the rpoB-rpoC (or rpoB-rpoCl-rpoCl) operon has been conserved throughout evolution. The same gene cDNA synthesis It would not have been possible to predict the rpoB exon-intron boundaries from the DNA sequence alone. Traditional techniques for cDNA synthesis involving oligo-dT primers were not suitable, since chloroplast mRNAs are not polyadenylated. The techniques of cDNA cloning and PCR amplification proved to be very useful for characterizing chloroplast RNA processing products. To our knowledge, this represents one of the first examples of cloned chloroplast cDNAs by PCR amplification. This approach should have wide applicability to future studies on organelle RNA maturation pathways. Gene expression The rpoB and rpoC genes are co-transcribed in E. coli (44). In spinach and pea chloroplast it was not possible to detect cotranscription of rpoB-rpoCl-rpoC2, using Northern analysis and [32P]-DNA-labeled probes (15, 47), although, it was concluded from S1 mapping analysis that the genes were co-transcribed in spinach (15). It was proposed that the failure to detect these transcripts in pea was due to a very low abundance of the transcripts or that the RNA polymerase genes were pseudogenes (47). The 3.2, 4.7 and 7.7 kb transcripts detected via Northern analysis with the rpoB gene-specific probe are much larger than the predicted unspliced rpoB precursor mRNA. Properties of the rpoB introns The presence of introns in Euglena protein genes has been extensively documented (46). The number, position and type of introns appears to be distinct from the higher plant chloroplast genomes. In particular, all the higher plant chloroplast rpoB genes characterized thus far do not contain any introns ( 1 , 2 , 46). Introns 1-7 of size 93, 95, 94, 99, 101, 110, and 99 nt respectively, are of a category recently identified by Christopher and Hallick (20) as 'Group IE'. The 309 nt intron 8 is a Group II intron. Until recently most of the introns found in Euglena protein genes were relatively large compared to the rpoB introns (46). The rpoB group EQ introns most closely resemble the three Nucleic Acids Research, Vol. 18, No. 7 1875 83 HLGDGBECUTIPBFIB) n CO. eiritPtrac HV--ncmff IOFECFWJIW I0FEG/CIFIDO lofEtnaFim LLDIQWIFtlFUDC o «.TEEL«cf«i«tTOQtiKFaLFvrrTom£mmc a.TEEinCfhrJB>TO0EUMLFVtTTOLVtPtIKa o.scriP(ffiiB)iMEFtFQiFffffnaA£MJ.)ca ttvmttllCDIAHECf lUtPanWWYK- -OTCItM 83 67 47 61 61 MinSPGVFFMOKOrTVtSS 168 E.C. l.T. region I E.C. t.T. 1.0. K.I. M.F*. E.G. --DOIT " *0CHT --11V--OIL--DCI- V«CVTAVA^ i.o. H.T. H.P. E.C. V E.C. r I.I. p S.O. K.T. N.P. E.C. RP140 csir»i«TEiviv»QL»w«vTHrtotaai«a 168 DAVTEtlT f O T L T V l A a 1UKTB---• ' - K m c Q T ( L i G M i n j * n a T F i v K i r a i v i n i i a s r a i r T t t t - - - u ) m 1*2 DA v r r n T T SSEITVIACL i uori - • •- •R0HQE0TIFlOIIPU«St.CTSIVItCIT1IVIIiaiL0Si*6irTitSC-"LDn8 142 DAVTOtllTtSOVYVPAOtTODat-- - • 136 FAiD«IYM.tVBIPVEVTTiai 136 tirnrTnTTWTiTnrTirTriHniirnrn rrrmrmn irirrnTurn n r 663 663 650 693 ...KELPASIMSIVAILCTT . . .KELPASQHAIVAIAat ml 252 252 197 197 191 187 E.C. l.T. S.O. B.T. H P. E G. GEi ICAA1HE LSUM.UUasa«Uat [ EIL F ICAAJK LSUHUUCLJCKJOOt IE T L F —vtrreiFLSF uocEkaai — --vcYPiiFLtF L ana m a n — E C. J.T. J.O. I T. H.P. E.G. E.C. S.T. t.O. H.P. E.C. --VCTF1CIFLEF lOOTTDCET -OflF TH>U)«-PrittIIJrvOPTttRLtAlrtIYI»t«--PflEmt£A«UFE ICHJW-F^ BSHM[LErTQOFACVeg>fVFgIlCgH.QtKfF-gC»CELaiICJUUIIMIUJfU>lP-ElWT CS*IMIUFtOQFACVGCOFVFtt«iaCEl«0CFF-0QtaL«ICMlt«ItU(UlIP-QinrT P»f$TEDA[VELTian.TCigm.FFttIIIKEL0OCFF-0Q«C£LaiQUjaJCDajlU*VP-Dttl E.C. l.T. t.O. N.T. n.f. E.Q E.C. S.T. S.O. I.T. M.P. E.G. 337 337 216 216 210 195 743 778 E.C. S.T. S.O. H.T. H.P. E.O. E.C. S.T. S.O. N.T. N.P. E.G. IPHO 499 499 362 362 337 353 LSLt^ 891 891 enwEMviui.. QTHGrWIHN.. 418 418 279 279 274 268 E.C. S.T. S.O. H.T. N.P. E.O. HP140 RPI2 (795-S20) (091-916) ..nOOOMIIAPOlRVSCDOWlCnl. ..WODEDO.IAPCVtVSGEDVI ] « T T . .FE«ll 976 a F S « I l 976 795 795 790 819 AVLVAMVEA*ni>ttFTOtVLELaTOEEm(MJCUtE^ 1061 6OTM* arm* WAMTA CTIlfVTIJOaEU 815 ETirvmantEU 815 QIIHITILOiOKIQ 810 ktCt VLIHVAEKMIQ (868-939) (964-1035)... 834 CXirVttVRlPQ VtVrrtTTICIM rtglon V E.C. S.T. S.O. H.T. K.P. E.G. IPUO IPU 504 584 447 447 442 438 AAVKE AAVKE TTFEJ TTTEJ HTFCEI HR1IKI <446 490) (511 555) (719-7U) (615-542) TPOH0U.1AFFGlDCV--«niVS1.U«Sl.KiyiSVlILaaO rvglon I I E.C. J.I. f.O N.T H.P. E.G. KP140 742 742 604 604 599 60B PtOtraiLUOtCPAfTPtOAAVCLVOOJU. HAYISV PtOUOnirUCLVCPAETPE0aACGLVDtLSL MSCItV CFLEIFT1wv--T0€VVTMlHYLUIEiC«TVIAttAHSML0£EMFVED--LVTasiCG£SSlFSttOV0YIl)VST0OVVSVCA GFLETfT1<rv--V«VVTMlKTLUlEEG»TVI/iaAHS«U»€G»FVtD--L»TCtSI(aSS1.FS«tlOvtlTMJVSTOOVVSVCA CSUSf^EI--S£HSatVQK.rLSnn)rma>SCHSlJU.nGIQCEO--VVfAKrtOEFLTIAUEOVVFniFSFQrFSIGA GSiESPFYEl--S£JtSTGVKm.rLSnaD£YTMVAAGNSlJU.IIQ01QEEO--VVPARrtO€FLT[MJEOV11JtSIFPFOTFS[GA CCUSPF«I--StLS«LUMHLSA*EMTniAT0l«lAUI0IISQEE0--lTPA«TrMDFVAIAUEOVBJtSlFPl.orFSWA CFLESPFTiyi.ltaiETmGirFISJMtlCrFTVAPHJVF1tO«»UIlDlIUGVntSl(IFS1SFSD(IOFISISTDOFTSlCT GS0MP1L... GTWWM... WDICVAaKaiKGnS«[LLWCl«nrLaCW»PVMIFMP1.CVP«IWVCaiFEC»LGlACt VflOCVAMHQrtGMKILPIKBBfO'LOOasVWWFW'tfiWI.IOWVCOIFECJtGtASS lGDtVAC«KaaOIlSI[LPWO«PFLO«npiWRSn.CVPS»«VCOIFEaLGlACJ lC0KlAG«KareGIlKIVPtlW*»FLPOCTPVMIU(P1.GIPt»«VCOVFEn.LnSSt IGDUASXKOQtQTCGlOTtQOHAFTCEGVAPDIIIHPHAIPSMTIGMlIEaOGKLO... LLD LLD FLI FLK 1146 879 879 874 898 tAmGAOVtOICVllLSTFSDCEVHnAEinJbUWIATttFlXAJCEAEICEU 1231 E.C. S.T HHTRIAP FM ltEOtAttU-VFSELTEAttOTWffVVF---EPETPaniFOaTGOPFEOfVIICI<FY 946 S.O. H.T. BHTRIAP FM BTEOtAOia-Vf HLrEAOCQTAJIPWVf ---EPtTPOC«IF0C«TOrPFE0PV1 IOCPT 946 H.P. OifRIIP FDC RtEtEAttia-VFttLTKAttKTT*^F---EPtWPaOil(LIDaTCE[FEOflTICKAT 941 965 E.G (973-1049) ...OtEVMYMtDTTGaiClKAOVnCPTT RP140 IF«2 (1068-11U) ...CfEVHTHCHTClOaAUIFFGrTr 665 665 528 528 523 523 rsion VI E.C l.T i.o ».T H.P E.G KUCUin.V0C«lIAI^ClTSlVTIX«>1.Oa«nFOMftSevUAUArGAArTL0tKTVI<SDl)VlliaTI0n)a 1316 hUOJn&VWIMmT0tmVTOOf^CCMaF«MFtBtril«UAYGAAYTiaDa.TV1ClI^^ 1316 uaiiovMtiHCMs«jrrAivioct^«iauuraecotv8SCvuAi.EGF(jvA«aooaT«iwi[iauK«^eiiiiGCTiP> 1031 IUClII«M»<ll«SSO«AlVTG<JI>lJtGllA)tOtM«VO£JCVWlECfOVAI(ILOeJl.TrlCSOHlllAMEVl«ITIl^ 1031 M.iaillGVDOt[«A»ISC^AiVTIXyUM»SMa««Vg)CVUUJG>tVAtlLQB«.T[aDHItA»TtVl.CAIVTGglPt 1026 i u a i i « v > » i u u m G n i t v T O < i P u n c t a c c o i F G O C v w i E E r e A A n L a c u T i a » v u n i C A i F s t . i i a n F s i ( 1049 TG1UUHVDHaiAtAJtGFKlV1.TtQPVEG«StOGajtFfi«EltlCHIAHGAA.. E.C. l.T. I.O. H.T. H.P. E.C. 0€-PO(-PflF«V1.LI0imitGIHIEl. E D££PCM-PfSFirvUamsl.G!l(ieL E KEAPOAMSTO.LVMl.lBJU.EL«IFLVStI)(FOIlr« ICEAfai*l>tlFIlLVlfl.imjU.EunFLVHDIFOm ttAPHIAftSFn.lVttlKUU.tlireVlICIDn.ll-t ---PHI-PEAFiaFlLfJIOItCIOUIFTIOmacro--- 1342 1343 1070 1070 1069 1082 Figure 4. Multiple alignment of the predicted polypeptide of the E. gradlis chloroplast rpoB gene with bacterial, chloroplast and eukaryotic homologues. The amino acid sequences of the proteins from £. coH, E.C. (38), 5. typhimwium, S.T. (39), the chloroplast analogues from spinach, S.O. (15), tobacco, N.T. (40) and liverwort, M.P. (2). Also included are regions of strong amino acid similarity from Drosophila DmRPI40 (12) and yeast RPB2 (11). An asterisk (*) refers to identical amino acids in bacteria and chloroplast polypeptides. A period (.) refers to conservative replacements of amino acids in chloroplast and bacterial sequences. Vertical arrowheads indicate the location of the introns in the E. gradlis rpoB gene. The six highly conserved regions are underlined. small introns reported for the tufA gene (21), the six small introns found in the ribosomaJ proteins genes, rpl23, rps3 and rpsJ9 (22) and four introns from rpsl4, rpU4, rps8 and rpU6 (20). With the addition of the seven small rpoB introns, a list of 20 small introns can now be assembled. In addition to the Euglena Group III introns reported thus far (20), several Group III introns have been found in the rpoCl-rpoCl operon (C. Radebaugh, G. YepizPlascencia and R.B. Hallick, manuscript in preparation), the rps4-rpsll operon (J. Stevenson, R. Drager, K. Nelson and R.B. Hallick, manuscript in preparation), and the rps2-atpI-atpH genes (R. Drager and R.B. Hallick, unpublished observations). The properties of the small rpoB introns that warrant their classification in Group IE are as follows: 1) the introns are small and uniform in size (93 — 111 nt), 2) they have degenerate versions of the group II intron boundary sequences with the consensus sequences 5'-NTNNG (N = nucleotide) and ANNTNNNN-3' (table 2) and 3) they lack the conserved secondary structure domains characteristic of group II introns. 1876 Nucleic Acids Research, Vol. 18, No. 7 Table 1. Overall amino acid similarities of the predicted E. gracilis rpoB gene products with bacterial {E. coli and S. typhlmurlum) , chloroplast, (spinach, tobacco and liverwort) and eukaryotic polypeptides. Percent amlno acid identity Gene E. colt rpoB 38.0 S. typhimurium rpoB 37.7 Spinach rpoB 39.0 Tobacco rpoB 39.1 Liverwort rpoB 35.2 Drosophila DmRP140 22.7 Yeast RPB2 21.9 Table 2. Comparison of the intron-exon boundaries of the eight introns of the E. gracilis rpoB gene. exon intron intron exon Split Codons rpoB-1 TAGAT TTGTGAATTA 73NT TTTACTG'ATG ATACAA NONE rpoB-2 GTTAG GAGAGATTTT 75NT TTATCTTATT AAATAA AG-A"ARG rpoB-3 TATGG- ATGTGGTTTA 71NT TAATTCATTT TTAACC G-TT~VAL rpoB-4 TTGAT GTGCGTTTTT 79NT ATATTTGGGT AAGACG NONE rpoB-5 GTTTT TTGAGATTTT 81NT TTATCTGAAA GAAATG NONE rpoB-6 GGATC TTGAGTTTAT 90NT CGAGATTAAA CACGTA C-CA=PR0 rpoB-7 AAAGG TTGGGAAATA 79NT ATATTTTATT AATGAT GG-A~GLY rpoB-8 AAGGT GTGCGAATTT 289NT TAAGTTTTAA GAAAAT NONE Conserved 5 # . T . . G. . . . V To date, group HI introns are only known in Euglena chloroplast genes. Group HI introns appear to be found predominantly in low abundance, constitutively expressed genes. Only the 309 nt rpoB intron 8 is a Group II intron. The rpoB gene is not the only Euglena gene that contains both Group II and Group HI introns. Other examples are rps3, rpll6 and rps8 (20). The properties of Group II introns include highly conserved 5'- and 3'-boundary sequences, a minimum size of > 300 nt, and conservation of a core structural feature. The boundary sequences for rpoB intron 5" . .A. .T V 8, 5-'GTGCGA and AGTTTTAT-3', match the consensus sequences for Group II intron splice boundaries (48). This intron may be folded into a RNA secondary structure typical of group II introns (48), with six helical domains radiating from a central core. The fifth helix is a 14 base-paired stem-loop and the sixth stem contains an unpaired adenine residue that has been proposed to be a branch point for lariat formation (48). Helical domains V and VI are the most diagnostic feature of Euglena chloroplast Group II introns. A potential secondary structure for domains Nucleic Acids Research, Vol. 18, No. 7 1877 Kb ACKNOWLEDGEMENTS 1 2 9.49 7.46 rpoB-C1-C2 4.40 -I rpoB-C1 We wish to thank Coco Whelchel for technical assistance during the screening and characterization of the genomic clones, David Christopher for helpful discussions during the course of the experiments and our colleagues in the laboratory for critical reading of the manuscript. This work was supported by a grant to R.B.H. from the NTH. rpoB 2.37 1.35 REFERENCES 0.24 -T Figure 5. Analysis of the E. gradlis rpoB gene expression via Northern hybridization. Lanes: 1) 20 y.g photoautotrophic grown total cell RNA; 2) 20 ^g heterotrophic grown total cell RNA. BRL's RNA ladder markers were used as molecular weight standards (kb). Tncistronic, dicistronic and monocistronic transcripts are indicated at the right of the figure. VI A T A A T G T T c 6 T T T T S A A A A IA A • Figure 6. A potential secondary structure model proposed for the 3'-end of intron 8. Structures labeled domains V and VIresemblethe hypothetical Group II domains five and six (48). The arrow indicates the 3'-splice junction. The asterisk marks the potential unpaired adenine proposed to be the branch point for lariat formation. The brackets enclose a base-paired region characteristic of domain V from Group II introns. V and VI for intron 8 of rpoB is shown in Fig. 6. It has been suggested (20, 22) that the conserved nucleotides at the intron boundary sequences could play an important role in the splicing mechanism, and that perhaps Group III introns are a highly degenerate version of Group II introns, representing the minimum size required for correct splicing. We are presently characterizing events in the rpoB mRNA maturation pathway, and studying the relationship among the RNA polymerase gene products, and the two different DNAdependent RNA polymerase activities present in the Euglena chloroplast. 1. Ozeki.H., Ohyama.K., Inokuchi.H., Fukuzama.H., Kochi.H., Sano.T., Nakahigashi,K.,Umesono,K. (1987) In. Cold Spring Harbor Symposia on Quantitative Biology, Vol LJI, Cold Spring Harbor Laboratory, pp. 791 -804. 2. Ohyama.K., Fukuzawa,H., Kohchi.T., Shirai, H., Sano,T., Sano.S., Umesono,K., Shiki.Y., Takeuchi.M., Chang.Z., Aota.S., Inokuchi.H., Ozeki.H. (1986) Nature 322:572-574. 3. Shinozaki.K., Ohme.M., Tanaka.M., Wakasugi.T., Hayashida.N., Matsubayashi.T., Zaita.N., ChungwongseJ., ObokataJ., YamaguchiShinozaki.K., Ohto,C, Torazawa.K., Meng.B.Y., Sugita.M., Deno.H., Kamogashira.T., Yamada,L., KusudaJ., Takaiwa.F., Kato.A., Todoh.N., Shimada.M., Sigiura M. (1986) EMBO J. 5:2043-2049. 4. Narita.J.O., Rushlow.K.E., Hallick.R.B. (1985) J. Biol. Chem. 260:11194-11199. 5. Greenberg.B.M., Narita.J.O., DeLuca-Flaherty.C.R., Hallick.R.B. (1985) In Molecular Biology of the Photosynthetk Apparatus. Cold Spring Harbor Laboratory, pp. 303-309. 6. Rushlow.K.E., Orozco.E.M., Upper.C, Halbck.R.B., (1980) J. Biol. Chem. 255:3786-3792. 7. Gruissem.W., Greenberg.B M., Zurawski,G., Prescott.D.M., Hallick.R.B. (1983) Cell 35:815-828. 8. Burgess,R.R. (1976) In RNA Polymerase, ed by R.Losick and M. Chamberlain. Cold Spring Harbor Press, pp 90-100. 9. Allison, L.A., Moyle,M., Shales.M., Ingles,C.J. (1985) Cell 42:599-610. 10. CordenJ.L., Cadena.D.L., Aheam.J.M., Dahmus.M.E. (1985) Proc. Natl. Acad. Sci. U.S.A. 82:7934-7938. 11. Sweetser.D., Nonet.M., Young,R.A. (1987) Proc. Natl. Acad. Sci. U.S.A. 84:1192-11%. 12. Falkenburg,D.,Dwormczack,B., Faust.D., Bautz.E.K.F. (1987) J. Mol. Biol. 195:929-937. 13. Lerbs.S., Brautigam.E., Parthier.B. (1985) EMBO J. 4:1661-1666. 14. Sijben-MQller,G., Hallick.R.B., AltJ., Westhoff.P., Herrman.R.G. (1986) Nucleic Acids Res. 14:1029-1044. 15. Hudson.G.S., Holton.T.A., Whitfeld.P.R., Bottomley,W. (1988) J. Mol. Biol. 200:639-654. 16. HiratsukaJ., Shimada.H., Whrttier.R., Ishibasni.T., Sakamoto.M., Mon,M., Kondo.C, Honji.Y., Sun,C-R., Meng,B-Y., Li,Y-Q., Kanno.A., Nishizawa.Y., Hirai.A., Shinozaki.K., Sigiura.M. (1989) Mol. Gen. Genet. 217:185-194. 17. Little, M.C., Hallick.R.B. (1988) J. Biol. Chem. 263:14302-14307. 18. Purton.S., Gray,J. (1989) Mol. Gen. Genet. 217:77-84. 19. Karabin,G.D., Narita, J.O., DoddJ.R., Hallick.R.B. (1983) J. Biol. Chem. 258:14790-14796. 20. Christopher.D.A., Halltck.R.B. (1989) Nucleic Acids Res. in press. 21. Montandon, P.E., Knuchel-Aegerter.C, Stutz.E. (1987) Nucleic Acids Res. 15:7809-7822. 22. Christopher.D.A., Cushman.J.C, Price.C.A., Hallick.R.B. (1988) Curr. Genet. 14:275-286. 23. Hallick.R.B., Richards,O.C, Gray.P.W. (1982) In Edelman.M., Hallick.R.B., Chua,N-H. (eds). Methods in Chloroplast Molecular Biology. Elsevier Biomedical, New York, pp. 281-294. 24. Bullock.K.W., FernandezJ.M., Short.I.M. (1987) Biotechniques 5:376-379. 25. Maniatis.T., Fritsch.E.F., Sambrook^l. (1982) Molecular Cloning: A laboratory Manual. Cold Spring Harbor, New York. 26. Henikoff.S. (1984) Gene 28:351-359. 27. VieriaJ., MessingJ. (1987) Method. Enzymol. 153:3-11. 28. Sanger.F., Nicklen.S., Coulson.A.R. (1977) Proc.NaU. Acad. Sci. U.S.A. 74:5463-5467. 29. Tabor,S., Richardson.C.C. (1987) Proc. Natl. Acad. Sci. U.S.A. 84:4767-4771. 30. Mount.D.W. and Conrad.B. (1986) Nucleic Acids Res. 14:443-454. 3L LipmanJ)J. and Pearson^WJi. 0985) Science 227-1435-144L 32. Feng.D.F. and Doolittle.R.F. (1987) J. Mol. Evol. 25:351-360. 33. Higgins,D.G. and Sharp.P.M. (1988) Gene 73:237-244. 1878 Nucleic Acids Research, Vol. 18, No. 7 34. Hallick.R.B., Chelm, B.K., Gray, P.W., Orozco, E.M. (1977) Nucleic Acids Res. 4:3055-3064. 35. Foumey.R.M., Miyakoshi.J., Day HI.R.S., Patterson,M.C. (1987) Focus 10:5-7. 36. Graf.L., Roux.E., Stutz.E., Kossel.H. (1982) Nucleic Acid Res. 10:6369-6381. 37. Steege, D.A.,Graves,M.C, Spermulli.L.L. (1982) J. Biol. Chem. 257:10430-10439. 38. Ovchinnikov.Y.A., Monastyrskaya.G.S., Gubanov.V.V., Guryev.S.O., Chertov,O.Y., Modyanov.N.N., Gnnkevich.V.A., Makarova.I.A., Marchenko, T.V., Polovnikova.I.N., Lipkin.V.M., Sverlov.E.D. (1981) Eur. J. Biochem. 116:621-629. 39. Sverlov.E.D., Lisjtyn.N.A., Guryer,S.O.,Monastyrskaya,G.S. (1986) DoklBiochem. Sect. (English tranls.) 287:232-236. 40. Ohme.M., Tanaka M., ChunwongseJ., Shinozaki.K., Sigiura.M. (1986) FEBS Lett. 200:87-90. 41. Daru M., Benatti.L., Lorenzetti.R., Martini.D., Mingati.C., Sassano.M., Sidoli.A., Soria.M. (1988) Nucleic Acids Res. 16:3103. 42. Lisitsyn.N.A., Monastyrskaya.G.S., Sverdlov.E.D. (1988) Eur. J. Biochem. 117:363-3969. 43. Grachev,M.A., Lukhtanov.E.A.,Mustaev,A.A.,Zaychikov,E.F., Abdukayumov,M.N., Rabinov.I.V., Richter.V.I., Skoblov.Y.S., Chistyakov.P.G. (1989) Eur. J. Biochem. 180:577-585. 44. Nomura.M. and Morgan.E.A. (1977) Ann. Rev. Genet. 11:297-347. 45. Yamamoto.M. and Nomura.M. (1979) J. Bact 137:584-594. 46. Hallick,R.B. and Buetow.D.E. (1989) in The Biology of Euglena, Vol. IV (Buetow,D.E., ed.), Academic Press, New York, pp. 315-414. 47. Woodbury.N.W., Roberts.L.L., PalmerJ.D., Thompson.W.F. (1988) Curr. Genet. 14:75-89. 48. Michel,F. and Dujon.B. (1983) EMBO J. 2:33-38.