Download The Euglena gracilis chloroplast rpoB gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Messenger RNA wikipedia , lookup

Genomic library wikipedia , lookup

Gene therapy wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Molecular ecology wikipedia , lookup

Transposable element wikipedia , lookup

Gene desert wikipedia , lookup

Polyadenylation wikipedia , lookup

Gene nomenclature wikipedia , lookup

RNA interference wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Biochemistry wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

Epitranscriptome wikipedia , lookup

Genetic code wikipedia , lookup

RNA silencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Point mutation wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Biosynthesis wikipedia , lookup

Gene wikipedia , lookup

Gene expression wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Community fingerprinting wikipedia , lookup

Chloroplast DNA wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
© 1990 Oxford University Press
Nucleic Acids Research, Vol. 18, No. 7 1869
The Euglena gracilis chloroplast rpoB gene. Novel gene
organization and transcription of the RNA polymerase
subunit operon
Gloria M.Yepiz-Plascencia, Catherine A.Radebaugh and Richard B.Hallick*
Department of Biochemistry, University of Arizona, Tucson, AZ 85721, USA
Received September 27, 1989; Revised and Accepted December 7, 1989
EMBL accession no. X17191
ABSTRACT
The rpoB gene coding for a /3-like subunit of the
chloroplast DNA-dependent RNA polymerase has been
located on the chloroplast genome of Euglena gracilis
distal to the rrnC ribosomal RNA operon. We have
determined 5760 base-pairs of DNA sequence,
including 97 bp of the 5S rRNA gene, an intergenic
spacer of 1264 bp, the rpoB gene of 4249 bp, 84 bp
spacer and 67 bp of the rpoC1 gene. The rpoB gene
is of the same polarity as the rRNA operons. The
organization of the rpoB and rpoC genes resembles the
E. coll rpoB-rpoC and higher plant chloroplast rpoBrpoC1-rpoC2 operons. The Euglena rpoB gene (1082
codons) encodes a polypeptlde with a predicted
molecular weight of 124,288. The rpoB gene is
interrupted by seven Group III introns of 93, 95, 94, 99,
101, 110 and 99 bp respectively and a Group II intron
of 309 bp. All other known rpoB genes lack introns. All
the exon-exon junctions were experimentally
determined by cDNA cloning and sequencing or direct
primer extension RNA sequencing. Transcripts from
the rpoB locus were characterized by Northern
hybridization. Fully-spliced, monocistronic rpoB mRNA,
as well as rpoB-rpoC1 and rpoB1-rpoC1-rpoC2 mRNAs
were identified.
INTRODUCTION
Chloroplast genes are transcribed, and the resulting mRNAs are
translated via plastid-specific RNA polymerase(s) and ribosomes,
respectively. The genes for tRNAs, rRNAs and several
messenger RNAs are chloroplast encoded (I, 2, 3, 4). In the
chloroplast of the unicellular protist Euglena gracilis, two
different RNA polymerase activities have been reported (4, 5,
6, 7). One of the polymerases is tightly bound to the chloroplast
DNA. This complex is known as the transcriptionally active
chromosome ('TAC'). A different RNA polymerase activity is
found in a soluble extract of the chloroplast ('soluble').
In E. coli the RNA polymerase is a protein complex composed
of four different polypeptides designated a (MW 37 000), (3 (MW
151 000), B' (MW 155 000) and a (MW 70 000) that are encoded
by the genes rpoA, rpoB, rpoC and rpoD, respectively. The
* To whom correspondence should be addressed
subunit composition is a-$(5'o (8). Eukaryotic RNA polymerase
genes coding for the largest and second largest subunits,
homologues to the E. coli /3 and /3'-subunits have been described
(9, 10, 11, 12). It has been suggested that the genes for the
chloroplast RNA polymerase are nuclear encoded (13).
Subsequently, the equivalent of the E. coli rpoA (14), rpoB and
rpoC (15) genes were reported in the spinach chloroplast genome.
These three genes have been also identified in tobacco (3),
liverwort (2), and rice (16). In chloroplasts, the rpoC-like genetic
information appears to be encoded in two genes, designated
rpoCl-rpoC2 (15).
We are interested in the relationship between chloroplast genes
for RNA polymerase subunits and the known chloroplast
polymerase activities. Antibodies against fusion proteins that
contained fragments of the chloroplast genes rpoA from spinach,
rpoB from tobacco, and rpoC2 from Euglena, were able to
immobilize a chloroplast RNA polymerase from spinach, pea and
Euglena gracilis (17). The antibodies also inhibited the 'soluble'
enzyme active in tRNA and mRNA synthesis but had almost no
effect on the activity of the E. gracilis 'TAC (17). It has been
recently shown (18) that fusion protein antibodies prepared against
the pea chloroplast rpoA gene product detect a 43 kDa polypeptide
in a chloroplast RNA polymerase preparation.
To better understand the differences in enzymatic activity of
Euglena chloroplast 'soluble' and 'TAC RNA polymerases, we
have characterized the Euglena chloroplast rpoB locus. This is
a first step toward identification of the corresponding enzyme
subunit in RNA polymerase preparations. We found that the rpoB
gene organization was so unusual that cDNA cloning and
sequencing was also required to determine the structure of the
mature mRNA. We present the complete nucleotide sequence
of the Euglena gracilis chloroplast rpoB gene, the upstream region
of the gene including 97 nucleotides of the 3'-end of the 5S rRNA
gene from the rrnC operon for orientation (19), the spacer region
between the rpoB and rpoCl genes and 67 nucleotides of the
5'-end of the rpoCl gene. The rpoB gene is interrupted by 8
introns. Intron 8 (309 nt) belongs to the Group II category. The
remaining 7 introns belong to a new class of introns, designated
as 'Group III' (20) similar to the small introns described in E.
gracilis tufA (21) and ribosomal protein genes (20,22). All the
exon-exon boundaries were experimentally determined by cDNA
1870 Nucleic Acids Research, Vol. 18, No. 7
primer extension sequencing of the mRNA, or by cDNA
synthesis, PCR amplification, cloning and sequencing of the
cDNA. The relatedness of Euglena gracilis chloroplast rpoB
product was evaluated by comparing its derived amino acid
sequence with the amino acid sequences of the E. coli 0-subunit
gene and the chloroplast homologues from tobacco, liverwort
and spinach chloroplast genomes using computer-assisted,
multiple sequence alignment algorithms.
MATERIALS AND METHODS
Materials
Enzymes, chemicals and [a35S]dATP were purchased from
BRL, (Gaithersburg, MD), NEN, Dupont, (Boston, MA), BIO
RAD, (Richmond, CA) and Sigma Chemical Company (St.
Louis, MO). Bluescript and Bluescribe (+) and ( - ) vectors were
obtained from Vector Cloning Systems, (San Diego, CA).
Sequenase kits were purchased from U.S. Biochemical Co.
(Cleveland, OH.)
DNA subcloning and exonuclease IIT/S1 deletions
Chloroplast DNA from Euglena gracilis Pringsheim, strain Z
was isolated according to the procedure described in (23).
Recombinant plasmid pPGl 1, containing the EcoRI restriction
fragment EcoF (Fig.l) (19), was used as a source of the 5.14
kb EcoRI-BamHI fragment of EcoF. This fragment was cloned
into Bluescript and Bluescribe (—) vectors using JM101 or
XLl-blue E. coli cells as hosts (24). The resulting recombinant
plasmids were designated pEZC931 and pEZC932, respectively.
The EcoRI fragment EcoR (Fig. 1) was subcloned from plasmid
pPG671 (22) into Bluescribe ( - ) in both orientations. The new
plasmids are designated pEZC929 and pEZC93O. A 2.0 kb
Hindm fragment, that overlaps the EcoF-EcoR (Fig. 1) junction
was isolated from HindHI-digested chloroplast DNA by agarose
gel electrophoresis, eluted from the agarose using GeneClean
(BIO 101, La Jolla, CA.), and cloned into Bluescript ( - ) in both
orientations. The new recombinant DNAs are designated
pEZC935 and pEZC936.
Plasmid DNAs were purified using a cleared lysis method (25)
and linearized with restriction enzymes for exonuclease HI/SI
digestions. Overlapping, unidirectional deletion subclones were
generated according to the procedure of Henikoff (26) with the
following three steps: (i) unidirectional 3'-exonuclease III
deletions into the chloroplast insert, (ii) SI nuclease digestion,
and (iii) intramolecular blunt end religation and transformation.
Single stranded template DNAs were prepared using the defective
phage M13KO7 as a helper phage (27), and sequenced using
[o^SJdATP and the dideoxy-chain termination method (28)
with the Klenow fragment of DNA polymerase I or Sequenase
(28).
DNA sequence analysis
Analysis of DNA sequence data was performed on IBM-PC/XT
and PC/AT computers using the DNA and protein analysis
programs of Mount and Conrad (30). The program FASTP (31)
was used for initial homology searches with the derived amino
acid sequence in the Protein Identification Resource (P.I.R.) of
the National Biomedical Research Foundation. A progressive
multiple alignment method (32, 33), was utilized to achieve the
multiple sequence alignments of the /3-like subunit amino acid
sequences on a DEC-Microvax 2 computer.
cDNA synthesis, DNA amplification and cDNA cloning
Chloroplasts were purified from cell lysates by differential
centrifugation and sucrose flotation (22). RNA was isolated by
resuspending the chloroplasts in lysis buffer (0.5% SDS, 10 mM
Tris-HCl pH 7.5, 1 mM EDTA, 5 mM DTT), extracting three
times with an equal volume of phenol (saturated with 10 mM
Tris-HCl pH 8.0, 1 mM EDTA), followed by two extractions
with chloroform-isoamyl alcohol (24:1), and collected by ethanol
precipitation. Before the RNA was utilized for cDNA synthesis,
the DNA was digested with RQ1 RNase-free DNase (Promega
Biotechnology, Madison,WI), followed by phenol and chloroform
extractions and ethanol precipitation as previously described.
Synthetic oligodeoxynucleotides for cDNA synthesis and
polymerase chain reactions (PCR)-amplification were purchased
from Promega Biotechnology. The primers 5'-Cl TlGAAGAAGTTCACC-3' (positions 2986-2970, Figs. 1 and 2) designated
Cl and 5'-GCTTTAATCTCTGAACCT-3' (positions
4651-4634, Figs. 1 and 2) designated C2, complementary to
exons 8 and 9 respectively, were used to perform two separate
cDNA synthesis reactions. The reactions containing 10 fig of
DNA-free RNA and 280 ng of the appropriate primer were
performed using a cDNA synthesis kit (BRL). The resulting
cDNAs were amplified by PCR using the Taq polymerase
(Perkin-Elmer Cetus, Emeryville, CA.). The reactions contained
the cDNA synthesis product from 5 fig of chloroplast RNA and
0.66 ng of a pair of cDNA and PCR primers (Cl-Pl and C2-P2
respectively). The oligodeoxynucleotide 5'<X3TTTGGTAGAAGAGTTAAG-3' (positions 1528-1548, exon 2, Figs. 1 and 2)
was designated PI and 5'-GCTTAGTTCCTTTTTTGG-3'
(positions 3622-3639, exon 9, Figs. 1 and 2) was designated
P2. The amplification cycle consisted of 1 min of denaturation
at 95°C, followed by 2 min of annealing at 45°C and 3 min of
polymerization at 72°C. Amplification was repeated for 30
cycles. Amplified DNA fragments were digested with SI nuclease
(70 units per /tg of DNA) to produce blunt ends, electrophoresed
through a 1% agarose gel, eluted from the agarose using
GeneClean and cloned into Smal digested Bluescript or Bluescribe
vectors. The cDNA clones were designated pEZClOOO (Cl-Pl
primers) and pEZClOOl (C2-P2 primers), respectively.
Analysis of rpoB transcripts
Whole cell RNA was isolated using aurintricarboxilic acid as a
nuclease inhibitor (34). For Northern analysis, 20 fig of total cell
RNA from photoautotrophically grown cells were electrophoresed
through 1.0% agarose gels containing 0.66 M formaldehyde. The
RNA was transferred to GeneScreen membranes (NEN, Dupont),
(35). RNAs of known size (BRL-RNA ladder) were used as
molecular weight standards. Hybridization probe was synthesized
using a plasmid DNA deletion clone of pECZ932. It was
linearized at the Seal restriction site at position 1176 and a
[32P]-labeled RNA transcript of 3.9 kb was synthesized using
T7 RNA polymerase and pEZC932 as a template (Promega,
Technical bulletin 002). The probe was complementary to exon
1 through the 5'-end of exon 9. Hybridization was carried out
in 50% formamide, 5x SSPE (SSPE: 0.18 M NaCl, 0.01 M
sodium-phosphate, 1 mM EDTA), 1 % SDS, 0.5 mg/ml Ficoll,
0.5 mg/ml polyvinyl pyrrolidone, 0.5 mg/ml BSA, and 100 /tg/ml
herring sperm DNA at 55°C for 24 hours. Following
hybridization, the filters were washed in 2 x SSC (SSC: 0.15
M NaCl, 0.15 M sodium-citrate) at room temperature for 15
minutes, two times in 2 x SSC, 2 % SDS at 65°C for 20 minutes
and once in 0.1 x SSC at room temperature for 15 minutes,
blotted dry and exposed to Kodak SB-5 X ray film.
Nucleic Acids Research, Vol. 18, No. 7 1871
rpoC1 3'
rpoB
5"
•II
C^
5S
23S
III
pi-.-c
CDNA,
cDNA,
R
F
EcoRI
Hdlll
BamHI
36
C
1
6
S
4
3
2
1
0 Kbp
Figure 1. Organization and partial restnction map for the rpoB coding locus. The 5S and 23S rRNA genes from the rmC locus arc included as reference (19).
Exons are shown as filled boxes and introns as open boxes. The transcription of the gene is from right to left. Relative positions and polarity of the synthetic oligodeoxynucleotkk primers used for cDNA cloning and for RNA sequencing are indicated by arrows. cDNA, and cDNA2 represent the cDNA-PCR amplified DNA products.
Restriction fragments (46) are labeled with letters or numbers between the restriction sites.
Primer extension RNA sequencing
The purified oligodeoxynucleotide primer 5'-CCCTTTTTTAAAAAGGAGCG-3'(positions 1530-1511, Fig 1), designated
C, and complementary to exon 2, was 5'-end labeled with T4
polynucleotide kinase (25). The primer extension sequencing
reactions (20) contained 15 /ig of chloroplast RNA and 2.0 X10*
dpm of 5'-end labeled oligodeoxynucleotide.
RESULTS
Molecular cloning, organization and genomk DNA sequence
of the rpoB gene
The region of the Euglena gracilis chloroplast genome containing
the rpoB gene was cloned as two different restriction fragments.
The 5.1 kb EcoRI-BamHI restriction fragment from EcoF (19)
and the EcoRI fragment EcoR (restriction map, Fig. 1). In order
to sequence across the EcoRI restriction site between EcoF and
EcoR (Fig. 1), the 2 kb Hindlll fragment, Hind36, was also
sequenced. The complete nucleotide sequence (5760 bp) of 100%
of both strands of the rpoB locus, including the upstream spacer
region, the 5S rRNA gene, and the 5'-end of the downstream
rpoCI gene was determined (Fig. 2). The sequence includes 97
bp of the 5S rRNA gene (19) followed by an intergenic spacer
of 1264 bp. The rpoB gene spans a region of 4248 bp. It is of
the same polarity as the ribosomal RNA operons (19) and the
rpoCI gene (C. Radebaugh, G. Yepiz-Plascencia and R.B.
HaJlick, manuscript in preparation). The overall gene organization
is shown in Fig. 1.
Cloning and sequencing of cDNAs
Identification of the exon-intron boundaries in the Euglena rpoB
gene was much more difficult than with other Euglena chloroplast
genes because of the relatively low conservation of amino acid
sequences among chloroplast and prokaryotic rpoB-like gene
products. Exons were initially identified via the FASTP search
algorithm, as encoding portions of polypeptides similar to other
rpoB polypeptides. Introns were detected as very AT rich
interruptions in putative protein coding regions that contained
in-frame termination codons. However, it was not possible to
accurately determine the exon-intron boundaries from the
genomic chloroplast DNA sequence alone. Therefore, synthetic
oligonucleotide primers were synthesized for PCR amplification
of specific rpoB cDNAs. The primers correspond to conserved
exon domains in the predicted rpoB polypeptide. The primers
Cl, complementary to the RNA sequence of exon 9, positions
2986—2970, and C2, complementary to exon 7, positions
4651 —4634 were used for cDNA synthesis. The resulting cDNA
reaction products were amplified by the PCR. For every
amplification reaction a pair of primers (Cl-Pl or C2-P2) and
the cDNA synthesis products were employed (see materials and
methods). A diagram of the cDNA-PCR amplified DNA
fragments is shown in Fig. 1.
The PCR-amplified DNA designated cDNA-1 is the product
of the amplification of rpoB mRNA from the 3'-end of exon 2
through the 5'-end of exon 8. It is a DNA fragment of 850 bp.
cDNA-2 is the product of the amplification from the 3'-end of
exon 8 to the 5'-end of exon 9. It is a DNA fragment of 720
bp. Both double stranded cDNAs fragments were cloned and
sequenced. The assignment of the splice boundaries for introns
2 through 8 is based on comparison of the cDNA and genomic
sequences. An example of the data from the genomic and cDNA
sequences used to determine the exon 5 and 6 splice boundaries
is shown in Fig. 3. Direct sequencing of the spliced mRNA
product, using a 20-nt primer complementary to exon 2, positions
1530-1511 (primer C), was employed to determine the sequence
at the exon 1-exon 2 junction (data not shown). Thus, all of the
splice boundaries for the rpoB introns were experimentally
determined. The rpoB gene is interrupted by seven small introns
of 93, 95, 94, 99, 101, 110, and 99 nt, and a larger intron of
309 nt beginning at positions 1400, 1720, 1898, 2105, 2423,
2564, 2816 and 4248, respectively (Figs. 1 and 2).
1872 Nucleic Acids Research, Vol. 18, No. 7
Ribosome binding sites
Within the 20 bases immediately preceding the start codon of
the rpoB gene is a sequence complementary to both the 3' end
of the 16S rRNA from Euglena chloroplast (5'CAACUCCC-OH
3') and the experimentally determined ribo-oligonucleotide
binding sequences (5'CUCCC-OH 3') for the small ribosomal
subunit of Euglena chloroplast ribosomes (36, 37). The sequence
5'GTGAG 3' (—8 to —3) differs by one base from the sequence
5'GGGAG 3' (complementary to 5'CUCCC-OH 3').
The E. gracilis rpoB gene product and its homology to
bacterial and chloroplast RNA polymerase /3-subunits
The E. gracilis rpoB gene spans a region of 4249 bp. The derived
amino acid sequence of the exons is shown in Fig. 2. The ATG
initiator codon for rpoB is at position 1363. The mature rpoB
mRNA has a minimum size of 3.2 kb. It encodes a polypeptide
of 1082 amino acids with a predicted molecular weight of
124,288. This predicted molecular weight is close to one of the
prominent polypeptides from the E. gracilis 'TAC RNA
polymerase (118,000) (4). Based on the Northern analysis
(described below and C. Radebaugh, G. Yepiz-Plascencia and
R.B. Hallick, manuscript in preparation), rpoB is the first gene
in the tricistronic rpoB-rpoCl-rpoC2 operon. The ATG initiator
codon for the rpoCl gene, lies 85 bp downstream of the rpoB
termination codon.
The FASTP algorithm was used to identify protein sequences
in the PIR data base with similarity to the Euglena rpoB gene
product. The only sequences that were selected with significant
similarity scores were RNA polymerase subunits from
chloroplast, bacteria and eukaryotic nuclei. A progressive
multiple alignment program (32) was then used to compare the
amino acid sequences of the selected bacterial, chloroplast and
eukaryotic /3-like subunits. The sequences are aligned
progressively, beginning with the most similar pair and continuing
with the addition of the next most similar sequence or set of
sequences. The E. gracilis chloroplast rpoB gene product was
aligned with the /3-subunit sequence of E.coli (38), and S.
typhimurium (39) RNA polymerase, and with the predicted
polypeptides from chloroplast genes of tobacco (40), spinach (15),
Marchantia polymorpha (2), the partial sequence from Saponaria
qfficinalis (41) and the homologous eukaryotic polypeptides from
the Drosophila melanogaster locus DmRP140 (12) and the
Saccharomyces cerevisiae locus RPB2 (11). The multiple protein
alignment is shown in Fig. 4.
We identified six highly conserved regions present in all the
rpoB gene products: region I (114 amino acids, positions 14-128
in £. gracilis), region U (86 amino acids, positions 358-444),
region in (46 amino acids, positions 518-564), region IV (152
amino acids, positions 667-819), region V (72 amino acids,
positions 823-895) and region VI (93 amino acids, positions
GGATCCACTTAAAACATTTCGAACTTGCAAGTTAAACATAAAGCCTAAATGGATACTTGGAAGGTTCCTTTCTGSGAAAAGCTTTTAGTGCCCTTATCGCCAGTTTATTTATTAATATTG
240
AGTATGTTTTTGATTGATTTTATGAATTTTGTCCATTTCTnGTTAAAGTAGTTAAAATCTTTAAAGTTTTGAIAAAAAATTTTTCTCTCAACAAGTTAAATAATAAAAAAACATCATGT
TTGAACTTTTTGTAAAGAAATTTTTAAGACAATCrTTGAATAAAGAAAAAGTGACICTTAAAAATTTGAACrCTAATACTTTTTTGGTTATGATTTTTTGTCAATTTAATAATTATTTAT
'80
ACTATTGCTTAAAATTTTTTCTAATATTGAGCTAGTT TAAACT TTGTTATTT TACTATAATACCATAAATTTGCCAAACTTTCCTATTACAAGAATTGAAGATTGTTATTTTTTCAAGTT
600
AGC^DWAAATTGTACGTCAATTTTGTTAAAAATACTTTAATTTTTTATGGTTTATTTTCTCTATTGTATGTGTAACACTTGACCTATTTATATTAGTAAAAATGGTTCTAATTTTATCA
720
AATTTCCATCTMTCUCTAMTATAGAATTAGAGCATCGTTTTCGT1ATAAATTTCAATATMTATTTMTGAAAAAATTTTTTTUTAAMGCGUATTTTTATTAAATGGTUGAAC
840
AAGTUTTTTTTTACAAA>MTATAMTTATTAAATTTTTAGAAAACTCCTATGATATGAAAAAAACAAGACAATAATrTTTTAGTATCAAATTTAATAAAGAGGCAAATTTTTCAATTT
960
TTAAAGTTTTGAGTTAATAAAAACAATTTGTTCATATAATACCATTTTAATTTTTTCAACTTTTTTGAAAAACTAATAGITTTATGTAGTGTAATATCAAAATTTATGTTATAAAAAAAC
1080
TATTATTTAACAAGAGTTCGTTTATTTTTTTCGAAAATTTATTTATTTAACTCAGCATCTAATTTGTCTTTAATAATGGAGAGTAATGAAACTAAAAGAGAAGTAAGCATTAACAGGTTG
1200
GTTGCAAACTTCCGTAAGAAAGCCTTTGAAAArAAAGGATTAAAAAAGCTCGTTCTTGGTTAATTAATTTTGGCCATr^UUMTAATGCGAAAAATAGTACTTCTGCAAAATTGTATAAAC
1320
ATCGAAGTAArTGTTTTAAAACATTATTTTTTTAAGGGAACTTTTTGTTTIATTAAATATTAAGGTTAGTTAATTTATATGCATTTAAAATTAATTTTAAAAAATCCATTATACTATTAA
1U0
TGTCAMTCAMGJUUCAGACTGTTTGMCTTTGTGAGAAATGGTAAATGGTTGTAGAGTACGCTCTAATTTATTAGATTTGTGAATTAATTCTTAACTTTGGTTTTATTTATTACAAAT
RP08
K V W G C B V R S N L L P
1S60
AAATAAATAATTTTTTTTGGTATGTTTATGTTAGATAJUMTTTTACT6ATGATACMCGAMTAGTTTTC6CTCCTTTTTAAAAMGGGTTTGGTAGAAGAGTTAACAAAAATAAAAGA
J O B N S F H S F L K K G L V E E L R K I K D
1660
TATAGCTCATOUGGCTTTAGCATUGCTTTCAJUtCAGATAATGTGAAATATAAAAAGCCTAAAATATCTGCIGMTTTGCATTGAAAAACGGGGAAACATACAGTTTArCTGTTCATAT
l A H E G F R I S F Q T D N V K T K K P K I S A E F A L K t t G E T Y S L S V M l
1800
ACCTGTTGAAGTTACCTATMTAATATGTTCCTTGTTAGGAWGATTTTTATTATTTTrCCTATTAATTTTATAATATATTTTTCATTATAATAATAAGTTTTTTTTTAGCAAAAAACAA
1«0
TTTATTATCTTATTAAJ>TAAATATATATTATTTGCAJUUU>TTCCTTTAATMCTGAAAA*GCTACGTTTATTTTTAACGCTAATAAAAGAATTATGGATGTGGTTTATATTATTACTTTT
N K T I
L F A K I P L M T E K G T F
I F N G H K R I t t
2040
AArTTTTTATTMTTATATTTATAAGTTAATTTTTTTTTAATTTTTAGGTTTGAGTTTTTTTAATTCATTTTTAACCAAATTATTCGTAGTCCTGGCGTTTATTTTGAAAAAAATCGTTA
V N 0 1
I S S P G V Y F f K P t G T
2160
TAATGATTCrJTATTTGCGACGTTWTACCAACTTTTGGTACTTGGTTAACTTTTAAAATTGJITGTGCGTTTTTAAATGTTATTTAGTAATTTGGATTTTTAAATTATATTTTrACATTA
N
O
S
I
F
A
T
l
l
P
T
F
G
S
W
l
T
F
K
t
D
2280
O D E G I F V K V D K I K T A I P L I M F L K C L G
2400
CCTATCCCGAA>AAAATTTTTTTTGTATCTT«CG*TCCTATTTTTAT«yUlACGTlACAAGAMGTGAGTCTTATGGTATTAGATTGGAATTTTTCGAATTTTATAAA#TATTTTTTCC
6TITTTTTAJJ^lj
AAACGAAGTTMTGTTCGTTT
M
E
V
H
V
R
F
26*0
AAAGGAAATGCCCGAAAATTTTTACGTTCAAAGTTTATGGATCTTGAGTTTATTCGT6TATTrTTTGTAATrTACAAAAGTTAAATTrTTT«TA&ArTATTTTAAAATTTTTTGATTTAT
G H A K K F I Q S K F H D
2760
TATTTTATG^GCATTATTAKCGAGATTAAACACGTAAATATGATTTGGSTGAAGTAtftK
P R K Y D L G E V G R
F
R
U
H
r
K
T GAGT T AAAGAAAGGUGGGAAAT AA T GAAGAA T AA r i
E
L
K
K
G
P
F
t)
E
1
ATTTATTATAAAAATTTTTTAGCAATAHTTATTAATGA1
i
r
R
S
E
F
F
O
S
H
f
l
T
2880
kTTTAAAAAACAAATTAGTACGTAGTATAGGTGAACTTCTTCAAAGTCAATTCCGAATTA
D
D
TTTTAAATGAATTAGAATCTAGTTTUAAGAAAAATTGA 'ATTTCTTTATAAAAATCCTTCT
L
:TTTTAG»TTATCTAGATTTTTTAATTCTTACTTTA.IAACTAATCGTATTC
TGTCTGAGCTTACACATAAAAGGAAGTTAAGICCCTTTGGTCCTAArGGTCTTAATAAAGAAAGAA
Nucleic Acids Research, Vol. 18, No. 7 1873
1360
ITrilTrilMMt6n«TIT«ATTTIATCITTA«CAAA<tATCTAA«AETTCATA
l E f l K H A Q L
I L S I A K D V R V D
34J0
AGTAICGAnnTACAAA£CCCimiMAMGTTTT&
SWmATTTTATTICTfCTKgCAOlAAAAATATTrTAUCTAGeTCCCTTTWTC
i l Y F I I S A O E K Y F T V A P F D
1600
TCTTTASAACTTCTCAAASTMTCTTTTAUTAAAAATAAGCTTCTACSTOTTAAAAGAASTAAAATTT
l D F I I I I I O
V F I t S O S I I l l S I C > I C I . l S V ( t l l t I I
3720
CAAAAnAOATSTAACAGAAATAAATACTACTCAATATCCAACCCTTTeTCCTATTCACAl
T ( L D V I E I * T S O r « I V C P I E
3840
nATAAAAACTCa^CSCCAfi6CrnMnAATCCACAAACT6AT6CCACCaTTTTAKAAAAACTACTCCAAJUtfTAATTTATTCTACnTAAAAAAAATAgnAnCAACACCAACAAT
F l l t l S I E A L i y i E I D A I V L A l I I C l l V I T J J L C f l V I Q t E E
3960
ATTCTenUTAn(^nniTAATAAAAAIIATTCTIT7TTTUnTATTttaBTaCTIAMiaUUTArCTaUUAAMTTTTIUUUTTATTAAAAAMtIAACCAAA«AAnTATT
T C A I I E F F I C a T t F F > L l G C L a E I I O t l F K L I ( ( V I O I ( i r
4080
TTTTAGAi^acCTAACAJUTCQAATCACCeTCTTTArATTCACAAUTTCCTATTCTTCATGAACGTGAfiTGSCTTAGAAAACCACAAATTATAGCTGATCQTATCAfiTACATTAAGAC
C C I
4320
ATATCAAAAAATTTAAAJurTTTTATTOTAAATCATGAAAAAAAACCTCTCCSAATTTAAATTTAaTTAATrnCAAAATTTAATTCTTAATnTAnnTCCTAAAnTTaTATAAATA
C
4560
AITCAAITAAAAinilTMIIIAAATUUAAAICIIIAIIITATIIAATAAAaCTAAAiaAITAIIACTTI/UAICTTIACITT&UAACOAAIATIAATTTCnAAilUIUtAAA
E
4680
ATATTUTAICTITATICCrUTCCAUITTAAAAACTATTAAAAACTTUAAAATAATaTAIIATIAAAATAnTTCASAWTTAAASUCAaUICTTITAATIGCAACAATTAAAS
V K L K a T P K U K N L l A F F G a K V R K D V S L l S P K I L V G I V T f V E
4920
TntATaiAAAAAtTCCAAITCTia^TCTTATIDITaTT(KTBMIAACarAM»TACAAAT/>aiMTAAAATTTC»lin»riinMiaTAATAAAMIATAATTTCaAAAATTirtTC
! L C K K S U C t V L I U V A C K > > i a i e O X I A e i t l i a H K C I I S I C I V
5040
CGICTATAfiATATCCCTTTTCTTCCTCATOQTACACCTaTTCATATOATTTTAAATCCTCTACCAATACCATCTAGAATCUTQrTQQTCAAGTCTTTQAAAaTTTfiTTAAACTTCTaT
P S I D H P F L P D G T P V D H I l l l P I . S I P ! » l i a V < 1 0 V F E I L I . I I I . I
5160
CITTATTTTTQAAACAAASeTATAAMTTCAACCTTTT(aTaAASTCCAAACAACTATCAATTCAAAATCTTTT0TCTATAAAAAATTAAAT6AACCCCCTAAAA6AACTAAAAAACATI
i l . F L K E I Y K i a P F D E V O T S I I I I I I C S F V V ( K I . > E A I K I I T I C I C D
5280
a^TTTTAATCCAAATTATCCAGflAAAACCCTTTTTATATQATMTACAAATTGTACaXTTnCATCACCCTBTACCTTTTOOTTATCCCTATATniAAAATTAATACATATCCTTA
U l F a P a Y P 0 K A F L Y 0 8 « « C « P F D « P V A F G Y A Y I L i : L l l f > I V
5*00
AAUaUAATTUTSCGAaUTTACaXCCCTIATTCITCAGTAACTUACAACCTTTAaTOlUAATCSAAAUTWTGCAI^UUUSTTTGGSCAM
( D K I H A I I V T e P r S S V i a 0 P l > Q I I I i : a S G a > F D E H E V U A I E
5520
OAtTIMTGC7aaTA7TIGTT(KAUaOITAriAA<:TATIAAATCTGAI«I8TAII/UWIABAICAawsa:iIATTTAGITTAATAAAIG«TACTTAITITTCTAAeCCAAAIAITC
C F 0 J « r i L O E L l . I I [ I D D V l « I I E A L F l L I l l C I T f J [ P I I I
5640
CKAAeCTTICAAGTIATITAIIIIAGAAATCKAATCTIIAItlAIIGAIAIIAAAATITrTACAAATAATTAIAAAAAAITCtATTAGITAAArTIIIGAtllATIIIITTIIIIAIGA
P E A F I L F
I L E H O S L C I D I K I
F T U K T t l C F D *
5760
AGTTTTTATaTUATTAAQACtTUTAnTATTTITATTTAITTTTBTATTTATGAAAaATTATQTaASAATAAAaATACnTaaXACAACAACTTTTAAOTTGCACCGAAAGATCTT
pcti
N K D Y V I I K I A I P O O V L I U T E I I L
Figure 2. Nucleotide sequence of the RNA-like strand of the E. gracilii chloroplast rpoB gene and flanking regions. The DNA sequence is given in the 5' to 3'
direction. The amino acid sequence as predicted from the nucleotide sequence using the universal genetic code is shown below the second nucleotide in each tnplet
codon. The intron splice boundaries are underlined. The first 97 nucleotides are the 3'-end of the 55 rRNA gene and the last 67 nucleotides are the 5'-end of the
rpoCl gene (see text).
940—1033) (Fig. 4). All the chloroplast sequences, including
Euglena, have small deletions at the amino termini and a larger
deletion centered at position 1000 when compared to the bacterial
genes. Particularly interesting is an insertion of 35 amino acids
only present in the Euglena gene at positions 598-633. This is
not a PCR artifact since the region was sequenced from two
independently isolated cDNA clones. It is flanked by a poorly
conserved upstream region, whereas, the downstream sequence
has a higher amino acid similarity (region IV, amino acids
641 -812 in Euglena). It corresponds to amino acids 756-931
in the E. coli polypeptide that presumably take part in the
formation of the enzyme-DNA binding site of the allosteric
regulation center responsible for the interaction with ppGpp (42).
In the E. gracilis polypeptide, the analogue to the E. coli Cys
764 residue (42), essential for the enzyme interaction with the
DNA template is substituted by Tyr 649, but two Cys residues,
Cys 616 and Cys 680 are present in close proximity. In the case
of the E. coli polypeptide, most of the mutations conferring
rifampicin and streptolydigin resistance have been localized to
amino acids 511 -576 (42). This region corresponds to the E.
gracilis amino acid residues 365-429, region II. It contains 27
identical amino acids and 24 conservative replacements. A region
of 118 residues (positions 932-1050) in E. coli, is substituted
by only 10 residues in E. gracilis (positions 813—823) and also
in the higher plant chloroplast homologues. This region was
shown to be redundant in E. coli, since its deletion did not
sufficiently influence enzymatic properties (42). Two of the most
highly conserved regions, corresponding to regions IV and V,
amino acids 796-819 and 861-877, in E. gracilis are
homologous to the E. coli domains involved in nucleoside
triphosphate binding (positions 1047-1070 and 1228-1244 in
the E. coli polypeptide). These domains have been identified in
E. coli by affinity labelling of the polypeptide with an initiating
substrate analogue. They contain the lysine residues and histidine
1237 that are situated in the nearest neighborhood to, or directly
involved in, the formation of the active center of initiating
substrate binding (43). The Euglena rpoB gene product has about
30% identity with homologous bacterial and chloroplast
polypeptides at the amino terminus and a higher conservation
of 48% at the carboxy terminus. The overall amino acid sequence
identities with the bacterial, chloroplast and eukaryotic genes are
summarized in table 1.
Expression of the chloroplast rpoB gene
Transcripts of 3.2, 4.7 and 7.7 kb were detected with the rpoBspecific probe. The probe contained exons 1 through 8,
approximately 60% of exon 9 and all of the introns. The probe
hybridized to transcripts of approximately 3.2, 4.7 and 7.7 kb
1874 Nucleic Acids Research, Vol. 18, No. 7
ct DNAs
sequence
cDNA
order is found in bacteria (38, 44) and in chloroplast from spinach
(15), tobacco (3), liverwort (2) rice (16) and Euglena gracilis.
The location of the Euglena chloroplast rpoB-rpoCl-rpoCl
operon distal to, and in the same polarity as, the ribosomal RNA
operon has some additional similarity to the arrangement of these
same genes in E. coli. The E. coli rpoB-rpoC genes are within
the rif cluster (44,45). The gene arrangement is rmB operon,
4 tRNA genes, tufB, rplK-rplA, rpU-rplL-rpoB-rpoC. The
chloroplast equivalent of rplK (rplll), rplA (rpll), rpLJ (rpllO),
and rplL (rpl7/rpll2) are all believed to be nuclear encoded in
plants. The juxtapositioninenomes.g of the Euglena RNA
polymerase operon distal to the ribosomal RNA operons is
perhaps noteworthy because of the overall similarity to the E.
coli rif cluster.
The large intercistronic region of 1.2 kbp between rrnC and
rpoB is at present unexplained. Large intercistronic regions are
uncommon in chloroplast genomes. In Euglena chloroplast DNA,
most intercistronic spacers are less than 100 bp length (46).
Although a protein coding locus for this region cannot be ruled
out at present, we do not find any open reading frames in this
region with similarity to known proteins in the P.I.R. data base.
In addition, transcripts from this region could not be detected
from either DNA strand by Northern blot hybridization (data not
shown).
sequence
..A
C
C
T
A
G
G
T
G
C
A
A
A
T
C
T
C
G
G
Figure 3. Chloroplast genomic and cDNA sequence analysis of the splice junctions
for introns 5 and 6. The DNA sequences for the genomic and cDNA clones are
given to the left of the lanes from which they are read. Two arrows point to
the 5' and 3'-splice junctions in the genomic sequence and converge at the exonexon junctions as a single arrow in the cDNA sequence. The sequence given
on the left side is the complement of the RNA-like strand. Lanes A,T,C,G, contain
the corresponding dideoxynucleotides.
that were present in all of the RNA samples (Fig. 5). Differences
in migration by the RNA samples are due to salt concentration.
The RNA species of 3.2 kb most likely represents the mature
fully spliced rpoB mRNA. This size is in very close agreement
with the minimum size message of 3.2 kb predicted from the
nucleotide sequence. An unspliced, rpoB pre-mRNA would be
4.2—4.3 kb in size. Transcripts of this size were not detected
in any of the RNA samples. In contrast, discrete transcripts of
4.7 and 7.7 kb, (larger than the region encoding the rpoB gene)
were detected (Fig. 5). The 4.7 and 7.7 kb transcripts are
interpreted as fully spliced di- and tri-cistronic mRNAs of rpoBrpoCl, rpoB-rpoCl-rpoC2 mRNAs, respectively.
DISCUSSION
Gene organization
The organization of the rpoB-rpoC (or rpoB-rpoCl-rpoCl)
operon has been conserved throughout evolution. The same gene
cDNA synthesis
It would not have been possible to predict the rpoB exon-intron
boundaries from the DNA sequence alone. Traditional techniques
for cDNA synthesis involving oligo-dT primers were not suitable,
since chloroplast mRNAs are not polyadenylated. The techniques
of cDNA cloning and PCR amplification proved to be very useful
for characterizing chloroplast RNA processing products. To our
knowledge, this represents one of the first examples of cloned
chloroplast cDNAs by PCR amplification. This approach should
have wide applicability to future studies on organelle RNA
maturation pathways.
Gene expression
The rpoB and rpoC genes are co-transcribed in E. coli (44). In
spinach and pea chloroplast it was not possible to detect cotranscription of rpoB-rpoCl-rpoC2, using Northern analysis and
[32P]-DNA-labeled probes (15, 47), although, it was concluded
from S1 mapping analysis that the genes were co-transcribed in
spinach (15). It was proposed that the failure to detect these
transcripts in pea was due to a very low abundance of the
transcripts or that the RNA polymerase genes were pseudogenes
(47). The 3.2, 4.7 and 7.7 kb transcripts detected via Northern
analysis with the rpoB gene-specific probe are much larger than
the predicted unspliced rpoB precursor mRNA.
Properties of the rpoB introns
The presence of introns in Euglena protein genes has been
extensively documented (46). The number, position and type of
introns appears to be distinct from the higher plant chloroplast
genomes. In particular, all the higher plant chloroplast rpoB genes
characterized thus far do not contain any introns ( 1 , 2 , 46).
Introns 1-7 of size 93, 95, 94, 99, 101, 110, and 99 nt
respectively, are of a category recently identified by Christopher
and Hallick (20) as 'Group IE'. The 309 nt intron 8 is a Group
II intron. Until recently most of the introns found in Euglena
protein genes were relatively large compared to the rpoB introns
(46). The rpoB group EQ introns most closely resemble the three
Nucleic Acids Research, Vol. 18, No. 7 1875
83
HLGDGBECUTIPBFIB)
n
CO.
eiritPtrac
HV--ncmff
IOFECFWJIW
I0FEG/CIFIDO
lofEtnaFim
LLDIQWIFtlFUDC
o
«.TEEL«cf«i«tTOQtiKFaLFvrrTom£mmc
a.TEEinCfhrJB>TO0EUMLFVtTTOLVtPtIKa
o.scriP(ffiiB)iMEFtFQiFffffnaA£MJ.)ca
ttvmttllCDIAHECf
lUtPanWWYK- -OTCItM
83
67
47
61
61
MinSPGVFFMOKOrTVtSS
168
E.C.
l.T.
region I
E.C.
t.T.
1.0.
K.I.
M.F*.
E.G.
--DOIT
" *0CHT
--11V--OIL--DCI-
V«CVTAVA^
i.o.
H.T.
H.P.
E.C.
V
E.C.
r
I.I.
p
S.O.
K.T.
N.P.
E.C.
RP140
csir»i«TEiviv»QL»w«vTHrtotaai«a 168
DAVTEtlT f O T L T V l A a 1UKTB---• ' - K m c Q T ( L i G M i n j * n a T F i v K i r a i v i n i i a s r a i r T t t t - - - u ) m 1*2
DA v r r n T T SSEITVIACL i uori - • •- •R0HQE0TIFlOIIPU«St.CTSIVItCIT1IVIIiaiL0Si*6irTitSC-"LDn8 142
DAVTOtllTtSOVYVPAOtTODat-- - •
136
FAiD«IYM.tVBIPVEVTTiai
136
tirnrTnTTWTiTnrTirTriHniirnrn rrrmrmn irirrnTurn n r
663
663
650
693
...KELPASIMSIVAILCTT
. . .KELPASQHAIVAIAat
ml
252
252
197
197
191
187
E.C.
l.T.
S.O.
B.T.
H P.
E G.
GEi ICAA1HE LSUM.UUasa«Uat [ EIL F
ICAAJK LSUHUUCLJCKJOOt IE T L F
—vtrreiFLSF
uocEkaai —
--vcYPiiFLtF
L ana m a n —
E C.
J.T.
J.O.
I T.
H.P.
E.G.
E.C.
S.T.
t.O.
H.P.
E.C.
--VCTF1CIFLEF
lOOTTDCET
-OflF
TH>U)«-PrittIIJrvOPTttRLtAlrtIYI»t«--PflEmt£A«UFE
ICHJW-F^
BSHM[LErTQOFACVeg>fVFgIlCgH.QtKfF-gC»CELaiICJUUIIMIUJfU>lP-ElWT
CS*IMIUFtOQFACVGCOFVFtt«iaCEl«0CFF-0QtaL«ICMlt«ItU(UlIP-QinrT
P»f$TEDA[VELTian.TCigm.FFttIIIKEL0OCFF-0Q«C£LaiQUjaJCDajlU*VP-Dttl
E.C.
l.T.
t.O.
N.T.
n.f.
E.Q
E.C.
S.T.
S.O.
I.T.
M.P.
E.G.
337
337
216
216
210
195
743
778
E.C.
S.T.
S.O.
H.T.
H.P.
E.O.
E.C.
S.T.
S.O.
N.T.
N.P.
E.G.
IPHO
499
499
362
362
337
353
LSLt^
891
891
enwEMviui..
QTHGrWIHN..
418
418
279
279
274
268
E.C.
S.T.
S.O.
H.T.
N.P.
E.O.
HP140
RPI2
(795-S20)
(091-916)
..nOOOMIIAPOlRVSCDOWlCnl.
..WODEDO.IAPCVtVSGEDVI ] « T T .
.FE«ll 976
a F S « I l 976
795
795
790
819
AVLVAMVEA*ni>ttFTOtVLELaTOEEm(MJCUtE^
1061
6OTM*
arm*
WAMTA
CTIlfVTIJOaEU 815
ETirvmantEU 815
QIIHITILOiOKIQ 810
ktCt
VLIHVAEKMIQ
(868-939)
(964-1035)...
834
CXirVttVRlPQ
VtVrrtTTICIM
rtglon V
E.C.
S.T.
S.O.
H.T.
K.P.
E.G.
IPUO
IPU
504
584
447
447
442
438
AAVKE
AAVKE
TTFEJ
TTTEJ
HTFCEI
HR1IKI
<446 490)
(511 555)
(719-7U)
(615-542)
TPOH0U.1AFFGlDCV--«niVS1.U«Sl.KiyiSVlILaaO
rvglon I I
E.C.
J.I.
f.O
N.T
H.P.
E.G.
KP140
742
742
604
604
599
60B
PtOtraiLUOtCPAfTPtOAAVCLVOOJU. HAYISV
PtOUOnirUCLVCPAETPE0aACGLVDtLSL MSCItV
CFLEIFT1wv--T0€VVTMlHYLUIEiC«TVIAttAHSML0£EMFVED--LVTasiCG£SSlFSttOV0YIl)VST0OVVSVCA
GFLETfT1<rv--V«VVTMlKTLUlEEG»TVI/iaAHS«U»€G»FVtD--L»TCtSI(aSS1.FS«tlOvtlTMJVSTOOVVSVCA
CSUSf^EI--S£HSatVQK.rLSnn)rma>SCHSlJU.nGIQCEO--VVfAKrtOEFLTIAUEOVVFniFSFQrFSIGA
GSiESPFYEl--S£JtSTGVKm.rLSnaD£YTMVAAGNSlJU.IIQ01QEEO--VVPARrtO€FLT[MJEOV11JtSIFPFOTFS[GA
CCUSPF«I--StLS«LUMHLSA*EMTniAT0l«lAUI0IISQEE0--lTPA«TrMDFVAIAUEOVBJtSlFPl.orFSWA
CFLESPFTiyi.ltaiETmGirFISJMtlCrFTVAPHJVF1tO«»UIlDlIUGVntSl(IFS1SFSD(IOFISISTDOFTSlCT
GS0MP1L...
GTWWM...
WDICVAaKaiKGnS«[LLWCl«nrLaCW»PVMIFMP1.CVP«IWVCaiFEC»LGlACt
VflOCVAMHQrtGMKILPIKBBfO'LOOasVWWFW'tfiWI.IOWVCOIFECJtGtASS
lGDtVAC«KaaOIlSI[LPWO«PFLO«npiWRSn.CVPS»«VCOIFEaLGlACJ
lC0KlAG«KareGIlKIVPtlW*»FLPOCTPVMIU(P1.GIPt»«VCOVFEn.LnSSt
IGDUASXKOQtQTCGlOTtQOHAFTCEGVAPDIIIHPHAIPSMTIGMlIEaOGKLO...
LLD
LLD
FLI
FLK
1146
879
879
874
898
tAmGAOVtOICVllLSTFSDCEVHnAEinJbUWIATttFlXAJCEAEICEU
1231
E.C.
S.T
HHTRIAP
FM
ltEOtAttU-VFSELTEAttOTWffVVF---EPETPaniFOaTGOPFEOfVIICI<FY 946
S.O.
H.T.
BHTRIAP
FM
BTEOtAOia-Vf HLrEAOCQTAJIPWVf ---EPtTPOC«IF0C«TOrPFE0PV1 IOCPT 946
H.P.
OifRIIP
FDC RtEtEAttia-VFttLTKAttKTT*^F---EPtWPaOil(LIDaTCE[FEOflTICKAT 941
965
E.G
(973-1049) ...OtEVMYMtDTTGaiClKAOVnCPTT
RP140
IF«2
(1068-11U) ...CfEVHTHCHTClOaAUIFFGrTr
665
665
528
528
523
523
rsion VI
E.C
l.T
i.o
».T
H.P
E.G
KUCUin.V0C«lIAI^ClTSlVTIX«>1.Oa«nFOMftSevUAUArGAArTL0tKTVI<SDl)VlliaTI0n)a
1316
hUOJn&VWIMmT0tmVTOOf^CCMaF«MFtBtril«UAYGAAYTiaDa.TV1ClI^^
1316
uaiiovMtiHCMs«jrrAivioct^«iauuraecotv8SCvuAi.EGF(jvA«aooaT«iwi[iauK«^eiiiiGCTiP> 1031
IUClII«M»<ll«SSO«AlVTG<JI>lJtGllA)tOtM«VO£JCVWlECfOVAI(ILOeJl.TrlCSOHlllAMEVl«ITIl^
1031
M.iaillGVDOt[«A»ISC^AiVTIXyUM»SMa««Vg)CVUUJG>tVAtlLQB«.T[aDHItA»TtVl.CAIVTGglPt 1026
i u a i i « v > » i u u m G n i t v T O < i P u n c t a c c o i F G O C v w i E E r e A A n L a c u T i a » v u n i C A i F s t . i i a n F s i ( 1049
TG1UUHVDHaiAtAJtGFKlV1.TtQPVEG«StOGajtFfi«EltlCHIAHGAA..
E.C.
l.T.
I.O.
H.T.
H.P.
E.C.
0€-PO(-PflF«V1.LI0imitGIHIEl.
E
D££PCM-PfSFirvUamsl.G!l(ieL
E
KEAPOAMSTO.LVMl.lBJU.EL«IFLVStI)(FOIlr«
ICEAfai*l>tlFIlLVlfl.imjU.EunFLVHDIFOm
ttAPHIAftSFn.lVttlKUU.tlireVlICIDn.ll-t
---PHI-PEAFiaFlLfJIOItCIOUIFTIOmacro---
1342
1343
1070
1070
1069
1082
Figure 4. Multiple alignment of the predicted polypeptide of the E. gradlis chloroplast rpoB gene with bacterial, chloroplast and eukaryotic homologues. The amino
acid sequences of the proteins from £. coH, E.C. (38), 5. typhimwium, S.T. (39), the chloroplast analogues from spinach, S.O. (15), tobacco, N.T. (40) and liverwort, M.P. (2). Also included are regions of strong amino acid similarity from Drosophila DmRPI40 (12) and yeast RPB2 (11). An asterisk (*) refers to identical
amino acids in bacteria and chloroplast polypeptides. A period (.) refers to conservative replacements of amino acids in chloroplast and bacterial sequences. Vertical
arrowheads indicate the location of the introns in the E. gradlis rpoB gene. The six highly conserved regions are underlined.
small introns reported for the tufA gene (21), the six small introns
found in the ribosomaJ proteins genes, rpl23, rps3 and rpsJ9 (22)
and four introns from rpsl4, rpU4, rps8 and rpU6 (20). With
the addition of the seven small rpoB introns, a list of 20 small
introns can now be assembled. In addition to the Euglena Group
III introns reported thus far (20), several Group III introns have
been found in the rpoCl-rpoCl operon (C. Radebaugh, G. YepizPlascencia and R.B. Hallick, manuscript in preparation), the
rps4-rpsll operon (J. Stevenson, R. Drager, K. Nelson and R.B.
Hallick, manuscript in preparation), and the rps2-atpI-atpH genes
(R. Drager and R.B. Hallick, unpublished observations). The
properties of the small rpoB introns that warrant their
classification in Group IE are as follows: 1) the introns are small
and uniform in size (93 — 111 nt), 2) they have degenerate
versions of the group II intron boundary sequences with the
consensus sequences 5'-NTNNG (N = nucleotide) and
ANNTNNNN-3' (table 2) and 3) they lack the conserved
secondary structure domains characteristic of group II introns.
1876 Nucleic Acids Research, Vol. 18, No. 7
Table 1. Overall amino acid similarities of the predicted E. gracilis rpoB
gene products with bacterial {E. coli and S. typhlmurlum) , chloroplast,
(spinach, tobacco and liverwort) and eukaryotic polypeptides.
Percent amlno acid identity
Gene
E. colt rpoB
38.0
S. typhimurium rpoB
37.7
Spinach rpoB
39.0
Tobacco rpoB
39.1
Liverwort rpoB
35.2
Drosophila DmRP140
22.7
Yeast RPB2
21.9
Table 2. Comparison of the intron-exon boundaries of the eight introns of the
E. gracilis rpoB gene.
exon
intron
intron
exon
Split Codons
rpoB-1
TAGAT
TTGTGAATTA
73NT
TTTACTG'ATG
ATACAA
NONE
rpoB-2
GTTAG
GAGAGATTTT
75NT
TTATCTTATT
AAATAA
AG-A"ARG
rpoB-3
TATGG-
ATGTGGTTTA
71NT
TAATTCATTT
TTAACC
G-TT~VAL
rpoB-4
TTGAT
GTGCGTTTTT
79NT
ATATTTGGGT
AAGACG
NONE
rpoB-5
GTTTT
TTGAGATTTT
81NT
TTATCTGAAA
GAAATG
NONE
rpoB-6
GGATC
TTGAGTTTAT
90NT
CGAGATTAAA
CACGTA
C-CA=PR0
rpoB-7
AAAGG
TTGGGAAATA
79NT
ATATTTTATT
AATGAT
GG-A~GLY
rpoB-8
AAGGT
GTGCGAATTT
289NT
TAAGTTTTAA
GAAAAT
NONE
Conserved
5 # . T . . G. . . . V
To date, group HI introns are only known in Euglena chloroplast
genes. Group HI introns appear to be found predominantly in
low abundance, constitutively expressed genes. Only the 309 nt
rpoB intron 8 is a Group II intron. The rpoB gene is not the only
Euglena gene that contains both Group II and Group HI introns.
Other examples are rps3, rpll6 and rps8 (20). The properties
of Group II introns include highly conserved 5'- and 3'-boundary
sequences, a minimum size of > 300 nt, and conservation of a
core structural feature. The boundary sequences for rpoB intron
5" . .A. .T
V
8, 5-'GTGCGA and AGTTTTAT-3', match the consensus
sequences for Group II intron splice boundaries (48). This intron
may be folded into a RNA secondary structure typical of group
II introns (48), with six helical domains radiating from a central
core. The fifth helix is a 14 base-paired stem-loop and the sixth
stem contains an unpaired adenine residue that has been proposed
to be a branch point for lariat formation (48). Helical domains
V and VI are the most diagnostic feature of Euglena chloroplast
Group II introns. A potential secondary structure for domains
Nucleic Acids Research, Vol. 18, No. 7 1877
Kb
ACKNOWLEDGEMENTS
1 2
9.49
7.46
rpoB-C1-C2
4.40 -I
rpoB-C1
We wish to thank Coco Whelchel for technical assistance during
the screening and characterization of the genomic clones, David
Christopher for helpful discussions during the course of the
experiments and our colleagues in the laboratory for critical
reading of the manuscript. This work was supported by a grant
to R.B.H. from the NTH.
rpoB
2.37
1.35
REFERENCES
0.24 -T
Figure 5. Analysis of the E. gradlis rpoB gene expression via Northern hybridization. Lanes: 1) 20 y.g photoautotrophic grown total cell RNA; 2) 20 ^g
heterotrophic grown total cell RNA. BRL's RNA ladder markers were used as
molecular weight standards (kb). Tncistronic, dicistronic and monocistronic
transcripts are indicated at the right of the figure.
VI
A
T
A
A
T
G
T
T
c
6
T
T
T
T
S
A
A
A
A
IA
A •
Figure 6. A potential secondary structure model proposed for the 3'-end of intron
8. Structures labeled domains V and VIresemblethe hypothetical Group II domains
five and six (48). The arrow indicates the 3'-splice junction. The asterisk marks
the potential unpaired adenine proposed to be the branch point for lariat formation.
The brackets enclose a base-paired region characteristic of domain V from Group
II introns.
V and VI for intron 8 of rpoB is shown in Fig. 6. It has been
suggested (20, 22) that the conserved nucleotides at the intron
boundary sequences could play an important role in the splicing
mechanism, and that perhaps Group III introns are a highly
degenerate version of Group II introns, representing the minimum
size required for correct splicing.
We are presently characterizing events in the rpoB mRNA
maturation pathway, and studying the relationship among the
RNA polymerase gene products, and the two different DNAdependent RNA polymerase activities present in the Euglena
chloroplast.
1. Ozeki.H., Ohyama.K., Inokuchi.H., Fukuzama.H., Kochi.H., Sano.T.,
Nakahigashi,K.,Umesono,K. (1987) In. Cold Spring Harbor Symposia on
Quantitative Biology, Vol LJI, Cold Spring Harbor Laboratory, pp. 791 -804.
2. Ohyama.K., Fukuzawa,H., Kohchi.T., Shirai, H., Sano,T., Sano.S.,
Umesono,K., Shiki.Y., Takeuchi.M., Chang.Z., Aota.S., Inokuchi.H.,
Ozeki.H. (1986) Nature 322:572-574.
3. Shinozaki.K., Ohme.M., Tanaka.M., Wakasugi.T., Hayashida.N.,
Matsubayashi.T., Zaita.N., ChungwongseJ., ObokataJ., YamaguchiShinozaki.K., Ohto,C, Torazawa.K., Meng.B.Y., Sugita.M., Deno.H.,
Kamogashira.T., Yamada,L., KusudaJ., Takaiwa.F., Kato.A., Todoh.N.,
Shimada.M., Sigiura M. (1986) EMBO J. 5:2043-2049.
4. Narita.J.O., Rushlow.K.E., Hallick.R.B. (1985) J. Biol. Chem.
260:11194-11199.
5. Greenberg.B.M., Narita.J.O., DeLuca-Flaherty.C.R., Hallick.R.B. (1985)
In Molecular Biology of the Photosynthetk Apparatus. Cold Spring Harbor
Laboratory, pp. 303-309.
6. Rushlow.K.E., Orozco.E.M., Upper.C, Halbck.R.B., (1980) J. Biol. Chem.
255:3786-3792.
7. Gruissem.W., Greenberg.B M., Zurawski,G., Prescott.D.M., Hallick.R.B.
(1983) Cell 35:815-828.
8. Burgess,R.R. (1976) In RNA Polymerase, ed by R.Losick and M.
Chamberlain. Cold Spring Harbor Press, pp 90-100.
9. Allison, L.A., Moyle,M., Shales.M., Ingles,C.J. (1985) Cell 42:599-610.
10. CordenJ.L., Cadena.D.L., Aheam.J.M., Dahmus.M.E. (1985) Proc. Natl.
Acad. Sci. U.S.A. 82:7934-7938.
11. Sweetser.D., Nonet.M., Young,R.A. (1987) Proc. Natl. Acad. Sci. U.S.A.
84:1192-11%.
12. Falkenburg,D.,Dwormczack,B., Faust.D., Bautz.E.K.F. (1987) J. Mol. Biol.
195:929-937.
13. Lerbs.S., Brautigam.E., Parthier.B. (1985) EMBO J. 4:1661-1666.
14. Sijben-MQller,G., Hallick.R.B., AltJ., Westhoff.P., Herrman.R.G. (1986)
Nucleic Acids Res. 14:1029-1044.
15. Hudson.G.S., Holton.T.A., Whitfeld.P.R., Bottomley,W. (1988) J. Mol.
Biol. 200:639-654.
16. HiratsukaJ., Shimada.H., Whrttier.R., Ishibasni.T., Sakamoto.M., Mon,M.,
Kondo.C, Honji.Y., Sun,C-R., Meng,B-Y., Li,Y-Q., Kanno.A.,
Nishizawa.Y., Hirai.A., Shinozaki.K., Sigiura.M. (1989) Mol. Gen. Genet.
217:185-194.
17. Little, M.C., Hallick.R.B. (1988) J. Biol. Chem. 263:14302-14307.
18. Purton.S., Gray,J. (1989) Mol. Gen. Genet. 217:77-84.
19. Karabin,G.D., Narita, J.O., DoddJ.R., Hallick.R.B. (1983) J. Biol. Chem.
258:14790-14796.
20. Christopher.D.A., Halltck.R.B. (1989) Nucleic Acids Res. in press.
21. Montandon, P.E., Knuchel-Aegerter.C, Stutz.E. (1987) Nucleic Acids Res.
15:7809-7822.
22. Christopher.D.A., Cushman.J.C, Price.C.A., Hallick.R.B. (1988) Curr.
Genet. 14:275-286.
23. Hallick.R.B., Richards,O.C, Gray.P.W. (1982) In Edelman.M.,
Hallick.R.B., Chua,N-H. (eds). Methods in Chloroplast Molecular Biology.
Elsevier Biomedical, New York, pp. 281-294.
24. Bullock.K.W., FernandezJ.M., Short.I.M. (1987) Biotechniques 5:376-379.
25. Maniatis.T., Fritsch.E.F., Sambrook^l. (1982) Molecular Cloning: A
laboratory Manual. Cold Spring Harbor, New York.
26. Henikoff.S. (1984) Gene 28:351-359.
27. VieriaJ., MessingJ. (1987) Method. Enzymol. 153:3-11.
28. Sanger.F., Nicklen.S., Coulson.A.R. (1977) Proc.NaU. Acad. Sci. U.S.A.
74:5463-5467.
29. Tabor,S., Richardson.C.C. (1987) Proc. Natl. Acad. Sci. U.S.A.
84:4767-4771.
30. Mount.D.W. and Conrad.B. (1986) Nucleic Acids Res. 14:443-454.
3L LipmanJ)J. and Pearson^WJi. 0985) Science 227-1435-144L
32. Feng.D.F. and Doolittle.R.F. (1987) J. Mol. Evol. 25:351-360.
33. Higgins,D.G. and Sharp.P.M. (1988) Gene 73:237-244.
1878 Nucleic Acids Research, Vol. 18, No. 7
34. Hallick.R.B., Chelm, B.K., Gray, P.W., Orozco, E.M. (1977) Nucleic Acids
Res. 4:3055-3064.
35. Foumey.R.M., Miyakoshi.J., Day HI.R.S., Patterson,M.C. (1987) Focus
10:5-7.
36. Graf.L., Roux.E., Stutz.E., Kossel.H. (1982) Nucleic Acid Res.
10:6369-6381.
37. Steege, D.A.,Graves,M.C, Spermulli.L.L. (1982) J. Biol. Chem.
257:10430-10439.
38. Ovchinnikov.Y.A., Monastyrskaya.G.S., Gubanov.V.V., Guryev.S.O.,
Chertov,O.Y., Modyanov.N.N., Gnnkevich.V.A., Makarova.I.A.,
Marchenko, T.V., Polovnikova.I.N., Lipkin.V.M., Sverlov.E.D. (1981)
Eur. J. Biochem. 116:621-629.
39. Sverlov.E.D., Lisjtyn.N.A., Guryer,S.O.,Monastyrskaya,G.S. (1986) DoklBiochem. Sect. (English tranls.) 287:232-236.
40. Ohme.M., Tanaka M., ChunwongseJ., Shinozaki.K., Sigiura.M. (1986)
FEBS Lett. 200:87-90.
41. Daru M., Benatti.L., Lorenzetti.R., Martini.D., Mingati.C., Sassano.M.,
Sidoli.A., Soria.M. (1988) Nucleic Acids Res. 16:3103.
42. Lisitsyn.N.A., Monastyrskaya.G.S., Sverdlov.E.D. (1988) Eur. J. Biochem.
117:363-3969.
43. Grachev,M.A.,
Lukhtanov.E.A.,Mustaev,A.A.,Zaychikov,E.F.,
Abdukayumov,M.N., Rabinov.I.V., Richter.V.I., Skoblov.Y.S.,
Chistyakov.P.G. (1989) Eur. J. Biochem. 180:577-585.
44. Nomura.M. and Morgan.E.A. (1977) Ann. Rev. Genet. 11:297-347.
45. Yamamoto.M. and Nomura.M. (1979) J. Bact 137:584-594.
46. Hallick,R.B. and Buetow.D.E. (1989) in The Biology of Euglena, Vol. IV
(Buetow,D.E., ed.), Academic Press, New York, pp. 315-414.
47. Woodbury.N.W., Roberts.L.L., PalmerJ.D., Thompson.W.F. (1988) Curr.
Genet. 14:75-89.
48. Michel,F. and Dujon.B. (1983) EMBO J. 2:33-38.