Download The complete nucleotide sequence of the tryptophan operon of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Point mutation wikipedia , lookup

Lac operon wikipedia , lookup

Gene regulatory network wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Transcript
Volume 9 Number 241981
N u c l e i c A c i d s Research
The complete nucleotide sequence of the tryptophan operon of Escherichia coli
C.Yanofsky1 T.Platt2, I.P.Crawford3, B.P.Nichols1, G.E.Christie2, H.Horowitz2, M.VanCleemput1
and A.M.Wu2
Department of Biological Sciences, Stanford University, Stanford, CA 94305, ^Department of
Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06510, and ^Department
of Microbiology, University of Iowa, Iowa City, 1A 52242, USA
Received 11 November 1981
ABSTRACT
The tryptophan (trp) operon of Escherichia coli has become the basic
reference structure for studies on tryptophan metabolism. Within the past
five years the application of recombinant DNA and sequencing methodologies
has permitted the characterization of the structural and functional elements
in this gene cluster at the molecular level. In this summary report we
present the complete nucleotide sequence for the five structural genes of the
trp operon of E. coli together with the internal and flanking regions of
regulatory information.
I.
INTRODUCTION AND OPERON ORGANIZATION
The pathway of tryptophan biosynthesis was the subject of some of the
earliest studies with biochemical mutants of micro-organisms.
Insight gained
in these investigations provided the foundation for extensive genetic and
biochemical analyses that established the genes, enzymes and reactions of
tryptophan biosynthesis as favored subjects for study.
Over the years the
tryptophan system has been used to investigate virtually every aspect of
amino acid metabolism, gene and operon structure, and gene function.
A schematic representation of the trp operon is given in Figure 1, with
the regulatory region preceding the first structural gene expanded below.
Within the promoter region (trp p_) is an operator site (trp o_) at which a
tryptophan-activated repressor protein can bind and regulate transcription
initiation (1,2).
Beyond the transcription initiation site
a transcribed
leader region (trpL) of 162bp contains a regulated site of transcription
termination, the attenuator (trp a) (3). Transcription into the five
structural genes may therefore be regulated at both the operator and
attenuator sites (4)(see below).
Those RNA polymerase molecules that
transcribe through the attenuator (depending on metabolic conditions)
generally continue to the end of the operon.
The full-length polycistronic
trp mRNA encoding the five trp polypeptides (5) is about 6800 nucleotides in
©IRL Press Umited. 1 Falconberg Court. London W 1 V 5FG. U.K.
6647
Nucleic Acids Research
trpE
p;trpL;
anthronilott
trpD
»yntrnto»e
onthroniloli
trpC
p2;
lynthatasi
PR anthronilatc itomcras*-
CompooertH
Component I
indoltglycirol
PR anthranilott transfcrast
t
+ glutamlnfl
+ PRPP
tryptophan
synttuloM
tryptophan
fi
tyntlwtast "
phosphate synthttatt
CdRP
choriimale—»anthrontlote^-*PR onthraniloti
f
trpA
trpB
»
IBGP — — » •
L-tryptophon
\
+L-ierino
promoter
trpL
attenuator
trpE
operator
\
transcription
start site
transcription
termination site
transcription
pause site
Figure 1. Organization of the structural and regulatory regions of the trp
operon of 15. coli. The 5 polypeptide products and the reactions they
catalyze are indicated. Nucleotides are numbered from the transcription
start site. The number preceding each gene corresponds to the A of the
respective translation start codon. For other definitions see text.
length.
Rho-dependent transcription termination occurs in the region
following the last gene, trpA (6).
In the following sections we will discuss
the nucleotide sequence of the operon in the context of structurally
significant features and biologically important functions.
A.
Regulatory regions
1.
The major promoter and operator regions: The sequence of the
promoter-operator
segment of the operon is presented in Figure 2.
Methylation-protection studies with the near-identical promoter-operator
region of Salmonella typhimurium have identified sites of close contact
between RNA polymerase and promoter, and trp repressor and operator (7).
Mutational analyses (1, 8, 9) summarized in the figure support the conclusion
that the -35 and -10 regions of the promoter play key roles in polymerase
recognition.
Binding-protection studies using promoter DNA and various
restriction enzymes also establish the importance of the -35 and -10 regions
6648
Nucleic Acids Research
-40
-30
-20
-10
»1
G C T G T T G A C A A T T A A T C A T C G A A C T A G T T A A C T A G T A C G C A A G
C G A C A A C T G T T A A T T A G T A G C T T G A T C A A T T G A T C A T G C S T T C
• 1
A A . . A G . . A A . . . G
G G . . G . . . G A . . A A .
. . G
. . . G . . G
G
methyln-Hon
G protectionrepressor
4i
G . . . G . . . G
G
G
C
G
^
T
A
G T
C A
^ ^
AC
T G
T
A
^
G
C
A T
T A
^
^
G C
C
G
mcthylation
protection-RNA
polyrocrase
promoter-down
mutations
A G
T C
^ ^
CT
G A
operatorconstitutive
mutations
Figure 2. The 40 bp preceding the transcription start site, their protection
by repressor or RNA polymerase, and mutational changes that affect promoter
or operator function. G or G indicates hypermethylation of this base.
to polymerase binding (9, 10). The TA base pair at -8 in particular appears
to be essential for efficient initiation (9). Some mutational changes in the
promoter region have little or no effect on promoter function, e.g., none of
the operator constitutive mutations listed in the figure, except TA-+CG at
position 13, alters promoter efficiency.
Deletion analyses indicate that the
region on the transcribed side of -5 may be replaced without appreciably
affecting promoter function.
Deletions extending to -8, however, essentially
eliminate promoter activity unless a TA base pair is introduced at bp -7 or
-8 (9). The sequence of the -35 region of the trp promoter is conserved in
several enteric species and resembles closely the consensus prokaryotic -35
sequence (11). The -10 region of the promoter also is highly conserved,
however this region of the trp promoter - also a segment of trp £ - does not
resemble the consensus -10 promoter sequence (12).
The repressor-binding site, the 20 bp receding the transcription start
site, is clearly within the promoter (2), explaining the in vitro observation
that trp repressor and RNA polymerase binding are mutually exclusive (13).
The sequence of the operator region is highly conserved in other
enterobacterial trp operons.
Binding sites for the trp repressor also exist
6649
Nucleic Acids Research
in the aroH (2, 12) and trpR promoters (2, 14) of E. coli; ten of the 18 bp
in the operators of these three operons are invariant (2).
2.
The minor internal promoter: A secondary promoter, trp p2, in the
distal portion of trpD, is responsible for 60-80% of the basal levels of
distal gene products (trpC, trpB, and trpA polypeptides) produced when E.
coli is grown with excess tryptophan, i.e. under conditions of maximum
repression (15). A comparison of the nucleotide sequence of the trp p2
region to the consensus promoter sequence of 12. coli (Figure 3) raises some
interesting questions (16). The best match overall between trp p2 and the
canonical region eliminates what is generally regarded as the "invariant T"
of the Pribnow box, but shifting register by two nucleotides to regain a T in
this position substantially reduces the consensus alignment.
The location of
the 51 end of the transcript does not help to discriminate between the two
possibilities (16). It is noteworthy that the codon changes necessary to
improve homology would have severe consequences for the polypeptide itself,
and that translation across this region must already accommodate a series of
codons that are rarely used in ji. coli (16, 17). We believe that these
observations reflect the dual constraints of amino acid sequence requirement
and promoter function on the sequence of this site.
The function of trp p2 in vivo has been postulated to be a mechanism to
aid the cell if it encounters a rapid shift from an environment with
plentiful tryptophan to one of severe tryptophan starvation (16, 18). The
appreciable levels of trpC, trpB and trpA polypeptides in cells grown with
excess tryptophan, and d_e novo synthesis of the tryptophan-deficient trpE and
tryptophan-poor trpD polypeptides when cells are shifted to a tryptophan-free
medium, would allow the bacterium to recover quickly from a decreased
capacity to synthesize the amino acid.
The precise physiological importance
of trp p2 remains unknown, but its preservation among numerous species of
enteric bacteria (19) argues that its function may be an important one.
TRP PI
-3S
-10
TTGACA
TATAATG
CCGTGACATTTTAACACGTTTGTTACAAGGTAAAGGCGACG*
3130
3156
T A T A A T G
-10
Figure 3. Minor i n t e r n a l promoter, t r p p2. Numbers are nucleotide positions
in the trpD segment of the operon. The arrow represents the s t a r t - p o i n t of
t r a n s c r i p t i o n from trp p2 in v i t r o . Consensus sequences for the -35 and -10
regions are shown (there are two a l t e r n a t i v e s for the -10 region - see t e x t ) .
6650
Nucleic Acids Research
3.
Leader-attenuator: The trp operon of E. coli is typical of many amino
acid biosynthetic operons in being regulated by attenuation—transcription
termination control at a site preceding the major structural genes of the
operon (3). The leader region of the trp operon (Figure 4) encodes a
14-residue peptide with adjacent tryptophan residues at positions 11 and 12
(3).
Gene fusion studies have shown that translation initiation does occur
at the leader peptide start codon j^n vivo (20, 21), but efforts to isolate
and identify the peptide have been unsuccessful.
The transcript of the
leader region can form alternate secondary structures (Figure 4) (3, 22, 23)
which are believed to determine whether transcription terminates or continues
through the attenuator.
Models of attenuation are based on the premise that one particular
secondary structure (C, Figure 4) signals transcription termination (3, 23,
24, 25). If tryptophan is abundant, translation of the leader peptide coding
region will be closely coupled to transcription of the leader segment of the
operon.
The translating ribosome will prevent formation of the proximal
MetLysAlallePheValleuLysGlyTrpTrpArgThrSer
20
<tO
80
60
100
120
140
160
ATGCGTAAAGCAATCAGATACCCAGCCCGCCTAATGAGCGGGCTTTTTTTTGAACAAAATTAGAGAATAACA
-Cc-r-^
G
C
G C'
U G
G C-90
C-G
i.y
C-G
0-A
u c
b-G
$•<>
U G
<?
t
V4
(A)
G-C
c-6
C-G
C-G-I3O
G-C
C-G, A
,C-G-
(C)
G
C
G
C
A
C
U
U
C
C
U
y-A'
A'u
U-A-MO
G-C
U C
G-C
"jO
A A UUUUUIHJUC-G C
G-C G
G-C G
G-C G
70-G'
C-G C
A...A C G,.
A
,C G-*
(B)
Figure 4. The leader region of the trp operon (transcript- equivalent strand)
and the amino acid sequence of the putative leader peptide. The three alternative RNA secondary structures that are believed to form are drawn below the
sequence.
6651
Nucleic Acids Research
secondary structures (A) and (B), permitting (C), and thus termination.
When
the cell is starved for tryptophan, however, the translating ribosome will
stall at one of the tandem Trp codons, preventing formation of structure (A)
but permitting alternate structure (B) to form.
This will preclude formation
of structure (C), and polymerase will transcribe through the attenuator site
into the structural genes.
If the ribosome fails to initiate translation soon after transcription
has occurred , e.g., due to a deficiency of fMet-tRNA, structure (A) would
form, and hence structure (C). Under these conditions termination would be
expected to be maximal.
In fact, when cells cannot initiate synthesis of the
leader peptide, termination is 5-fold greater than when they have an adequate
supply of tryptophan (25, 26). Thus ribosome location on the leader segment
of the transcript while this segment is being synthesized determines the
secondary structure of the transcript.
This in turn directs the transcribing
RNA polymerase to terminate or to continue transcription.
The pause in transcription observed at a site located between bp 50-90
(29,30) may insure that the newly-loaded ribosome can remain coupled to the
transcription complex.
Thus the two major factors to which attenuation
responds are the level of tryptophan in the cell (via charged tRNA
the functional efficiency of the translation apparatus.
) , and
Considerable iri vivo
and in vitro evidence exists for this model of transcription control among
many of the araino acid biosynthetic operons (3-6, 24-28).
4.
Termination at the end of the operon: The DNA following the last
structural gene in the trp operon encodes what appears to be a complex set of
transcription termination signals.
Iji vivo, the mRNA transcribed from the
trp operon ends efficiently at a site called trp t_, 36 bp past trpA (Figure
5) (31). The 3' terminal sequence CAUUUU_
is immediately preceded by a
GC-rich region of dyad symmetry in the template, which allows formation of
the RNA hairpin characteristic of most prokaryotic terminators (11).
Termination in this region can be affected by lesions in RNA polymerase, rho
factor, and in DNA itself (6, 32-35).
Analysis of mutations of the latter
class gave surprising results - several deletions which relieved termination
had endpoints that were distal to the point corresponding to the normal 31
end of the transcript (33).
Transcription studies jji vitro revealed that upon addition of rho factor,
polymerase terminates at a second site, trp t/ , located about 250 nucleotides
downstream from the first (Figure 5) (34). This site resides in a region of
the genome removed by all eight of the distal deletions that have been
6652
Nucleic Acids Research
6690
67)0
6730
6750
6770
TAATCCCACAGCCGCCAGTTCCGCTGGCGGCATTTTAACTTTCTTTAATGAAGCCGGAAAAATCCTAAATTCATTTAATATTTATCTTTT
6790
6810
6830
6850
6870
6890
6910
6930
6950
CACCAGTAAAATCAATAATTTTCTCTAAGTCACTTATTCCTCAGGTCCTTGTTAATATATCCAGAATGTTCCTCAAAATATATTTTCCCT
6970
6990
7010
CTATCTTCTCGTTGCGCTTAATTTGACTAATTCTCATTAGCGACTAATTTTAATGAGTGTCGAC
C
U
C
UG
G-C
A-U
C-G -6710
C-G
G-C
C-G
C-G
G-C
6694- A
AUUUU
6989
UUAAUUUGACUAAUUCUCAUU
(t'l
It)
Figure 5. Termination region at the end of the trp operon. The sequence
begins with the stop codon at the end of trpA and is shown through the Sal I
site. Left - RNA secondary structure corresponding to trp t^; the transcript
terminates at the fourth U. Right - arrows indicate the sites of
rho-dependent termination at trp t^' .
examined in detail, and its presence explains the failure of the initial
selection to detect mutations in the trp t_ region alone (read-through
transcription would be halted at trp t^' with high efficiency) . We do not yet
understand how the removal of only the distal site (trp t/ ) can affect the
behavior of RNA polymerase at the intact proximal site (trp t) iji vivo.
Nor
is it clear why rho factor is required for producing the correct 3' terminus
at trp t_ iji vivo, yet ^n vitro the presence of rho has no effect on
termination at this first site.
The explanation may lie in structural
interactions within the DNA, as well as in the participation of additional
factors in the cell.
Although the responses of the two sites are qualitatively different in
vitro (34), both appear to be required for correct function rn vivo . The
primary structure of the trp t/ region (Figure 5) is also highly unusual,
bearing little similarity to a normal termination site.
Although it is very
AT-rich, it is notable for its lack of other defining features, in particular
6653
Nucleic Acids Research
the absence of s i g n i f i c a n t symmetry elements. Enhanced efficiency and
s p e c i f i c i t y of termination at trp t_' i s observed when rho i s used in the
presence of nusA protein (6), but the contribution of this l a t t e r factor i s
only beginning t o be studied. It seems likely that control of t r a n s c r i p t i o n
termination a t the end of the operon i s more i n t r i c a t e than previously
thought.
B.
Structural genes and punctuation
1. The coding region for the five polypeptides: The nucleotide sequences
of the 5 s t r u c t u r a l genes of the operon (18, 36-40) and t h e i r i n t e r c i s t r o n i c
regions are presented in Figure 6. The deduced amino acid sequences are also
indicated in the f i g u r e . [Codon usage in the 5 s t r u c t u r a l genes and amino
acid sequence comparisons are discussed in section I I ] . The five s t r u c t u r a l
genes of the operon code for two bifunctional polypeptides and two pairs of
polypeptides that form tetrameric enzyme complexes. The bifunctional trpD
polypeptide c o n s i s t s of a glutamine amidotransferase domain, designated the
trpG portion by analogy to the monofunctional trpG polypeptide of other
organisms, followed by an anthranilate phosphoribosy1 transferase or trpD
domain (Figure 1) (41, 4 2 ) . Similarly the trpC polypeptide has two domains,
the amino terminal half corresponding to the monofunctional trpC polypeptide
of other organisms and catalyzing the indole-3-glycerol phosphate synthetase
reaction, and the d i s t a l half analogous to trpF polypeptides and catalyzing
the isomerization of phosphoribosylanthranilate (42-44). The trpE and trpD
polypeptides form a tetrameric functional complex (45,46) that catalyzes the
r e a c t i o n s : chorismate + glutamine ->. a n t h r a n i l a t e and anthranilate + PRPP •*
phosphoribosylanthranilate. The trpB and trpA polypeptides similarly are
components of an ao Bo c o m p l e x t h a t catalyzes the reaction:
indole-3-glycerol phosphate + L-serine -+ L-tryptophan (47, 48). The order of
the s t r u c t u r a l genes and their domain-encoding segments resembles the
sequential order of the functions of the corresponding polypeptide domains in
tryptophan synthesis with the exceptions that the trpC region precedes the
trpF region, and trpB precedes trpA. It i s not known whether trp £2 might be
responsible for the gene order t h a t has evolved in the e n t e r i c b a c t e r i a .
2. Ribosome binding sites and i n t e r c i s t r o n i c regions: The six known
ribosome binding s i t e s in the operon are i l l u s t r a t e d in Figure 7. In a l l
cases the i n i t i a t i n g AUG codon i s immediately preceded by a pyrimidine-purine
pair, and a l l possess a purine rich sequence complementary to the 3' end of
16S rRNA, as expected. However, these regions are more extensive than
6654
Nucleic Acids Research
TRP E
170
ATG CAA AC*
MET GIN THR
1
260
GAT CGT CCG
ASP ARG PRO
ISO
190
CAA AAA CCG ACT CTC G»A CTG CTA
GLN LYS PRO THS LEU GLU LEU LEU
10
270
2QO
GCA ACG CTG CTG CTG GAA TCC GCA
ALA THR LEU LEU LEU GLU SER ALA
200
210
220
2J0
240
250
ACC TGC GAA GGC GCT TAT CGC GAC AAT CCC ACC GCG CTT TTT CAC CAG TTG TGT GGG
THR CYS GLU GLY ALA TYR ARG ASP ASM PRO THR ALA LEU PHE H I S GLN LEU CYS GLY
20
10
290
300
310
320
330
340
GAT ATC GAC AGC AAA GAT GAT TTA AAA AGC CTG CTG CTG GTA GAC AGT GCG CTG CGC
ASP I L E ASP SER LYS ASP ASP LEU LYS SER LEU LEU LEU VAL ASP SCR ALA LEU ARG
40
50
60
350
360
370
380
390
GOO
410
420
430
ATT ACA GCT TTA GGT GAC ACT GTC ACA ATC CAG GCA CTT TCC GGC AAC GGC GAA GCC CTC CTG GCA CTA CTG GAT AAC GCC CTG CCT GCG
ILE THR ALA LEU GLY ASP THR VAL THR I L E GLN ALA LEU SER GLY ASM GLY GLU ALA LEU LEU ALA LEU LEU ASP ASN ALA LEU PRO ALA
70
60
90
WO
$50
460
470
480
490
500
510
520
GGT GTG GAA AGT GAA CAA TCA CCA AAC TGC CGT GTG CTG CGC TTC CCC CCT GTC AGT CCA CTG CTG GAT GAA GAC GCC CGC TTA TGC TCC
GLY VAL GLU SER GLU GLN SER PRO ASN CYS ARG VAL LEU ARG PKE PRO PRO VAL SER PRO LEU LEU ASP GLU ASP ALA APG LEU CYS SER
100
tlO
120
530
540
550
560
570
560
590
600
610
CTT TCG GTT TTT GAC GCT TTC CGT TTA TTG CAG AAT CTG TTG AAT GTA CCS AAG GAA GAA CGA GAA GCC ATG TTC TTC AGC GGC CTG TTC
LEU SER VAL PHE ASP ALA PHE ARG LEU LEU GLN ASN LEU LEU ASN VAL PRO LYS GLU GLU ARG GLU ALA HET PKE PHE SER GLY LEU PHE
130
140
150
'20
630
640
650
660
670
6S0
690
700
TCT TAT GAC CTT GTG GCG GGA TTT GAA GAT TTA CCG CAA CTG TCA GCG GAA AAT AAC TGC CCT GAT TTC TGT TTT TAT CTC GCT GAA ACG
SER TYR ASP LEU VAL ALA GLY PHE GLU ASP LEU PRO GLN LEU SER ALA GLU ASN ASN CYS PRO ASP PHE CYS PHE TYH LEU ALA GLU THR
160
170
180
710
720
730
740
750
760
770
760
790
CTG ATG GTG ATT GAC CAT CAG AAA AAA AGC ACC CGT ATT CAG GCC AGC CTG TTT GCT CCG AAT GAA GAA GAA AAA CAA CGT CTC ACT GCT
LEU HET VAl ILE ASP HIS GLN LYS LYS SER THS ARG ILE GLN ALA SER LEU PHE ALA PRO ASN GLU GLU GLU LYS GLN ARG LEU THR ALA
190
200
210
800
810
820
630
640
650
860
670
880
CGC CTG AAC GAA CTA CGT CAG CAA CTG ACC GAA GCC GCG CCG CCG CTG CCA GTG GTT TCC GTG CCG CAT ATG CGT TGT GAA TGT AAT CAG
ARG LEU ASN GLU LEU ARG GIN GLN LEU THR GLU ALA ALA PRO PRO LEU PRO VAL VAL SER VAL PRO HIS MET ARG CYS GIU CYS ASN GLN
220
230
240
690
900
910
920
930
940
950
960
970
AGC GAT GAA GAG TTC GGT GGC GTA GTG CGT TTG TTG CAA AAA GCG ATT CGC GCT GGA GAA ATT TTC CAG GTG GTG CCA TCT CGC CGT TTC
5ER ASP GLU GLU PHE GLY GLY VAL VAL ARG LEU LEU GLN LYS ALA H E ARG ALA GLY GLU ILE PHE GLN VAL VAL PRO SER ARG ARG PHE
250
260
270
980
990
1000
1010
1020
1030
1040
1050
1060
TCT CTG CCC TGC CCG TCA CCG CTG GCG GCC TAT TAC GTG CTG AAA AAG AGT AAT CCC AGC CCG TAC ATG TTT TTT ATG CAG GAT AAT GAT
5ER LEU PRO CYS PRO SER PRO LEU ALA ALA TYR TYR VAL,LEU LYS LYS SER ASN PRO SER PRO TYP MET PHE PHE MET GLH ASP ASN ASP
280
290
300
1070
1080
1090
1100
1110
1120
1130
1140
1150
TTC ACC CTA TTT GGC GCG TCG CCG GAA AGC TCG CTC AAG TAT GAT GCC ACC AGC CGC CAG ATT GAG ATC TAC CCG ATT GCC GGA ACA CGC
PHE THR LEU PHE GLY ALA SER PRO GLU SER SER LEU LYS TYR ASP ALA TKR SER ARG GLN ILE GLU ILE TYR PRO ILE ALA GLY THR ARG
310
JCO
330
"60
1170
1180
1190
1200
1210
1220
1230
1240
CCA CGC GGT CGT CGC GCC GAT GGT TCA CTG GAC AGA GAT CTC GAC AGC CGT ATT GAA CTG GAA ATG CGT ACC GAT CAT AAA GAG CTG TCT
PRO ARG GLY ARG ARG ALA ASP GLY SER LEU ASP ARG ASP LEU ASP SER ARG ILE GLU LEU GLU HET ARG THR ASP HIS LYS GLU LEU SER
340
350
360
1250
1260
1270
1280
1290
1300
1310
1320
1330
GAA CAT CTG ATG CTG GTT GAT CTC GCC CGT AAT GAT CTG GCA CGC ATT TGC ACC CCC GGC AGC CGC TAC GTC GCC GAT CTC ACC AAA GTT
GLU HIS LEU MET LEU VAL ASP LEU ALA ARG ASN ASP LEU ALA ARG ILE CYS THR PRO GLY SER ARG TYR VAL ALA ASP LEU THR LYS VAL
370
300
390
1340
1350
1360
1370
1380
1390
1400
1410
1420
GAC CGT TAT TCC TAT GTG ATG CAC CTC GTC TCT CGC GTA GTC GGC GAA CTG CGT CAC GAT CTT GAC GCC CTG CAC GCT TAT CGC GCC TGT
ASP ARG TYR SER TYR VAL MET HIS LEU VAL SER ARG VAL VAL GLY GLU LEU ARG HIS ASP LEU ASP ALA LEU HIS ALA TYR ARG ALA CYS
400
410
420
1430
1440
1450
1460
1470
1460
t490
1500
1510
ATG AAT ATG GGG ACG TTA AGC GGT GCG CCG AAA GTA CGC GCT ATG CAG TTA ATT GCC GAG GCG GAA GGT CGT CGC CGC GGC AGC TAC GGC
MET ASN (1ET GLY THR LEU SER GLY ALA PRO LYS VAL ARG ALA MET GLN LEU ILE ALA GLU ALA GLU GLY ARG ARG ARG GLY SER TYR GLY
430
440
450
1520
1530
1540
1550
1560
1570
1580
1590
1600
GGC GCG GTA GGT TAT TTC ACC GCG CAT GGC GAT CTC GAC ACC TGC ATT GTG ATC CGC TCG GCG CTG GTG GAA AAC GGT ATC GCC ACC GIG
GLY ALA VAL GLY TVR PHE THR ALA HIS GLY ASP LEU ASP THR CYS ILE VAL ILE ARG SER ALA LEU VAL GLU ASN GLY ILE ALA THR VAL
460
470
480
1610
1620
1630
1640
1650
1660
1670
1680
1690
CAA GCG GGT GCT GGT GTA GTC CTT GAT TCT GTT CCG CAG TCG GAA GCC GAC GAA ACC CGT AAC AAA GCC CGC GCT GTA CTG CGC GCT ATT
GLN ALA GLY ALA GLY VAL VAL LEU ASP SER VAL PRO GLN SER GLU ALA ASP GIU THR ARG ASN LYS ALA ARG ALA VAL LEU ARG ALA U E
490
500
510
1700
1710
1720
GCC ACC GCG CAT CAT GCA CAG GAG ACT TTC TG
ALA THR ALA HIS HIS ALA GLN GLU THR PHE END
520
Figure 6(i)
required for b a s e - p a i r i n g to 16S RNA.
A consensus sequence A-G-G-Pu-Pu-A
occurs four t o seven n u c l e o t i d e s before the i n i t i a t o r codon, and only 3 out
of 36 n u c l e o t i d e s in the s i x s i t e s d i f f e r
from t h i s p a t t e r n ( 4 9 ) .
The e x t e n t
of s t r i c t b a s e - p a i r i n g between mRNA and rRNA would vary over a wide range
among these s i t e s , but other f a c t o r s including t r a n s l a t i o n of the preceding
6655
Nucleic Acids Research
TRP 0
1730
ATG CCT
MET ALA
1
1820
TAC CGC
TYR ARG
1740
1750
1760
1770
1780
1790
1800
1610
GAC »TT CTG CTG CTC GAT »AT ATC GAC TCT TTT »CG TAC AAC CTG GCA GAT CAG TTG CGC AGC AAT GGG CAT AAC GTG GTG ATT
ASP ILE LEU LEU LEU ASP ASN ILE ASP SER PHE THR TrR A5N LEU ALA ASP GLN LEU ARG SEP. ASN GLr HIS ASN VAL VAL ILE
10
10
30
1830
1840
1850
I860
1870
1880
1890
1900
AAC CAT ATA CCG GCG CAA ACC TTA ATT GAA CGC TTG GCG ACC ATG ACT AAT CCG GTG CTG ATG CTT TCT CCT GGC CCC GGT GTG
ASN HIS ILE PRO ALA GLN THR LEU ILE GLU ARG LEU ALA THR MET SER ASN PRO VAL LEU MET LEU SER PRO GLY PRO GLY VAL
40
50
60
1910
1920
1930
1910
1950
1960
1970
1900
1990
CCG AGC GAA GCC GGT TGT ATG CCG GAA CTC CTC ACC CGC TTG CGT GGC AAG CTG CCC ATT ATT GGC ATT TGC CTC GGA CAT CAG GCG ATT
PRO SER GLU ALA GLY CYS MET PRO GLU LEU LEU THR ARG LEU ARG GLY LYS LEU PRO ILE ILE GLY ILE CYS LEU GLY HIS GLN ALA ILE
70
80
90
£000
2010
2020
2030
2040
2050
2060
2070
2080
GTC GAA GCT TAC GGG GGC TAT GTC GGT CAG GCG GGC GAA ATT CTC CAC GGT AAA GCC TCC AGC ATT GAA CAT GAC GGT CAG GCG ATG TTT
VAL GLU ALA TYR GLY GLY TYR VAL GLY GLN ALA GLY GLU H E LEU HIS GLY LYS ALA SER SER ILE GLU HIS ASP GLY GLN ALA HET PHE
100
110
120
2090
2100
2110
2120
2130
2140
2150
2160
2170
GCC GGA TTA ACA AAC CCG CTG CCG GTG GCG CGT TAT CAC TCG CTG GTT GGC AGT AAC ATT CCG GCC GGT TTA ACC ATC AAC GCC CAT TTT
ALA GLY LEU THR ASN PRO LEU PRO VAL ALA ARG TVR HIS SER LEU VAL GLY SER ASN ILE PRO ALA GLY LEU 1HR ILE ASN ALA HIS PHE
130
140
150
2180
2190
£200
2210
2220
2230
2240
2250
2260
AAT GGC ATG GTG ATG GCA GTA CGT CAC GAT GCG G«T CGC GTT TGT GGA TTC CAG TTC CAT CCG GAA TCC ATT CTC ACC ACC CAG GGC GCT
ASN GLY MET VAL MET ALA VAL ARG HIS ASP ALA ASP ARG VAL CYS GLY PHE GLN PHE HIS PRO GLU SER ILE LEU THR THR GLN GLY ALA
160
170
180
2270
2280
2290
2300
2310
2320
2330
2340
2350
CGC CTG CTG GAA CAA ACG CTG GCC TGG GCG CAG CAT AAA CTA GAG CCA GCC AAC ACG CTG CAA CCG ATT CTG GAA AAA CTG TAT CAG GCG
ARG LEU LEU GLU GLN THR LEU ALA TRP ALA GLH HIS LYS LEU GLU PRO ALA ASN THH LEU GLN PRO ILE LEU GLU LYS LEU TVR GLN ALA
190
200
210
2360
2370
2380
2390
2400
2410
2420
2430
2440
CAG ACG CTT AGC CAA CAA GAA AGC CAC CAG CTG TTT TCA GCG GTG GTG CGT GGC GAG CTG AAG CCG GAA CAA CTG GCG GCG GCG CTG GTG
GLN THR LEU SER GLN GLN GLU SER HIS GLN LEU PHE SER ALA VAL VAL ARG GLY GLU LEU LYS PRO GLU GLN LEU ALA ALA ALA LEU VAL
220
230
240
2450
2460
2470
2480
2490
2500
2510
2520
2530
AGC ATG AAA ATT CGC GGT GAG CAC CCG AAC GAG ATC GCC GGG GCA GCA ACC GCG CTA CTG GAA AAC GCA GCG CCG TTC CCG CGC CCG GAT
SER MET LYS ILE ARG GLY GLU HIS PRO ASN GLU ILE ALA GLY ALA ALA THR ALA LEU LEU GLU ASN ALA ALA PRO PHE PRO ARG PRO ASP
250
260
270
2540
2550
2560
2570
2560
2590
2600
2610
2620
TAT CTG TTT GCT GAT ATC GTC GGT ACT GGC GGT GAC GGC AGC AAC AGT ATC AAT ATT TCT ACC GCC AGT GCG TTT GTC GCC GCG GCC TGT
TYR LEU PHE ALA ASP ILE VAL GLY THR GLY GLY ASP GLY SER ASN SER ILE ASN ILE SER TKR ALA SER ALA PHE VAL ALA ALA ALA ;Y5
280
290
300
2630
2640
2650
2660
2670
2680
2690
2700
2710
GGG C7G AAA GTG GCG AAA CAC GGC AAC CGT AGC GTC TCC AGT AAA TCT GGT TCG TCC GAT CTG CTG GCG GCG TTC GGT ATT AAT CTT GAT
GLY LEU LYS VAL ALA LYS HIS GL V ASN ARG SER VAL SER SER LYS SER GLY SER SER ASP LEU LEU ALA ALA PHE GLY ILE ASN LEU ASP
310
320
330
2720
2730
2740
2750
2760
2770
2760
2790
2800
ATG AAC GCC GAT AAA TCG CGC CAG GCG CTG GAT GAG TTA GGT GTA TGT TTC CTC TTT GCG CCG AAG TAT CAC ACC GGA TTC CGC CAC GCG
MET ASN ALA ASP LVS SER ARG GLN ALA LEU ASP GLU LEU GLY VAL CYS PHE LEU PHE ALA FRO LYS TYR HIS THR GLY PHE ARG HIS ALA
340
350
360
2810
2820
2630
2840
2850
2860
2670
2680
2890
ATG CCG GTT CGC CAG CAA CTG AAA ACC CGC ACC CIG TTC AAT GIG CTG GGG CCA TTG ATT AAC CCG GCG CAT CCG CCG CTG GCG TTA ATT
MET PRO VAL ARG GLN GLN LEU LYS THR ARG THR LEU PNE ASN VAL LEU GLY PRO LEU ILE ASN PRO ALA HIS PRO PRO LEU ALA LEU ILE
370
360
390
2900
2910
2920
2930
2940
2950
2960
2970
2980
GGT GTT TAT AGT CCG GAA CTG GTG CTG CCG ATT GCC GAA ACC TTG CGC GTG CTG GGG TAT CAA CGC GCG GCG GTG GTG CAC AGC GGC GGG
GLY VAL TYR SER PRO GLU LEU VAL LEU PRO ILE ALA GLU 1HR LEU ARG VAL LEU GLY TYR GLN ARG ALA ALA VAL VAL HIS SER GLY GLY
400
410
420
2990
3000
3010
3020
3030
3040
3050
3060
3070
ATG GAT GAA GTT TCA TTA CAC GCG CCG ACA ATC GTT GCC GAA CTG CAT GAC GGC GAA ATT AAA AGC TAT CAG CTC ACC GCA GAA GAC TTT
MET ASP GLU VAL SER LEU HIS ALA PRO TKR ILE VAL ALA GLU LEU HIS ASP GLY GLU ILE LYS SER TYR GLN LEU THR ALA GLU ASP PHE
430
440
450
3030
3090
3100
3110
3120
3130
3140
3150
3160
GCC CTG ACA CCC TAC CAC CAG GAG CAA CTG GCA GGC GGA ACA CCG GAA GAA AAC CGT GAC ATT TTA ACA CGT TTG TTA CAA GGT AAA GGC
GLY LEU THR PRO TYR HIS GLN GLU GLN LEU ALA GLY GLY THR PRO GLU GLU ASH ARG ASP H E LEU THR ARG LEU LEU GLN GLY LYS GLY
460
470
480
3170
3180
3190
3200
3210
3220
3230
3240
3250
GAC GCC GCC CAT GAA GCA GCC GTC GCT GCG AAC GTC GCC ATG TTA ATG CGC CTG CAT GGC CAT GAA GAT CTG CAA GCC AAT GCG CAA ACC
ASP ALA ALA HIS GLU ALA ALA VAL ALA ALA ASN VAL ALA MET LEU MET ARG LEU HIS GLY HIS GLU ASP LEU GLN ALA ASN ALA GLN THR
490
500
510
3260
3270
3260
3290
3300
3310
3320
GTT CTT GAG GTA CTG CGC AGT GGT TCC GCT TAC GAC AGA GTC ACC GCA CTG GCG GCA CGA GGG TAA
VAL LEU GLU VAL LEU ARG SER GLY SER ALA TYR ASP ARG VAL THR ALA LEU ALA ALA ARG GLY END
520
530
Figure 6(ii)
gene almost c e r t a i n l y a f f e c t the e f f i c i e n c y of i n i t i a t i o n .
A more d e t a i l e d
comparative a n a l y s i s of these regions i s presented elsewhere ( 4 9 ) .
The four i n t e r c i s t r o n i c regions in the p o l y c i s t r o n i c t r p mRNA are also
shown in Figure 7, and display some unusual f e a t u r e s .
Two of them c o n s i s t of
a chain termination codon that overlaps the subsequent i n i t i a t o r codon by one
6656
Nucleic Acids Research
3330
3340
3350
3360
3370
ATG ATG CAA ACC GTT TTA GCG AAA ATC GTC GCA GAC AAG GCG ATT TGG GTA
MET GUI THR VAL LEU ALA LYS ILE VAL ALA ASP LYS ALA ILE TRP VAL
10
3420
3430
3440
3450
3460
GAG GTT CAG CCG AGC ACG CGA CAT TTT TAT GAT GCG CTA CAG GGT GCG CGC
GLU VAL GUI PRO SER THR ARG H I S PHE TYR ASP ALA LEU GLN GLY ALA ARG
40
3510
3520
3530
3540
3550
AAA GGC GTG ATC CGT GAT GAT TTC GAT CCA GCA CGC ATT GCC GCC ATT TAT
LYS GLT VAL I L E ARG ASP ASP PHE ASP PRO ALA ARG I L E ALA ALA I L E TYR
60
70
3600
TTC AGG GGT
LYS TYR PHE ARG GLT
90
3690
TAC CAG ATC TAT CTG
TYR GLN I L E TYR LEU
120
3780
GCC GTC GCT CAC AGT
ALA VAL ALA H I S SER
150
3370
GTT GGC ATC AAC AAC
VAL GLY I L E ASN ASH
160
3960
GTA ATC AGC GAA TCC
VAL I L E SER GLU SER
4050
GCC CAT GAC GAT
ALA HIS ASP ASP
3610
3620
AGC TTT AAT TTC CTC
SER PHE ASH PHE LEU
3710
TAC CAG GCC GAT
TYR GLH ALA ASP
130
3800
3790
CTG GAG ATG GGG GTG CTG ACC
LEU GLU MET GLY VAL LEU THR
160
3890
3830
CGC GAT CTG CGT GAT TTG TCG
ARG ASP LEU ARG ASP LEU SER
19Q
3980
3970
GGC ATC AAT ACT TAC GCT CAG
GLY I L E ASN THR TYR ALA GLN
220
4070
4060
3700
GCG CGC TAT
ALA ARG TYR
GCC GTG CGC CGG GTG
ALA VAL ARG ARG VAL
250
4160
4150
GCG ATT TAC GGT GGG TTG ATT
ALA I L E TYR GLY GLY LEU ILE
280
4250
4240
TAT GTT GGC GTG TTC CGC AAT
TYR VAL GLY VAL PHE ARG ASN
310
4340
4330
GAA GAA CAG CTG TAT ATC GAT
GLU GLU GLH LEU TYR ILE ASP
340
4430
4420
GCC CGC GAG TTT CAG CAC GTT
ALA ARG GLU PHE GLN HIS VAL
370
4520
4510
CAA ACG CTT GGC AAC GTT CTG
GLN THR LEU GLY ASN VAL LEU
400
4610
4600
AAT TCT GCT GTA GAG TCG CAA
ASN SER ALA VAL GLU SER GLN
430
TTG CAC GCC
LEU H I S ALA
4140
GCT TAT GAC GCG GGC
ALA TYR ASP ALA GLY
270
4230
GCG GCA CCG TTG CAG
ALA ALA PRO LEU GLN
300
4320
CAA CTG CAT GGT AAT
GLN LEU H I S GLY ASN
330
4410
GGT GAA ACC CTG CCC
GLY GLU THR LEU PRO
360
4500
TCA CTA TTA AAT GGT
SER LEU LEU ASN GLY
390
4540
GCC GGA CTT GAT TTT
ALA GLY LEU ASP PHE
420
CCC ATC
FRO I L E
100
3380
GAA GCC CGC AAA CAG
GLU ALA ARG LYS GLH
20
3470
ACG GCG TTT ATT CTG
THR ALA PHE ILE LEO
50
3560
AAA CAT TAC GCT TCG
LYS HIS TYR ALA SER
80
3650
3630
3640
GTC AGC CAA ATC GCC CCG CAG CCG ATT TTA
VAL SER GLN ILE ALA PRO GLN PRO ILE LEU
110
3740
3720
3730
GCC TGC TTA TTA ATG CTT TCA GTA CTG GAT
ALA CYS LEU LEU MET LEU SER VAL LEU ASP
t40
3830
3310
3820
GAA GTC AGT AAT GAA GAG GAA CAG GAG CGC
GLU VAL SER ASH GLU GLU GLU GLN GLU ARG
170
3920
3900
3910
ATT GAT CTC AAC CGT ACC CGC GAG CTT GCG
I L E ASP LEU ASN ARG THR ARG GLU LEU ALA
200
3990
4010
4000
GTG CGC GAG TTA AGC CAC TTC GCT AAC GGT
VAL ARG GLU LEU SER HIS PHE ALA ASN GLY
230
4080
4090
4100
TTG CTG GGT GAG AAT AAA GTA TGT
CTG
LEU LEU GLY GLU ASH LYS VAL CYS
LEU
3390
3400
3410
CAG CAA CCG CTG GCC AGT TTT CAG
GLH GLN PRO LEU ALA SER PHE GLN
3480
3490
3500
GAG TGC AAG AAA GCG TCG CCG TCA
GLU CYS LYS LYS ALA 5ER PRO SER
3570
3580
3590
GCA ATT TCG GTG CTG ACT GAT GAG
ALA ILE SER VAL LEU THR ASP GLU
3660
3670
3680
TGT AAA GAC TTC ATT ATC GAC CCT
CYS LYS ASP PHE ILE ILE ASP PRO
3750
3760
3770
GAC GAC CAA TAT CGC CAG CTT GCC
ASP ASP GLH TYR ARG GLN LEU ALA
3640
3650
3660
GCC ATT GCA TTG GGA GCA AAG GTC
ALA ILE ALA LEU GLY ALA LYS VAL
3930
3940
3950
CCG AAA CTG GGG CAC AAC GTG ACG
PRO LYS LEU GLY HIS ASN VAL THR
4020
4030
4040
TTT CTG ATT GGT TCG GCG TTG ATG
PHE LEU ILE GLY SER ALA LEU MET
4110
4120
4130
ACG CGT GGG CAA GAT GCT AAA GCA
THR ARG GLY GLH ASP ALA LYS ALA
4170
4180
4190
TTT GCG ACA TCA CCG CGT TGC GTC
PHE ALA THR SER PRO ARG CYS VAL
4200
4210
4220
GAT GAA CAG GCG CAG GAA GTG ATG GCT
ASP GLU GLN ALA GLN GLU VAL MET ALA
4260
4270
4280
CAC GAT ATT GCC GAT GTG GTG GAC
HIS ASP ILE ALA ASP VAL VAL ASP
4290
4300
4310
GCT AAG GTG TTA TCG CTG GTG GCA GTG
ALA LYS VAL LEU SER LEU VAL ALA VAL
4350
4360
4370
ACG CTG CGT GAA GCT CTG CCA GCA
THR LEU ARG GLU ALA LEU PRO ALA
4300
4390
4400
GTT GCC ATC TGG AAA GCA TTA AGC GTC
VAL ALA ILE TRP LYS ALA LEU SER VAL
4440
4450
4460
GAT AAA TAT GTT TTA GAC AAC GGC
ASP LYS TYR VAL LEU ASP ASH GLY
4470
4480
4490
GGT GGA AGC GGG CAA CGT TTT GAC TGG
GLY GLY SER GLY GLN ARG PHE ASP TRP
4530
4540
4550
CTG GCG GGG GGC TTA GGC GCA GAT
LEU ALA GLY GLY LEU GLY ALA ASP
4560
4570
4580
! TGC GTG GAA GCG GCA CAA ACC GCC TGC
I CYS VAL GLU ALA ALA GLN THR GLY CYS
I
4650
4660
4670
I GCC TCG GTT TTC CAG ACG CTG CGC GCA
I ALA SER VAL PHE GLN THR LEU ARG ALA
4620
4630
4640
CCG GGC ATC AAA GAC GCA CGT CTT
PRO GLY ILE LYS ASP ALA ARG LEU
TAT TAA
TrR END
450
Figure 6(iii)
nucleotide.
It is curious that this occurs with the pairs of genes whose
products are associated in multisubunit enzyme complexes.
The possibility
that the juxtaposition of translational signals may play a role in the
coordinate synthesis of enzyme subunits is supported by observations on
trpE-trpD expression (50).
The trpC gene, whose product is unique among the
five trp operon polypeptides in not being part of a multisubunit complex, is
flanked by larger untranslated regions - 6 nucleotides at the 5' end, and 14
nucleotides at the 31 end.
Though trpC is translated in the same reading
6657
Nucleic Acids Research
4680
GGAAAGGAACA
SPACER
TRP B
1690
ATG ACA
MET THH
I
4780
GCT TTT
ALA PKE
4870
AAA
LYS
4960
GTG
VAL
5050
CTG
LEU
5140
TTA
LEU
5230
TAC
TYR
5320
GAA
GLU
5410
TTT
PHE
5500
CTA
LEU
5590
TCC
SER
5680
GAA
GLU
5770
ATG
MET
5660
TTG
LEU
4700
4710
4720
4730
4740
4750
4760
4770
ACA TTA CTT AAC CCC TAT TTT GGT GAG TTT GGC GGC ATG TAC GTG CCA CAA ATC CTG ATG CCT GCT CTG CGC CAG CTG GAA GAA
THR LEU LEU ASH PRO TYR PHE GLY GLU PHE GLY GLY MET TYR VAL PRO GLN ILE LEU MET PRO ALA LEU ARC GLN LEU GLU GLU
10
20
30
4790
4800
4810
4820
4830
4840
4650
4860
GTC ACT GCG CAA AAA GAT CCT GAA TTT CAG GCT CAG TTC AAC GAC CTG CTG AAA AAC TAT GCC GGG CGT CCA ACC GCG CTG ACC
VAL SER ALA GLN LYS ASP PRO GLU PHE GLN ALA GLN PHE ASN ASP LEU LEU LYS ASN TYR ALA GLY ARG PRO THR ALA LEU THR
50
60
40
4880
4890
4900
4910
4920
4930
4940
4950
TGC CAG AAC ATT ACA GCC GGG ACG AAC ACC ACG CTG TAT CTC AAG CGT GAA GAT TTG CTG CAC GGC GGC GCG CAT AAA ACT AAC CAG
CYS GLN ASN ILE TH3 ALA GLY THR ASN THR TKR LEU TYR LEU LYS ARG GLU ASP LEU LEU HIS GLY GLY ALA HIS LYS THR ASN GLN
70
60
90
4970
4980
4990
5000
5010
5020
5030
5040
CTG GGG CAG GCG TTG CTG GCG AAG CGG ATG GGT AAA ACC GAA ATC ATC GCC GAA ACC GGT GCC GGT CAG CAT GGC GTG GCG TCG GCC
LEU GLY GLN ALA LEU LEU ALA LYS ARG MET GLY LYS THR GLU ILE ILE ALA GLU THR GLY ALA GLY GLN HIS GLY VAL ALA SER ALA
100
110
120
5060
507Q
5080
5090
5100
5110
5120
5130
GCC AGC GCC CTG CTC GGC CTG AAA TGC CGT ATT TAT ATG GGT GCC AAA GAC GTT GAA CGC CAG TCG CCT AAC GIT TTT CGT ATG CGC
ALA SER ALA LEU LEU GLY LEU LYS CYS ARG ILE TYR MET GLY ALA LYS ASP VAL GLU ARG GLN SER PRO ASN VAL PHE ARG MET ARG
130
140
150
5150
5160
5170
5180
5190
5200
5210
5220
ATG GGT GCG GAA GTG ATC CCG GTG CAT AGC GGT TCC GCG ACG CTG AAA GAT GCC TGT AAC GAG GCG CTG CGC GAC TGG TCC GGT AGT
MET GLY ALA GLU VAL ILE PRO VAL HIS SER GLY SER ALA THR LEU LYS ASP ALA CYS ASN GLU ALA LEU ARG ASP TPP SER GLY SER
160
170
160
5240
5250
5260
5270
5260
5290
5300
5310
GAA ACC GCG CAC TAT ATG CTG GGC ACC GCA GCT GGC CCG CAT CCT TAT CCG ACC ATT GTG CGT GAG TTT CAG CGG ATG ATT GGC GAA
GLU THR ALA HIS TYR MET LEU GLY THR ALA ALA GLY PRO HIS PRO TYR PRO THR ILE VAL ARG GLU PHE GLN ARG MET ILE GLY GLU
190
200
210
5330
5340
5350
5360
5370
5380
5390
5400
ACC AAA GCG CAG ATT CTG GAA AGA GAA GGT CGC CTG CCG GAT GCC GTT ATC GCC TGT GTT GGC GGC GGT TCG AAT GCC ATC GGC ATG
THR LYS ALA GLN ILE LEU GLU ARG GLU GLY ARG LEU PRO ASP ALA VAL ILE ALA CYS VAL GLY GLY GLY SER ASN ALA ILE GLY MET
220
230
240
5420
5430
5440
5450
5460
5470
5460
5490
GCT GAT TTC ATC AAT GAA ACC AAC GTC GGC CTG ATT GGT GTG GAG CCA GGT GGT CAC GGT ATC GAA ACT GGC GAG CAC GGC GCA CCG
ALA ASP PHE ILE ASN GLU THR ASH VAL GLY LEU ILE GLY VAL GLU PRO GLY GLY HIS GLY ILE GLU THR GLY GLU HIS GLY ALA PRO
250
260
270
5510
5520
5530
5540
5550
5560
5570
5S80
AAA CAT GGT CGC GTG GGT ATC TAT TTC GGT ATG AAA GCG CCG ATG ATG CAA ACC GAA GAC GGG CAG ATT GAA GAA TCT TAC TCC ATC
LYS HIS GLY ARG VAL GLY ILE TYR PHE GLY MET LYS ALA PRO MET MET GLN THR GLU ASP GLY GLN ILE GLU GLU SER TYR SER ILE
260
290
300
5600
5610
5620
5630
5640
5650
5660
5670
GCC GGA CTG GAT TTC CCG TCT GTC GGC CCA CAA CAC GCG TAT CTT AAC AGC ACT GGA CGC GCT GAT TAC GTG TCT ATT ACC GAT GAT
ALA GLY LEU ASP PHE PRO SER VAL GLY PRO GLN HIS ALA TYR LEU ASN SER THR GLY ARG ALA ASP TYR VAL SER ILE THR ASP ASP
310
320
330
5690
5700
5710
5720
5730
5740
5750
5760
GCC CTT GAA GCC TTC AAA ACG CTG TGC CTG CAC GAA GGG ATC ATC CCG GCG CTG GAA TCC TCC CAC GCC TTG GCC CAT GCG TTG AAA
ALA LEU GLU ALA PHE LYS THR LEU CYS LEU HIS GLU GLY ILE ILE PRO ALA LEU GLU SER SER HIS ALA LEU ALA HIS ALA LEU LYS
340
350
360
5760
5790
5600
5810
5620
5830
5840
5650
ATG CGC GAA AAC CCG GAT AAA GAG CAG CTA CTG GTG GTT AAC CTT TCC GGT CGC GGC GAT AAA GAC ATC TTC ACC GTT CAC GAT ATT
MET ARG GLU ASN PRO ASP LYS GLU GLN LEU LEU VAL VAL ASN LEU SER GLY ARG GLY ASP LYS ASP ILE PHE THR VAL HIS ASP ILE
370
360
390
5670
5860
AAA GCA CGA GGG GAA ATC TG
LYS ALA ARG GLY GLU ILE EHO
Figure 6(iv)
frame as trpD preceding i t ,
pair of tandem AUG codons.
i n i t i a t i o n e v i d e n t l y begins at the second of a
The spacer between trpC and trpB has only one
pyrimidine in a s t r e t c h of 14 n u c l e o t i d e s , far more purines than required
for
the Shine-Dalgarno i n t e r a c t i o n .
Whether t h i s region possesses some
a d d i t i o n a l function i s unknown.
In S_. typhimurium, t h i s region has only 12
n u c l e o t i d e s , and i s capable of forming some secondary s t r u c t u r e t h a t i s not
p o s s i b l e in the I!, c o l i case ( 5 1 ) .
II.
A.
EVOLUTIONARY CONSIDERATIONS
Codon usage
Table I summarizes the frequency of codon u t i l i z a t i o n i n each of the five
s t r u c t u r a l genes of the IL. c o l i t r p operon.
6658
The numbers in parentheses show
Nucleic Acids Research
TRP A
5890
ATG GAA CGC TAC
MET GLU ARG TYR
1
5960
GAG CAG TCA TTG
GLU GLN SER LEU
6070
GGC CCG ACG ATT
GLY PRO THR ILE
6160
AAA CAC CCG ACC
LT5 HIS PRp THR
6250
GTC GGC GTC GAT
VAL GLY VAL ASP
6340
ATC TTC ATC TGC
ILE PHE ILE CYS
6430
GGC GTG ACC GGC
GLY VAL THR GLY
GGA TTT
GLY PHE
6520
GGT ATT
GLY ILE
ATC GAG
ILE GLU
6610
CAA CAT
GLN HIS
5900
5910
GAA TCT CTG TTT GCC CAG TTG
GLU SER LEU PHE ALA GLN LEU
10
5990
6000
AAA ATT ATC GAT ACG CTA ATT
LYS ILE ILE ASP THR LEU ILE
40
6060
6090
CAA AAC GCC ACT CTG CGC GCC
GLN ASH ALA THR LEU ARG ALA
70
6160
6170
ATT CCC ATT GGC CTG TTG ATG
ILE PSO H E GLY LEU LEU MET
100
6260
6270
TCG GTG CTG GTT nrr K I T GTG
SER VAL LEU VAL ALA ASP VAL
130
6350
6360
CCG CCA AAT GCC GAT GAC GAC
PRO PRO ASN ALA ASP ASP ASP
160
6440
6450
GCA GAA AAC CGC GCC GCG TTA
LEU
190
6530
6540
TCC GCC CCG GAT CAG GTA AAA
SER ALA PSO ASP GLN VAL LYS
220
6620
6630
ATT AAT GAG CCA GAG AAA ATG
ILE ASN GLU PRO GLU LYS MET
250
5930
5920
5940
AAG GAG CGC AAA GAA GGC GCA TTC GTT CCT
LYS GLU ARG LYS GLU GLY ALA PHE VAL PRO
20
6010
6020
6030
GAA GCC GGT GCT GAC GCG CTG GAG TTA GGT
GLU ALA GLY ALA ASP ALA LEU GLU LEU GLY
50
61C0
6100
6110
TTT GCG GCA GGT GTG ACT CCG GCA CAA TGT
PHE ALA ALA GLY VAL THR PRO ALA GLN CYS
60
6190
6200
6210
TAT GCC AAT CTG GTG TTT AAC AAA GGC ATT
TYR ALA ASN LEU VAL PHE ASN LYS GLY ILE
110
6280
6290
6300
CCA GTT GAA GAG TCC GCG CCC TTC CGC
PRO VAL GLU GLU SER ALA PRO PHE ARG GLN
140
6370
6360
6390
CTG CTG CGC CAG ATA GCC TCT TAC GGT CGT
LEU LEU ARG GLN ILE ALA SER TYR GLY ARG
170
6460
6470
6460
CCC CTC AAT CAT CTG GTT GCG AAG CTG AAA
PRO LEU ASN
LYS
200
6550
6560
6570
GCA GCG ATT GAT GCA GGA GCT GCG GGC GCG
ALA ALA ILE ASP ALA GLY ALA ALA GLY ALA
230
6640
6650
6660
CTG GCG GCA CTG AAA GTT TTT GTA CAA CCG
LEU ALA ALA LEU LYS VAL PHE VAL GLN PRO
260
5960
5970
5950
TTC GTC ACG CTC GGT GAT CCG GGC ATT
PHE VAL THB LEU GLY ASP PRO GLY ILE
30
6050
6060
6040
ATC CCC TTC TCC GAC CCA CTG GCG GAT
ILE PRO PHE SER ASP PRO LEU ALA ASP
60
6150
6140
6130
TTT GAA ATG CTG GCA CTG ATT CGC CAG
PHE GLU MET LEU ALA LEU ILE ARG GLN
90
6230
6240
6220
GAT GAG TTT TAT GCC CAG TGC GAA AAA
ASP GLU PHE TYR ALA GLN CYS GLU LYS
120
6320
6330
6310
GTC GCA CCT
ALA ALA LEU ARG HIS ASN VAL ALA PRO
150
6410
6420
6400
GGT TAC ACC TAT TTG CTG TCA CGA GCA
GLY TYR THR TYR LEU LEU SER ARG ALA
160
6510
6490
6500
GAG TAC AAC GCT GCA CCT CCA TTG CAG
GLU TYR ASN
ALA
6560
ATT TCT GGT
ILE SCR GLY
TCG
SER
6670
ATG AAA GCG
HET LtS ALA
GCG
ALA
ALA PRO PRO LEU GLN
210
6590
6600
GCC ATT GTT AAA ATC
ALA ILE VAL LYS ILE
240
6680
ACG CGC AGT TAA
THR APG SER END
Figure 6(v)
Figure 6. The complete nucleotide sequence of the coding region for the five
trp polypeptides including the intercistronic regions. The nucleotide
sequence is numbered relative to the start of transcription. The amino acid
residues in each polypeptide are numbered from the amino-terminal Met. The
complete nucleotide sequence and associated restriction sites are available
in the Sumex-Molgen data bank at Stanford University.
the proportional use of particular codons; that is, in cases where more than
one codon specifies a particular amino acid, the relative frequency of usage
of each codon is given.
There is considerable consistency in the
proportional use of each codon throughout the operon.
Although every sense
codon is used at least once somewhere in the structural sequence, the usage
A A A C G G T A T C G A C AIATGJA A A
A T T A G A G A A T A A C A|A T el C A A
C
i
t
G
i
t
U
T
t
l
C|T G | A | T G | G
C T
C A C G A G G G | T A A | A T G | A T G| C A A
|T A A|G G A A A G G A A C
C G A G G G G A A A T
A | A T G| A C A
CJT G | A | T G | G A A
Figure 7. Ribosome binding s i t e s and i n t e r c i s t r o n i c regions. I n i t i a t o r and
stop codons are boxed; Shine-Dalgarno sequences are underlined.
6659
00
eo
o O
H
r-
CO
vO
en
CJ
m
O O
CO
•n
vO
CM
CO
CM
rH
fH
C
in vo
rH
o
r- CM
CM CM vO CM
eo 00 CM
CO
rH
CO
CM
rH
CO
CM
vO
CM CO r-i T-i
m
r-i CM
CO
CO in
o O
VD r-f
r-i
r-i O
o o
r-i i^.
o
r-i
CO
»H
rH rH
CO
vO -3"
-
-3"
CO ON
rH CM
o
CO CM
vO
rH
rH
rH
CM
ri
H
n
m o
VO
CM
CM
s
r-i
en
vD
ON VD
o o
C
p?
CO
CM
o
in vD
VO
o
vO
CM
rH
rO
in
r-i eo
o
o
H CJ
O
CJ EJ CJ O
o m
vO
en
m
<
CJ
o
CJ
8
CJ
cn CO
vD
8 B8
CM CM r-i
rH vO prH CM CM CO
CO CO CM CM
CO
m
ro
O
H CJ
**
<
ON
CO
o
r-i
<
CM CO
CM cn
CM
CO
CM
CM
rH
in
CO
CM
CM VD CM
o
cn O
O\
rH
s
O
rH
O o
CM
i-H
en
CO
r-i
t-i
eo
rH
r-i
r-i rH
H CJ
O
O o CJ CJ
00 CO 00
CO CM
en
en
cn
CO
1-i
CO
rH
rH
rH
H
CM
ON
m o
m
m
o
CM ON
eo en
rH
74)
CM
O
CO
en
rH
m
m
i-H CM
14)
m
m CM
o vo
cn VO
<n
eo
.32)
.16)
CM T-i vD
rH
13 (.52) 9 ( .39]
48)
14 (.61
O
CM
O
.10)
.053
CM
vD
12) 4 (
04) 2 (
.05
.54)
.46)
m
44) 6 (
EC
CM
.13)
r-*
rH
CM ON
CO CM r-l CM
in
vO
r-i
r-i
10
18
14
16
"o^
r- CM
<
CM
CO
39)
9)
15)
7)
H
CM
.n
.17
CJ
C }
^j-
r-i rH
rH
rH
00 r-i
O o
n
o m
en
CO CM
CM CO
AGG
H O O
m
gg g g g ° g
CO CM
o vn
ON
en vD
CO
CM
00 CM
.57
ON tn
CM
O O
.n
CO
CM
CM
.50
vO
VD
.17,
vn
vO CO
Ex}
m
eo
.17
.39'
m
.53
00 vD
-3-
0
m
12
eo
0)
vO
O
rH rH
rH
^
CM
CM
o
O0 ON CM
CM
-3-
CM CO rH
rH
rH
r-
CM
en r- CM
s
rH
o
m
o
O1 o
-3 vD
00
m
CM CO •H
CM
-3"
o
CM
o o
vD
rH 00 CO
in tn vD
ON iH
vO en
r-i
tn m
CO
ON
o
(.17) 6 ( .15
(.31)
17 (.41
(.24) 3 ( .07
(.28)
15 (.37
PS
CO
rH
eo vo
(.73
•"
(.47
(.53
en vD
(.69
m
(.15
r-.
(.44
(.50
(.50
<
(.41
Nucleic Acids Research
m CM
co in
CO CM
CO
en
t-i in
m
o
vO
t-i
t-i
m oo
o
W
.—I
r-f
H C_>
u
H H H
g
6660
PH
CJ
g CJ
CJ CJ CJ
in
P
O
CM
rH
g
m vO
m
m
m
CO
rH
u
CJ
o o
o
C g
<
vD
00
sf
r-i
rH
TCT
m
CO
CJ
O
CJ CJ
H H H
u
H
C) CJ
O
o
g8
CJ
H
<
CJ
o
<
<
CO
CM
rH r-i
8
fr>
CJ
vO
r-i
CJ CJ
CJ
CJ
vj
O
8
CJ
Nucleic Acids Research
is far from equal.
The seldom used codons for arginine (AGG, AGA, CGG and
CGA used 1, 3, 3, and 5 times) and isoleucine (ATA used twice) contrast with
certain favored codons for leucine (CTG, 53% usage with 6 choices), valine
(GTG, 43% usage with 4 choices), proline (CCG, 57% usage with 4 choices),
threonine (ACC, 52% usage with 4 choices), lysine (AAA, 82% usage with 2
choices), glutamic acid (GAA, 72% usage with 2 choices) and arginine (CGC,
57% usage with 6 choices).
This non-random pattern of codon usage is characteristic of
intermittently or moderately expressed genes in E. coli.
A more restricted
pattern is seen with highly expressed genes such as those for ribosomal
proteins (52) and outer membrane components (53).
In such cases there may be
a requirement for codons with intermediate binding energies to ensure short
ribosomal transit times (17).
The trp operon genes, which may be expressed
maximally by the cell only on occasion, seem to show a preference but not a
requirement for the same codons.
Since the genes of the trp operon are permitted some freedom in codon
utilization it is not surprising that the third positions are strongly
influenced by the overall G + C content of the whole genome.
A comparison of
the sequences of trpA (268 codons; 38) and the proximal third of trpD (194
codons; 37) in several enteric bacteria shows this most clearly.
The entire
E. coli trp operon has 58% G + C in the third codon position while the genome
G + C content is 51%.
Those regions of the operon sequenced in Klebsiella
aerogenes (genome G + C content 56%) and Serratia marcescens
(genomic
G + C content 59%) show 83 and 82% G + C in the third codon position,
respectively
B.
(38,39).
Amino acid sequence homology.
A computer search for repeated sequences within the trp operon revealed
no evidence for an ancestral duplication of any segment of the operon during
its evolution (Deeley, M., unpublished).
unexpected.
This result is not wholly
Since each reaction of the pathway is chemically different, a
plausible scheme for the origin of the operon involves recruitment and
modification of individual genes or gene segments originally responsible for
performing chemically similar reactions with different substrates.
Grouping
these genes together behind a single regulatory region may have been a late
evolutionary event in the enteric bacteria (see below).
Under this hypothesis one might expect some amino acid sequence homology
between individual trp genes and genes of related function elsewhere on the
chromosome, rather than homology between different trp genes.
This
6661
Nucleic Acids Research
possibility has not yet received an adequate test, but tnaA, the gene for
tryptophanase and one candidate for an ancestral relative of trpB, shows no
detectable sequence similarity to trpB (54).
C.
Gene and operon organization in other organisms
the general plan of the trp operon appears to be identical in all enteric
bacteria.
In some, however, the two functionally distinct segments of the
trpD gene are separate rather than fused as in E_. coli (19). The mechanism
of this fusion has been postulated from a comparison of the DNA sequence in
this region of the £. marcescens and E. coli trp operons (55). The trpC gene
of E. coli and all other enteric bacteria studied apparently represents a
fusion of two genes that are separate in other procaryotic organisms (56,
57),
though the precise location and mechanism of fusion is not known.
Enzymatic studies suggest that in both fused genes just described the two
active sites have remained distinct and independent.
In two instances this
pathway does have complex active sites composed of elements on two different
polypeptides:
anthranilate synthetase formed from the trpE gene product and
the trpG domain of the trpD gene product and tryptophan synthetase from the
trpB and trpA gene products.
It is interesting that these cooperating
polypeptides have not been fused in E. coli.
The existence of a fused
trpA-trpB polypeptide in Saccharomyces cerevisiae (58) and Neurospora crassa
(59) indicates that such a fusion of cooperative polypeptides can not be
ruled out for mechanical reasons.
All other major bacterial groups studied have the genes for the
tryptophan pathway at two or three separate chromosomal locations (56, 57);
in many of these instances the separate trp gene clusters are independently
regulated.
It is clear that the enteric bacterial arrangement of these genes
in a single operon, though it is the best studied one, is only one of many
possible solutions to the genetic organization of the elements of the
tryptophan pathway.
D.
Hybrid trp genes and proteins
j>. typhimurium is a close relative of E_. coli, but differs from it in the
trpA gene in 25% of its nucleotides and 15% of its amino acids.
The
corresponding differences in the trpB gene are 15% of the nucleotides and 4%
of the amino acids.
Most of the nucleotide differences are in the third
codon position and represent synonymous codon changes.
Through the
construction of compatible plasmids containing defective versions of these
two homologous genes, recombinant trpB and trpA genes producing hybrid
proteins were obtained (60). In the trpA case there was no requirement that
6662
Nucleic Acids Research
these hybrids be enzymatically functional.
Nevertheless, each of six such
recombinants examined possessed normal enzymatic activity, though each had a
different crossover point from the sequence of one organism to the other.
The hybrids differed from either parental protein in 6 to 34 amino acids.
Several of the recombinant proteins were less thermostable than either
parental molecule.
Thus, though none of the divergent amino acids appears to
affect the active site, it can be argued that in each parent some of the
amino acid differences are balanced by others far away in the primary
sequence to obtain a more stable overall conformation.
III.
PROBLEMS FOR THE FUTURE
The ease with which structural genes can now be cloned, fragmented, and
fused to other genes or gene fragments, should make it possible to fabricate
virtually any desired gene sequence by combining preexisting DNA segments.
This application of recombinant DNA technology, combined with the additional
capability of synthesizing and incorporating short DNA fragments of defined
sequence, challenge our ingenuityi in the design of meaningful experiments.
A.
Protein structure studies
With present-day techniques one can examine the effects of replacing
individual amino acid residues, or segments of a protein, on the properties
of the catalytic site, on the folding of the molecule, and on susceptibility
of the protein to proteases and environmental conditions.
The potential for
systematically changing the structure of a polypeptide should yield
derivatives that are not readily obtainable using classical mutant production
procedures.
With such tailor-made trp polypeptides one can approach the
classical questions about protein structure:
to what extent are the unique
characteristics of each polypeptide chain (functional domains, substrate
binding sites, catalytic sites, subunit interaction sites, and folding)
associated with different linear segments of the polypeptide?
Although
proteins are maintained in an appropriate three-dimensional conformation by
multiple interactions between different segments of the polypeptide chain, it
is conceivable that much of the structural information for each discrete
functional aspect resides in a distinct linear segment of the polypeptide.
This attractive possibility would provide for the evolution of proteins with
new overall functions by transposition of preexisting gene segments.
In
addition, it would be consistent with the presence of intervening sequences
between domain-encoding regions of some eukaryotic genes.
6663
Nucleic Acids Research
B.
Transcriptional and translational controls
Using current techniques it should be possible to change any base pair in
the promoter, operator and leader regions of the trp operon of E. coli and
thereby analyze the contribution of each segment of these regulatory
regions.
Such studies should reveal the recognition requirements for
polymerase and repressor binding, and identify the segments of the leader
transcript that participate in attenuation.
This information, paired with
physiological observations, should elucidate the essential features of
transcriptional control as well as illuminate how such controls benefit the
organism.
Comparable studies with the internal promoter may provide an
explanation for its existence in E. coli and related enteric bacteria.
Little is known about the details of translation of the coding regions of
the polycistronic trp messenger RNA.
We suspect that ribosome behavior on
the messenger is optimized relative to the needs of the bacterium but there
is no experimental evidence that accounts for the sequence differences
between the translation punctuation regions.
We particularly would like to
know the significance of the overlapping stop-start codon sequences between
the pairs of genes whose polypeptide products form functional complexes.
Do
translating ribosomes or their 30S subunits simply reinitiate following
termination at the stop codon in such stop-start regions?
question relevant to all polycistronic messengers.
This is a key
Also unknown is whether
any of the trp polypeptides or any other cellular protein binds to the
messenger and influences translation.
Recombinant DNA techniques applied to
these problems should provide insight into the structural
requirements for
ribosome binding and translation efficiency, and reveal the significance of
the diverse structures of translation initiation regions.
Termination of transcription can also serve several important functions
in the cell.
In addition to the housekeeping requirement of stopping
transcription at the end of a gene cluster, termination mechanisms are
involved in attenuation
(to modulate transcriptional readthrough) and
mutational polarity (to abort transcription when it is uncoupled from
translation).
Some increasingly complex aspects of termination are
illustrated by control regions in the trp operon, but these by no means
exhaust existing possibilities.
Studies i_n vivo and j^n vitro are beginning
to clarify not only the role of nucleotide sequence, but the participation of
additional protein factors (such as rho and nusA), the translational
apparatus, processing events, and higher order structural interactions in
termination and antitermination events.
6664
Nucleic Acids Research
Future work will be focused on understanding the intricate molecular
details of these mechanisms, and developing accurate mimicking of cellular
regulation on a broader front.
For example, the qualitative resemblence
between termination regions in E. coli such as trp t_' and termination regions
in eukaryotic organisms suggests that application of our knowledge in this
direction might be especially fruitful.
Ultimately, we should be able to
combine the contributions of termination regions with those of promoter or
operator regions to enhance the range of control of the expression of hybrid,
foreign, or synthetic genes in any chosen organism.
C.
Evolutionary studies
Just as with the functional and regulatory questions posed above,
recombinant DNA methods should offer ways to study the mechanisms of
evolution.
Selective advantages or disadvantages of certain genetic
modifications should be directly measurable.
Although easiest, measurement
of the growth rate in minimal medium of cells containing a gene or gene
segment from a different species is likely to be too insensitive for this
purpose because of the wide regulatory range available for this operon.
Modifications of the trp promoter and leader to yield a fixed, low level of
expression could be introduced, of course, but the magnitude of a selective
burden, even though compensated for by a derepression of some magnitude,
might best be measured by competition against the wild type in chemostat
experiments under several growth conditions.
Some of the features that could
be tested for evolutionary significance in this manner are:
the value of trp
p2, the trpG-trpD fusion, the trpC-trpF fusion, the ability of heterologous
subunits of anthranilate synthetase and tryptophan synthetase to substitute
for homologous ones, the advantage of a particular transcription termination
sequence at the end of the operon, the selective advantage of specific codon
usage, and many others.
If a modification, such as translational punctuation
between the trpG and trpD domains, is found to be deleterious, stepwise
selection for better function may retrace the presently hypothetical
evolutionary pathway culminating in a gene fusion.
Going further afield, the trp genes of non-enteric organisms offer many
unexpected examples of evolutionary diversity.
The level of expression of
trpB and trpA in Pseudomonas aeruginosa and Pseudomonas putida is regulated
over a wide range, as in J[. coli, but control is through induction by
indoleglycerol phosphate rather than repression by tryptophan (56). As it is
now known that the regulatory mechanism for this Pseudomonas gene pair is
closely linked to the two structural genes and can be mobilized with it (60,
6665
Nucleic Acids Research
61),
introduction of this unit into an E. coli strain deleted for trpB and
trpA could provide a t e s t of the superiority of one or the other of these
contrasting regulatory mechanisms in E_. c o l i .
This brief survey only skims the surface of the possibilities made
available by a complete knowledge of the trp operon DNA sequence.
Future
experimentation promises solutions to questions about structure and function
as well as about the mechanisms regulating expression of this cluster of five
genes.
When accompanied by additional exploration of the homologous genes in
other bacteria, there is reason to hope that we can reconstruct the
evolutionary events that resulted in the operon we observe in E_. coli today.
Acknowledgement
The studies summarized herein could not have been performed without the
support of the U.S. Public Health Service, the National Science Foundation,
the American Heart Association and the American Cancer Society.
REFERENCES
1. Bennett, G. N. and Yanofsky, C. (1978) J . Mol. Biol. 121, 179-192
2. Gunsalus, R. G. and Yanofsky, C. (1980) Proc. Natl. Acad. S c i . U.S.A. 77,
7112-7121
3. Yanofsky, C. (1981) Nature 289, 751-758
4 . Crawford, I . P. and S t a u f f e r , G.V. (1980) Ann. Rev. Biochem. 4 9 ,
163-195
5. Imamoto, F. (1973) Prog. Nucleic Acid Res. Mol. Biol. 13, 339-407
6. P l a t t , T. (1981) C e l l , 24, 10-23
7. Oppenheim, D. S . , Bennett, G. N. and Yanofsky, C. (1980) J . Mol.Biol.
144, 133-142
8 . Miozzari, G. and Yanofsky, C. (1978) P r o c . N a t l . Acad. Sci.U.S.A. 75,
5580-5584
9. Oppenheim, D. S. and Yanofsky, C. (1980) J . Mol. B i o l . 144, 143-161
10. Brown, K. D. , Bennett, G. N. Lee, F . , Schweingruber, M.E. , and Yanofsky,
C. (1978) J . Mol. B i o l . 121, 153-177
1 1 . Rosenberg, M. and Court, D. (1979) Ann. Rev. Genet. 13, 319-353
12. Zurawski, G. , Gunsalus, R., Brown, K. D. and Yanofsky, C. (1981) J . Mol.
Biol. 145, 47-73
1 3 . S q u i r e s , C. L. , Lee, F. and Yanofsky, C. (1975) J . Mol. B i o l . 92, 93-111
14. S i n g l e t o n , C. K., Roeder, W. D., Bogosian, G. , Somerville, R. L. and
Weither, H. L. (1980) Nucl. Acids Res. 8, 1551-1560
15. Jackson, E. and Yanofsky, C. (1972) J . Mol. B i o l . 69, 307-313
16. Horowitz, H. and P l a t t , T. (1982) J . Mol. B i o l . in press
17. Grantham, R. , and Gautier, C. , Gouy, M. and Mercier, R. Nucl. Acids Res.
9, r 4 3 - r 7 4 .
18. Nichols, B. P . , VanCleemput, M. and Yanofsky, C. (1981) J . Mol. B i o l .
146, 45-54
19. Largen, M. and B e l s e r , W. (1973) Genetics 75, 19-22
20. Schmeissner, U., Ganem, D. and M i l l e r , J . H. (1977) J . Mol. B i o l . 109,
303-326
21. Miozzari, G. F. and Yanofsky, C (1978) J . Bact. 133, 1457-1466
22. Lee, F. and Yanofsky, C. (1977) Proc. N a t l . Acad. S c i . U.S.A. 74,
4365-4369
6666
Nucleic Acids Research
23. Oxender, D. Zurawski, G. and Yanofsky, C. (1979) Proc. Natl. Acad. Sci.
U.S.A. 76, 5524-5528
G. V., Zurawski, G., and Yanofsky, C. (1980) J. Mol. Biol. 142,
123-129
Zurawski, G. and Yanofsky, C. (1980) J. Mol. Biol. 142, 123-129
Stroynowski, I . and Yanofsky, C. In preparation.
Johnston, H. M., Barnes, W. M.
M., Chumley, F. G. , Bossi, L. and Roth, J.
(1980) Proc. Natl. Acad. Sci. U.S.A. 77, 508-512
Keller, E. B., and Calvo, J. M. (1979) Proc. Natl. Acad. Sci. U.S.A. 76,
6186-6190
Farnham, P. J. and P l a t t , T. (1981) Nucleic Acids Res. 9, 563-577
Winkler, M. E., and Yanofsky, C. (1981) Biochemistry 20, 3738-3744
Wu, A. M. and P l a t t , T. (1978) Proc. Natl. Acad. Sci. U.S.A. 75,
5442-5446
Guarente, L. , Beckwith, J. Wu, A.M. and P l a t t T. (1979) J. Mol. Biol.
133, 189-197
Wu, A. M., Chapman, A. B., P l a t t , T., Guarente, L. P. and Beckwith, J.
(1980) Cell 19, 829-836
Wu, A. M. , C h r i s t i e , G. E. and P l a t t , T. (1981) Proc. Natl. Acad. Sci.
U.S. A. 78, 2913-2917
Guarente, L. P. (1979) J. Mol Biol. 129, 295-304
Nichols, B. P . , Miozzari, G. F . , VanCleemput, M., Bennett, G. N. and
Yanofsky, C. (1981) J. Mol. Biol. 142, 503-517
Horowitz, H. , C h r i s t i e , G. E. and P l a t t , T. (1982) J. Mol. B i o l . i n press
C h r i s t i e , G. E. and P l a t t , T (1980) J. Mol. Biol. 142, 519-530
Crawford, I P. , Nichols, B. P. and Yanofsky, C. (1980) J. Mol. Biol. 142,
489-502
Nichols, B. P and Yanofsky, C. (1979) Proc. Natl. Acad. Sci.U.S.A. 76,
5244-5248
Jackson, E. N. and Yanofsky, C. (1974) J. Bact. 117, 502-508
Yanofsky, C. , Horn, V., Bonner, M. and Stasiowski, S. (1971) Genetics 69,
409-4 33
Creighton, T. (1970) Biochem. J 120, 699-707
Kirschner, K., Szadkowsky, H. Henschen, A. and Lottspeich, F. (1980) J.
Mol. Biol. 143, 395-409
I t o , J. and Yanofsky, C. (1966) J. Biol. Chem. 241, 4112-4114
Zalkin, H. (1973) Adv. In Enzymology 38, 1-39
Crawford, I . P . and Yanofsky, C. (1958) Proc. Natl. Acad. Sci. U.S.A 44,
1161-1170
Miles, E. W. (1979) Adv. in Enzymology 49, 127-186
C h r i s t i e , G. E. and P l a t t , T. (1980) J. Mol. Biol. 143, 335-341
Oppenheim, D. S. and Yanofsky, C. (1980) Genetics 95, 785-795
Selker, E. and Yanofsky, C. (1979) J. Mol. B i o l . , 130, 135-143
Post, L. E., Strycharz, G. D., Nomura, M., Lewis, H. and Dennis, P. P.
(1977) Proc. Natl. Acad. S c i . U.S.A. 76, 1697-1701
Nakamura, K. and Inouye, M. (1980) Proc. Natl. Acad. Sci. U.S.A. 77,
1369-1373
Deeley, M. and Yanofsky, C. (1981) J. Bacteriol. 147, 782-796
Miozzari, G. and Yanofsky, C. (1979) Nature 277, 486-489
Crawford, I. P. (1975) Bacteriol. Revs. 39, 87-120
Crawford, I . P. (1980) C r i t . Revs. Biochem. 8, 175-189
Zalkin, H. and Yanofsky, C. , J. Biol. Chem. in press
Matchett, W. H. , and DeMoss, J. A. (1975) J. Biol. Chem. 250,
2941-2946
Schneider, W. P . , Nichols, B. P. and Yanofsky, C. (1981) Proc.
Natl. Acad. Sci. U.S.A. 78, 2169-2173
24. Stauffer,
25.
26.
27.
28.
29.
30.
31.
32.
33.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
6667
Nucleic Acids Research
6 1 . Hedges, R. W., Jacob, A. E . , and Crawford, I . P. (1977) Nature 267,
283-284
62. Manch, J. N. and Crawford, I . P. (1981) J . Mol. Biol. in p r e s s
6668