* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The complete nucleotide sequence of the tryptophan operon of
Survey
Document related concepts
Transcript
Volume 9 Number 241981 N u c l e i c A c i d s Research The complete nucleotide sequence of the tryptophan operon of Escherichia coli C.Yanofsky1 T.Platt2, I.P.Crawford3, B.P.Nichols1, G.E.Christie2, H.Horowitz2, M.VanCleemput1 and A.M.Wu2 Department of Biological Sciences, Stanford University, Stanford, CA 94305, ^Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06510, and ^Department of Microbiology, University of Iowa, Iowa City, 1A 52242, USA Received 11 November 1981 ABSTRACT The tryptophan (trp) operon of Escherichia coli has become the basic reference structure for studies on tryptophan metabolism. Within the past five years the application of recombinant DNA and sequencing methodologies has permitted the characterization of the structural and functional elements in this gene cluster at the molecular level. In this summary report we present the complete nucleotide sequence for the five structural genes of the trp operon of E. coli together with the internal and flanking regions of regulatory information. I. INTRODUCTION AND OPERON ORGANIZATION The pathway of tryptophan biosynthesis was the subject of some of the earliest studies with biochemical mutants of micro-organisms. Insight gained in these investigations provided the foundation for extensive genetic and biochemical analyses that established the genes, enzymes and reactions of tryptophan biosynthesis as favored subjects for study. Over the years the tryptophan system has been used to investigate virtually every aspect of amino acid metabolism, gene and operon structure, and gene function. A schematic representation of the trp operon is given in Figure 1, with the regulatory region preceding the first structural gene expanded below. Within the promoter region (trp p_) is an operator site (trp o_) at which a tryptophan-activated repressor protein can bind and regulate transcription initiation (1,2). Beyond the transcription initiation site a transcribed leader region (trpL) of 162bp contains a regulated site of transcription termination, the attenuator (trp a) (3). Transcription into the five structural genes may therefore be regulated at both the operator and attenuator sites (4)(see below). Those RNA polymerase molecules that transcribe through the attenuator (depending on metabolic conditions) generally continue to the end of the operon. The full-length polycistronic trp mRNA encoding the five trp polypeptides (5) is about 6800 nucleotides in ©IRL Press Umited. 1 Falconberg Court. London W 1 V 5FG. U.K. 6647 Nucleic Acids Research trpE p;trpL; anthronilott trpD »yntrnto»e onthroniloli trpC p2; lynthatasi PR anthronilatc itomcras*- CompooertH Component I indoltglycirol PR anthranilott transfcrast t + glutamlnfl + PRPP tryptophan synttuloM tryptophan fi tyntlwtast " phosphate synthttatt CdRP choriimale—»anthrontlote^-*PR onthraniloti f trpA trpB » IBGP — — » • L-tryptophon \ +L-ierino promoter trpL attenuator trpE operator \ transcription start site transcription termination site transcription pause site Figure 1. Organization of the structural and regulatory regions of the trp operon of 15. coli. The 5 polypeptide products and the reactions they catalyze are indicated. Nucleotides are numbered from the transcription start site. The number preceding each gene corresponds to the A of the respective translation start codon. For other definitions see text. length. Rho-dependent transcription termination occurs in the region following the last gene, trpA (6). In the following sections we will discuss the nucleotide sequence of the operon in the context of structurally significant features and biologically important functions. A. Regulatory regions 1. The major promoter and operator regions: The sequence of the promoter-operator segment of the operon is presented in Figure 2. Methylation-protection studies with the near-identical promoter-operator region of Salmonella typhimurium have identified sites of close contact between RNA polymerase and promoter, and trp repressor and operator (7). Mutational analyses (1, 8, 9) summarized in the figure support the conclusion that the -35 and -10 regions of the promoter play key roles in polymerase recognition. Binding-protection studies using promoter DNA and various restriction enzymes also establish the importance of the -35 and -10 regions 6648 Nucleic Acids Research -40 -30 -20 -10 »1 G C T G T T G A C A A T T A A T C A T C G A A C T A G T T A A C T A G T A C G C A A G C G A C A A C T G T T A A T T A G T A G C T T G A T C A A T T G A T C A T G C S T T C • 1 A A . . A G . . A A . . . G G G . . G . . . G A . . A A . . . G . . . G . . G G methyln-Hon G protectionrepressor 4i G . . . G . . . G G G C G ^ T A G T C A ^ ^ AC T G T A ^ G C A T T A ^ ^ G C C G mcthylation protection-RNA polyrocrase promoter-down mutations A G T C ^ ^ CT G A operatorconstitutive mutations Figure 2. The 40 bp preceding the transcription start site, their protection by repressor or RNA polymerase, and mutational changes that affect promoter or operator function. G or G indicates hypermethylation of this base. to polymerase binding (9, 10). The TA base pair at -8 in particular appears to be essential for efficient initiation (9). Some mutational changes in the promoter region have little or no effect on promoter function, e.g., none of the operator constitutive mutations listed in the figure, except TA-+CG at position 13, alters promoter efficiency. Deletion analyses indicate that the region on the transcribed side of -5 may be replaced without appreciably affecting promoter function. Deletions extending to -8, however, essentially eliminate promoter activity unless a TA base pair is introduced at bp -7 or -8 (9). The sequence of the -35 region of the trp promoter is conserved in several enteric species and resembles closely the consensus prokaryotic -35 sequence (11). The -10 region of the promoter also is highly conserved, however this region of the trp promoter - also a segment of trp £ - does not resemble the consensus -10 promoter sequence (12). The repressor-binding site, the 20 bp receding the transcription start site, is clearly within the promoter (2), explaining the in vitro observation that trp repressor and RNA polymerase binding are mutually exclusive (13). The sequence of the operator region is highly conserved in other enterobacterial trp operons. Binding sites for the trp repressor also exist 6649 Nucleic Acids Research in the aroH (2, 12) and trpR promoters (2, 14) of E. coli; ten of the 18 bp in the operators of these three operons are invariant (2). 2. The minor internal promoter: A secondary promoter, trp p2, in the distal portion of trpD, is responsible for 60-80% of the basal levels of distal gene products (trpC, trpB, and trpA polypeptides) produced when E. coli is grown with excess tryptophan, i.e. under conditions of maximum repression (15). A comparison of the nucleotide sequence of the trp p2 region to the consensus promoter sequence of 12. coli (Figure 3) raises some interesting questions (16). The best match overall between trp p2 and the canonical region eliminates what is generally regarded as the "invariant T" of the Pribnow box, but shifting register by two nucleotides to regain a T in this position substantially reduces the consensus alignment. The location of the 51 end of the transcript does not help to discriminate between the two possibilities (16). It is noteworthy that the codon changes necessary to improve homology would have severe consequences for the polypeptide itself, and that translation across this region must already accommodate a series of codons that are rarely used in ji. coli (16, 17). We believe that these observations reflect the dual constraints of amino acid sequence requirement and promoter function on the sequence of this site. The function of trp p2 in vivo has been postulated to be a mechanism to aid the cell if it encounters a rapid shift from an environment with plentiful tryptophan to one of severe tryptophan starvation (16, 18). The appreciable levels of trpC, trpB and trpA polypeptides in cells grown with excess tryptophan, and d_e novo synthesis of the tryptophan-deficient trpE and tryptophan-poor trpD polypeptides when cells are shifted to a tryptophan-free medium, would allow the bacterium to recover quickly from a decreased capacity to synthesize the amino acid. The precise physiological importance of trp p2 remains unknown, but its preservation among numerous species of enteric bacteria (19) argues that its function may be an important one. TRP PI -3S -10 TTGACA TATAATG CCGTGACATTTTAACACGTTTGTTACAAGGTAAAGGCGACG* 3130 3156 T A T A A T G -10 Figure 3. Minor i n t e r n a l promoter, t r p p2. Numbers are nucleotide positions in the trpD segment of the operon. The arrow represents the s t a r t - p o i n t of t r a n s c r i p t i o n from trp p2 in v i t r o . Consensus sequences for the -35 and -10 regions are shown (there are two a l t e r n a t i v e s for the -10 region - see t e x t ) . 6650 Nucleic Acids Research 3. Leader-attenuator: The trp operon of E. coli is typical of many amino acid biosynthetic operons in being regulated by attenuation—transcription termination control at a site preceding the major structural genes of the operon (3). The leader region of the trp operon (Figure 4) encodes a 14-residue peptide with adjacent tryptophan residues at positions 11 and 12 (3). Gene fusion studies have shown that translation initiation does occur at the leader peptide start codon j^n vivo (20, 21), but efforts to isolate and identify the peptide have been unsuccessful. The transcript of the leader region can form alternate secondary structures (Figure 4) (3, 22, 23) which are believed to determine whether transcription terminates or continues through the attenuator. Models of attenuation are based on the premise that one particular secondary structure (C, Figure 4) signals transcription termination (3, 23, 24, 25). If tryptophan is abundant, translation of the leader peptide coding region will be closely coupled to transcription of the leader segment of the operon. The translating ribosome will prevent formation of the proximal MetLysAlallePheValleuLysGlyTrpTrpArgThrSer 20 <tO 80 60 100 120 140 160 ATGCGTAAAGCAATCAGATACCCAGCCCGCCTAATGAGCGGGCTTTTTTTTGAACAAAATTAGAGAATAACA -Cc-r-^ G C G C' U G G C-90 C-G i.y C-G 0-A u c b-G $•<> U G <? t V4 (A) G-C c-6 C-G C-G-I3O G-C C-G, A ,C-G- (C) G C G C A C U U C C U y-A' A'u U-A-MO G-C U C G-C "jO A A UUUUUIHJUC-G C G-C G G-C G G-C G 70-G' C-G C A...A C G,. A ,C G-* (B) Figure 4. The leader region of the trp operon (transcript- equivalent strand) and the amino acid sequence of the putative leader peptide. The three alternative RNA secondary structures that are believed to form are drawn below the sequence. 6651 Nucleic Acids Research secondary structures (A) and (B), permitting (C), and thus termination. When the cell is starved for tryptophan, however, the translating ribosome will stall at one of the tandem Trp codons, preventing formation of structure (A) but permitting alternate structure (B) to form. This will preclude formation of structure (C), and polymerase will transcribe through the attenuator site into the structural genes. If the ribosome fails to initiate translation soon after transcription has occurred , e.g., due to a deficiency of fMet-tRNA, structure (A) would form, and hence structure (C). Under these conditions termination would be expected to be maximal. In fact, when cells cannot initiate synthesis of the leader peptide, termination is 5-fold greater than when they have an adequate supply of tryptophan (25, 26). Thus ribosome location on the leader segment of the transcript while this segment is being synthesized determines the secondary structure of the transcript. This in turn directs the transcribing RNA polymerase to terminate or to continue transcription. The pause in transcription observed at a site located between bp 50-90 (29,30) may insure that the newly-loaded ribosome can remain coupled to the transcription complex. Thus the two major factors to which attenuation responds are the level of tryptophan in the cell (via charged tRNA the functional efficiency of the translation apparatus. ) , and Considerable iri vivo and in vitro evidence exists for this model of transcription control among many of the araino acid biosynthetic operons (3-6, 24-28). 4. Termination at the end of the operon: The DNA following the last structural gene in the trp operon encodes what appears to be a complex set of transcription termination signals. Iji vivo, the mRNA transcribed from the trp operon ends efficiently at a site called trp t_, 36 bp past trpA (Figure 5) (31). The 3' terminal sequence CAUUUU_ is immediately preceded by a GC-rich region of dyad symmetry in the template, which allows formation of the RNA hairpin characteristic of most prokaryotic terminators (11). Termination in this region can be affected by lesions in RNA polymerase, rho factor, and in DNA itself (6, 32-35). Analysis of mutations of the latter class gave surprising results - several deletions which relieved termination had endpoints that were distal to the point corresponding to the normal 31 end of the transcript (33). Transcription studies jji vitro revealed that upon addition of rho factor, polymerase terminates at a second site, trp t/ , located about 250 nucleotides downstream from the first (Figure 5) (34). This site resides in a region of the genome removed by all eight of the distal deletions that have been 6652 Nucleic Acids Research 6690 67)0 6730 6750 6770 TAATCCCACAGCCGCCAGTTCCGCTGGCGGCATTTTAACTTTCTTTAATGAAGCCGGAAAAATCCTAAATTCATTTAATATTTATCTTTT 6790 6810 6830 6850 6870 6890 6910 6930 6950 CACCAGTAAAATCAATAATTTTCTCTAAGTCACTTATTCCTCAGGTCCTTGTTAATATATCCAGAATGTTCCTCAAAATATATTTTCCCT 6970 6990 7010 CTATCTTCTCGTTGCGCTTAATTTGACTAATTCTCATTAGCGACTAATTTTAATGAGTGTCGAC C U C UG G-C A-U C-G -6710 C-G G-C C-G C-G G-C 6694- A AUUUU 6989 UUAAUUUGACUAAUUCUCAUU (t'l It) Figure 5. Termination region at the end of the trp operon. The sequence begins with the stop codon at the end of trpA and is shown through the Sal I site. Left - RNA secondary structure corresponding to trp t^; the transcript terminates at the fourth U. Right - arrows indicate the sites of rho-dependent termination at trp t^' . examined in detail, and its presence explains the failure of the initial selection to detect mutations in the trp t_ region alone (read-through transcription would be halted at trp t^' with high efficiency) . We do not yet understand how the removal of only the distal site (trp t/ ) can affect the behavior of RNA polymerase at the intact proximal site (trp t) iji vivo. Nor is it clear why rho factor is required for producing the correct 3' terminus at trp t_ iji vivo, yet ^n vitro the presence of rho has no effect on termination at this first site. The explanation may lie in structural interactions within the DNA, as well as in the participation of additional factors in the cell. Although the responses of the two sites are qualitatively different in vitro (34), both appear to be required for correct function rn vivo . The primary structure of the trp t/ region (Figure 5) is also highly unusual, bearing little similarity to a normal termination site. Although it is very AT-rich, it is notable for its lack of other defining features, in particular 6653 Nucleic Acids Research the absence of s i g n i f i c a n t symmetry elements. Enhanced efficiency and s p e c i f i c i t y of termination at trp t_' i s observed when rho i s used in the presence of nusA protein (6), but the contribution of this l a t t e r factor i s only beginning t o be studied. It seems likely that control of t r a n s c r i p t i o n termination a t the end of the operon i s more i n t r i c a t e than previously thought. B. Structural genes and punctuation 1. The coding region for the five polypeptides: The nucleotide sequences of the 5 s t r u c t u r a l genes of the operon (18, 36-40) and t h e i r i n t e r c i s t r o n i c regions are presented in Figure 6. The deduced amino acid sequences are also indicated in the f i g u r e . [Codon usage in the 5 s t r u c t u r a l genes and amino acid sequence comparisons are discussed in section I I ] . The five s t r u c t u r a l genes of the operon code for two bifunctional polypeptides and two pairs of polypeptides that form tetrameric enzyme complexes. The bifunctional trpD polypeptide c o n s i s t s of a glutamine amidotransferase domain, designated the trpG portion by analogy to the monofunctional trpG polypeptide of other organisms, followed by an anthranilate phosphoribosy1 transferase or trpD domain (Figure 1) (41, 4 2 ) . Similarly the trpC polypeptide has two domains, the amino terminal half corresponding to the monofunctional trpC polypeptide of other organisms and catalyzing the indole-3-glycerol phosphate synthetase reaction, and the d i s t a l half analogous to trpF polypeptides and catalyzing the isomerization of phosphoribosylanthranilate (42-44). The trpE and trpD polypeptides form a tetrameric functional complex (45,46) that catalyzes the r e a c t i o n s : chorismate + glutamine ->. a n t h r a n i l a t e and anthranilate + PRPP •* phosphoribosylanthranilate. The trpB and trpA polypeptides similarly are components of an ao Bo c o m p l e x t h a t catalyzes the reaction: indole-3-glycerol phosphate + L-serine -+ L-tryptophan (47, 48). The order of the s t r u c t u r a l genes and their domain-encoding segments resembles the sequential order of the functions of the corresponding polypeptide domains in tryptophan synthesis with the exceptions that the trpC region precedes the trpF region, and trpB precedes trpA. It i s not known whether trp £2 might be responsible for the gene order t h a t has evolved in the e n t e r i c b a c t e r i a . 2. Ribosome binding sites and i n t e r c i s t r o n i c regions: The six known ribosome binding s i t e s in the operon are i l l u s t r a t e d in Figure 7. In a l l cases the i n i t i a t i n g AUG codon i s immediately preceded by a pyrimidine-purine pair, and a l l possess a purine rich sequence complementary to the 3' end of 16S rRNA, as expected. However, these regions are more extensive than 6654 Nucleic Acids Research TRP E 170 ATG CAA AC* MET GIN THR 1 260 GAT CGT CCG ASP ARG PRO ISO 190 CAA AAA CCG ACT CTC G»A CTG CTA GLN LYS PRO THS LEU GLU LEU LEU 10 270 2QO GCA ACG CTG CTG CTG GAA TCC GCA ALA THR LEU LEU LEU GLU SER ALA 200 210 220 2J0 240 250 ACC TGC GAA GGC GCT TAT CGC GAC AAT CCC ACC GCG CTT TTT CAC CAG TTG TGT GGG THR CYS GLU GLY ALA TYR ARG ASP ASM PRO THR ALA LEU PHE H I S GLN LEU CYS GLY 20 10 290 300 310 320 330 340 GAT ATC GAC AGC AAA GAT GAT TTA AAA AGC CTG CTG CTG GTA GAC AGT GCG CTG CGC ASP I L E ASP SER LYS ASP ASP LEU LYS SER LEU LEU LEU VAL ASP SCR ALA LEU ARG 40 50 60 350 360 370 380 390 GOO 410 420 430 ATT ACA GCT TTA GGT GAC ACT GTC ACA ATC CAG GCA CTT TCC GGC AAC GGC GAA GCC CTC CTG GCA CTA CTG GAT AAC GCC CTG CCT GCG ILE THR ALA LEU GLY ASP THR VAL THR I L E GLN ALA LEU SER GLY ASM GLY GLU ALA LEU LEU ALA LEU LEU ASP ASN ALA LEU PRO ALA 70 60 90 WO $50 460 470 480 490 500 510 520 GGT GTG GAA AGT GAA CAA TCA CCA AAC TGC CGT GTG CTG CGC TTC CCC CCT GTC AGT CCA CTG CTG GAT GAA GAC GCC CGC TTA TGC TCC GLY VAL GLU SER GLU GLN SER PRO ASN CYS ARG VAL LEU ARG PKE PRO PRO VAL SER PRO LEU LEU ASP GLU ASP ALA APG LEU CYS SER 100 tlO 120 530 540 550 560 570 560 590 600 610 CTT TCG GTT TTT GAC GCT TTC CGT TTA TTG CAG AAT CTG TTG AAT GTA CCS AAG GAA GAA CGA GAA GCC ATG TTC TTC AGC GGC CTG TTC LEU SER VAL PHE ASP ALA PHE ARG LEU LEU GLN ASN LEU LEU ASN VAL PRO LYS GLU GLU ARG GLU ALA HET PKE PHE SER GLY LEU PHE 130 140 150 '20 630 640 650 660 670 6S0 690 700 TCT TAT GAC CTT GTG GCG GGA TTT GAA GAT TTA CCG CAA CTG TCA GCG GAA AAT AAC TGC CCT GAT TTC TGT TTT TAT CTC GCT GAA ACG SER TYR ASP LEU VAL ALA GLY PHE GLU ASP LEU PRO GLN LEU SER ALA GLU ASN ASN CYS PRO ASP PHE CYS PHE TYH LEU ALA GLU THR 160 170 180 710 720 730 740 750 760 770 760 790 CTG ATG GTG ATT GAC CAT CAG AAA AAA AGC ACC CGT ATT CAG GCC AGC CTG TTT GCT CCG AAT GAA GAA GAA AAA CAA CGT CTC ACT GCT LEU HET VAl ILE ASP HIS GLN LYS LYS SER THS ARG ILE GLN ALA SER LEU PHE ALA PRO ASN GLU GLU GLU LYS GLN ARG LEU THR ALA 190 200 210 800 810 820 630 640 650 860 670 880 CGC CTG AAC GAA CTA CGT CAG CAA CTG ACC GAA GCC GCG CCG CCG CTG CCA GTG GTT TCC GTG CCG CAT ATG CGT TGT GAA TGT AAT CAG ARG LEU ASN GLU LEU ARG GIN GLN LEU THR GLU ALA ALA PRO PRO LEU PRO VAL VAL SER VAL PRO HIS MET ARG CYS GIU CYS ASN GLN 220 230 240 690 900 910 920 930 940 950 960 970 AGC GAT GAA GAG TTC GGT GGC GTA GTG CGT TTG TTG CAA AAA GCG ATT CGC GCT GGA GAA ATT TTC CAG GTG GTG CCA TCT CGC CGT TTC 5ER ASP GLU GLU PHE GLY GLY VAL VAL ARG LEU LEU GLN LYS ALA H E ARG ALA GLY GLU ILE PHE GLN VAL VAL PRO SER ARG ARG PHE 250 260 270 980 990 1000 1010 1020 1030 1040 1050 1060 TCT CTG CCC TGC CCG TCA CCG CTG GCG GCC TAT TAC GTG CTG AAA AAG AGT AAT CCC AGC CCG TAC ATG TTT TTT ATG CAG GAT AAT GAT 5ER LEU PRO CYS PRO SER PRO LEU ALA ALA TYR TYR VAL,LEU LYS LYS SER ASN PRO SER PRO TYP MET PHE PHE MET GLH ASP ASN ASP 280 290 300 1070 1080 1090 1100 1110 1120 1130 1140 1150 TTC ACC CTA TTT GGC GCG TCG CCG GAA AGC TCG CTC AAG TAT GAT GCC ACC AGC CGC CAG ATT GAG ATC TAC CCG ATT GCC GGA ACA CGC PHE THR LEU PHE GLY ALA SER PRO GLU SER SER LEU LYS TYR ASP ALA TKR SER ARG GLN ILE GLU ILE TYR PRO ILE ALA GLY THR ARG 310 JCO 330 "60 1170 1180 1190 1200 1210 1220 1230 1240 CCA CGC GGT CGT CGC GCC GAT GGT TCA CTG GAC AGA GAT CTC GAC AGC CGT ATT GAA CTG GAA ATG CGT ACC GAT CAT AAA GAG CTG TCT PRO ARG GLY ARG ARG ALA ASP GLY SER LEU ASP ARG ASP LEU ASP SER ARG ILE GLU LEU GLU HET ARG THR ASP HIS LYS GLU LEU SER 340 350 360 1250 1260 1270 1280 1290 1300 1310 1320 1330 GAA CAT CTG ATG CTG GTT GAT CTC GCC CGT AAT GAT CTG GCA CGC ATT TGC ACC CCC GGC AGC CGC TAC GTC GCC GAT CTC ACC AAA GTT GLU HIS LEU MET LEU VAL ASP LEU ALA ARG ASN ASP LEU ALA ARG ILE CYS THR PRO GLY SER ARG TYR VAL ALA ASP LEU THR LYS VAL 370 300 390 1340 1350 1360 1370 1380 1390 1400 1410 1420 GAC CGT TAT TCC TAT GTG ATG CAC CTC GTC TCT CGC GTA GTC GGC GAA CTG CGT CAC GAT CTT GAC GCC CTG CAC GCT TAT CGC GCC TGT ASP ARG TYR SER TYR VAL MET HIS LEU VAL SER ARG VAL VAL GLY GLU LEU ARG HIS ASP LEU ASP ALA LEU HIS ALA TYR ARG ALA CYS 400 410 420 1430 1440 1450 1460 1470 1460 t490 1500 1510 ATG AAT ATG GGG ACG TTA AGC GGT GCG CCG AAA GTA CGC GCT ATG CAG TTA ATT GCC GAG GCG GAA GGT CGT CGC CGC GGC AGC TAC GGC MET ASN (1ET GLY THR LEU SER GLY ALA PRO LYS VAL ARG ALA MET GLN LEU ILE ALA GLU ALA GLU GLY ARG ARG ARG GLY SER TYR GLY 430 440 450 1520 1530 1540 1550 1560 1570 1580 1590 1600 GGC GCG GTA GGT TAT TTC ACC GCG CAT GGC GAT CTC GAC ACC TGC ATT GTG ATC CGC TCG GCG CTG GTG GAA AAC GGT ATC GCC ACC GIG GLY ALA VAL GLY TVR PHE THR ALA HIS GLY ASP LEU ASP THR CYS ILE VAL ILE ARG SER ALA LEU VAL GLU ASN GLY ILE ALA THR VAL 460 470 480 1610 1620 1630 1640 1650 1660 1670 1680 1690 CAA GCG GGT GCT GGT GTA GTC CTT GAT TCT GTT CCG CAG TCG GAA GCC GAC GAA ACC CGT AAC AAA GCC CGC GCT GTA CTG CGC GCT ATT GLN ALA GLY ALA GLY VAL VAL LEU ASP SER VAL PRO GLN SER GLU ALA ASP GIU THR ARG ASN LYS ALA ARG ALA VAL LEU ARG ALA U E 490 500 510 1700 1710 1720 GCC ACC GCG CAT CAT GCA CAG GAG ACT TTC TG ALA THR ALA HIS HIS ALA GLN GLU THR PHE END 520 Figure 6(i) required for b a s e - p a i r i n g to 16S RNA. A consensus sequence A-G-G-Pu-Pu-A occurs four t o seven n u c l e o t i d e s before the i n i t i a t o r codon, and only 3 out of 36 n u c l e o t i d e s in the s i x s i t e s d i f f e r from t h i s p a t t e r n ( 4 9 ) . The e x t e n t of s t r i c t b a s e - p a i r i n g between mRNA and rRNA would vary over a wide range among these s i t e s , but other f a c t o r s including t r a n s l a t i o n of the preceding 6655 Nucleic Acids Research TRP 0 1730 ATG CCT MET ALA 1 1820 TAC CGC TYR ARG 1740 1750 1760 1770 1780 1790 1800 1610 GAC »TT CTG CTG CTC GAT »AT ATC GAC TCT TTT »CG TAC AAC CTG GCA GAT CAG TTG CGC AGC AAT GGG CAT AAC GTG GTG ATT ASP ILE LEU LEU LEU ASP ASN ILE ASP SER PHE THR TrR A5N LEU ALA ASP GLN LEU ARG SEP. ASN GLr HIS ASN VAL VAL ILE 10 10 30 1830 1840 1850 I860 1870 1880 1890 1900 AAC CAT ATA CCG GCG CAA ACC TTA ATT GAA CGC TTG GCG ACC ATG ACT AAT CCG GTG CTG ATG CTT TCT CCT GGC CCC GGT GTG ASN HIS ILE PRO ALA GLN THR LEU ILE GLU ARG LEU ALA THR MET SER ASN PRO VAL LEU MET LEU SER PRO GLY PRO GLY VAL 40 50 60 1910 1920 1930 1910 1950 1960 1970 1900 1990 CCG AGC GAA GCC GGT TGT ATG CCG GAA CTC CTC ACC CGC TTG CGT GGC AAG CTG CCC ATT ATT GGC ATT TGC CTC GGA CAT CAG GCG ATT PRO SER GLU ALA GLY CYS MET PRO GLU LEU LEU THR ARG LEU ARG GLY LYS LEU PRO ILE ILE GLY ILE CYS LEU GLY HIS GLN ALA ILE 70 80 90 £000 2010 2020 2030 2040 2050 2060 2070 2080 GTC GAA GCT TAC GGG GGC TAT GTC GGT CAG GCG GGC GAA ATT CTC CAC GGT AAA GCC TCC AGC ATT GAA CAT GAC GGT CAG GCG ATG TTT VAL GLU ALA TYR GLY GLY TYR VAL GLY GLN ALA GLY GLU H E LEU HIS GLY LYS ALA SER SER ILE GLU HIS ASP GLY GLN ALA HET PHE 100 110 120 2090 2100 2110 2120 2130 2140 2150 2160 2170 GCC GGA TTA ACA AAC CCG CTG CCG GTG GCG CGT TAT CAC TCG CTG GTT GGC AGT AAC ATT CCG GCC GGT TTA ACC ATC AAC GCC CAT TTT ALA GLY LEU THR ASN PRO LEU PRO VAL ALA ARG TVR HIS SER LEU VAL GLY SER ASN ILE PRO ALA GLY LEU 1HR ILE ASN ALA HIS PHE 130 140 150 2180 2190 £200 2210 2220 2230 2240 2250 2260 AAT GGC ATG GTG ATG GCA GTA CGT CAC GAT GCG G«T CGC GTT TGT GGA TTC CAG TTC CAT CCG GAA TCC ATT CTC ACC ACC CAG GGC GCT ASN GLY MET VAL MET ALA VAL ARG HIS ASP ALA ASP ARG VAL CYS GLY PHE GLN PHE HIS PRO GLU SER ILE LEU THR THR GLN GLY ALA 160 170 180 2270 2280 2290 2300 2310 2320 2330 2340 2350 CGC CTG CTG GAA CAA ACG CTG GCC TGG GCG CAG CAT AAA CTA GAG CCA GCC AAC ACG CTG CAA CCG ATT CTG GAA AAA CTG TAT CAG GCG ARG LEU LEU GLU GLN THR LEU ALA TRP ALA GLH HIS LYS LEU GLU PRO ALA ASN THH LEU GLN PRO ILE LEU GLU LYS LEU TVR GLN ALA 190 200 210 2360 2370 2380 2390 2400 2410 2420 2430 2440 CAG ACG CTT AGC CAA CAA GAA AGC CAC CAG CTG TTT TCA GCG GTG GTG CGT GGC GAG CTG AAG CCG GAA CAA CTG GCG GCG GCG CTG GTG GLN THR LEU SER GLN GLN GLU SER HIS GLN LEU PHE SER ALA VAL VAL ARG GLY GLU LEU LYS PRO GLU GLN LEU ALA ALA ALA LEU VAL 220 230 240 2450 2460 2470 2480 2490 2500 2510 2520 2530 AGC ATG AAA ATT CGC GGT GAG CAC CCG AAC GAG ATC GCC GGG GCA GCA ACC GCG CTA CTG GAA AAC GCA GCG CCG TTC CCG CGC CCG GAT SER MET LYS ILE ARG GLY GLU HIS PRO ASN GLU ILE ALA GLY ALA ALA THR ALA LEU LEU GLU ASN ALA ALA PRO PHE PRO ARG PRO ASP 250 260 270 2540 2550 2560 2570 2560 2590 2600 2610 2620 TAT CTG TTT GCT GAT ATC GTC GGT ACT GGC GGT GAC GGC AGC AAC AGT ATC AAT ATT TCT ACC GCC AGT GCG TTT GTC GCC GCG GCC TGT TYR LEU PHE ALA ASP ILE VAL GLY THR GLY GLY ASP GLY SER ASN SER ILE ASN ILE SER TKR ALA SER ALA PHE VAL ALA ALA ALA ;Y5 280 290 300 2630 2640 2650 2660 2670 2680 2690 2700 2710 GGG C7G AAA GTG GCG AAA CAC GGC AAC CGT AGC GTC TCC AGT AAA TCT GGT TCG TCC GAT CTG CTG GCG GCG TTC GGT ATT AAT CTT GAT GLY LEU LYS VAL ALA LYS HIS GL V ASN ARG SER VAL SER SER LYS SER GLY SER SER ASP LEU LEU ALA ALA PHE GLY ILE ASN LEU ASP 310 320 330 2720 2730 2740 2750 2760 2770 2760 2790 2800 ATG AAC GCC GAT AAA TCG CGC CAG GCG CTG GAT GAG TTA GGT GTA TGT TTC CTC TTT GCG CCG AAG TAT CAC ACC GGA TTC CGC CAC GCG MET ASN ALA ASP LVS SER ARG GLN ALA LEU ASP GLU LEU GLY VAL CYS PHE LEU PHE ALA FRO LYS TYR HIS THR GLY PHE ARG HIS ALA 340 350 360 2810 2820 2630 2840 2850 2860 2670 2680 2890 ATG CCG GTT CGC CAG CAA CTG AAA ACC CGC ACC CIG TTC AAT GIG CTG GGG CCA TTG ATT AAC CCG GCG CAT CCG CCG CTG GCG TTA ATT MET PRO VAL ARG GLN GLN LEU LYS THR ARG THR LEU PNE ASN VAL LEU GLY PRO LEU ILE ASN PRO ALA HIS PRO PRO LEU ALA LEU ILE 370 360 390 2900 2910 2920 2930 2940 2950 2960 2970 2980 GGT GTT TAT AGT CCG GAA CTG GTG CTG CCG ATT GCC GAA ACC TTG CGC GTG CTG GGG TAT CAA CGC GCG GCG GTG GTG CAC AGC GGC GGG GLY VAL TYR SER PRO GLU LEU VAL LEU PRO ILE ALA GLU 1HR LEU ARG VAL LEU GLY TYR GLN ARG ALA ALA VAL VAL HIS SER GLY GLY 400 410 420 2990 3000 3010 3020 3030 3040 3050 3060 3070 ATG GAT GAA GTT TCA TTA CAC GCG CCG ACA ATC GTT GCC GAA CTG CAT GAC GGC GAA ATT AAA AGC TAT CAG CTC ACC GCA GAA GAC TTT MET ASP GLU VAL SER LEU HIS ALA PRO TKR ILE VAL ALA GLU LEU HIS ASP GLY GLU ILE LYS SER TYR GLN LEU THR ALA GLU ASP PHE 430 440 450 3030 3090 3100 3110 3120 3130 3140 3150 3160 GCC CTG ACA CCC TAC CAC CAG GAG CAA CTG GCA GGC GGA ACA CCG GAA GAA AAC CGT GAC ATT TTA ACA CGT TTG TTA CAA GGT AAA GGC GLY LEU THR PRO TYR HIS GLN GLU GLN LEU ALA GLY GLY THR PRO GLU GLU ASH ARG ASP H E LEU THR ARG LEU LEU GLN GLY LYS GLY 460 470 480 3170 3180 3190 3200 3210 3220 3230 3240 3250 GAC GCC GCC CAT GAA GCA GCC GTC GCT GCG AAC GTC GCC ATG TTA ATG CGC CTG CAT GGC CAT GAA GAT CTG CAA GCC AAT GCG CAA ACC ASP ALA ALA HIS GLU ALA ALA VAL ALA ALA ASN VAL ALA MET LEU MET ARG LEU HIS GLY HIS GLU ASP LEU GLN ALA ASN ALA GLN THR 490 500 510 3260 3270 3260 3290 3300 3310 3320 GTT CTT GAG GTA CTG CGC AGT GGT TCC GCT TAC GAC AGA GTC ACC GCA CTG GCG GCA CGA GGG TAA VAL LEU GLU VAL LEU ARG SER GLY SER ALA TYR ASP ARG VAL THR ALA LEU ALA ALA ARG GLY END 520 530 Figure 6(ii) gene almost c e r t a i n l y a f f e c t the e f f i c i e n c y of i n i t i a t i o n . A more d e t a i l e d comparative a n a l y s i s of these regions i s presented elsewhere ( 4 9 ) . The four i n t e r c i s t r o n i c regions in the p o l y c i s t r o n i c t r p mRNA are also shown in Figure 7, and display some unusual f e a t u r e s . Two of them c o n s i s t of a chain termination codon that overlaps the subsequent i n i t i a t o r codon by one 6656 Nucleic Acids Research 3330 3340 3350 3360 3370 ATG ATG CAA ACC GTT TTA GCG AAA ATC GTC GCA GAC AAG GCG ATT TGG GTA MET GUI THR VAL LEU ALA LYS ILE VAL ALA ASP LYS ALA ILE TRP VAL 10 3420 3430 3440 3450 3460 GAG GTT CAG CCG AGC ACG CGA CAT TTT TAT GAT GCG CTA CAG GGT GCG CGC GLU VAL GUI PRO SER THR ARG H I S PHE TYR ASP ALA LEU GLN GLY ALA ARG 40 3510 3520 3530 3540 3550 AAA GGC GTG ATC CGT GAT GAT TTC GAT CCA GCA CGC ATT GCC GCC ATT TAT LYS GLT VAL I L E ARG ASP ASP PHE ASP PRO ALA ARG I L E ALA ALA I L E TYR 60 70 3600 TTC AGG GGT LYS TYR PHE ARG GLT 90 3690 TAC CAG ATC TAT CTG TYR GLN I L E TYR LEU 120 3780 GCC GTC GCT CAC AGT ALA VAL ALA H I S SER 150 3370 GTT GGC ATC AAC AAC VAL GLY I L E ASN ASH 160 3960 GTA ATC AGC GAA TCC VAL I L E SER GLU SER 4050 GCC CAT GAC GAT ALA HIS ASP ASP 3610 3620 AGC TTT AAT TTC CTC SER PHE ASH PHE LEU 3710 TAC CAG GCC GAT TYR GLH ALA ASP 130 3800 3790 CTG GAG ATG GGG GTG CTG ACC LEU GLU MET GLY VAL LEU THR 160 3890 3830 CGC GAT CTG CGT GAT TTG TCG ARG ASP LEU ARG ASP LEU SER 19Q 3980 3970 GGC ATC AAT ACT TAC GCT CAG GLY I L E ASN THR TYR ALA GLN 220 4070 4060 3700 GCG CGC TAT ALA ARG TYR GCC GTG CGC CGG GTG ALA VAL ARG ARG VAL 250 4160 4150 GCG ATT TAC GGT GGG TTG ATT ALA I L E TYR GLY GLY LEU ILE 280 4250 4240 TAT GTT GGC GTG TTC CGC AAT TYR VAL GLY VAL PHE ARG ASN 310 4340 4330 GAA GAA CAG CTG TAT ATC GAT GLU GLU GLH LEU TYR ILE ASP 340 4430 4420 GCC CGC GAG TTT CAG CAC GTT ALA ARG GLU PHE GLN HIS VAL 370 4520 4510 CAA ACG CTT GGC AAC GTT CTG GLN THR LEU GLY ASN VAL LEU 400 4610 4600 AAT TCT GCT GTA GAG TCG CAA ASN SER ALA VAL GLU SER GLN 430 TTG CAC GCC LEU H I S ALA 4140 GCT TAT GAC GCG GGC ALA TYR ASP ALA GLY 270 4230 GCG GCA CCG TTG CAG ALA ALA PRO LEU GLN 300 4320 CAA CTG CAT GGT AAT GLN LEU H I S GLY ASN 330 4410 GGT GAA ACC CTG CCC GLY GLU THR LEU PRO 360 4500 TCA CTA TTA AAT GGT SER LEU LEU ASN GLY 390 4540 GCC GGA CTT GAT TTT ALA GLY LEU ASP PHE 420 CCC ATC FRO I L E 100 3380 GAA GCC CGC AAA CAG GLU ALA ARG LYS GLH 20 3470 ACG GCG TTT ATT CTG THR ALA PHE ILE LEO 50 3560 AAA CAT TAC GCT TCG LYS HIS TYR ALA SER 80 3650 3630 3640 GTC AGC CAA ATC GCC CCG CAG CCG ATT TTA VAL SER GLN ILE ALA PRO GLN PRO ILE LEU 110 3740 3720 3730 GCC TGC TTA TTA ATG CTT TCA GTA CTG GAT ALA CYS LEU LEU MET LEU SER VAL LEU ASP t40 3830 3310 3820 GAA GTC AGT AAT GAA GAG GAA CAG GAG CGC GLU VAL SER ASH GLU GLU GLU GLN GLU ARG 170 3920 3900 3910 ATT GAT CTC AAC CGT ACC CGC GAG CTT GCG I L E ASP LEU ASN ARG THR ARG GLU LEU ALA 200 3990 4010 4000 GTG CGC GAG TTA AGC CAC TTC GCT AAC GGT VAL ARG GLU LEU SER HIS PHE ALA ASN GLY 230 4080 4090 4100 TTG CTG GGT GAG AAT AAA GTA TGT CTG LEU LEU GLY GLU ASH LYS VAL CYS LEU 3390 3400 3410 CAG CAA CCG CTG GCC AGT TTT CAG GLH GLN PRO LEU ALA SER PHE GLN 3480 3490 3500 GAG TGC AAG AAA GCG TCG CCG TCA GLU CYS LYS LYS ALA 5ER PRO SER 3570 3580 3590 GCA ATT TCG GTG CTG ACT GAT GAG ALA ILE SER VAL LEU THR ASP GLU 3660 3670 3680 TGT AAA GAC TTC ATT ATC GAC CCT CYS LYS ASP PHE ILE ILE ASP PRO 3750 3760 3770 GAC GAC CAA TAT CGC CAG CTT GCC ASP ASP GLH TYR ARG GLN LEU ALA 3640 3650 3660 GCC ATT GCA TTG GGA GCA AAG GTC ALA ILE ALA LEU GLY ALA LYS VAL 3930 3940 3950 CCG AAA CTG GGG CAC AAC GTG ACG PRO LYS LEU GLY HIS ASN VAL THR 4020 4030 4040 TTT CTG ATT GGT TCG GCG TTG ATG PHE LEU ILE GLY SER ALA LEU MET 4110 4120 4130 ACG CGT GGG CAA GAT GCT AAA GCA THR ARG GLY GLH ASP ALA LYS ALA 4170 4180 4190 TTT GCG ACA TCA CCG CGT TGC GTC PHE ALA THR SER PRO ARG CYS VAL 4200 4210 4220 GAT GAA CAG GCG CAG GAA GTG ATG GCT ASP GLU GLN ALA GLN GLU VAL MET ALA 4260 4270 4280 CAC GAT ATT GCC GAT GTG GTG GAC HIS ASP ILE ALA ASP VAL VAL ASP 4290 4300 4310 GCT AAG GTG TTA TCG CTG GTG GCA GTG ALA LYS VAL LEU SER LEU VAL ALA VAL 4350 4360 4370 ACG CTG CGT GAA GCT CTG CCA GCA THR LEU ARG GLU ALA LEU PRO ALA 4300 4390 4400 GTT GCC ATC TGG AAA GCA TTA AGC GTC VAL ALA ILE TRP LYS ALA LEU SER VAL 4440 4450 4460 GAT AAA TAT GTT TTA GAC AAC GGC ASP LYS TYR VAL LEU ASP ASH GLY 4470 4480 4490 GGT GGA AGC GGG CAA CGT TTT GAC TGG GLY GLY SER GLY GLN ARG PHE ASP TRP 4530 4540 4550 CTG GCG GGG GGC TTA GGC GCA GAT LEU ALA GLY GLY LEU GLY ALA ASP 4560 4570 4580 ! TGC GTG GAA GCG GCA CAA ACC GCC TGC I CYS VAL GLU ALA ALA GLN THR GLY CYS I 4650 4660 4670 I GCC TCG GTT TTC CAG ACG CTG CGC GCA I ALA SER VAL PHE GLN THR LEU ARG ALA 4620 4630 4640 CCG GGC ATC AAA GAC GCA CGT CTT PRO GLY ILE LYS ASP ALA ARG LEU TAT TAA TrR END 450 Figure 6(iii) nucleotide. It is curious that this occurs with the pairs of genes whose products are associated in multisubunit enzyme complexes. The possibility that the juxtaposition of translational signals may play a role in the coordinate synthesis of enzyme subunits is supported by observations on trpE-trpD expression (50). The trpC gene, whose product is unique among the five trp operon polypeptides in not being part of a multisubunit complex, is flanked by larger untranslated regions - 6 nucleotides at the 5' end, and 14 nucleotides at the 31 end. Though trpC is translated in the same reading 6657 Nucleic Acids Research 4680 GGAAAGGAACA SPACER TRP B 1690 ATG ACA MET THH I 4780 GCT TTT ALA PKE 4870 AAA LYS 4960 GTG VAL 5050 CTG LEU 5140 TTA LEU 5230 TAC TYR 5320 GAA GLU 5410 TTT PHE 5500 CTA LEU 5590 TCC SER 5680 GAA GLU 5770 ATG MET 5660 TTG LEU 4700 4710 4720 4730 4740 4750 4760 4770 ACA TTA CTT AAC CCC TAT TTT GGT GAG TTT GGC GGC ATG TAC GTG CCA CAA ATC CTG ATG CCT GCT CTG CGC CAG CTG GAA GAA THR LEU LEU ASH PRO TYR PHE GLY GLU PHE GLY GLY MET TYR VAL PRO GLN ILE LEU MET PRO ALA LEU ARC GLN LEU GLU GLU 10 20 30 4790 4800 4810 4820 4830 4840 4650 4860 GTC ACT GCG CAA AAA GAT CCT GAA TTT CAG GCT CAG TTC AAC GAC CTG CTG AAA AAC TAT GCC GGG CGT CCA ACC GCG CTG ACC VAL SER ALA GLN LYS ASP PRO GLU PHE GLN ALA GLN PHE ASN ASP LEU LEU LYS ASN TYR ALA GLY ARG PRO THR ALA LEU THR 50 60 40 4880 4890 4900 4910 4920 4930 4940 4950 TGC CAG AAC ATT ACA GCC GGG ACG AAC ACC ACG CTG TAT CTC AAG CGT GAA GAT TTG CTG CAC GGC GGC GCG CAT AAA ACT AAC CAG CYS GLN ASN ILE TH3 ALA GLY THR ASN THR TKR LEU TYR LEU LYS ARG GLU ASP LEU LEU HIS GLY GLY ALA HIS LYS THR ASN GLN 70 60 90 4970 4980 4990 5000 5010 5020 5030 5040 CTG GGG CAG GCG TTG CTG GCG AAG CGG ATG GGT AAA ACC GAA ATC ATC GCC GAA ACC GGT GCC GGT CAG CAT GGC GTG GCG TCG GCC LEU GLY GLN ALA LEU LEU ALA LYS ARG MET GLY LYS THR GLU ILE ILE ALA GLU THR GLY ALA GLY GLN HIS GLY VAL ALA SER ALA 100 110 120 5060 507Q 5080 5090 5100 5110 5120 5130 GCC AGC GCC CTG CTC GGC CTG AAA TGC CGT ATT TAT ATG GGT GCC AAA GAC GTT GAA CGC CAG TCG CCT AAC GIT TTT CGT ATG CGC ALA SER ALA LEU LEU GLY LEU LYS CYS ARG ILE TYR MET GLY ALA LYS ASP VAL GLU ARG GLN SER PRO ASN VAL PHE ARG MET ARG 130 140 150 5150 5160 5170 5180 5190 5200 5210 5220 ATG GGT GCG GAA GTG ATC CCG GTG CAT AGC GGT TCC GCG ACG CTG AAA GAT GCC TGT AAC GAG GCG CTG CGC GAC TGG TCC GGT AGT MET GLY ALA GLU VAL ILE PRO VAL HIS SER GLY SER ALA THR LEU LYS ASP ALA CYS ASN GLU ALA LEU ARG ASP TPP SER GLY SER 160 170 160 5240 5250 5260 5270 5260 5290 5300 5310 GAA ACC GCG CAC TAT ATG CTG GGC ACC GCA GCT GGC CCG CAT CCT TAT CCG ACC ATT GTG CGT GAG TTT CAG CGG ATG ATT GGC GAA GLU THR ALA HIS TYR MET LEU GLY THR ALA ALA GLY PRO HIS PRO TYR PRO THR ILE VAL ARG GLU PHE GLN ARG MET ILE GLY GLU 190 200 210 5330 5340 5350 5360 5370 5380 5390 5400 ACC AAA GCG CAG ATT CTG GAA AGA GAA GGT CGC CTG CCG GAT GCC GTT ATC GCC TGT GTT GGC GGC GGT TCG AAT GCC ATC GGC ATG THR LYS ALA GLN ILE LEU GLU ARG GLU GLY ARG LEU PRO ASP ALA VAL ILE ALA CYS VAL GLY GLY GLY SER ASN ALA ILE GLY MET 220 230 240 5420 5430 5440 5450 5460 5470 5460 5490 GCT GAT TTC ATC AAT GAA ACC AAC GTC GGC CTG ATT GGT GTG GAG CCA GGT GGT CAC GGT ATC GAA ACT GGC GAG CAC GGC GCA CCG ALA ASP PHE ILE ASN GLU THR ASH VAL GLY LEU ILE GLY VAL GLU PRO GLY GLY HIS GLY ILE GLU THR GLY GLU HIS GLY ALA PRO 250 260 270 5510 5520 5530 5540 5550 5560 5570 5S80 AAA CAT GGT CGC GTG GGT ATC TAT TTC GGT ATG AAA GCG CCG ATG ATG CAA ACC GAA GAC GGG CAG ATT GAA GAA TCT TAC TCC ATC LYS HIS GLY ARG VAL GLY ILE TYR PHE GLY MET LYS ALA PRO MET MET GLN THR GLU ASP GLY GLN ILE GLU GLU SER TYR SER ILE 260 290 300 5600 5610 5620 5630 5640 5650 5660 5670 GCC GGA CTG GAT TTC CCG TCT GTC GGC CCA CAA CAC GCG TAT CTT AAC AGC ACT GGA CGC GCT GAT TAC GTG TCT ATT ACC GAT GAT ALA GLY LEU ASP PHE PRO SER VAL GLY PRO GLN HIS ALA TYR LEU ASN SER THR GLY ARG ALA ASP TYR VAL SER ILE THR ASP ASP 310 320 330 5690 5700 5710 5720 5730 5740 5750 5760 GCC CTT GAA GCC TTC AAA ACG CTG TGC CTG CAC GAA GGG ATC ATC CCG GCG CTG GAA TCC TCC CAC GCC TTG GCC CAT GCG TTG AAA ALA LEU GLU ALA PHE LYS THR LEU CYS LEU HIS GLU GLY ILE ILE PRO ALA LEU GLU SER SER HIS ALA LEU ALA HIS ALA LEU LYS 340 350 360 5760 5790 5600 5810 5620 5830 5840 5650 ATG CGC GAA AAC CCG GAT AAA GAG CAG CTA CTG GTG GTT AAC CTT TCC GGT CGC GGC GAT AAA GAC ATC TTC ACC GTT CAC GAT ATT MET ARG GLU ASN PRO ASP LYS GLU GLN LEU LEU VAL VAL ASN LEU SER GLY ARG GLY ASP LYS ASP ILE PHE THR VAL HIS ASP ILE 370 360 390 5670 5860 AAA GCA CGA GGG GAA ATC TG LYS ALA ARG GLY GLU ILE EHO Figure 6(iv) frame as trpD preceding i t , pair of tandem AUG codons. i n i t i a t i o n e v i d e n t l y begins at the second of a The spacer between trpC and trpB has only one pyrimidine in a s t r e t c h of 14 n u c l e o t i d e s , far more purines than required for the Shine-Dalgarno i n t e r a c t i o n . Whether t h i s region possesses some a d d i t i o n a l function i s unknown. In S_. typhimurium, t h i s region has only 12 n u c l e o t i d e s , and i s capable of forming some secondary s t r u c t u r e t h a t i s not p o s s i b l e in the I!, c o l i case ( 5 1 ) . II. A. EVOLUTIONARY CONSIDERATIONS Codon usage Table I summarizes the frequency of codon u t i l i z a t i o n i n each of the five s t r u c t u r a l genes of the IL. c o l i t r p operon. 6658 The numbers in parentheses show Nucleic Acids Research TRP A 5890 ATG GAA CGC TAC MET GLU ARG TYR 1 5960 GAG CAG TCA TTG GLU GLN SER LEU 6070 GGC CCG ACG ATT GLY PRO THR ILE 6160 AAA CAC CCG ACC LT5 HIS PRp THR 6250 GTC GGC GTC GAT VAL GLY VAL ASP 6340 ATC TTC ATC TGC ILE PHE ILE CYS 6430 GGC GTG ACC GGC GLY VAL THR GLY GGA TTT GLY PHE 6520 GGT ATT GLY ILE ATC GAG ILE GLU 6610 CAA CAT GLN HIS 5900 5910 GAA TCT CTG TTT GCC CAG TTG GLU SER LEU PHE ALA GLN LEU 10 5990 6000 AAA ATT ATC GAT ACG CTA ATT LYS ILE ILE ASP THR LEU ILE 40 6060 6090 CAA AAC GCC ACT CTG CGC GCC GLN ASH ALA THR LEU ARG ALA 70 6160 6170 ATT CCC ATT GGC CTG TTG ATG ILE PSO H E GLY LEU LEU MET 100 6260 6270 TCG GTG CTG GTT nrr K I T GTG SER VAL LEU VAL ALA ASP VAL 130 6350 6360 CCG CCA AAT GCC GAT GAC GAC PRO PRO ASN ALA ASP ASP ASP 160 6440 6450 GCA GAA AAC CGC GCC GCG TTA LEU 190 6530 6540 TCC GCC CCG GAT CAG GTA AAA SER ALA PSO ASP GLN VAL LYS 220 6620 6630 ATT AAT GAG CCA GAG AAA ATG ILE ASN GLU PRO GLU LYS MET 250 5930 5920 5940 AAG GAG CGC AAA GAA GGC GCA TTC GTT CCT LYS GLU ARG LYS GLU GLY ALA PHE VAL PRO 20 6010 6020 6030 GAA GCC GGT GCT GAC GCG CTG GAG TTA GGT GLU ALA GLY ALA ASP ALA LEU GLU LEU GLY 50 61C0 6100 6110 TTT GCG GCA GGT GTG ACT CCG GCA CAA TGT PHE ALA ALA GLY VAL THR PRO ALA GLN CYS 60 6190 6200 6210 TAT GCC AAT CTG GTG TTT AAC AAA GGC ATT TYR ALA ASN LEU VAL PHE ASN LYS GLY ILE 110 6280 6290 6300 CCA GTT GAA GAG TCC GCG CCC TTC CGC PRO VAL GLU GLU SER ALA PRO PHE ARG GLN 140 6370 6360 6390 CTG CTG CGC CAG ATA GCC TCT TAC GGT CGT LEU LEU ARG GLN ILE ALA SER TYR GLY ARG 170 6460 6470 6460 CCC CTC AAT CAT CTG GTT GCG AAG CTG AAA PRO LEU ASN LYS 200 6550 6560 6570 GCA GCG ATT GAT GCA GGA GCT GCG GGC GCG ALA ALA ILE ASP ALA GLY ALA ALA GLY ALA 230 6640 6650 6660 CTG GCG GCA CTG AAA GTT TTT GTA CAA CCG LEU ALA ALA LEU LYS VAL PHE VAL GLN PRO 260 5960 5970 5950 TTC GTC ACG CTC GGT GAT CCG GGC ATT PHE VAL THB LEU GLY ASP PRO GLY ILE 30 6050 6060 6040 ATC CCC TTC TCC GAC CCA CTG GCG GAT ILE PRO PHE SER ASP PRO LEU ALA ASP 60 6150 6140 6130 TTT GAA ATG CTG GCA CTG ATT CGC CAG PHE GLU MET LEU ALA LEU ILE ARG GLN 90 6230 6240 6220 GAT GAG TTT TAT GCC CAG TGC GAA AAA ASP GLU PHE TYR ALA GLN CYS GLU LYS 120 6320 6330 6310 GTC GCA CCT ALA ALA LEU ARG HIS ASN VAL ALA PRO 150 6410 6420 6400 GGT TAC ACC TAT TTG CTG TCA CGA GCA GLY TYR THR TYR LEU LEU SER ARG ALA 160 6510 6490 6500 GAG TAC AAC GCT GCA CCT CCA TTG CAG GLU TYR ASN ALA 6560 ATT TCT GGT ILE SCR GLY TCG SER 6670 ATG AAA GCG HET LtS ALA GCG ALA ALA PRO PRO LEU GLN 210 6590 6600 GCC ATT GTT AAA ATC ALA ILE VAL LYS ILE 240 6680 ACG CGC AGT TAA THR APG SER END Figure 6(v) Figure 6. The complete nucleotide sequence of the coding region for the five trp polypeptides including the intercistronic regions. The nucleotide sequence is numbered relative to the start of transcription. The amino acid residues in each polypeptide are numbered from the amino-terminal Met. The complete nucleotide sequence and associated restriction sites are available in the Sumex-Molgen data bank at Stanford University. the proportional use of particular codons; that is, in cases where more than one codon specifies a particular amino acid, the relative frequency of usage of each codon is given. There is considerable consistency in the proportional use of each codon throughout the operon. Although every sense codon is used at least once somewhere in the structural sequence, the usage A A A C G G T A T C G A C AIATGJA A A A T T A G A G A A T A A C A|A T el C A A C i t G i t U T t l C|T G | A | T G | G C T C A C G A G G G | T A A | A T G | A T G| C A A |T A A|G G A A A G G A A C C G A G G G G A A A T A | A T G| A C A CJT G | A | T G | G A A Figure 7. Ribosome binding s i t e s and i n t e r c i s t r o n i c regions. I n i t i a t o r and stop codons are boxed; Shine-Dalgarno sequences are underlined. 6659 00 eo o O H r- CO vO en CJ m O O CO •n vO CM CO CM rH fH C in vo rH o r- CM CM CM vO CM eo 00 CM CO rH CO CM rH CO CM vO CM CO r-i T-i m r-i CM CO CO in o O VD r-f r-i r-i O o o r-i i^. o r-i CO »H rH rH CO vO -3" - -3" CO ON rH CM o CO CM vO rH rH rH CM ri H n m o VO CM CM s r-i en vD ON VD o o C p? CO CM o in vD VO o vO CM rH rO in r-i eo o o H CJ O CJ EJ CJ O o m vO en m < CJ o CJ 8 CJ cn CO vD 8 B8 CM CM r-i rH vO prH CM CM CO CO CO CM CM CO m ro O H CJ ** < ON CO o r-i < CM CO CM cn CM CO CM CM rH in CO CM CM VD CM o cn O O\ rH s O rH O o CM i-H en CO r-i t-i eo rH r-i r-i rH H CJ O O o CJ CJ 00 CO 00 CO CM en en cn CO 1-i CO rH rH rH H CM ON m o m m o CM ON eo en rH 74) CM O CO en rH m m i-H CM 14) m m CM o vo cn VO <n eo .32) .16) CM T-i vD rH 13 (.52) 9 ( .39] 48) 14 (.61 O CM O .10) .053 CM vD 12) 4 ( 04) 2 ( .05 .54) .46) m 44) 6 ( EC CM .13) r-* rH CM ON CO CM r-l CM in vO r-i r-i 10 18 14 16 "o^ r- CM < CM CO 39) 9) 15) 7) H CM .n .17 CJ C } ^j- r-i rH rH rH 00 r-i O o n o m en CO CM CM CO AGG H O O m gg g g g ° g CO CM o vn ON en vD CO CM 00 CM .57 ON tn CM O O .n CO CM CM .50 vO VD .17, vn vO CO Ex} m eo .17 .39' m .53 00 vD -3- 0 m 12 eo 0) vO O rH rH rH ^ CM CM o O0 ON CM CM -3- CM CO rH rH rH r- CM en r- CM s rH o m o O1 o -3 vD 00 m CM CO •H CM -3" o CM o o vD rH 00 CO in tn vD ON iH vO en r-i tn m CO ON o (.17) 6 ( .15 (.31) 17 (.41 (.24) 3 ( .07 (.28) 15 (.37 PS CO rH eo vo (.73 •" (.47 (.53 en vD (.69 m (.15 r-. (.44 (.50 (.50 < (.41 Nucleic Acids Research m CM co in CO CM CO en t-i in m o vO t-i t-i m oo o W .—I r-f H C_> u H H H g 6660 PH CJ g CJ CJ CJ CJ in P O CM rH g m vO m m m CO rH u CJ o o o C g < vD 00 sf r-i rH TCT m CO CJ O CJ CJ H H H u H C) CJ O o g8 CJ H < CJ o < < CO CM rH r-i 8 fr> CJ vO r-i CJ CJ CJ CJ vj O 8 CJ Nucleic Acids Research is far from equal. The seldom used codons for arginine (AGG, AGA, CGG and CGA used 1, 3, 3, and 5 times) and isoleucine (ATA used twice) contrast with certain favored codons for leucine (CTG, 53% usage with 6 choices), valine (GTG, 43% usage with 4 choices), proline (CCG, 57% usage with 4 choices), threonine (ACC, 52% usage with 4 choices), lysine (AAA, 82% usage with 2 choices), glutamic acid (GAA, 72% usage with 2 choices) and arginine (CGC, 57% usage with 6 choices). This non-random pattern of codon usage is characteristic of intermittently or moderately expressed genes in E. coli. A more restricted pattern is seen with highly expressed genes such as those for ribosomal proteins (52) and outer membrane components (53). In such cases there may be a requirement for codons with intermediate binding energies to ensure short ribosomal transit times (17). The trp operon genes, which may be expressed maximally by the cell only on occasion, seem to show a preference but not a requirement for the same codons. Since the genes of the trp operon are permitted some freedom in codon utilization it is not surprising that the third positions are strongly influenced by the overall G + C content of the whole genome. A comparison of the sequences of trpA (268 codons; 38) and the proximal third of trpD (194 codons; 37) in several enteric bacteria shows this most clearly. The entire E. coli trp operon has 58% G + C in the third codon position while the genome G + C content is 51%. Those regions of the operon sequenced in Klebsiella aerogenes (genome G + C content 56%) and Serratia marcescens (genomic G + C content 59%) show 83 and 82% G + C in the third codon position, respectively B. (38,39). Amino acid sequence homology. A computer search for repeated sequences within the trp operon revealed no evidence for an ancestral duplication of any segment of the operon during its evolution (Deeley, M., unpublished). unexpected. This result is not wholly Since each reaction of the pathway is chemically different, a plausible scheme for the origin of the operon involves recruitment and modification of individual genes or gene segments originally responsible for performing chemically similar reactions with different substrates. Grouping these genes together behind a single regulatory region may have been a late evolutionary event in the enteric bacteria (see below). Under this hypothesis one might expect some amino acid sequence homology between individual trp genes and genes of related function elsewhere on the chromosome, rather than homology between different trp genes. This 6661 Nucleic Acids Research possibility has not yet received an adequate test, but tnaA, the gene for tryptophanase and one candidate for an ancestral relative of trpB, shows no detectable sequence similarity to trpB (54). C. Gene and operon organization in other organisms the general plan of the trp operon appears to be identical in all enteric bacteria. In some, however, the two functionally distinct segments of the trpD gene are separate rather than fused as in E_. coli (19). The mechanism of this fusion has been postulated from a comparison of the DNA sequence in this region of the £. marcescens and E. coli trp operons (55). The trpC gene of E. coli and all other enteric bacteria studied apparently represents a fusion of two genes that are separate in other procaryotic organisms (56, 57), though the precise location and mechanism of fusion is not known. Enzymatic studies suggest that in both fused genes just described the two active sites have remained distinct and independent. In two instances this pathway does have complex active sites composed of elements on two different polypeptides: anthranilate synthetase formed from the trpE gene product and the trpG domain of the trpD gene product and tryptophan synthetase from the trpB and trpA gene products. It is interesting that these cooperating polypeptides have not been fused in E. coli. The existence of a fused trpA-trpB polypeptide in Saccharomyces cerevisiae (58) and Neurospora crassa (59) indicates that such a fusion of cooperative polypeptides can not be ruled out for mechanical reasons. All other major bacterial groups studied have the genes for the tryptophan pathway at two or three separate chromosomal locations (56, 57); in many of these instances the separate trp gene clusters are independently regulated. It is clear that the enteric bacterial arrangement of these genes in a single operon, though it is the best studied one, is only one of many possible solutions to the genetic organization of the elements of the tryptophan pathway. D. Hybrid trp genes and proteins j>. typhimurium is a close relative of E_. coli, but differs from it in the trpA gene in 25% of its nucleotides and 15% of its amino acids. The corresponding differences in the trpB gene are 15% of the nucleotides and 4% of the amino acids. Most of the nucleotide differences are in the third codon position and represent synonymous codon changes. Through the construction of compatible plasmids containing defective versions of these two homologous genes, recombinant trpB and trpA genes producing hybrid proteins were obtained (60). In the trpA case there was no requirement that 6662 Nucleic Acids Research these hybrids be enzymatically functional. Nevertheless, each of six such recombinants examined possessed normal enzymatic activity, though each had a different crossover point from the sequence of one organism to the other. The hybrids differed from either parental protein in 6 to 34 amino acids. Several of the recombinant proteins were less thermostable than either parental molecule. Thus, though none of the divergent amino acids appears to affect the active site, it can be argued that in each parent some of the amino acid differences are balanced by others far away in the primary sequence to obtain a more stable overall conformation. III. PROBLEMS FOR THE FUTURE The ease with which structural genes can now be cloned, fragmented, and fused to other genes or gene fragments, should make it possible to fabricate virtually any desired gene sequence by combining preexisting DNA segments. This application of recombinant DNA technology, combined with the additional capability of synthesizing and incorporating short DNA fragments of defined sequence, challenge our ingenuityi in the design of meaningful experiments. A. Protein structure studies With present-day techniques one can examine the effects of replacing individual amino acid residues, or segments of a protein, on the properties of the catalytic site, on the folding of the molecule, and on susceptibility of the protein to proteases and environmental conditions. The potential for systematically changing the structure of a polypeptide should yield derivatives that are not readily obtainable using classical mutant production procedures. With such tailor-made trp polypeptides one can approach the classical questions about protein structure: to what extent are the unique characteristics of each polypeptide chain (functional domains, substrate binding sites, catalytic sites, subunit interaction sites, and folding) associated with different linear segments of the polypeptide? Although proteins are maintained in an appropriate three-dimensional conformation by multiple interactions between different segments of the polypeptide chain, it is conceivable that much of the structural information for each discrete functional aspect resides in a distinct linear segment of the polypeptide. This attractive possibility would provide for the evolution of proteins with new overall functions by transposition of preexisting gene segments. In addition, it would be consistent with the presence of intervening sequences between domain-encoding regions of some eukaryotic genes. 6663 Nucleic Acids Research B. Transcriptional and translational controls Using current techniques it should be possible to change any base pair in the promoter, operator and leader regions of the trp operon of E. coli and thereby analyze the contribution of each segment of these regulatory regions. Such studies should reveal the recognition requirements for polymerase and repressor binding, and identify the segments of the leader transcript that participate in attenuation. This information, paired with physiological observations, should elucidate the essential features of transcriptional control as well as illuminate how such controls benefit the organism. Comparable studies with the internal promoter may provide an explanation for its existence in E. coli and related enteric bacteria. Little is known about the details of translation of the coding regions of the polycistronic trp messenger RNA. We suspect that ribosome behavior on the messenger is optimized relative to the needs of the bacterium but there is no experimental evidence that accounts for the sequence differences between the translation punctuation regions. We particularly would like to know the significance of the overlapping stop-start codon sequences between the pairs of genes whose polypeptide products form functional complexes. Do translating ribosomes or their 30S subunits simply reinitiate following termination at the stop codon in such stop-start regions? question relevant to all polycistronic messengers. This is a key Also unknown is whether any of the trp polypeptides or any other cellular protein binds to the messenger and influences translation. Recombinant DNA techniques applied to these problems should provide insight into the structural requirements for ribosome binding and translation efficiency, and reveal the significance of the diverse structures of translation initiation regions. Termination of transcription can also serve several important functions in the cell. In addition to the housekeeping requirement of stopping transcription at the end of a gene cluster, termination mechanisms are involved in attenuation (to modulate transcriptional readthrough) and mutational polarity (to abort transcription when it is uncoupled from translation). Some increasingly complex aspects of termination are illustrated by control regions in the trp operon, but these by no means exhaust existing possibilities. Studies i_n vivo and j^n vitro are beginning to clarify not only the role of nucleotide sequence, but the participation of additional protein factors (such as rho and nusA), the translational apparatus, processing events, and higher order structural interactions in termination and antitermination events. 6664 Nucleic Acids Research Future work will be focused on understanding the intricate molecular details of these mechanisms, and developing accurate mimicking of cellular regulation on a broader front. For example, the qualitative resemblence between termination regions in E. coli such as trp t_' and termination regions in eukaryotic organisms suggests that application of our knowledge in this direction might be especially fruitful. Ultimately, we should be able to combine the contributions of termination regions with those of promoter or operator regions to enhance the range of control of the expression of hybrid, foreign, or synthetic genes in any chosen organism. C. Evolutionary studies Just as with the functional and regulatory questions posed above, recombinant DNA methods should offer ways to study the mechanisms of evolution. Selective advantages or disadvantages of certain genetic modifications should be directly measurable. Although easiest, measurement of the growth rate in minimal medium of cells containing a gene or gene segment from a different species is likely to be too insensitive for this purpose because of the wide regulatory range available for this operon. Modifications of the trp promoter and leader to yield a fixed, low level of expression could be introduced, of course, but the magnitude of a selective burden, even though compensated for by a derepression of some magnitude, might best be measured by competition against the wild type in chemostat experiments under several growth conditions. Some of the features that could be tested for evolutionary significance in this manner are: the value of trp p2, the trpG-trpD fusion, the trpC-trpF fusion, the ability of heterologous subunits of anthranilate synthetase and tryptophan synthetase to substitute for homologous ones, the advantage of a particular transcription termination sequence at the end of the operon, the selective advantage of specific codon usage, and many others. If a modification, such as translational punctuation between the trpG and trpD domains, is found to be deleterious, stepwise selection for better function may retrace the presently hypothetical evolutionary pathway culminating in a gene fusion. Going further afield, the trp genes of non-enteric organisms offer many unexpected examples of evolutionary diversity. The level of expression of trpB and trpA in Pseudomonas aeruginosa and Pseudomonas putida is regulated over a wide range, as in J[. coli, but control is through induction by indoleglycerol phosphate rather than repression by tryptophan (56). As it is now known that the regulatory mechanism for this Pseudomonas gene pair is closely linked to the two structural genes and can be mobilized with it (60, 6665 Nucleic Acids Research 61), introduction of this unit into an E. coli strain deleted for trpB and trpA could provide a t e s t of the superiority of one or the other of these contrasting regulatory mechanisms in E_. c o l i . This brief survey only skims the surface of the possibilities made available by a complete knowledge of the trp operon DNA sequence. Future experimentation promises solutions to questions about structure and function as well as about the mechanisms regulating expression of this cluster of five genes. When accompanied by additional exploration of the homologous genes in other bacteria, there is reason to hope that we can reconstruct the evolutionary events that resulted in the operon we observe in E_. coli today. Acknowledgement The studies summarized herein could not have been performed without the support of the U.S. Public Health Service, the National Science Foundation, the American Heart Association and the American Cancer Society. REFERENCES 1. Bennett, G. N. and Yanofsky, C. (1978) J . Mol. Biol. 121, 179-192 2. Gunsalus, R. G. and Yanofsky, C. (1980) Proc. Natl. Acad. S c i . U.S.A. 77, 7112-7121 3. Yanofsky, C. (1981) Nature 289, 751-758 4 . Crawford, I . P. and S t a u f f e r , G.V. (1980) Ann. Rev. Biochem. 4 9 , 163-195 5. Imamoto, F. (1973) Prog. Nucleic Acid Res. Mol. Biol. 13, 339-407 6. P l a t t , T. (1981) C e l l , 24, 10-23 7. Oppenheim, D. S . , Bennett, G. N. and Yanofsky, C. (1980) J . Mol.Biol. 144, 133-142 8 . Miozzari, G. and Yanofsky, C. (1978) P r o c . N a t l . Acad. Sci.U.S.A. 75, 5580-5584 9. Oppenheim, D. S. and Yanofsky, C. (1980) J . Mol. B i o l . 144, 143-161 10. Brown, K. D. , Bennett, G. N. Lee, F . , Schweingruber, M.E. , and Yanofsky, C. (1978) J . Mol. B i o l . 121, 153-177 1 1 . Rosenberg, M. and Court, D. (1979) Ann. Rev. Genet. 13, 319-353 12. Zurawski, G. , Gunsalus, R., Brown, K. D. and Yanofsky, C. (1981) J . Mol. Biol. 145, 47-73 1 3 . S q u i r e s , C. L. , Lee, F. and Yanofsky, C. (1975) J . Mol. B i o l . 92, 93-111 14. S i n g l e t o n , C. K., Roeder, W. D., Bogosian, G. , Somerville, R. L. and Weither, H. L. (1980) Nucl. Acids Res. 8, 1551-1560 15. Jackson, E. and Yanofsky, C. (1972) J . Mol. B i o l . 69, 307-313 16. Horowitz, H. and P l a t t , T. (1982) J . Mol. B i o l . in press 17. Grantham, R. , and Gautier, C. , Gouy, M. and Mercier, R. Nucl. Acids Res. 9, r 4 3 - r 7 4 . 18. Nichols, B. P . , VanCleemput, M. and Yanofsky, C. (1981) J . Mol. B i o l . 146, 45-54 19. Largen, M. and B e l s e r , W. (1973) Genetics 75, 19-22 20. Schmeissner, U., Ganem, D. and M i l l e r , J . H. (1977) J . Mol. B i o l . 109, 303-326 21. Miozzari, G. F. and Yanofsky, C (1978) J . Bact. 133, 1457-1466 22. Lee, F. and Yanofsky, C. (1977) Proc. N a t l . Acad. S c i . U.S.A. 74, 4365-4369 6666 Nucleic Acids Research 23. Oxender, D. Zurawski, G. and Yanofsky, C. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 5524-5528 G. V., Zurawski, G., and Yanofsky, C. (1980) J. Mol. Biol. 142, 123-129 Zurawski, G. and Yanofsky, C. (1980) J. Mol. Biol. 142, 123-129 Stroynowski, I . and Yanofsky, C. In preparation. Johnston, H. M., Barnes, W. M. M., Chumley, F. G. , Bossi, L. and Roth, J. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 508-512 Keller, E. B., and Calvo, J. M. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 6186-6190 Farnham, P. J. and P l a t t , T. (1981) Nucleic Acids Res. 9, 563-577 Winkler, M. E., and Yanofsky, C. (1981) Biochemistry 20, 3738-3744 Wu, A. M. and P l a t t , T. (1978) Proc. Natl. Acad. Sci. U.S.A. 75, 5442-5446 Guarente, L. , Beckwith, J. Wu, A.M. and P l a t t T. (1979) J. Mol. Biol. 133, 189-197 Wu, A. M., Chapman, A. B., P l a t t , T., Guarente, L. P. and Beckwith, J. (1980) Cell 19, 829-836 Wu, A. M. , C h r i s t i e , G. E. and P l a t t , T. (1981) Proc. Natl. Acad. Sci. U.S. A. 78, 2913-2917 Guarente, L. P. (1979) J. Mol Biol. 129, 295-304 Nichols, B. P . , Miozzari, G. F . , VanCleemput, M., Bennett, G. N. and Yanofsky, C. (1981) J. Mol. Biol. 142, 503-517 Horowitz, H. , C h r i s t i e , G. E. and P l a t t , T. (1982) J. Mol. B i o l . i n press C h r i s t i e , G. E. and P l a t t , T (1980) J. Mol. Biol. 142, 519-530 Crawford, I P. , Nichols, B. P. and Yanofsky, C. (1980) J. Mol. Biol. 142, 489-502 Nichols, B. P and Yanofsky, C. (1979) Proc. Natl. Acad. Sci.U.S.A. 76, 5244-5248 Jackson, E. N. and Yanofsky, C. (1974) J. Bact. 117, 502-508 Yanofsky, C. , Horn, V., Bonner, M. and Stasiowski, S. (1971) Genetics 69, 409-4 33 Creighton, T. (1970) Biochem. J 120, 699-707 Kirschner, K., Szadkowsky, H. Henschen, A. and Lottspeich, F. (1980) J. Mol. Biol. 143, 395-409 I t o , J. and Yanofsky, C. (1966) J. Biol. Chem. 241, 4112-4114 Zalkin, H. (1973) Adv. In Enzymology 38, 1-39 Crawford, I . P . and Yanofsky, C. (1958) Proc. Natl. Acad. Sci. U.S.A 44, 1161-1170 Miles, E. W. (1979) Adv. in Enzymology 49, 127-186 C h r i s t i e , G. E. and P l a t t , T. (1980) J. Mol. Biol. 143, 335-341 Oppenheim, D. S. and Yanofsky, C. (1980) Genetics 95, 785-795 Selker, E. and Yanofsky, C. (1979) J. Mol. B i o l . , 130, 135-143 Post, L. E., Strycharz, G. D., Nomura, M., Lewis, H. and Dennis, P. P. (1977) Proc. Natl. Acad. S c i . U.S.A. 76, 1697-1701 Nakamura, K. and Inouye, M. (1980) Proc. Natl. Acad. Sci. U.S.A. 77, 1369-1373 Deeley, M. and Yanofsky, C. (1981) J. Bacteriol. 147, 782-796 Miozzari, G. and Yanofsky, C. (1979) Nature 277, 486-489 Crawford, I. P. (1975) Bacteriol. Revs. 39, 87-120 Crawford, I . P. (1980) C r i t . Revs. Biochem. 8, 175-189 Zalkin, H. and Yanofsky, C. , J. Biol. Chem. in press Matchett, W. H. , and DeMoss, J. A. (1975) J. Biol. Chem. 250, 2941-2946 Schneider, W. P . , Nichols, B. P. and Yanofsky, C. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 2169-2173 24. Stauffer, 25. 26. 27. 28. 29. 30. 31. 32. 33. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 6667 Nucleic Acids Research 6 1 . Hedges, R. W., Jacob, A. E . , and Crawford, I . P. (1977) Nature 267, 283-284 62. Manch, J. N. and Crawford, I . P. (1981) J . Mol. Biol. in p r e s s 6668