* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA sequence of the rat growth hormone gene: location of the 5
Genome evolution wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
SNP genotyping wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Frameshift mutation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic engineering wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Pathogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Molecular cloning wikipedia , lookup
Transposable element wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Human genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
History of genetic engineering wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene therapy wikipedia , lookup
Primary transcript wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
Microsatellite wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Point mutation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Metagenomics wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
volume 9 Number 91981 Nucleic Acids Research DNA sequence of the rat growth hormone gene: location of the 5' terminus of the growth hormone mRNA and identification of an internal transposon-like element Guy S.Page, Susan Smith and Howard M.Goodman Howard Hughes Medical Institute Laboratories, Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143, USA Received 29 January 1981 ABSTRACT The present communication describes the molecular cloning and DNA sequence determination of the rat growth hormone (rGH) gene. The rGH gene was cloned on an 11 kilobase EcoRI fragment of total rat DNA; it has four intervening sequences which correspond in position to those of the human growth hormone (hGH) gene. One of the intervening sequences in the rGH gene contains a possible transposable element: a 200 base pair direct repeat that is itself flanked by an exact 15 base pair direct repeat. The DNA sequence was used to estimate the location of the 5 1 end of the mature growth hormone mRNA. By SI nuclease mapping it was located approximately 25 bases "downstream" frcm a TATAAA sequence presumed to play a role in initiation of transcription of the rGH gene. INTRODUCTION Many of the processes of development, tissue differentiation, and the responses of an organism to environmental changes occur at the primary genetic level, that is, at the level of genetic transcription. However, while we know that gene transcription can show tissue specificity, we have as yet very little direct information about the way (or ways) in which such specificity is conferred, and to what extent it depends upon the primary structures (i.e., DNA sequences) of the genes involved. As part of our approach to these general questions we have been studying the rat growth hormone (rCH) gene, whose tissue-specific expression can be examined both in vivo and in cultured cell lines. Growth hormone (GF!) is produced by a specialized subset of the cells of the anterior pituitary in response to specific signals from the hypothalamus. In addition, GH is produced and secreted by cells of the closely related rat pituitary tumor lines GH., GH-j, and GC (1). In these cells there is preliminary evidence that the level of GH mRNA can be controlled by both glucocorticoid and thvroid hormones (?). Thus an excellent system is available for an examination of hormone and tissue-specific control of genetic transcrirjtion and © IRL Press Limited, 1 Falconberg Court, London W 1 V 5FG. U.K. 2087 Nucleic Acids Research i t s relation to acne structure. In this paper we report the molecular cloning and complete OTA sequence determination of the rGM gene. We also rerort the localization of the 5' end of the nature rGIl mPNA, i.e. the probable site of initiation of transcrintion and the finding of a possible transposon-like element located in one of the intervening sequences. IOTERIRLS and METHODS Restriction Enzyire Analyses All d i g e s t i o n s v/ere done with enzymes purchased fran e i t h e r Mew England Biolabs, rtethesda Pesearch Laboratories, o r Poehringer f'annheim. Digestions were usually done with a s u b s t a n t i a l excess of enzyme and approximately in accordance with t h e conditions provided by t h e manufact u r e r . Gel e l e c t r o p h o r e t i c separations, unless specified otherwise, were performed e i t h e r on 1% agarose gels or fi% acrylamide gels i n TBE buffer CS.P09M T r i s , rc.ir!9M Boric acid, FJ.miM KOTA, pH 8 . 3 ) . Gels were stained with ethidium brcmide and visualized by illumination with UV l i g h t . nPC-5 Fractionation of DNA Fragments A ^ . 9 x 22 cm col inn of RTC-5 (3) vras packed under pressure (400 p . s . i . ) a n d e q u i l i b r a t e d with 1.2511 NaOftc l^my T r i s . I ' d pi! 7 . 5 , lmt-: 3OTA. Three milligrams of CcoPI-cleaved ilooded r a t OMA ( g i f t of A. Ullrich) was loaded onto t h e column in the same buffer, and t h e fragments eluted with a 20^ x 20" ml gradient of 1.45 t o 1.55M HaOAc (l^nf Tris.SICl pH 7 . 5 , 1 n*1 DDTA) a t a flov; r a t e of 1.5 rnl/min (~4^M p . s . i . ) . Tractions were assayed for absorbance a t 2671 nm and a l i g u o t s of selected fractions electrophoresed on an agarose g e l . Five microqrams of DUA from each r?7A-containing fraction were assayed by Southern hybridization for r a t qrowth hormone (rGH) sequences. The e l u tion peak was spread broadly across 3T f r a c t i o n s . The rGH seouences eluted in four f r a c t i o n s , which were pooled and t h e !7TA p r e c i p i t a t e d from them. Sucrose Gradient Fractionation of Restriction Fragments Sedimentation in sucrose gradients was used as a p u r i f i c a t i o n step both for t h e "arms" of t h e cloning vector \Charon-4A and for t h e HV.b rCTI HcoPI fragment. The Gradients were run e s s e n t i a l l y as described i n Lawn, e t a l . , ( 4 ) . Gradients of 11% t o 403 Sucrose (10 x lfi ml) in 1" - IaCl, 0.11T! T r i s (pH 7 . 4 ) , n.TIlM EDTA, " . 5 uq/ml ethidium branide were c e n t r i fuqed a t ?7,cy.9i rpn i n a Beckman S"J?7 r o t o r . Tine o*7 centrifuqation depended on the s i z e of t h e fragment t o be p u r i f i e d . Specific bands visu- 2088 Nucleic Acids Research alized in the centrifuge tube by UV illumination were collected from the side of the tube with a syringe. More disperse fragment populations were fractionated through a hole punched in the bottom of the tube. I'olecular Cloning of the rGH Gene "arms" to the 11kb EcoRI Ligation of purified \Charon-4A fragments containing rGH, _in vitro packaging of the reccmbinant molecules, transfection, and identification of products were all carried out exactly as described elsewhere (5, 6 ) . A description of the methods used for subcloning the 11kb rGH EcoRI fragment and specific parts of it into pBR322 may be found in Ullrich, et al. (7). Subcloning of RGH Sequences into Ml3-derived Vectors The 1.5 kb PvuII-D fragment containing most of the rGH gene was cloned from the 11kb rGH EcoRI fragment in pBR322 (p.gRGH) into the vector M13mp5 as described by Cordell, et al., (G). Reccmbinant phaae were propagated in the bacterial strain 79.02 (gift of B. Gronenborg), and single stranded template DNA for sequencing prepared from them as described by Winter and Fields (8). The clones were designated MP5.gRGH.l (+) and MP5.gRGH.2 depending upon the orientation of the PvuII-D fraoment (-) in the single- stranded vector. DHA Sequencing Most of the DMA sequencing was carried out by the chain-termination nethod of Sanger (9) using as templates MP5.gRGH.l and MP5.gRGH.? above). (see Hpecific ENA primers were prepared from the rGH cDMA clone (1^) and from the gene PvuII-D fragment by digestion with selected restriction endonucleases and purified by polyacrylamide gel electrophoresis. The D?!A sequencing reactions were carried out as described by Cordell, et al., (6). The t'axam and Gilbert procedure (11; 1?) was used to sequence regions flanking the PvuII-D fragment and to confirm selected sequences obtained by the chain-termination method within the PvuII-D fragment. All manipula- tions closely followed the published protocols. Growth of OH Cells and R'TA Extraction GH, cells (1) were grown either in suspension or in monolayers in Dulbecco's fodified Eagle's medium supplemented with 1W Fetal Calf Serum and lnil triiodothyronine and 50 yf-1 dexamethasone (13). In order to obtain cytoplasmic PNA, cells (either trypsinized from monolayers or directly from suspension culture) were pelleted, washed once with 10 mf' Tris.UCl pH 7.4, 5 mV NaCl, 1 rrt-: MgCl. (RSB), and suspended in cold Rsn containinq n.4?. MP- 2089 Nucleic Acids Research 40, and 0.06% sodiun deoxycholate. The suspension was kept on i c e l o r minutes, a f t e r which the lysed c e l l s were removed by centrifugation. rive The supernatant was brought t o 0.25M NaCl, 0.05 M Tris.HCl pH 7 . 4 , 0.05 M EDTA and extracted once each with phenol and chloroform. RNA was precipitated from the f i n a l supernatant with ethanol. RNA-DNA Hybridizations and Sl^ Nuclease Mapping The RNA and DNA fragments t o be hybridized together (see Results d e t a i l s ) were f i r s t mixed and coprecipitated with ethanol. dried under vacuum and dissolved i n 20 u l of a b o i l e d for three minutes, then placed a t 50 for The p e l l e t was buffer HCONH2, 0 . 2 M NaCl, 20 mM PIPES pH 6 . 5 , and 0 . 5 mM EOTA. for containing 80% The s o l u t i o n was three hours, the resul- tant hybrids were d i l u t e d to 300 u l i n SI buffer (0.25M NaCl, 0.03M Na0Ac pH 4 . 6 , Urihybridized 0.001M ZnS04) and 300 u n i t s of SI nuclease added. nucleic a c i d s were digested a t rccm temperature for 30 minutes. The pro- ducts o f t h e d i g e s t i o n were analyzed on an 8% polyacrylamide DNA sequencing g e l ( s e e d e s c r i p t i o n o f sequencing p r o t o c o l s , above). RESULTS Molecular Cloning and Restriction Map of the Rat Growth Hormone Gene In order to facilitate identification of a rat growth hormone (rGH) gene clone, total rat DMA was enriched for rGH gene sequences prior to cloning using a two-step process. DHA was extracted from total tissue of Hooded rats and digested to completion with EooRI. The resultant fragments were fractionated on an RPC-5 column (14; 15), and aliguots of the column fractions were assayed by hybridization with a labeled rGH cDNA probe (10) a single peak of hybridization was observed. Appropriate column fractions were pooled and the restriction fragments separated by sedimentation through a sucrose gradient (4). The rGH sequences were again located by hybridization with the cDNA probe, and the peak fractions were used for cloning. Enrichment for rGH—specific sequences by these procedures was approximately 50- to 100-fold. The rat genomic EcoRI fragments enriched for rGH gene sequences were then ligated to purified \ Charon4A "vector arms", packaged in vitro and plaques screened as described elsewhere (16, 5, 6 ) . Several putative rGHcontaining clones were isolated. The rGH isolate chosen for detailed study, designated VgRGH, contained an 11 kilobase (kbj EcoRI restriction fragment that hybridized to the rGH cENA probe. A restriction map of this fragment is shown in Fig. 1. Map positions for Xbal, SstI, Bglll, Hindlll, 2090 Nucleic Acids Research mRNA coding wquoncs Intervening Mquenca ' R*peat*d Mqusnce Direction of TranMription kh I I I I 1 2 3 1 1 1 A Xba 1 1 \ 1 1 1 5 6 1 \ 4 Bgl II A 3.4 lf A B 1.6 1 I 7 8 9 1 1 10 11 1 C 1.1 0 | E 1 BamH 1 1 0.6 B o.is T3T 1.5 1 35 A 6.9 1 A 69 1 0.85 0 | 3.8 4.4 Sal 1 1 1 | A 5.8 1.1 1 B 7.6 | Hind III 1 Pst I L 1 1 B 8.9 1 2 1 Sst 1 1 1 1 C 1 7 B | 4.1 B D 1.6 1 1 35 |E| F , 0.2 0.45 1.3 Figure 1. Restriction Map of the iGH Gene: The locations of the restriction sites indicated were determined as described in Materials and Methods. An expansion of the rGH section of the 11 kb EcoRI fragment is shown at the top with the location of the rCH gene, its intervening sequences, and the 200 bp repeated sequences. The nunber under each restriction fragment indicates its size in kilobase pairs. The left-most and right-most (except for PstI and Pvull) vertical bars are always the EcoRI sites. Sail, and Barrel sites were obtained from \.gRGH by standard methods. To simplify more detailed restriction mapping, the approximately 6kb EcoRI- to-Hindlll fragment (Hindlll-A in Fig. 1) in \.gRGH was transferred to the plasmid vector pBR322. This sub-clone, designated p.gRGH, was used to reconfirm the map positions of those enzymes listed above and to position the PstI and Pvull sites. Each of the enzymes Bglll, Xba I, and SstI was found to separate portions of the rGH protein-ooding sequence, i.e., to cleave within the gene. As sites for these enzymes are not found in the cDNA sequence, it seemed likely that they indicated the presence of intervening sequences. Furthermore, the cDNA hybridized to regions between the Xbal site and the distal Bglll site, and between this Bglll. site and the SstI site. This observa- tion demonstrated the presence of protein-coding sequence in the Xbal-Bqlll 2091 Nucleic Acids Research and Dglll-SstI intervals. As such, these three s i t e s were taken to indi- cate the presence of at least three d i s t i n c t interveninq sequences. Location of the rGH Gene and Determination of the Orientiation of Tran- scription The approximate location of the rG1! gene on the cloned irkb EcoRI fragment was determined by hybridization of selected digests of the clone with nick-translated rGH cDMA probe (17). For example, the probe was found t o hybridize to Bglll fragments C and D, but not to Bglll-B (see Fig. 1 for fragment nomenclature). These data limit the I t hybridized to Hindlll-A but not to 'IindIII-B. location of the rGH gene to between the BglII-B/ BclII-C junction and the HinJIII-A/ Hindlll-B junction. By similar hybrid- ization analyses the location of the rGH gene was determined (Fig. 1). The orientation of transcription of the rCTI gene was determined lay hybridization analysis with 5 ' - and 3'-specific rGH cDNA clone. probes prepared from the Briefly, the BOO-base pair Hindlll fragment containing the cloned rGH cDNA was purified from the plasmid p.cPGH (pRGH-1 of Seeburg, et al., (11)) and cleaved with Hhal. The 220-base Dair 5'-end fragment and the 275-base pair phoresis. 3'-end liich of fragment these were isolated by preparative eel electro- fragments was labeled by nick-translation and hybridized to BqlH, SstI and double digests of the genctnic clone (Tiq. 2). The 5'-specific cDNA fragment hybridizes to Rglll-C, C .stl-A, and to a frag- ment the same size as BolII-C in the double digest (Fig. ">,b). specific cKJA fragment hybridizes to Bglll-D, Pstl-B, Bglll-SstI fragment in the double digest (Tig. 2,c). the orientation and to The 3 ' a 3.8 kb These results specify of transcription as shown in Fig. 1. The genctnic clone therefore contains about ? kb of DNft sequence "upstream" from the putative 5' end of the gene. Determination of the 'lumber of rG'l Genes There are multiple growth horraone aenes in the human genone (10). comparable multiple gene structure orowth homone gene in r a t . tion with does not appear to be found for A the Digestion of rat I>A with "coP.I and hybridiza- the rCl cD*IA nrobe yields a sinqle band corresponding to the cloned 11 kb restriction fragment described above. However, to determine whether the hybridization seen with the rGM cD>'A nrobe i s in fact due to the nresence of a single rGU gene, a direct comparison was tnadp between restriction digests of r a t oenonic DMA and the cloned gencmic rCl sequence. Kat genome CIA and niA frcn PstI, 2092 the rCl clone vrere digested with PvuII and each of \/hich cleaves the cloned rt?! qene several times (Fio. 1). Nucleic Acids Research Figure 2. Orientation of Transcription of the rGH Gene: In each lane 2 pg of DNA fran the plasmid clone p.gRGH was digested with the following enzymes: the first lane, with nglll; the second, with SstI; and the third, with both enzymes. The three sets are (a) ethidium bromide staining pattern (b)autoradiograph of hybridization with 5'-specific probe; and (c) autoradiograph of hybridization with 3'-specific probe. Numbers at the left refer to restriction fragment sizes in kilobase pairs. The conditions of hybridization were those described by Gordell, et. al., (6). The gencmic digests and appropriate amounts of the digests of the cloned DNA were electrophoresed on the same gel, transferred to nitrocellulose, and hybridized with nick-translated probe prepared from the cloned rCH cDNA sequence. The results of this comparison are shown in Fig. 3. As there are no restriction fragments in the digests of the rat gencmic DMA that cannot be accounted for by fragments frcm the rGH clone, it seems most plausible to conclude that there is only one growth hormone gene in the rat genome. This conclusion is supported by more detailed restriction mapping data (10). The data do not .however, rule out the possibility of several identical genes in identical sequence environments. DMA Sequence of the Rat Growth Hormone Gene Both the chain-termination method of OTA sequence determination (9) 2083 Nucleic Acids Research ». «- 3 3 Q. Q. 0- Q_ l g | i ? i o o i§ g J 2 x DO 11- •1.5/1.6 1.10.85- Figure 3. Hybridization Analysis of Rat Genanic CE-7A: A comparison was made between selected restriction digests of the 11 kb \.gRGH clone (vis. Fig. 1) and total rat genoraic I?P\. Each hybridization i s to either 10 ug of genomic EEIA or 39 pg of the cloned ENA. Particular digests were as indicated in the figure. All digests were electrophoresed together. After electrophoresis the gel was treated for 20 min. in 50 ITM HC1 (this may explain the poor recovery of the smaller restriction fragments). Ml digests were transferred t o nitrocellulose and hybridized together. Hybridization was under described conditions (6), for seven days a t a probe concentration of 5 x 10 cpm/ml (specific activity 1-2 x 10 cpn pg). The autoradiogram was exposed for five days at -70 with a Dupont Cronex Lightening Plus intensifying screen. TVie numbers refer to size in kilobase pairs of indicated restriction fragments. Hybridization to the T.fffikb genoraic PstI fraoment was visible in the original autoradiogram, although at a reduced intensity. This hybridization band dirJ not reproduce. and the chemical degradation method developed by Maxam & Gilbert (11, 12) were used to obtain the complete sequence of the rGH gene. A diagram of the complete seauencing strategy i s given in Fiq. 4. The chain-termination method of ttlA sequencing relies on the availab i l i t v of a sinqle-stranded template for the DMA synthesis reaction. 2094 Such t t tt t t M i i t ] • s z t M [ [ i l\ i li lj t t I [[ Mtt i• i* |I jilj ] ii] Figure 4. DMA Sequencing Schane for the rGH Gene: The bottom line shows the location of restriction sites used in the DHA sequence determination (H.B. This is not a complete restriction map. Such a map is available on request). The numbers refer to the distance in base pairs fran the Bglll site and correspond to those used in Fig. 5. Above the restriction map is shown a representation of the rGH gene. Protein-coding portions of the gene are shown as open boxes; 5'- and 3'-untranslated regions of the mRNA are shown as cross-hatched boxes; and the intervening sequences are the single line regions designated A through D. The direction of transcription of the rGH gene is left to right. The uppermost portion of the figure represents the CtlA sequencing scheme. Each arrow corresponds in position, direction, and length to one sequence determination. Thin arrc"^ represent sequence obtained by the chain-termination method, and the thick arrows, sequence obtained by the Maxam and Gilbert method. > g Q. 8 CD o Nucleic Acids Research a template was obtained for the PvuII-D fragment (Fig. 1), which spans most of the rGH gene, by transferring this fragment to the single-stranded cloning vector Ml3mp5 (20). A decanucleotide "linker" (CCAAGCTTGG) containing the Hindlll restriction s i t e was ligated to a PvuII digest of p.gRGH. The products were cloned into M13np5 and isolates containing the PvuII-D fragment identified yielded by hybridization and restriction analyses. recombinant phage clones containing each of This approach the strands of the PvuII-D fragment for use as sequencing templates. Primers for the chain-termination sequencing reactions were prepared fran two sources: selected fragments frctn the cDMA clone were used to prime reactions from protein-coding portions of the gene; fragments from p.gRGH were used to obtain sequence within the intervening sequences (IV5). The Maxam-Gilbert method was used to sequence regions 5' and 3 ' to the PvuII-D fragment, as well as t o clarify any ambiguous sequence within t h i s fragment. The entire sequence of the rGH gene i s shown in Fig. 5. half Approximately of the gene was sequenced on both strands. Those portions that were not sequenced on both strands were either sequenced on the same strand by both methods (Fig. 4; circa nucleotide 85"i) or on the same strand with different primers (Fig. 4; circa nucleotide 1150). portions for The data for those few vJiich a single sequence determination was made were wholly unambiguous in interpretation. Localization of the "CAP" Site of Mature iGH mKNA From the DHA sequence presented in Fig. 5 we were able to make a prediction of the location of the rGH mRNA "CAP" s i t e and to t e s t this prediction. fied At position 209-205 is the sequence TATAAA, which has been identias part of the signal for i n i t i a t i o n of transcription by RNA Polym- erase I I (21). The initiation s i t e i s usually located at an A residue 25 + 1 bases from the TATAAA sequence (22). In order to experimentally determine the position of the "CAP" s i t e of the rGI mRMA we extracted cytoplasmic RMA from the r a t pituitary tumor line GH, after the c e l l s had been grown in triiodothyronine and dexamethasone for three days (23). rGH message (2). Induced GH., mRNA was expected to contain about l%-5% 32 PstI fragments of p.gFGU were end-labeled with /-[ P]ATP and T4 polynucleotide kinase and the 0.6 kb Pstl-B fragment (rig. 1) purified by polyacrylamide gel electrophoresis. This PstI fragment overlaps the region where the "CAP" s i t e should occur ( i . e . , ca. nucleotide 231 in Fig. 2096 5). The labeled Pstl-B fragment was hybridized to QI, mRIIA under high Nucleic Acids Research cgtaccattqoocataaacttggcaaaggogqcsggtggaaaggtaagatcaqggaogtgaccgcaggagag 1 30 60 cagtqgaqaogcgatgtqtgggaggagcttctaaattatcx»tcagcacaagctgtcagtggctocagcca 90 120 tgaataaatgtataqggaaaaaqqcaggagocttggggtcgaggaaaacaggtagggtataaaaagggcat 150 180 210 geaacy^accaaatccagcacxxit^agoccagattccaaactactcaggtoctgtggacagatcactgag 240" 270 -26 Met Ala Ala A tggcg ATG GCT OCA G gtaagcatgogcagatcocqctgggtgtggtttggaccaaagagccttgaa 300 330 gatggatctgagacttctagtqtgacjagcatcccaacttcoaoccatgttggqaacattctgggaocctat 360 390 420 gqggattgggagagattggtecttgctcccagcctcctcctgtcctectgtctctctttctag 450 480 -20 -10 Gin Thr Pro Trp Leu Leu Thr CAG AC 1 CCC TOG CIC CTG ACC 510 -1 1 Ala Gly Ala Phe Pro Ala Met GCT GGT GCT TTC CCT GCC ATG 20 sp Ser AC TCT Phe Ser Leu Leu Cys Leu Leu Trp Pro Gin Glu TIC AGC CTG CTC TOC CTGCTGTOG OCT CAA GAG 540 10 Pro Leu Ser Ser Leu Phe Ala Asn Ala Val Leu CCC TTG TOC W7T CTG TTT GCC AAT GCT GTG CTC 570 30 Arg Ala Gin His Leu His Gin Leu Ala Ala Asp Thr Tyr Lys Glu Phe CGA GCC CflG CAC CTG CAC CflG CTG GCT GCT GAC ACC IK: AAA GAG TTC gtaagt 600 630 tcctoqqtqttqggtgcxstgactgtggaagcaggaaaggggcaogatoccaccctcgooccgaatccctgc 660 690 720 ooocaqqaagteataggaggaaactatgocgttagatgagcagaaaaagaatgggtogtocataagcagta 750 ' 780 atgacagaqagggctgqagagatggctcagtggttaagagcacoogactgctcttccaaaggtoctgagtt 8iO 840 caattoecagcaaccacatqgtggctcacaaccatctgtaaagagatoogatgasctcttctggtgtgtct 870 900 930 gaagacagctacaqtgtacttatataataaacaaataaatctttaaaaaaaaaaacaaaaaoggggctgga 960 990 gagatggctcagoggttaagagogcocgactgctcttocagaggtcatgagttcaattoscagcaaccaca 1020 1050 1 tggtqgctcacaat3catctgtaaaqagatctgatgocctcttctggtgtatctgaagacagctacagtgta 080 1110 1140 2097 Nucleic Acids Research cttatatataataaataaataaatctttaaaaaaaacaaaacaaaaacaaaaacaaaacagtaatgacaga 1170 1200 _^ Glu Arg Ala Tyr lie Pro Glu gagtcacaagctggtccctcagtgactacctttcctccag GAG OGT GOC TAG ATT OOC GAG 1230 1260 40 50 Gly Gin Arg Tyr Ser l i e Gin Asn Ala Gin Ala Ala Phe Cys QGA CAG CGC TAT TOC ATT CAG AAT GOC CAG GOT GOG TTC TOO 1290 1320 60 70 l i e Pro Ala Pro Thr Gly Lys Glu Glu Ala Gin Gin Arg Thr ATC CCA GOC CCC ACC GGC AAG GAG GAG GOC CAG CAG AGA ACT 1350 Phe Ser Glu Thr TTC TCA GAG ACC gtgagtaggcccag 1380 qccttgtctqtacagatcctcttttcttcxx:aagcaqccctaactgcagtccaggcx:agggaccagctctt 1410 1440 cxx:tgaggctgaggtaacctgggagtoccaggcagaggtcactagctaatgcacagcxxx:ttttttccx:te 1470 1500 1530 Asp Met Glu Leu Leu Arg Phe aq GAC ATG GAA TTG CTT CGC TTC 1560 90 Pro Val Gin Phe Leu Ser Arg H e OOC GTG CAG TTT CTC AGC AGG ATC 1590 110 Asp Arg Val Tvr Glu Lys Leu Lys GAC CGC GTC TAT GAG AAA CTG AAG 1650 80 Ser Leu Leu Leu H e Gin Ser Trp Leu Gly TOG CTG CTG CTC ATC CAG TCA TOG CTG GGG 100 Phe Thr Asn Ser Leu Met TTT ACC AAC AGC CTG ATG 1620 120 Asp Leu Glu Glu Gly H e GAC CTG GAA GAG GGC ATC 1680 Phe Gly Thr Ser TTT GGT ACC TOG Gin Ala Leu Met CAG GOT CTG ATG Gin CAG gtcaggatqgaoogggggcgctagoctgaggttatactgaoctttgcctctgcttggagcctagct 1710 " ' ' 1740 qggqggctcactgagctctgtttacoggtcagacx:ttaaaccttgagaaggcttcctactcactttccctt 1770 ' 1800 1830 atqaagcx:tccaggcctttctctaggttctggagttggggagggcaoggctctgagttcttctttcxx:aca 1860 " 1890 130 140 Glu Leu Glu Asp Gly Ser Pro Arg H e Gly Gin H e Leu Lys Gin Thr acaq GAG CTG GAA GAC GGC AGC CCC OGT ATT GGG CAG ATC CTC AAG CAA ACC 1920 1950 150 160 Tyr Asp Lys Phe Asp A3 a Asn Met Arq Ser TAT GAC AAG TTT GAC GOC AAC ATG CGC AGC 1980 170 G l v Leu Leu S e r Cys Phe Lys Lys Asp Leu GGG CTG CTC TOO TQC TTC AAG AAG GAC CTG 2040 2098 Asp Asp Ala Leu Leu Lys .Asn Tyr GAT GAC GOT CTG CTC AAA AAC TAT 2010 H i s Lys Ala Glu Thr Tyr Leu Arg CAC AAG GCA GAG ACC TAC CTG COG Nucleic Acids Research 180 19"! 192 Val Ket Lys Cys Arg Arg Phe Ala Glu Ser Ser Cys Ala Phe AM GTC ATG AAG TGT CGC CGC TIT GCG GAA AGC AGC TGT GCT TIC TAG 2100 2070 gcacacactq gtgtrtctgcggcactcx:cxx^tacccx:cctqtactctggcaactgccacccctacactttqtcctaata 2130 2160 2190 aaattaagatqcatcatatcactctgctagacatcttttttttttttgaaggc 222"! 2243 Figure 5. DtIA Sequence of the iGH Gene The region sequenced was fran the 5' end of Bglll-C to the 3 end of PvuII-E. The sequence in the figure is differentiated as follows: Protein-Coding Sequence, upper case letters with amino acid designations above them (numbers above the line in these regions refer to amino acids -26 through 190); intervening sequences, lower case letters; and _5'— and 3'-untranslated regions, lower case, underlined. The two diamonds at positions 804 and 999 mark the beginnings of the two 20"! bp direct repeat sequences found in IVS-B. The 15 bp direct repeats are indicated by horizontal arrows. All numbers below the lines refer to distance in base pairs frcm the first base pair presented in the sequence. formamide conditions that favor DKA-RNA hybridization over reannealing of the probe labeled (24), and the hybrids were digested with Sl-resistant material was sized SI nuclease. The on an 8% polyacrylamide DNA sequencing gel. A DNA sequence ladder prepared from the rGH Xhol site was used as a size marker. The results of the experiment are shown in Fig. 6. The length of the DNA strand protected from digestion with SI by the rGH mENA is about 65 bases. Thus the "CAP" site of rGH mRNA appears to be approximately 65 bases frcm the PstI site, i.e., at position 230, an A residue 25 base pairs downstream frcm the TATAAA sequence. The variation in length of the Sl-resistant DNA fragments probably results from variation in the extent of digestion with SI. Any fragments not digested to completion will be slightly longer than the correct length and any overdigestion will slightly shorten the fragments. Most of the label is found in the 65-base fragment; however, we cannot, by these data, rule out the possibility of a slight variation in the rGH mRNA "CAP" site, e.g., initiation at the A residue at position 233, 28 bases frcm the end of TATAAA sequence. DISCUSSION Comparison of the rGH Gene and rGH cDMA Sequences Comparison of the cloned rGH cDNA sequence with the shewed two discrepancies protein sequence (9). One occurred at the amino-terminal amino 2099 Nucleic Acids Research -70 65- 60 -50 •40 Figure 6. Location of,the rGH iriRNA "CAP" Site: The end-labeled 0.G kb PstI fragment (T.I yq, - 10 cpm) from p.gPGM was hybridized to li*1 nq of cytoplasmic RNA extracted from GH., cells. The hybrids were digested with SI nuclease as described in Materials and Methods. The Sl-resistant material was electrophoresed on an 0% polyacrylamide DNft sequencing gel at 30 rnA for 2 hours. A DMA sequencing ladder prepared from the Xhcl site of p.gRQI was used as a size standard. (So as to equalize autoradiographic intensities. The Sl-resistant band and the sequencing ladder were photographed separately and the photos reassembled) . All numbers in the figure refer to lengths in bases of the indicated DMA fraaments. 2100 Nucleic Acids Research acid. Protein sequences for rat, as well as human, bovine, equine, and ovine growth hormones, placed a Phe residue at this position (25). The reported cENA sequence predicted a Leu residue (codon UUA) as the aminoterminal amino acid. The ENA sequence of the cloned gene is consistent with the protein sequence, rather than the cENA sequence, in that it specifies a Phe residue (codon UUC) at the amino-terminal position of the mature hormone. We have repeated the ENA sequence of the original cENA clone and have confirmed the initial characterization. It seems most likely, therefore, that the discrepancy between the cENA and the protein sequence at this position resulted from an error in reverse transcriptase copying of the rGH mFNA during the cENA preparation. This is consistent with the high error frequency that has been observed for reverse transcriptase (26). Wallis and Davies mature hormone (25) placed a Gly residue at position 8 of the sequence, while sequence obtained here for the the cENA sequence predicts a Ser. The rGH gene is in agreement with the cDNA sequence, placing a Ser at position 8. This same amino acid has been seen in the equivalent position of other GH's (25). Finally, at position 749 in the cDNA (9) a T residue is found that is not found at the corresponding position in the gene sequence (between bases 2196-2197). This portion of the original cHWA clone was resequenced and was also found to lack this base. The discrepancy thus appears to bo attribut- able to an error in the original sequence determination. Interveninq Sequences in the Growth Hormone Gene Comparison of the rGH cBNA sequence with that of the cloned gene shows that in the gene the protein-cod ing portion is divided by four intervening sequences, designated A through D. They are located within amino acid -23 of the pre-hormone (A), and between amino acids 31 and 32 (B), 7<* and 71 (C), and 124 and 125 (D). The location of intervening sequence A is unambiguous. Each of the other three could conceivably be placed one or more bases removed from its given position. However, those locations shown are the only ones that place a GT dinucleotide at the 5' junction of each intervening sequence and an AG dinucleotide at each 3' these two dinucleotides seem to be a general junction. feature of sequence junctions (27), we feel the positions given are reliable. observations are consistent with Since intervening those of Chien and Thompson Our (28) who described by heteroduplex analysis interveninq sequences in an independently obtained clone of the rat Growth Hormone gene. The four intervening sequences in PGH are in approximately the same 2101 Nucleic Acids Research locations and intervening are approximately sequence the same sizes (with the exception of R) as in human growth hormone (18). The entire difference in size between the human and rat E intervening sequences may be accounted for by the presence of a 200-base pair direct repeat that is found in the rat gene. The first unit of the repeat is located between bases 894 and 996, and the second, between bases 999 and 1206. The difference in size of the repeat units results frcm a repetition of the sequence CAAAA at the end of the second repeat unit. Other than this, there are only 8 base-pair differences between the two repeat units. Furthermore, there is an identical 15-base pair sequence (CACJTAATGACAGAGA) located just before the first repeat unit (bases 789 and DP3) and just after the second repeat unit (bases 1297 to 1221), i.e., the ?Qf-base pair direct repeat is itself flanked by a 15-base pair direct repeat. These observations strongly suggest that the large size of the rat B intervening sequence is due to the transposition of a 2T*— base pair direct repeat into that sequence. '.Jhether this repeat was once present in the ancestral human gene and has been lost in the course of evolution, or represents a more recent event occurring uniquely in the rat gene cannot be determined from the observations made here. However, that its presence is the result of a transposition event seems very likely (29). It is worth noting that a sequence similar to the repeat units identified above and inverted with respect to them is found just 3' to the rGH gene (vis. Figure 2 ) . This repeat unit has not been sequenced, but has been identified and mapped by electron microscopy (30). Fran the data given in the DNA sequence and the SI nuclease mapping data we can make an initial estimate of the size of the primary transcript of the rCH gene. The mFNA "CAP" site, identified for other eucaryotic transcriptional units as being close to if not coincident with the site of initiation of transcription (31, 32, 33), is located at position 230. The poly-A addition site was located by comparison with the rGH cENA at position 2210. The distance between the two locations—our estimate of the size of the rGH primary transcript—is 1980 bases. On the basis of hybridization data, Maurer et a_l., (34) have identified a 2.3 kb nuclear RSA species as a potential precursor to the rGH message. Considering the experimental errors involved, this estimate is in reasonable agreement with our own. However, they also identified a 5.6 kb and a 6.7 kb species as potential precursors. In consideration of the data we have presented above, and in the absence of any direct structural char- 2102 Nucleic Acids Research acterization of the putative precursors, we feel that the identition of the larger RNA species as rGH precursors seems premature. REFERENCES 1. Tashjian, Jr., A.H., Yasumura, Y., Levine, L., Sato, G.H., and Parker, M.L. (1968) Endocrinology 82, 342-352. 2. Martial, J., Baxter, J., Goodman, H.M., and Seeburg, P. (1977) Proc. Natl. Acad. Sci. 74., 1816. 3. Pearson, R.L., Weiss, J.F., and Kelmers, A.D. (1971) Biochem. Biophys. Acta 228, 770-774. 4. Lawn, R.M., Fritsch, E.F., Parker, R.C., Blake, G., and Maniatis, T. (1978) Cell 1. 1157-1174. 5. Fiddes, J.C., Seeburg, P.H., DeNbto, F.M., Hallewell, R.A., Baxter, J.D., and Goodman, H.M. (1979) Proc. Natl. Acad. Sci. USA 76, 4294-4298. 6. Cordell, B., Bell, G., Tischer, E., DeNbto, F.M., Ullrich, A., Pictet, R., Rutter, W.J., and Goodman, H.M. (1979) Cell 18, 533-543. 7. Ullrich, A., Shine, J., Chirgwin, J., Pictet, R., and Rutter, W.J., and Goodman, H.M. (1977). Science 196, 1313-1319. 8. Winters, G., and Fields, S. (1980) Nucleic Acids Research 8, 1965. 9. Sanger, F., Nicklen, S., and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA 74_, 5463-5467. 10. Seeburg, P.H., Shine, J., Martial, J.A., Baxter, J.D., and Goodman, H.M. (1977) Nature 270, 486-494. 61-70. 11. Maxam, A, and Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA 74, 560564. 12. Maxam, A., and Gilbert, W. (1980) in Methods in Enzymology (L. Grossman and K. Moldave, eds.) Vol. 65, pp. 499-560 Academic Press, New York. 13. Samuels, H.H., Klein, D., Stanley, F., and Casanova, J. (1978) J. Biol. Chem. 253., 5895. 14. Hardies, S.C. and Wells, R.D. (1976) Proc. Natl. Acad. Sci. USA 73, 3117-3121. 15. Leder, P., Tiemeeier, D., and Enquist, L. (1977) Science 196, 175-177. 16. Blattner, F.R., Blechl, A.E., Denniston-Thcmpson, K., Faber, H.E., Richards, J.E., Slightom, J.L., Tucker, P.W., and Smithies, 0. (1978) Science 202, 1279-1284. 17. Southern, E.M. (1975) J. Mol. Biol. 98, 503. 18. Moore, D., manuscript in preparation. 19. Diamond, D.J., and Goodman, H.M., unpublished information. 20. Gronenborg, B., and Messing, J. (1978) Nature 272, 375-377. 21. Gannon, F., O'Hare, K.O., Perrin, F., Le Pennec, J.P., Benoist, C , Cochet, M., Breathnach, R., Royal, A., Garapin, A., Cauri, B., and Chambon, P. (1979) Nature 278, 428-434. 22. Goldberg, M. (1979) Ph.D. Thesis, Stanford University. 23. Yu, L.Y., Tushinski, R.J., and Bancroft, F.C. (1977) J. Biol. Chem. 252 24. Weaver, R.F., and Weissman, C. (1979) Nuc. Acid. Res. T_, 1175-1193. 25. Wallis, M., and Davis, R.V., in Growth Hormone and Related Peptides (eds. Pecile, A., and Muller, E.E.) 1-14 (Elsevier, New York, 1976). 26. Gopinthan, K.P., Weymouth, L.A., Kunkel, T.A., and Loeb, L.A. (1979) Nature 278, 857. 27. Seif, I., Khoury, G., and Dhar, R. (1979) Nucleic Acids Research 6, 3387-3398. 28. Chien, Y-H. and Thompson, E.B. (1980) Proc. Natl. Acad. Sci. 77, 4583. 29. Potter, S., Truett, M., Phillips, M., and Maher, A. TL9iBPlT~Cell 20, 639-647. 2103 Nucleic Acids Research 30. Goodman, H.M. e t . a l . , manuscript i n preparation. 31. Ziff, E.B., and Evans, R.M. (1978) Cell 15 1463-1475. 32. Baker, C.C., and Ziff, E.B. (1960) Cold Spring Harbor Synp. of Quant. Biol. Vol. XLJV, 415-428. 33. Luse, D.S., and Roeder, R.G. (1980) Cell 20, 691-699. 34. Maurer, R.A., Gubbins, E.J., Erwin, C.R., and Donelson, J.E. (1980) J . Biol. Chan. 255, 2243-2246. 2104