* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The complete nucleotide sequence of the RNA coding for the
Survey
Document related concepts
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Epitranscriptome wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Gene expression wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Protein structure prediction wikipedia , lookup
Proteolysis wikipedia , lookup
Homology modeling wikipedia , lookup
Biosynthesis wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genomic library wikipedia , lookup
Biochemistry wikipedia , lookup
Polyadenylation wikipedia , lookup
Transcript
Volume 12 Number 5 1984 Nucleic Acids Research The complete nucteotide sequence of the RNA coding for the primary translation product of foot and mouth disease virus A.R.CarroU*, D.J.Rowlands* and B.E.Clarke* Animal Virus Research Institute, Ash Road, Pirbright, Woking, Surrey, UK Received 10 January 1984; Revised and Accepted 16 February 1984 ABSTRACT The complete nucelotide sequence of the coding region of foot and mouth disease virus RNA (strain A . Q 6 1 ) is presented. The sequence extends from the primary initiation site, approximately 1200 nucleotide from the 5' end of the genome, in an open translational reading frame of 6,999 nucleotides to a termination codon 93 nucleotides from the 3' terminal poly (A). Available amino acid sequence data correlates with that predicted from the nucleotide sequence. The amino acid sequence around cleavage sites in the polyproteln shows no consistency, although a number of the virus-coded protease cleavage sites are between glutamate and glycine residues. INTRODUCTION Foot and mouth disease (FMD) is a highly contagious viral disease of clovenhooved animals and is of paramount economic importance. The virus (genus: aphthovirus; family: Picornaviridae) has a single stranded positive sense RNA genome of approximately 8,500 nucleotides. The genome is 3' polyadenylated, can serve directly as a messenger and contains a poly (C) tract (100-200 nucleotides long) located approximately 400 nucleotides from the 5' end (1). Unlike most eukaryotlc mRNAs, FMDV RNA is not capped at its 51 end but has a small viral-coded protein (VPg) covalently attached to the terminal 5' uridine residue (2, 3 i The VPg protein is not required for translation and, by correlation with poliovirus is probably involved in replication (A, 5). The RNA of FMDV, like other picornaviruses, appears to have a long 5' untranslated leader sequence. Removal of the poly C tract of FMDV and all nucleotides to its 5' side has no effect on the spectrum of proteins generated by in vitro translation. This indicates that the site for initiation of translation is located to the 3' side of the poly (C) tract (6X The initial translational product of FMDV RNA would be a polyprotein of about 250,000 daltons. However, this protein Is not observed as it is nascently processed into the viral structural and non-structural proteins as shown in Figure L. We describe here the complete nucleotide sequence of the region coding for this poly protein from one strain of FMDV. © IRL Press Limited, Oxford, England. 2461 Nucleic Acids Research MATERIALS AND METHODS Recombinant Plasmids The construction of recombinant plasmids has been described previously (7). Briefly double stranded cDNA was synthesised using RNA from FMDV strain A . n 61 and inserted by G:C tailing into the Pst I site of pAT 153. Two of the resulting recombinanta (pFA76 and pFA206) contained inserts corresponding to =85% of the FMDV genome. Detailed restriction maps of these recombinants were generated by standard procedures. pFA206 was subcloned into two more manageable size clones pFA206a or pFA2O60. Initially pFA206 was digested with EcoRI, recirculari8ed using T4 DNA ligase and used to transform E. coli MC1061 by standard procedures (8, 9). This resulted in the clone pFA206a representing those sequences to the 3' side of the EcoRI site in pFA206 (see Figure IX pFA206B was generated by digestion of pFA206 with EcoRI and PstI, gel purification of the required band and ligation into EcoRI/PstI digested pAT 153. DNA sequencing (10). All DNA sequencing was carried out by the method of Maxam and Gilbert Recombinant DNA was sequenced using uniquely 51 or 3' 32P labelled restriction fragments as described previously (11). Primer extension sequencing 32 was carried out as described by Rowlands et al (12). Briefly, a 5' P-labelled primer (Cell Tech. Ltd., UK) was used to direct the synthesis of cONA using reverse transcriptase and viral RNA as a template. The resulting 5' labelled cDNA was size fractionated and sequenced by the method of Maxam and Gilbert (10). Analysis of Sequence Data DNA sequence was analysed using an Apple II microcomputer as described previously (13). RESULTS AND DISCUSSION Sequencing Strategy The construction of cONA clones from FMDV RNA (serotype A1Q 61) has been described elsewhere (7). These clones represented in total about 85% of the viral genome (Figure IX The nucleotide sequence of the region coding for the structural protein (VPs 1-4) has been previously reported (11). As shown In Figure 1 a large clone (pFA2O6) having a cONA Insert of 5.4Kb represented the major part of the genome coding for the non-structural proteins. To simplify sequence analysis this insert was sub-cloned into two halves (pFA206B and pFA2O6a) as described In methods. Detailed restriction analysis of these clones (see Figure 1) allowed a sequencing strategy to be carried out so that most regions of the genome were sequenced in both strands. Those regions which were not sequenced in both strands are clearly indicated in Figure 1 and these regions were determined from 2462 Nucleic Acids Research 0 FMDV RNA 1 2 3 SCALE Kb 4 5 7 6 8 VPg pdy(C) E potylA) ^ Primary Product! L p20a P1 P8S P3 plOO P2 PE2 Ctowagt p2(Wp16 VP4 VP2 VP3 VP1 p12 p34 VPgi p20b pFA76 XX ML I i pFA206 Saqutncing Stringy | f EQR i i i XTTPDHHPBVFP i i i I I i i 11 i ' i primtf FIGURE 1 Organisation of the FMDV genome. The positions of the gene products is based on previously published work (1). The naming of the polypeptides has been done using both existing names and the recommendations agreed at the 3rd European Study Group on moleculear biology of picornaviruses, Urbino, Italy, September, 1983. The new nomenclature is indicated above the corresponding polypeptides and the old nomenclature is indicated below. The alignment of two cDNA clones with respect to the viral genome la shown. The Eco R l site was used to sub-clone pFA206 into a and B as indicated. The sequence of 2820 nucleotides from the 5' end of the clone pFA76 has been reported previously (7X The sequence strategy is illustrated by the bars and arrows indicating the restriction sites used and the extent of the sequence obtained. Primer extension sequencing was used to obtain sequence to the 5' side of pFA76 and the location of the primer and sequence obtained is shown. The abbreviations used for restriction enzymes are: B, Bam HI; D, Hind III: E, Dde I; F, Hin FI; G, Bgl II; H, Hpa II; L, Bgl I; M, Sma I; P, Pst I; R, Eco RI; S, Sal I; T, Taq I; V, Pvu II; X, Xho I. several independent sequencing experiments. Previous mapping studies using defined T, oligonucleotide probes had indicated that the clone pFA206 did not include sequences extending to the poly (A) at the 3' end of the genome (7). The sequence data show however that this clone does in fact contain the whole of the 31 end of the genome including 33 bases of the poly (A) tract. None of the cDNA clones hybridised with T. oligonucleotides derived from the 5' end of the genome suggesting that this region was not represented in our clone banks. In order to obtain 5' end sequence beyond the region represented in the clones a primer extension method was employed. A synthetic oligonucleotide (Cell Tech Ltd., UK) whose sequence was derived from the 51 proximal region of pFA76 (see Figure 1) was used to prime cDNA transcripts which were sequenced as described in methods. Approximately 450 bases were sequenced using this primer. 2463 Nucleic Acids Research 35 £5- 3 >, is 35 & 35 §3 §3 85 m SI 35 Sfc OH =3tfl o c OO o i_ UH O o >, 83 35 Is §3 SI 83 O «} P3 §3 fe §n 55 35 s3 a£ mm 53 83 u h OO o L O. * aCcfl <>» QH Oo O 3 U Q. 3.3 33 3fc 35 o c IM 35 3.3 g35 °H < 1 35 35 3 o*e O-5! =i^ S? %Z 5C 5l= ^ 5 e> rnD u orD« Ua o t> •< * g£ 5i 83 33 Ux 1 3^_J OC OttT « 35 U C S 3 5^3 S3 C 55 55 O c> (J O •—OO ^4* O DO "33 Hi li ACA Thr §5 O >, s3 33 S3 35 83 35 OO OO *!3 O« 55 00 §5 ^£ S5 <n J< as 33 «5 00 u x no -Si O C 00 mun !S3 U 3 = -1 OO O °- S> 3 *Z 0 u 55 ^3 83 33 oct ^ 83 00 00 35 8" 83 OO «" oV SS3 SS5 5f3 o^f a& H i o> o3 Q3 y a. 30 33 35 l a isi n? t- ino §3 §1 S3 81 §3 $5 gf o a. o>- o^i =>>v Oo O g> u< ^l "g-3 ~B? as S5 Oi U 83 Ss 35 31. o c OO O l_ o • Or-J If 33 t^U gl o n B6 1 o « *~Q n o e ^ §5 3S Is IJ-H §5 33 O ^-1 84 o> 81 o£ II 35 35 2464 o> Si? 8P 335 3-**« a 5^ U 5 00 od: 3_ oV Dt- O-H Oo §3 00 31 si 35 15 £ O>, o-S ^ ^ oa o< O tl DC I3C?5 35 ye as- 35 O Q. OrH 35 §3 35 35 00 §£*• Or-, St O3^ 35 t oo 2 n O DO 84 O 3 •< 3 8i O C O C So §5 Si OjOo. 35 3£ 55 §3 SS3 000 oo JO. 3.3 CTCKO S3 13 33 II ^ f»5 >> f^O -^ y ^ i o L, on §ta. ye eg n o c o «» 3£ 33 35 33 §£ o Q. on Ju 84 gj 33 55 O C 83 if on oc 3S 33 §3 33 Nucleic Acids Research it 1*3 ps m £5 §3 ^3 §5 §3 ^ §3 83 15 33 i! 3S 83 *i-t (JO 85 a t o3 o3 <->o. U L . oo S-J o^ =>t- o£ <££ £$Z £8£ £S£ £33 3*P ^ 3 3 £j"<P T«F Jpoa "u CM <M °J "J gs a.2 85 a " 5E O I O 3 5 5 afc 3H 3S " o , o o 3.3 8t »3 s i 84 SfT 81 3a & 00 c £3hH if ^^3 f'T'Jj SS 83 3^ 8? 2 i & op« o-< z "33 SS^ ss| ^ 3 33 as 35 s-a 83 84 §3 I 33 ga aJi 35 Sa 8 t3o Si? i^ 33 1$ 85 "•3 SS si U'H <-> 3 On UC 3j Go •Kin *I-J SS SS n _> CD BS 83 :^i =3 CJ O E II ^ 5l 00. oc (_>M O1* ^-*s Ui ac ^n 2* 133 "i-H 00 C4,C *P t 3 CD U L, QI-. O O O3 O 0 5S 8-fe o^^nj -'^i u< Oo <P u^t £c3£ gap 3 4) 33 3 1) o a. IIL * ; *) (j 3 ^<—1 8fe §S hj ry < n 31 § 3 . 1 ! 83 8i= o n Or-4 =J5 * • * t_)ra UL. O t_ 3£ aji P3 < 3 3i 53 3^ at S3 3^ (_) C Q3 iS 3a 5S 3i O 3 C7^3O °§*> ^ o a ™=,« ^ y - as 3n O-3 35 jc •< aj 31— § Is 3S SS g3 SI S3 81 §3 §3 as 3S 83 3>3 •< L n a 3co ua. ^F- <-\ Q3 L) L. (J 3 a > %-< 00 no Cn 3« O_j r-.O J UQ. 3L OC 3D. as i Stv *<>*£ 3£ s ^ 35 < » 35 <->« ^^ ^ 13 i s s Ri^a a& as "as "^« is as 35 » SS B5 §1 S3 33 53 Ojc 4 ^ P 33i^ ^ SI 33 §4§5 §1 3<-t ^ - ^ <»H UX S5 35 13 33 Si <<|-i O£ U3 y a OH o 3 O"C ' ^ ^ff Sid 00 oi! 5*B| IDID 3u PO "5 « (J >. 3JT 8 3 ^ ^ UP 83 33 83 3i 35 8£ s pa,~ §3 3a 2e 1 ! 35 Ss S^ §S SS as§1 8£ S3 3i 8E 33 as33 §3 m 85 S3 as §3 13 SS S3 33 8S 3S as §1 SS 33 5_S 2465 Nucleic Acids Research 31 2466 5115 5130 5415 5160 5175 5190 5505 5520 CUC LRAJ GCC UAC AAJ GCC GCC ACC AGC G&J CCC UAC X U CCA CCA CCC CUU CUU GCC AAC CAC CCC CCU CAC ACA UUC AUC CUC CCC ACU CAC UCU CCA CCU CGC AAU CCA GUU CCA DAC Leu Phe A l l T y r Lys A l l A l l Thr Arg A l l Gly T y r Cys C l y C l y A l l V i l Leu Ala Lys Asp Gly A l l Aap Thr Phe I^e V i l Cly Thr H i s Ser A l l Cly C l y Asn C l y V i l C l y Tyr UGC UCA UGC CUll O K AGC UCC A X CUU 1 ) 8 AAC A X AAC C O ) 5 ^ GUC CAC CCU CAA5^C? CAC CAC CAC GGG5u8c AUU GUU GAU A C C ^ S GAU G X CAA G A Q 5 ^ CUC CAC GUC AUC5?^? Cys Ser Cys V i l Ser Arg Ser I € T Leu G i n Lys HE? Lys A l l H i s V a l top Pro C l u Pro H i s H i s C l u C l y Leu H e v i T top Thr Arg Asp V i l G l u G l u Arg V i l H i s V i l MET Arg AAA ACA AAG CUU ^ U CCC AC: GUI) CCA UAC CCC GUG UUC AACCCU GAC UUU CCC CCU CCC GCC UUG UCA AAC AAC CAC COG CCC C X AAC CAC GGA CUU CUU Cut GAC G AC GUC AUU UUC Lys T h r Lys L . o A l a Pro Thr V i l A l l T y r Gly V i l Phe ton P r o ' C l u Phe Gly Pro A l l A l l Leu Ser Asn Lys top Pro Arg Leu Asn G l u C l y V i l V i l Leu top top V i l H e Phe U K AAA CAC AAA S i CAC CCA AAC AUC At? CAA CAC GAC AAA5Gffi CUC UUC CCC C C C l S ? CCC CCU CAC UAC5Gffi UCA OGC CUC CAC 5 !?? GUC C X CCU A O C 5 ^ AAU GCC CCA UUG 5 *!? S«r Lys H i s Lys C l y top A l a Lys MET T h r G l u G l u top Lys A l l Leu Phe Arg Arg Cys Ala Ala top T y r A l l Ser Arg Leu H i s Ser v i l Leu Gly Thr Ala ton A l l Pro Leu S e r 5895 5910 5925 5910 5955 5970 5985 6QOQ AUC UAC GAG GCA AOC AAC GGC GUU GAU GGA CUC GAC GCC A X GAC CCG CAC ACU GCA CCU GCC CUC CCC UGG CCC QJC CAG GCA AAA CGC OGC CCU GCG CUC AUC GAC UUC GAG AAC CCC l i e T y r G l u A l l H e Lys G l y V a l top C l y Leu top A l a HET G l u Pro top Thr Ala Pro C l y Leu Pro T r p A l l Leu G i n Gly Lys Arg Arg G l y A l l Leu H e top Phe G l u ton Gly 6015 6030 6015 6060 6075 6090 6105 6120 ACC GUC GCA CCC CAA GUU GAC GCU GCC UUG AAC CUC A X GAG AAA AGA GAA UAC AAC UUU GCU X U CAC ACC UUC C X AAC CAC CAG AUU CGA CCC A X GAG AAA CUA CCC GCC GGC AAG Thr V i l Gly Pro C l u V a l G l u Ala Ala Leu Lys Leu HET G l u Lys Arg G l u T y r Lys Phe Ala Cys G i n Thr Phe Leu Lys top G l u l i e Arg Pro HET G l u Lys V i l Arg A l a C l y Lys 6135 6150 6165 6180 6195 6210 6225 6240 ACU OGC AUC CUC CAD GUU U X CCU GUU GAA CAC AUU CUU UAC ACC AAG A X A X AUU GGC AGC UUC X U GCA CAA A X CAC UCA AAC AAC GGA CCA CAA AUU CCC UQJ GOG GUC GCU UGC Thr Arg l i e V a l top V a l Leu Pro V i l G l u H i s H e Leu T y r Thr Lys HET HET H e Gly Arg Phe Cys Ala G i n HET H i s Ser ton Asn Gly Pro C l n H e Gly Ser A l l V a l Gly Cys 6255 6270 6285 6300 6315 6330 6315 6360 AAC CCU GAC GUU (jAD UGC CAA AGA UUU 5CC ACA CAU UUC G C C C A l UAC AGA AAC G X X G GAU G X GAC UAC OCC GCC UUU GAU CCA AAC CAC X C ACU GAC CCU A X AAC AUC A X WJU Asn Pro top V a l top T r p G i n Arg Phe C l y Thr H i s Phe Ala G i n T y r Arg Asn V a l T r p top V a l top T y r Ser Ala Phe top A l l Asn H i s Cys Ser top Ala HET Asn H e HET Phe 6375 6390 64Q5 6420 6435 6450 6165 6480 CAG GAG GUG UUC CGC ACA GAC UUU GCC UUC CAC CCA AAU GCU GlG X G AUC C X AAA ACC CUC GUG AAC ACG GAA CAC GCC UAU GAG AAC AAG CCC AUC ACU GUU GAA CCC CGC A X CCA G l u C l u V a l Phe Arg Thr Asp Phe G l y Phe H i s Pro Asn A l a G l u T r p H e Leu Lys Thr Leu V a l Asn Thr G l u H i s A l l Tyr G l u Asn Lys Arg H e Thr V a l G l u G l y C l y HET Pro 6495 6510 6525 6510 6S55 6570 6585 66O0 UCU GCU UGU UCC GCA ACA AGC AUC AUU AAC ACA AUU U X AAC AAC AUC UAC GUG CUC UAC GCU U X CGU AGA CAC UAU GAG GGA GUU GAG CUG CAC ACU UAC ACt A X AUC UCC UAC GGA Ser C l y Cys Ser A l l Thr Ser H e H e Asn Thr H e Leu Asn Asn H e T y r V i l Leu T y r A l l Leu Arg Arg H i s T y r G l u Gly V a l G l u Leu top Thr T y r Thr HET H e Ser T y r Gly 6615 6630 6645 6660 6675 6690 6705 6720 GAC GAC AUC G X GUG GCA ACU GAU UAC CAU C X CAC UUU GJC CCU CUC AAC CCC CAC UUC AAA UCU CUU GCC CAA ACC AUC ACU CCA GCU GAC AAA ACC GAC AAA GCU UUU CUU CUU CCU top top H e V i l V a l A l a Ser top T y r top Leu top Pht C l u Ala Leu Lys Pro H i s Phe Lys Ser Leu Gly Gin Thr H e Thr Pro Ala Asp Lys Ser Asp Lys Gly Phe V i l Leu C l y 6735 6750 6765 6780 6795 6810 6825 6840 CAC UCC AUU ACC GAU GUC ACU UUC CUC AAA AGA CAC UUC CAC A X GAU UAU GCA ACU CGC UUU UAC AAA CCU G X A X GCC UCA AAG ACC CUU G C GCU AUC CUC UCC UUU CCA CGC CGU C l n Ser l i e T h r top V i l Thr Phe Leu Lys Arg H i s Phe H i s HET top T y r G l y Thr C l y Phe T y r Lys Pro V a l HET A l l Ser Lys Thr Leu G l u A l a H e Leu Ser Phe Ala Arg Arg 6955 6870 6885 6900 6915 6930 6945 6960 GGC ACC AUA CAC GAG AAG U X AUC UCC C X GCA GGA CUC GCC GUC CAC UCU GCA CCA GAC GAG UAC COG OGU CUC UUU GAC CCC UUC CAC GGC CUC UUU CAG AUC CCA AGC UAC AGA OCA Cly Thr l i e G i n G l u Lys Leu H e Ser V a l A l l C l y Leu Ale V a l H i s Ser Cly Pro top G l u T y r Arg Arg Leu Phe G l u Pro Phe Gin Gly Leu Phe G l u H e Pro Ser T y r Arg Ser 6975 6990 7005 7020 7035 7050 7065 7080 CUU UAC C X OGU CGU OCC OGC GUG AAC GCC G X OGC GU GAC GCA UAA UCC CCC AGA GAU GAC AAU X C CAC UUC GCC UCU GGG GCC OGC GAC GCC GUA GGA G X AAA K C UCC AAA GAG CUU DOC OGC CGU Leu T y r Leu Arg T r p V a l Asn Ala V a l Cys Gly top Ala , , , C CCGCUUCCUCAAUOCAAAAAAAAAAAA FIGURE 2 The complete nucleotide sequence of the coding region of FMDV type A,.. 61 and the corresponding amino acid sequence from the major open reading frame. Putative cleavage sites for some of the viral proteins are indicated and those for which there is supportive amino acid sequence data are indicated with a superfix referring to the following references: 1) 11, and references therein; 2) 2, 2L 3) 14. </> ~" Nucleic Acids Research Nucleotide Sequence The nucleotide sequence determined from the recombinant clones and by primer extension sequencing is shown in Figure 2. The sequence includes an open translational reading frame of 2,333 codons terminating at a UAA codon 93 bases from the 3' poly (A) sequence. Two lines of evidence suggest that this represents the sequence coding for the FMDV polyprotein. Firstly, no other reading frame of significant length is found in the sequence. Secondly, protein sequence data is available for several viral proteins (11 and refs. therein, 14) and these sequences correlate with the major open reading frame. Further sequence towards the 51 end from the major open reading frame has termination codone in all three reading frames (15). Examination of our sequence gels allows ua to estimate the poly (C) tract to be at least 500 nucleotides from the initiation site for translation. This would mean that FMDV RNA has a 5' untranslated region in excess of 1,000 nucleotides long and a total genome length of at least 8,100 nucleotides. The nucleotide composition and dlnucleotide frequency of the coding region are consistent with the data previously obtained for the region of the genome coding for the structural proteins (11). In particular the base composition shows a bias In favour of C+G over A+U. This is also reflected in chemical analysis of the RNA (16). The relatively high frequency of CpG is also maintained throughout the genome being 83% of the expected frequency compared to 37% for eukaryotes in general (17) and 47% for poliovirus (18). Predicted Amino Acid Sequence The processing of picornaviral proteins is a complex process and appears highly variable between different members of the family. However, the general genomic organisation and the primary cleavage products derived from the polyprotein are similar, (1, 19) with one important exception. The region to the 5' side of the genome coding for the structural genes in FMDV and encephalomyocarditis virus code for additional proteins which have no identifiable counterparts in poliovirus. It has been shown that two proteins are coded for in this part of the genome in FMDV; namely p20a (Lab) and pl6 (Lb) (for polypeptide nomenclature see Figure 1). Analysis of these proteins has shown them to have similar tryptic peptide maps and limited proteolysls patterns (6, 15). Both of these proteins can be labelled in vitro using N-formyl ( -S) methionine t RNAf" 181 (6) and studies involving a protease inhibitor suggest that the C-terrnini of these proteins are the same (20). These data suggest that there are two initiation sites in FMDV RNA. Examination of the sequence data presented here shows the presence of a second AUG codon located at nucleotide position 85 and in the same reading 2468 Nucleic Acids Research frame as the initial AUG at position 1 (see Figure 2). Initiation at these two sites would result in two similar proteins with a molecular weight difference of 3,800, i.e. the difference in molecular weight between p20a (Lab) and pl6 (Lb). Beck et al (21) have also recently reported the presence of these two putative initiation sites in two other serotypes of the virua. Although no protein sequence data are available to confirm the predicted sequence of the three VPg proteins, their charge, amino acid composition and tryptic peptide composition (2) are entirely consistent with the sequences presented here. These data also agree with those of Forss and Schaller (22) on the predicted sequence of the VPgs from FMDV serotypes O and C. Although in general FMDV and polio show a high degree of similarity in genome organisation the VPg genes are an example where they differ significantly. Polio has a single VPg gene whereas FMDV has three, which, moreover are highly conserved in three serotypes (22). All three VPgs from FMDV are equally represented in RNA extracted from virus particles (2). The poliovirus VPg has been found in a precursor molecule of molecular weight 12,000 of which the VPg protein itself is contained in the C-terminal portion. The non-VPg sequences In this precursor contain a hydrophobic region of 22 amino acids situated 7 amino acids to the N-terminal side of VPg. This hydrophobic area is thought to be responsible for the association of the VPg precursor with cellular membranes, since the VPg precursor is found in membrane bound complexes (23). Comparison of hydrophilicity plots representing the VPg region from FMDV and polio show no hydrophobic region in FMDV comparable to that in polio. The functional precursor, (if any), containing the FMDV VPgs is at present unknown. Comparison of our sequence with the recently published sequence of the FMDV A._ polymera8e (p56a; 3d) (14) show them to be very similar. There are 16 amino acid changes (out of 470 total) which are distributed throughout the protein. Cleavage sites The proteolytic cleavage sites involved in the processing of the poliovirus polyprotein into the final products show a high degree of uniformity not seen with FMDV. In polio eight of the identified cleavage sites are between gln-gly pairs; of the remaining cleavages one is between tySgly and one is between asn-ser (24). The latter cleavage is between VP4 and VP2 and occurs during the final stages of virus maturation. With FMDV, on the other hand, little homology is apparent in the sequences around the known cleavage sites. The possible cleavage sites involved in processing the polyprotein of FMDV are indicated in Figure 2. Those sites for which some amino acid sequence data are available are also indicated. Both host 2469 Nucleic Acids Research and viral specified proteases are involved in the processing of FMDV proteins (20). The cleavages thought to be caused by a viral enzyme (secondary cleavages) indicate a preference for glu-gly bonds, for example between VP2 and VP3, at the N termini of the VPgs and between p2Ob (3c) and p56a (3d, polymerase). Of the twelve putative cleavage sites (both primary and secondary) five occur between glutamate and glycine. The putative host specified cleavage site (primary cleavage) between p20a/pl6 and p88 (L and PI) has been revised from previous reports (7). This cleavage is now proposed to occur after amino acid 204 between gly and gin on the basis of homology between FMDV and EMC VP4 proteins (A. Palmenberg, personal communication), and our recent observation that FMDV VP4 contains proline (15). With the revised cleavage site VP4 contains a single proline at amino acid position 208 (see Figure 2\ The N-terminus of VP4 is known to be refractory to Edman degradation in all picornaviruses which have been examined (11, 19, 24, 25) and with FMDV and EMC this blockage may be due to deamination and cyclisation of a glutamine. CONCLUDING REMARKS At present we do not have clones which span the poly (C) tract and so it has not been possible to analyse the sequence of the 5' non coding region of the FMDV genome. The reason for the lack of 51 end clones is not clear. It could be simply due to the low frequency of cDNA transcripts which extend to the extreme 51 end, or alternatively the unusual structure of the poly (C) tract may be unstable in E. coli and eliminated during plasmid replication. Direct RNA sequencing of the, extreme 5' end of the genome has been reported (26) and shows that considerable homology exists between the first 27 nucleotides from all seven serotypes of FMDV. This suggests that this region has a critical role in virus replication. Sequence data from the 3' untranslated region of FMDV have also been reported (14, 27). Porter et al (27) analysed the nucleotide sequence adjacent to the poly (A) for five serotypes and found =75% homology in the first 20 nucleotides and a highly conserved sequence of eleven nucleotides at position -7 to -17 from the poly (A) tract. One of these serotypes was identical to the virus used in this study and the sequence of 42 nucleotides reported agrees exactly with that reported here. Recently the 3' sequence of a second type A virus (A._) has been reported (14) and maintains the sequence of the eleven highly conserved nucleotides, however the remainder of the sequence up to the poly (A) is considerably different and includes three additional nucleotides. In the A._ virus there is a single UAA termination codon located 96 nucleotides from the poly (A) 2470 Nucleic Acids Research tract in contrast to the single UAA termination codon located 93 nucleotides from the poly (A) reported here from the A._ virus. A knowledge of the coding sequence of a virus is a necessary pre-requisite for the full understanding of the functions of the encoded proteins in virus structure and replication. Moreover, comparisons of the complete sequences of .different viruses from within the picornavirus family will indicate evolutionary relationships between the virus groups. Finally the sequence data is essential for experiments involving site specific mutagenesis of an infectious clone of virus RNA (28) in order to modify specific regions of the genome. ACKNOWLEDGEMENTS We would like to thank Dr A Makoff for computer analysis and Dr F Brown for helpful discussion and critical reading of the manuscript. We would like to thank Dr M Eaton (Cell Tech Ltd, UK) for kindly providing the synthetic oligonucleotide primer. •Present address: WeUcome Biotechnology Limited, Ash Road, Pirbright, Woking, Surrey, UK REFERENCES L Sangar DV. (1979) J. Gen. Virol. 45, 1-13. 2. King AMQ, Sangar DV, Harris TJR and Brown F. (1980) J. Virol. 34, 627-634. 3. Sangar DV, Rowlands DJ, Harris TJR and Brown F. (1977) Nature 268, 64865a 4. Flanegan JB, Petersson RF, Ambros V, Hewlett MJ and Baltimore D. (1977) PNAS USA 74, 961-965. 5. Nomoto A, Detjen B, Pozzatti R and Wimmer E. (1977) Nature 268, 208-213. 6. Sangar DV, Black DN, Rowlands DJ, Harris TJR and Brown F. (1980) J. Virol. 33, 59-68. 7. Boothroyd JC, Highfleld PE, Cross GAM, Rowlands DJ, Lowe PA, Brown F and Harris TJR. (1981) Nature 290, 800-802. 8. Wensink PC, Finnegan DJ, Donelson JE and Hogness DS. (1974) Cell 3, 315325. 9. Woods DE, Crampton JM, Clarke BE and Williamson R. (1980) Nucleic Acids Research 8, 5157-5168. 10. Maxam A and Gilbert W. (1980) in Methods in Enzymology VoL 65, pp499560, Grossman L and Moldave K (Eds), Academic Press, N.Y. 11. Boothroyd JC, Harris TJR, Rowlands DJ and Lowe PA. (1982) Gene 17, 153161. 12. Rowlands DJ, Clarke BE, Carroll AR, Brown F, Nicholson BH, Bittle JL, Houghten RA and Lerner RA. (1983) Nature 306, 694-697. 13. Makoff AJ, Paynter CA, Rowlands DJ and Boothroyd JC. (1982) Nucleic Acids Research 10, 8285-8295. 14. Robertson BH, Morgan DO, Moore DM, Grubman MJ, Card J, Fischer T, Weddell G, Dowbenko D and Yansura D. (1983) Virology 126, 614-623. 15. Clarke BE (1984) Manuscript in preparation. 16. Newman JFE, Rowlands DJ and Brown F. (1973) J. Gen. ViroL 18, 171-180. 17. Nussinov R. (1981) J. Mol. BioL 149, 125-13L 18. RacanieUo VR and Baltimore D. (1981) PNAS USA 78, 4887-489L 19. Rueckert RR. (1976) in Comprehensive Virology, Fraenkel-Conrat H and 2471 Nucleic Acids Research 20. 21. 22. 23. 24. 25. 26. 27. 28. 2472 Wagner RR (Eds) Vol. 6 ppl31-213, Plenum Press, New York. Burroughs JN, Sangar DV, Clarke BE and Rowlands DJ. (1984) J. ViroL In press. Beck E, Forsa S, Strebel K, Cattaneo R and Fell G. (1983) Nucleic Acids Research 11, 7873-7885. Forss S and Schaller H. (1982) Nucleic Acids Research 10, 6441-6450. Semler BL, Anderson CW, Kitamura N, Rotherberg PG, Wishart WL and Wimmer E. (1981) PNAS USA 78, 3463-3468. Kitamura N, Semler BL, Rothberg PG, Larsen GR, Adler CJ, Dorner AJ, Emini EA, Hanecak R, Lee JJ, van der Werf S, Anderson CW and Wimmer E. (1981) Nature 291, 547-553. Scraba DG. (1979) in The Molecular Biology of Picornaviruses, PerezBercoff R. (Ed) ppl-23, Plenum Press, New York. Harris TJR. (1980) J. Virol. 36, 659-664. Porter AG, Fellner P, Black DN, Rowlands DJ, Harris TJR and Brown F. (1978) Nature 276, 298-301. Racaniello VR and Baltimore D. (1981) Science 214, 916-919.