* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Nucleotide sequence and genome organization of foot-and
RNA interference wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Metagenomics wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Protein moonlighting wikipedia , lookup
Genomic library wikipedia , lookup
Expanded genetic code wikipedia , lookup
RNA silencing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Genome evolution wikipedia , lookup
Primary transcript wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Point mutation wikipedia , lookup
Epitranscriptome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding RNA wikipedia , lookup
History of RNA biology wikipedia , lookup
Polyadenylation wikipedia , lookup
Genome editing wikipedia , lookup
Genetic code wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
volume 12 Number 16 1984 Nucleic Acids Research Nucleotide sequence and genome organization of foot-and-mouth disease virus S.Forss, K.Strebel, E.Beck and H.Schaller Microbiology, University of Heidelberg, Im Neuenheimer Feld 230, D-6900 Heidelberg, FRG Received 21 May 1984; Revised and Accepted 24 July 1984 ABSTRACT A continuous 7802 nucleotide sequence spanning the 94* of foot and mouth disease virus RNA between the 5 -proximal poly(C) tract and the 3'-terminat poty(A) was obtained from cloned cpNA, and the total size of the RNA genome was corrected to 8450 nucteotides. A long open reading frame was identified within this sequence starting about 1300 bases from the 5 end of the RNA genome and extending to a termination codon 92 bases from its polyadenylated 3 end. The protein sequence of 2332 ammo acids deduced from this coding sequence was correlated with the 260 K FMDV polyprotein. Its processing sites and twelve mature viral proteins were interred from protein data, available for some proteins, a predicted cleavage specificity of an FMDV encoded protease for Gtu / GlyCThr, Ser) linkages, and homologies to related proteins from poliovirus. In addition, a short unlinked reading frame of 92 codons has been identified by sequence homology to the polyprotein initiation signal and by in vitro translation studies. INTRODUCTION Foot and mouth disease viruses which are the (FMDV) or aphthoviruses causative agent of an aggressive are picornaviruses and economically important disease of cloven-footed farm animals. Their virion contains a single-stranded RNA genome of about 8 kb with a small protein (VPg) covalently attached to its 5' end, an internal poly(C) tract, and a poly(A) sequence at the 3' end. This RNA is of positive polarity Protein sythesis involves and can act directly post-translational cleavage as a messenger RNA. of a 260 K potyprotein which is encoded between a single major translation initiation site next to the poly(C) tract and the 3' end of the RNA. This mode of protein synthesis is common to all picornaviruses (1) and makes them interesting model systems for studying the mechanisms of translation initiation and of protein maturation by specific proteolytic cleavages (2). To obtain more information about the FMDV genome, its control signals and its gene products we have cloned cDNA copies of the viral RNA from strain O-jK and determined their nucleotide sequence. Using a set of overlapping clones we obtained a continuous sequence of 7915 nucleotides representing the 3' proximal long L segment of the FMDV RNA which is thought to contain all its c o d ing information. In a previous publication we had already determined the initia- © IRL Press Limited, Oxford, England. 6587 Nucleic Acids Research lion sites for polyprotein synthesis and a long open reading frame encoding the structural proteins of FMDV (3). The present work extends this reading frame by the coding sequence translation product of for the non-structural proteins, 2332 ammo acids which could predicting a single be correlated in many parts with known protein data trom FMDV induced proteins. In addition, we report and discuss a sequence of 713 bases preceding the polyprotein gene to the poly(C) tract. MATERIALS AND METHODS Enzymes Restriction endonucleases were isolated and purified by standard procedures or purchased from Boeringer Mannheim or Biolabs. Avian Myeloblastosis reverse transcriptase was a gift from W. Keller, Heidelberg. Aspergillus oryzae, calf Virus Nuclease S1 from intestinal phosphatase, and T4 polynucleotide kinase were from Boehnnger Mannheim, and terminal deoxynucleotidyl transferase was [ a / - 3 2 P ] - A T P was prepared according to (4). from BRL. FMDV RNA FMDV O.|K RNA, isolated after 7 and 64 passages in BHK-cells of the same field-isolate of the virus (Kaufbeuren 1966-67) was provided by K. Strohmaier and W. Keller. The virus was plaque-purified at the beginning of the passages and again after passage 16. FMDV cDNA clones The clones FMDV-2735 and -2615 were constructed essentially as described by (3) using single-stranded restriction fragments (mapping at pos. 3662-3752 and 3792-3882, cf. Figure 2) trom the existing cDNA clone FMDV-715 as primers for cDNA-synthesis. The double-stranded cDNA was provided with dC- tails and annealed to dG-tailed pBR322, linearized at the Pstl site. tion of clones FMDV-3214a and -3214c was carried out using Construca modified procedure that avoids second strand synthesis and G/C tailing (cf. Figure 1): cDNA synthesis 692-743) from was primed cDNA clone with an Aval/Hindlll FMDV2735 which potyacrytamide strand separation get (8). had restriction fragment been isolated (Pos. from a 6* Reverse transcription was followed by RNase treatment (pane. RNase 20ug/ml, 30 min., 37°C). Oligo-dA tails of 70-80 nucleotides were added to Deoxynucleotidyl Transferase. the 3'-ends of the cDNA using Terminal This ohgo-dA tailed cDNA was annealed to the DNA fragment complementary to the primer fragment. By this a partially d o u - ble stranded molecule was obtained having a single stranded oligo-dA tail at its 3'-side and a double stranded portion at its 5'-side reconstituting the origi- 6588 Nucleic Acids Research nal Hindlll site ot FMDV pos. 743. This DNA molecule was then annealed to the vector pUC9 (6), that had been cleaved by Pstl, dT-tailed, Hindlll cleaved and puntied by agorose gel electrophoresis. The DNA mixture was ligated and transformed into competent E.coli C600 cells. Clones were screened for FMDV inserts by colony hybridization with [ 32 P]-labelled FMDV RNA as described elsewhere (7). Clones FMDV-2735 and -2615 were derived from RNA isolated from low passaged FMDV (7 passages), all other clones were derived from a virus-isolate after 64 passages in BHK cells. Nucleotide sequence analysis Restriction fragments were endlabelted and chemically degraded by the basespecific cleavage methods according to (8). 5' endlabelling was used throughout except for the 3' end of the genome where 3' endlabelling was employed at a Hpall site located 68 nucleotides in front of the poly(A) tail. Thin sequencing gels (0,04 cm), either 40 or 100 cm long, were dried prior to printing, according to (9) to improve resolution of the bands in the sequencing ladders. Computer analyses The derived nucleotide sequences were entered into a data base, where the information was stored and processed using the computer programs of (10). The alignment program from Kriiger and Osterburg (unpublished) was based on the algorithm from (11). The homology comparison in Figure 4b was based on such computer-derived alignments scoring "similar ammo acids" as one third of identical ammo acids. RESULTS AND DISCUSSION Cloning of FMDV cDNA A set of cDNA clones from FMDV strain OiK (FMDV-144, -512, -703, -715, -1034, -1448) has been described which cover the two thirds of the FMDV genome from the VP3 coding region to the 3' end (5, cf. Figure 2). To obtain cloned cDNA copies upstream of the VP3 gene single-stranded restriction fragments from the existing cDNA clones (indicated by I and II in Figure 2) were used in two steps (7, 12) to prime cDNA synthesis close to the missing parts of the FMDV genome (see Methods and Figure 1). As a result the cloned part of the FMDV genome was extended into the 3'-end of the poly(C) tract. Nucleotide sequence analysis In Figure 2 exact map positions of the cloned cDNA inserts used for nucleotide sequence analysis and the sequencing strategy are depicted. The methods of (8) were followed for 5' [ 3 2P]-endlabelling and subsequent partial chemical 6589 Nucleic Acids Research RNA 51 cDNA 3' __ ~ Hindu _=_*' •™ *** TTTTT Transformation — PUC9 Fig.1: S t r a t e g y tor cloning 5'-terminal sequences of FMDV O i K . The minus s t r a n d of an Aval/Hmdlll fragment ( p o s . 6 9 2 - 7 4 3 , hatched b o x ) from cDNA c l o n e pFMDV2735 was used t o s p e c i f i c a l l y prime cDNA synthesis on FMDV RNA (step I). The RNA template was h y d r o l y z e d using p a n e . RNase, o l i g o - d A tails were a d d e d to the 3 ' - e n d s of the cDNA and the cDNA was annealed to the plus s t r a n d of the Aval/Hmdlll fragment ( e m p t y b o x ) , complementary to the primer fragment (step II). The partially double stranded and d A - tailed cDNA was annealed to pUC9 v e c t o r DNA (6) that had been Pstl c l e a v e d , d T - t a i l e d and Hmdlll c l e a v e d in this order (step III) and then t r a n s f e r r e d into competent E.coli C600. degradation of r e s t r i c t i o n fragments. data most In order to obtain unambiguous regions were sequenced in several sequence independent r u n s , and the entire sequence was analyzed on both DNA strands (Figure 2 ) . Using o v e r l a p s of at least hundred a continuous sequence was g e n - erated over poly(C) n u c l e o t i d e s between s u b c t o n e s , the 7915 c l o n e d tract pFMDV-3214c and nucleotides ending in 102 and p F M D V - 5 1 2 , starting with A residues respectively 11 C residues from as (Figure determined 3). All in the clone cleavage sites p i e d i c t e d in this sequence for the restriction enzymes used during the sequence analysis 5026. were a l s o o b s e r v e d experimentally, This site overlaps two Mbol sites e x c e p t for a Clal site at which are known to position be targets for adenosine m e t h y l a t i o n in E. coli (13). Minor h e t e r o l o g i e s in the sequence different c l o n e s (Figure 3 , 14). high probability in populations because of the intrinsically were o b s e r v e d in o v e r l a p p i n g regions from Such point mutations are known to occur of FMDV RNA (15) and in other viral with RNAs imprecise mechanism of RNA replication (16). variation was p a r t i c u l a r l y obvious when cDNA from moderately saged virus (7 and 64 passages in 6HK c e l l s , r e s p e c t i v e l y ) was c o m p a r e d . substitutions were f o u n d in 550 bases o v e r l a p p i n g the c l o n e s This and highly p a s No FMDV-2735 and - 2 6 1 5 from the same l o w passaged virus (14). The n u c l e o t i d e The FMDV sequence genome is divided by the p o l y ( C ) size and f u n c t i o n . The small (S) segment tides (17)) is p r o b a b l y involved only tract into t w o parts of to the 5 ' side ( a p p r o x . different 400 n u c l e o - in initiation of viral RNA r e p l i c a t i o n , and the large (L) segment 3 ' of the p o l y ( C ) contains a l l the protein coding i n f o r m a - 6590 ryib'f"i ** • p». 'I »" ' m I \ | P12 n* ; 'viv'W i; ;'—;B 't' " t y r t i 1 i " i ' i ' i i f t i / f t ^ ' i ^ _ ., - I m | »ni | raoi I'T #N»tfU{'.itf"'' ' 4 ^ 1 , i , . ™-j Fig.2: Physical map ot the FMDV genome, FMDV cDNA clones and strategy for the nucleotide sequence analysis. The top part shows the physical and the gene map of the FMDV RNA. Positions of the restriction sites of endonucleases used for 5' end-labeling are indicated on the second line. Below that, the restriction fragments used to prime cDNA synthesis (I and II) are illustrated as open arrows. Horizontal arrows show direction and extent ot individual sequencing runs, grouped according to the clones they originate from. Dashed lines represent portions of clones or sequencing runs from which no information was obtained. The analysis of the clones FMDV-1034 and FMDV-144 has already been reported (26) and only the extent ot the sequenced areas are indicated here. The restriction enzymes are represented by the letters: A, Aval; B, BamHi; C, Clal; E, Haelll; F, Hind; H, Hhal; I, Hphl; N, Hindlli; P, Hpali; R, EcoRl; T, Taql; U, Pvuli; X, Xhol; Xb, Xbal. CD o' 3) a> O) in CD 0) a Nucleic Acids Research tion (18). The sequence ot 7802 nonhomopolymeric nucleotides shown in Figure 3, represents the complete primary structure ot the L segment. Assuming additional 150 bases tor poly(C) and 400 tor the S segment this corresponds to 94 % of the FMDV genome indicating that the size of the FMDV genome is about 8500 nucleotides, i.e. 500 nucleotides longer than estimated from sizinggets by (17). This size correction is also supported by the sizing and the partial sequence analysis of cDNA copies that cover the as yet uncloned 5'-terminat part of the FMDV genome, i.e. the S segment and the poly(C) tract (7). So tar, all attempts to clone this missing part of the viral RNA have been unsuccessful. These difficulties seem to be related to the internal poly(C) tract, which, although readily copied into cDNA (7), is probably highly unstable in E. coli, as shown for other (GtC)-homopolymers longer than 30 basepairs (19). In accordance with this notion we have only been able to clone 11 C residues from the 3' proximal part of the poly(C) tract which still contains interdispersed non-C nucleotides. The sequence obtained ...GCT(C)i 1AAG... was confirmed by direct sequencing of the cDNA extending into the poly(C) tract (7). It is different, but shows similarities to the sequence ..T(C)2AUUCCAAG... determined for the 3' end ot a T1 oligo nucleotide containing the poly(C) tract from FMDV O-V1 and A61 (17). Coding regions and translation initiation sites From the kinetics of the appearance of FMDV specific gene products it has been predicted that FMDV RNA contains a single long open translational reading frame for a 260 K polyprotein, which in turn is a precursor for all gene products. Our sequence reveals such a reading frame of 7035 nucleotides (pos. 766 to 7800) with a first possible initiaton codon for the polyprotein at position 805 (see Figure 3). The coding region is preceded by 724 nucleotides of known sequence and by some additional 550 nucleotides of yet uncloned RNA comprising the poly(C) tract and the S fragment. It is followed by several atop codons in all three reading frames leaving 92 nucleotides untranslated in front of the poly(A) tail. Usually the 5' proximal AUG is used to initiate translation in an eucaryotic mRNA, which suggests that the ribosomes first recognize the capped 5' end ot the RNA and then traverse downstream until an AUG is encountered (20). Clearly this model is not applicable to FMDV since translation of the major primary gene product starts at position 805 and to a lesser extent also at position 889 (3), which are the ninth and tenth AUGs downstream trom the poly(C) (see Figure 3). These two start sites differ from other AUG codons in the sequence in that they are preceded at a short dis- 6592 Nucleic Acids Research 92 fCTnAACTTTTACCCTCCTTCCCGACGTAAAASCCAGGTAACCACAACCITCAAACCGTCCCCCCCCACCTAA^ ?0« 31S CGGA^GAAACCACAAGACTTACCTTCCCTCCCAAGTAAAACGACAAACACACACAGTTTIGCCCGTTTT^^ TTGTACAAACACCATCTATGCAGCTTTCCOa^ACTGACACAAACCGTGCAACTTCAAACTCCGCCIGCTCT^^ «<0 GATCCACTAGCGAGTCTTAGTAGCGGTACTGCTGTCTCGTAGCGGAGCAJGTTGGCCGTGGGAACACCTCT^ See •I t P2Q&B0& CATGTGTGCAAC<SX>CCACGGCAGCTTTACTCTCAAACCCACTTCAAGGIGACATTGATACTGCTACTCAA C«1ACACTCGGGATCTGAGAACCGGACTGGGACTTCTTTAAAGTGCCCAGTTTAAAAA&CITCTACCCCTGA^^ ATG AAT ACA ACT & * C TCT TTT ATC GCT TTG G T A CAG CCT A T C AGA GAG AIT AAA GCA CTT TTT CTA TCA CGC ACC A C A GGG AAA I ATG GAA P20fi "j.V"J!;'_'J!'_i"i".tri"i*'j_""-*'j_i;i*'jJ!.'.t;ii';_'i'_L."--i"ifi>_'-i"_i'i-if_'i'-r_'_!'j'.M\r:7Ol" " P I O <u C G&h TT C CAA CTG CAT GAG GGT G&h CCA CCT 12SS TTT CCG TGT GTC ACC TCC AAC CCG TGG 13*S GTC CCC TAC CAT CAA GAA CCA CTC AAC GGG CAA Prv Lfu Asn VaI HIS )i5I5 ie<» P'o Tyr Aap Cl u Gin CCC TCC CAG AAC CAA TAC CCT GCG CTC ATT ATC CAC GAT TCG AAA GCC Gly Glu TCT CCClAAT GTG Trp TGG GAG AAC GAC TTC AAG GTT * AI Ly* ATC TAC CAA Lya AAG CAC CCC CCC Val TTG CAC AAA Arg Lya Ltu ACT GGC ACC ATA ATA AAC AAC TAC CTT CCC ACC TCT GCT TTC AGC GGT CTT TTC GCC CCT CTT CTC GGC CCG GGG GCT GCA Lya ATC GGC TCC GAC CAA Gly GTT TCC Ala ACT Gly G C GAA ACC GCC CTG GTC CCA GCG Gin Sar TC TCG AG G f f j i f i GAT CTC C&l TTT ACC Sar Pro A l a Thr TAC ATC CAC CAG TAT CAA AAC GAC AAC GCA ATC ACT GCA GGC TCT AAC CAG GGC TCC ACC CAC ACA ACC AAA T ACC TGG ACG CCG GAC AAG CTC Gin CTC 210 TCC ATG GAC ACA CAC CTT GGT TCC ACC CAC ACA ACC AAC ACC CAC AAC AAT GAC TGG TTC CCC I c A C AAG AAG ACA GAG CAG ACC ACT CTC CTC CAA CAC CCC ATC VP4 TCC VP2 Idti L t l IiLF SlH S±S IE.' !£.' LU l U 41k 410 &!« ill 30 ° 170) CTC ACC ACC CGI AAC GGC CAC ACC ACG TCG ACA ACC CAC TCA AGC GTT GGA GTC A C A TAC GGG TAC CCA ACA CCT GAA GAT TTT CTG ACC Lau Tnf Tnr A^j Ain 61 r H n Thr Tnr &*r Thr Tnr Gin S«' S«' V«l Gly V«l Thr Tyr Gly Tyr Al# Thr Alt Glu Aap Prl* V*l Sar 330 17^^ G^y^ CCG 'UhC ACT TCC GCT CTC ^^rt ACC ACA CTT CTG £P^1 GCA ^^^ CGC TTT TTC AAA ACC f**ftC CTC TTC ^*if TCG CTC ACC ACT f^yf TCA 1 ^f ^ TTC G&^ CGT TGC ^**f CTC CTG ^^^A CTC CCG ACC ^^c CAC '^OV GGT GTC TAC GGC AGC CTG ACT CAC TCG TAT CCA TAT ATG A^y^ *<ff GGC : CIGGA 2336 TGT 2a I S TTG ACC Cyt 2S1S CAC Sar CCG ^AA GCC Aap Cly GGG CGT ff*fi TAT GGT Tyr TTT fL^f* f a r GGC Gly ACC CTC GTG Glv AAC Lau CTC ACC ACG GAC Val CTT CCG AAC Pio T^^ T j ^ A a p CAT GTG GCT ACC f>AO GCA CCT Ly^ TCC CAC CCC GTT Thr A l • Aap CCG ACG TTT TAT Pry CTC GGG VaI CGC Thr Aap T»f Tyi Tnr G i n Tyr Gl r itt Tnr l i t Am n n Ltu Pnr Mai Put Th< GAG GCG CCC CCC CAC i16S T A C fJC^m 2»0S CCC CCT Ala JOSft TCC ATC CCC Sar i i a 3736 ACC 337$ CTC Tnr TAC CTC TCC CCC GCC CAT Lau i n A l a A l a A«o T v Ala P r o Tyr C C C GAA IACC ACT TCT CCC CCC Arg Ai a G I U I T I W C C C f~^* ^AC TCC CTT Trp CCC Pro GCA A m CCC GCC C l y CCC GAG TIW Sa_ ACG fi^c O ' C CCA AAT V a l TAC ACT TTC Tyr G l v ^ _ > TTC G l y *>'O 26 9ft CCA CCA GCC ATC CAC CCG CCC ^^G ACA CCT Pnt CTG GAG GGT AAC CCC VaI GCC Pha GTA CCC AAC Pio Aan CCG CGC TAC ACC CAA TTC GGT CCC CCC GTG CAC TCA CCC CAT Ala Cly Tyr CCT frlu CTC $\t Arg Aan Gin 640 ACG ACC CCC CAA AAC CCC T T C CAC * A C Pro C l u t-ya A l a GTG TTG CCA I L t u A l n r\^c A l t CAT C ACC ACC AID GTT Pip GAA AAC Val TAC Thr GGT CCC Thr Arg Trf T CAA Thr C C A ACT GCT Thr CCT CAC A«n P r o TCC i * ^ ^ Mai Val Alt Tyr A l t S30 TGG fr^f ACT GGC TTA ifeAC TCA AAG A l t rtj^c ACA C A C ATC Vil Tyr GCA M i a CCA Lya CAG Glu A i n CCQ C A ^ TAC CAC AAC Tnr TAC CAA Thr ACA T T T C T C ^%/sG C T C A C A A C C ACC A A C A»p ^CC C T C TAC TGC ATT CCC C A G TTT TAC ACC CCG TCT GfaC GTG GCC GAG ACC ACft J^^T GTG C^flr ^ f ^ TGG GTC S i r Gly V»l A l a Q*u Tnr Tnr Aan V a l Gin Gly Trp Vt I C y l Lau BOO Ala Ala TCC TTC ATC ATC ^ ^ f A l t CGC Thr A I D A l a Lv> Thr ACC TTG V P 1 Thr Gin I li CTC ATC CAC ATT CTC ACC CCC Pit CCC Tirr ACT TGC * ^ ^ ^ ^ ^ A T T ^\^c A T T T T 6 ^AC A l a ^^^^ A A T G C T C T G CJj Gl v Glu CjJJ AfQ f 50 C T T CCC L.au AAC VP3~ CCC CAC Pro CTC ACT TCG G^C AGG CTC CTT CCT CAC TTT CAC ATC TCT TTC CCA CCA AAA CJ^^ ATC T C A ^SAC ACC TTC CTC C C A GCT CTT Aip * ' 9 V t i L a u A l a C l n P n t A a o M a I & i r L a u A4 a A l a L y a G i n Ma I S a r A m T h r P h a L t u A l t G l y L a u A l a G i n 000 Lra TTT AA* AAC ATA Tnr Arg TTC ACA Lau A l t 1 * 0 CGT GcTo Aan Pha Aap Lau Lau Lya Lau Ala Cly Aap Val Civ tar Aan Pro GlylPre Ph* P*na Put 6ar Aap VaI 00 _________________ — — - — — __ — — — — — — J P12 P52n Arg t n I , , )S5 sar TCCfif^ Ago TTT A | a C T C CAC ACC 2A6 CTC C T C Ph. t . r ^^hG ^ . ^ ^ ACC TTC L*> Lau T TG GCiC pna C | H c i u GTC TTC C C ATT L ( u GIG V i l G l u Tnr ^ ^ * A | a AAC AAC CTC Ma *TC Ma M V fj^f Aan Cln Mai ATC *fi* MTf Glv Val Lya TCC TCC CAC Ala CTC Cln GGT Ma TCC Clu CTC Arg ACT ^iTrfi TTC CTC ^Art CTT &CC TCC ACT **P CTC TTC Mai C^^C ^ * B * Tnr Gl r TTT CAC Sar f j fr Lau Tn. L , . fififl A*p GTG CCG TTC ^fi4i CCC Glu GCC H . i TGG C l y Pro Ti^C ^^3 * l « L y i Pro CCC GTC TTC CTT Aap Pnt ATC T ' P Tyr ACT TTC Aan *t^*j Lya GCA TCC ACA CCC f^\A ^^^C CTT Arg CTC Lau CCA Lau Val 990 CTA H i Ly» L * u Lau 1034 CCC ^j^^j fv^^i f y * CAfi *a^W^ CACICTC Ptfi^ ^^~* ^^34 6593 Nucleic Acids Research 1 I4BS GC* CAC CCA ATT TCC 1 < ACC CCC ACA ATC CAC I O CTC TCC TAC TCC CCA CCT CAC CCT CAC CAC TTC CAC CCT TAC AAC AAC CAC CCC ACC AAC TAC CAC : c c c ACC ACC * A C TTC TAC i c e c c c T I C ACC " l i CCO ACC ACC r CCT ffiv; GAT ccc TAC AAA ATT AAC ACC AT< P i O O i l l i AAC CCC CAA < L * . G . f Gl» • ATC CAA CAA ACT i ( p 3 ) : i 4*15 CTT GAT CAC GAC CTC GAG CCC CtC AAA CAC AAC GIG AAC TCC AGA ACC GAC CCT CCC CCT CTA GCC AAC CTC CAA CCA CAC CCT ACC CTT CAC CAC ACC CCC CCC CAC C A C CAC CCC CAA CCT ACC ACC CCA TAC CTT GCC C A A GCT GCT GCT TTT CAA ICCA CCC ATC ACA CCC CAC TAC CAC ACA ACA CCC CAA ACT CTC C C A CCA AAA CCA CCT CAA AAC CCA T CTC CCC C I A AAA CTC CCC CAT CTC ACA CCC TAC ACA CCC ACC AAC CCT CC1 TAC TCC CCA TAC TCC CCA GCC CTT CTT T C A TCC CTT CCC AAA CAC CCA GCT CAC TCC ACC TCC ATC CTT ACT TTC CTT AIC CTC CCC ACT CAC ATT CCC TCi GAC CGC ACA GCC TCC AAC CCT G*T O i l I > 1 TGG CAS A&A T y • > TTT CAC CCC f ^ f t CCA CAC ccc ATC CAA CCA GAC ACT cco ccc ccc I • vAA CCT CAC A A A A T C Af^fi CCA C A C ATT f ^ C : CCi AAA ( GCC GCA CCC AAC CCA GT T GGA '. CAC * -. e n s i ACC ACA CAT GAC TCC Tt.T TTC GC* (AC CCC CAG ATC TAC CAC ACA CCT TCA GCC AAT AAC CCT AAC CTC AAC ACT OCA TCC CAT : ccc TTC CAC « ATG CCC TCT :TC TCC TIT CCA CGC iliAA rcccn 6594 < ASCCGSSCTC Nucleic Acids Research tance by a stretch ot 11 pyrimidines, which are interrupted by no more than one punne residue and which also show significant base complementarity to the 3'-end of the 18S libosomal RNA from eucaryotes (3). These pyrimidine runs are also present at the translational start sites of poliovirus and EMCV (21, 22). We therefore speculate that both features may be of importance tor the recognition by ribosomes or initiation factors of an uncapped mRNA with the long untranslated leader sequence. Sequence complementary to the 18S rRNA terminus has also been noted for less pyrimidine rich sequences in other eucaryotic mRNAs (23). The sequence of 1.3 kb preceding the polyprotein gene seems to be exceptionally long for a leader segment of a small, and otherwise compactly organized viral genome and suggests that additional short coding sequences may exist in this segment. At present the most likely candidate for such an unlinked FMDV gene is a sequence of 92 translatable codons that follows the first AUG after the poly(C) tract (pos. 209 in Fig. 3). The evidence for this hypothesis is twofold. Firstly, a polypeptide ot 10 K (tentatively named P10) has recently been identified among the products of in vitro translation of FMDV RNA by immunoprecipitation with an antiserum directed against the P10 amino acid sequence (Strebel et al, in prep.). Secondly, the translational start of the presumed P10 gene (pos. 209) is structurally very similar to other sites controlling translation initiation at internal positions of picornaviral RNAs in that it is also preceded at a short distance by the pyrimidine rich sequence noted above. The deduced protein map The long open reading frame encodes a polypeptide with a maximal size of 2332 ammo acids and a calculated molecular weight of 258.9 K, in excellent agreement with the 260 k determined experimentally tor the FMDV (1). polyprotein Its deduced amino acid sequence could be correlated in the P1 (P88) segment with all known sequence data from the structural proteins of FMDV O-)K (3, 26, 27, ammo acids underlined in Figure 3). Much less information Fig.3: The nucteotide sequence and encoded information of the L segment of FMDV RNA strain O^K (U residues are shown as T). The amino acid sequence corresponds to the total polyprotein of 2332 amino acids. A translated amino acid sequence is also shown for a hypothetical small polypeptide encoded upstream from the polyprotein. The numbers to the right represent amtno acid positions in the potyprotein. Predicted limits of the primary precursors are indicated to the left, and those of the stable viral potypeptides to the right. Dashed lines represent borders where the supporting data are weak. Underlined amino acids have been determined experimentally for this virus strain (26,27). The nucleotide positions to the left use the BamHI site at position 3000 as a reference (5). Nucleotide 92 (indicated by an arrowhead) is the first nucleotide downstream Irom the poly(C), and is put forward here as nucleotide 1 in an improved numbering system. AUG codons in the 5' region are underlined, a palindromic sequence at the 3' end is underlined by arrows. Nucleotide heterologies are indicated by dots. 6595 Nucleic Acids Research RNA o FMDV P16 ,,P20o VR. VP2 189 69 218 217 92 Poliovirus VP1 220 213 16154 .P12 VP4, VP2 VP3 VP1 69 271 238 302 -f- r^ H—h 153 \ M-& mi , P3-t, 329 182 i 1 - I proteins 3d 3c 2C i 1 P3-ib 661 ii ^2a L proteins p \ $ d P56 213 P2-X U9 97 \ FMDV f~ 2332 aa T P20b P2L 318 i 1 i Poliovirus 2207 aa VP3 / S I mmi m i P2 -capsid proteins - P3 [primer -fprotea lor replication? Fig.4: Comparison of the FMDV and poliovirus genomes. a) Schematic map of gene organization and protein processing Open arrows indicate cleavages probably executed by cellular proteases, while filled arrows represent processing by the viral protease. The dashed arrows illustrate morphogenetic cleavages occurring during virus particle maturation. The FMDV polypeptides are termed as in Figure 3; the number of amino acids is given for each protein below the lines. b) Sequence homology between corresponding parts of the two polyproteins. The degree of homology in different segments (see Methods) is: hatched 25-40'', crosshatched 40-65", filled in 565H. Related gene products are connected by vertical lines and identified using the new general nomenclature for picornaviruses (according to the third Meeting of the European Study Group of Picornaviruses, Urbino, Italy Sept. 5-10, 1983). was available for the non-structural proteins encoded in the P2 (P52) and the P3 (P100) precursors where only the VPg genes had been exactly mapped (24) and only approximate positions had been allocated for P34 and P56a (1) and for P12 and P20b (A.King, pers.comm.). A more complete protein map was established (14), using size estimations from SOS polyacrylamide gel electrophoresis and sequence order was known (21). proteins were homologies to poliovirus polypeptides for which the The exact coding limits ot the individual functional predicted from the cleavage specificity for Glu (GlnVGly (Ser,Thr) linkages of the FMDV protease (see below) and also using very recent data from Grubman and collaborators, who determined amino acid sequences at the N-termini of the polypeptides synthesized in vitro from FMDV A12 RNA and P56 (25). and corresponding to P52, P12, P34, P14, P20b (pers. comm.) The order and limits of FMDV proteins thus obtained (Figure 4) were recently confirmed by immunoprecipitation of polypeptides of the predicted size from FMDV infected BHK cells, using antisera against bacterially synthes- 6596 Nucleic Acids Research Table I: Predicted map positions and biochemical properties of FMDV polypeptides as deduced from the nucleotide sequence. Polypeptide Polyprotein Map co- ,. ordinate ' Map position No. of ami no acids Molecular weight Net charge 258946.8 24367.4 21243.1 +37 Function precursor ? ? 805-7800 805-1455 889-1455 2332 217 lb lc Id 1456-3615 1456-1662 1663-2316 2317-2976 2977-3615 720 69 218 220 213 79305.6 7362.3 24410.4 23746.4 +13 -3 +4 -2 23840.5 +14 P52 (PI 2) P34 P2 2b 2c 3616-5079 3664-4125 4126-5079 488 154 318 54501.1 16255.1 +8 +2 precursor 35892.9 +8 ? 100844.7 17355.2 2604.5 2622.4 +21 -5 +3 +4 2579.3 23012.4 52760.9 +3 +9 +7 P20a P16 L L1 P88 VP4 VP2 VP3 VP1 PI la 189 P100 P3 (P14) VPg-1 3a 3b-l 5080-7800 5080-5538 5539-5607 907 153 23 VPg-2 VPg-3 3b-2 3b-3 5608-5679 5680-5751 P2Ob P56 3c 3d 5752-6390 6391-7800 24 24 213 470 -5 -7 precursor capsid protein n n ii ii ii n ? precursor genome-linked protein protease RNA polymerase 1) Nomenclature suggested for the picornaviral polypeptides at the 3 r Meeting of the European Study Group of Picornaviruses, Urbino Italy, Sept. 5-10 1983. ized polypeptides that correspond to the respective segments of the polyprotein (Strebel et al., in prep.). The biochemical properties of the FMDV proteins predicted from the ammo acid sequences in Figure 3 are summarized in Table I and a schematic protein map is shown in Figure 4a. Our sequence-derived molecular weights often differ from earlier determinations, and correlate better with recent size estimations of FMDV proteins synthesized in infected cells (3, Strebel et al., in prep.). The polyprotein sequence starts with two closely related "leader" proteins, L and L' ("P20a" and "P16") which differ by 28 amino acids at their N-termini and have molecular weights of 24.4 k and 21.2 K, respectively. The primary product following the leader protein L/L' in the polyprotein is P1 (MW 79.3k), formerly called "P88", the precursor of the capsid proteins which are arranged in the order VP4-VP2-VP3-VP1 (1). P2 (formerly "P52"), the precursor from 6597 Nucleic Acids Research the middle part of the potyprotein has a calculated molecular weight of 54.5 K. It contains two stable proteins, 2b ("P12") and 2c ("P34"), with unknown functions. The N-termtnal limit of P2 was originally set next to the carboxy-termi- nus of VP1 (cf. Figure 3) which had been accurately identified for FMDV OiK by (26). However, as shown for FMDV A 1 2 , the proteins P2 and P12 start 16 ammo acids downstream from the C-terminus of VP1 (Grubman pers. comm., cf. Figure 3). The carboxy-termmal precursor P3 (formerly "P100", calculated molecular weight 100.8 K) comprises the sequence between ammo acids 1426 and 2332. It is processed into six proteins: 3a ("P14") of 17.4 K, three VPgs (in tandem) ot 2.3 K each, a candidate for a protease, 3c ("P20b"), of 23.0 K, and an RNA polymerase, 3d ("P56"), of 52.7 K. Pfotease cleavage sites The present map ot the FMDV genome (Figures 2 and 4) predicts that at least 12 sites in the protein sequence need to be cleaved to give rise to mature viral gene pioducts. Seven out of these sites show similar amino acid sequences VP2/VP3 Pro-Ser-Lys-Glu/Giy-Ile-Phe-Pro VP3/VP1 Ala-Arg-Ata-Glu/Thr-Thr-Ser-Ala P14/VPg-1 Pro-Gln-Ala-Glu/Gly-Pro-Tyr-Ala VPg-1/VPg-2 Pro-Gln-Gln-Glu/Gly-Pro-Tyr-Ala VPg-2/VPg-3 VaI-Val-Lys-Glu/Gly-Pro-Tyr-Glu VPg-3/P20b Ite-Val-Thr-Glu/Ser-GIy-Ala-Pro P20b/P56 Pro-His-His-GIu/GIy-Leu-Ile-Val suggesting that they are cleaved by a single (viral) protease, recognizing the consensus sequence Glu/Gly (Ser, Thr). This specificity is similar to the one displayed by the poliovirus protease which cleaves between Gin and Gly residues at eight out ot eleven processing sites in the polio potyprotein ( 2 1 , 28, Fig. 4a). The sequences around the remaining five cleavage sites in FMDV P2 0a/VP4 Ser-Gln/Asn-Gly-Ser-GIy/Asn-Thr-Gly-Ser VP4/VP2 Ala-Leu-Leu-Ala/Asp-Lys-Asn-Thr VP1/P5 2 Lys-GIn-Thr-Leu/Asn-Phe-Asp-Leu P12/P34 Ala-Glu-Lys-Gln/Leu-Lys-Ala-Arg P34/P100 Ile-Phe-Lys-Gln/lle-Ser-lle-Pro show little sequence homology to each other and are thought to be recognized by cellular proteases. However recent experiments from this laboratory (unpublished data) suggest that the L polypeptide may also be involved in the processing of at least the L/P1 junction. 6598 Nucleic Acids Research In FMDV there are 14 Glu-Gly, 3 Glu-Ser, and 9 Glu-Thr dipeptides in the polyprotein sequence which may be substrate for the FMDV protease. Of these only five, one, and one respectively are utilized, according to the protein map (Figures 3 and 4a). Therefore, not simply the primary sequence but also a certain conformation of the amino acid sequence must be recognized by the viral enzyme. Homoloqy to poliovirus Gene organization and processing mechanisms predicted for FMDV from the nucleotide sequence in Figure 3 are summarized in Figure 4 and compared to those from poliovirus, a member of the enterovirus family. As outlined in Fig- ure 4a, the overall organization is well conserved between the two genomes. This map is also similar to the approximate gene map established for EMCV (29), another well studied virus belonging to a third genus of picornaviridae. In an overall comparison FMDV differs from poliovirus by an increase in size of its genome by about 1000 nucleotides 5' pioximal part of the genome. most of which are added in the Within the polyprotein region the two genomes differ drastically only by the addition of the L gene and two extra VPg genes in FMDV, and by the addition of a third extended gene (2a or 2b) in the c e n tral part of the polio genome. Other corresponding genes often differ in size. Thus, the capsid proteins are larger in poliovirus, while FMDV has expanded its nonstructural proteins in the P3 segment (Figure 4). The alignment of the two genomes was refined using sequence homologies between functionally corresponding segments. Using a dot matrix program significant homology was detected on the ammo acid level throughout most part of the coding region, and to a lower extend also on the nucleotide level indicating common structural features of general functional importance (14). As shown in Figure 4 these sequence homologies are very high in certain parts of two non-structural proteins, the polymerase (3d), and protein 2c (x/P34). Less, but still high homology was detected between the protease genes and the capsid proteins VP2 and VP3, indicating that the functional specificity of these proteins was less stringently conserved during picornaviral evolution. The evolutionary divergance is most pronounced in proteins 1d, 2a/2b and 3a. Protein 1d (VP1) is the capsid protein most exposed at the surface of the virion and, as a consequence of the pressure of the host's immune system, its sequence is highly variable also between different aphtoviruses giving rise seven serotypes and many more subtypes. to In contrast proteins 2a/2b and 3a, although varible between FMDV and poliovirus, are highly conserved between two FMDV serotypes as exemplified by a comparison of FMDV O-|K and 6599 Nucleic Acids Research (unpublished results). Therefore, we conclude that these latter proteins play an important role in the FMDV lite cycle but did coevolve with their targets which are most likely FMDV-specified molecules. Following the same argu- ment, we predict that the more generally conserved picornaviral proteins (2c and 3d) depend in their functions on the interaction with host factors of conserved structure. In this context it is of note that sequences common to FMDV and poliovirus in proteins 2c and 3d are also present in Cow Pea Mosaic Virus, a plant virus which has thus been correlated with picornaviruses (30). The comparison of the two picornaviral genomes in Figure 4b also indicates that these differ in size predominantly in regions with low sequence homology. Sometimes extended blocks of nucleotides are also found inserted into well conserved genes like in gene 1b (VP2). Together with the addition of com- plete genes, like the L gene or the extra VPg genes in FMDV, these data indicate that evolution of picornaviral genomes involved the insertion or deletion of RNA segments of several hundred nucleotides. CONCLUSIONS We report the nucleotide sequence of the complete coding part of an aphthovirus genome. This sequence is useful in several ways for further studies of the viral life cycle. It has already allowed exact predictions of the genome organisation of these viruses and of the amino acid sequences of their gene pro- ducts. This information facilitates the identification and characterization of these proteins, e.g. by antisera elicited against synthetic peptides. In addition, cDNA segments from specific parts of the genome can be expressed in E. coli and used as specific antigens free of any other viral protein, or be used as substrate tor processing enzymes. Finally, FMDV genes and genomic signals can be transferred well defined as cDNA copies into animal cells and their functions analyzed. Such studies should also answer questions as to the function of the 1300 nucleotides long leader sequence preceding the polyprotein gene in the FMDV RNA. During preparation of this manuscript, the complete coding sequence for the FMDV polyprotein has been reported for strain A-io (31). While there are extensive serotype related sequence variations in the structural proteins (3, 12) the sequence differs in the non-structural genes from the OiK nucteotides (7.9*0 and 44 predicted amino acids (3.1"), sequence by 331 provided that we neglect three T (U) residues (positions 5978/79, 6013/14, 6016/17), appearing in duplicate in the A 1 0 sequence published. 6600 Nucleic Acids Research ACKNOWLEDGEMENTS We are indepted to Joern Wolters and Oliver Steinau lor the computer analyses, and to Gudrun Feil for technical assistance. We also thank Dr. W. Keller and Dr. K. Strohmaier for gifts of FMDV RNA. Dr. M. Grubman for We acknowledge Dr. A. King and the communication of unpublished data. This work was supported by research grants from Biogen S.A. and the Deutsche Forschungsgemeinschaft (Forschergruppe Genexpression). REFERENCES 1 Sangar, D.V. (1979) J. gen. Virot. 45, 1-13 2 Perez-Bercoff, R. (ed.) (1979), The Molecular Biology of the Picornaviruses. Plenum, New York 3 Beck, E., Foiss, S., Strebel, K., Cattaneo, R. and Fell, G., (1983) Nucleic Acids Res. 11, 7873-7885 4 Johnson, R.A. and Walseth, T.F. (1979) Adv. cyclic Nucteotide Res. 10, 135-137 5 Kupper, H., Keller, W., Kurz, C , Forss, S., Schaller, H., Franze, R., Marquardt, O., Zaslavsky, V . , and Hofschneider, P.-H. (1981) Nature 289, 555-559 6 Vieira, J. and Messing,J. (1982) Gene 19, 259-268 7 Strebel, K. (1982) Diplomarbeit, University of Heidelberg 8 Maxam, A.M. and Gilbert, W. (1980) Methods in Enzymol. 65, 499-560 9 Garoff, H. and Ansorge, W. (1981) Analyt. Biochem. 115, 454-457 10 Osterburg, G., Glatting, K . - H . , and Sommer, R. (1982) Nucleic Acids Res. 10, 207-216 11 Zucker, M. and Stiegler, P. (1981) Nucleic Acids Res. 9, 133-148 12 Beck, E., Feil, G., and Strohmaier, K., (1983a) EMBO Journal 2, 555-559 13 Hattman, S., Brooks, J.E., and Masurekar, M. (1978) J. Mol.Biol. 126, 367-380 14 Forss (1983), P.H. thesis. University of Heidelberg 15 Domingo, E., Dayila, M., and Ortin, J. (1980) Gene 11, 333-346 18 Holland, J . , Spindler, K., Horodyski, F., Grabau, E., Nichol, S., and VandePol, S. (1982) Science 215, 1577-1585 17 Harris, T.J.R., Robson, K.J.H., and Brown, F. (1980) J. gen. Virot. 50, 403-418 18 Sangar, D.V., Black, D.N., Rowlands, D.J., Harris, T.J.R., and Brown, F., (1980) J . Virol. 33, 59-68 19 Peacock, S.L., Mclver, C M . , and Monoham, J.J. (1981) Biochem. Biophys. Acta 655, 243-250 20 Kozak, M. (1983) Microbiological reviews 47, 1-45. 21 Kitamura, N., Semler, B.L., Rothberg, P.G., Larsen, G.R., Adler, C.G., Dorner, A . J . , Emini, E.A., Hanecak, R., Lee, J . J . , VanderWerf, S., Anderson, C.W., and Wimmer, E. (1981) Nature 291, 547-553 22 Palmenberg, A . C . , Kirby, E.M., Janda, M.R., Drake, N.L., Duke, G.M., Potratz, K.F., and Coltett, M.S. (1984) Nucleic Acids Res. 12, 2969-2996 23 Hagenbiichle, O., Sauter, M., Steitz, J.A., and Maus, R.G. (1978) Cetl 13, 551-563 24 Forss, S. and Schaller, H. (1982) Nucleic Acids Res. 10, 6441-6450 25 Robertson, B.H., Morgan, D.O., Moore, D.M., Grubman, M.J., Card, J . , Fischer, T., Weddell, G., Dowbenko, D., and Yansura, D. (1983) Virology 126, 614-623 26 Kurz, C , Forss, S., Kupper, H., Strohmaier, K., and Schaller, H. (1981) Nucleic Acids Res. 9, 1919-1931 27 Strohmaier, K., Wittmann-Liebold, B., and Geissler, A.W. (1978) Biochem. Biophys. Res. Comm. 85, 1840-1645 28 Hanecak, R., Semler, B.L., Anderson, C.W., and Wimmer, E. (1982) Proc. Natl. Acad. Sci. U.S.A. 79, 3973-3977 29 Palmenberg, A.C. (1982) J. Virol. 44, 900-906 30 Franssen, H., Lennissen, J . , Goldbach, R., Lomonossoff, G., and Zimmern, D., EMBO Journal, in press. 31 Carroll, A.R., Rowlands, D.J., and Clarke, B.E. (1984) Nucleic Acids Res. 12, 2461-2470 6601