* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Complete nucleotide sequences of two soybean
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Western blot wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Biosynthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Plant virus wikipedia , lookup
Protein structure prediction wikipedia , lookup
2067 Journal of General Virology (1992), 73, 2067-2077. Printed in Great Britain Complete nucleotide sequences of two soybean mosaic virus strains differentiated by response of soybean containing the Rsv resistance gene Ch. Jayaram, John H. Hill* and W. Allen Miller Department of Plant Pathology, Iowa State University, Ames, Iowa 50011, U.S.A. The complete nucleotide sequence of the genomic RNAs of strains G2 and G7 of soybean mosaic virus were determined. In both cases, the genome is 9588 nucleotides long, excluding the Y-terminal poly(A) sequence. A large open reading frame (nucleotides 132 to 9329) encodes a polyprotein of 3066 amino acids with a predicted Mr of either 349542 (strain G2) or 349741 (strain G7). Based on comparison with the proposed locations of cleavage sites of other potyvirus polyproteins, nine mature proteins are predicted. The mature proteins of the two strains share 94 to 100% amino acid identity, with the greatest variability occurring in the 35K and 42K proteins. Differences in local net charge in portions of these proteins as well as differences in amino acid sequence throughout the genome are discussed in relation to resistance and susceptibility of host plants to strains G2 and G7. Comparison with other potyviruses may be useful for taxonomic clarification of viruses and strains. Introduction in the function of virus-encoded protease, movement or replicase proteins. One of the best characterized examples of host resistance is that of the cowpea cultivar Arlington to cowpea mosaic virus. In vitro studies suggested that Arlington leaves contain a protease inhibitor that inhibits proteolytic processing of a virus-encoded polyprotein (Sanderson et al., 1985). A translation inhibitor, although not specific to the viral RNA, may also be involved in the resistance mechanism (Ponz et al., 1988). A different resistance mechanism involves blocking of cell-to-cell movement from the initial site of replication. A single Mr 30 000 (30K) protein encoded by the tobacco mosaic virus (TMV) genome has been identified which potentiates cell-to-cell movement of TMV in tobacco plants (Deom et al., 1987; Meshi et al., 1987). It has been speculated that the 30K protein is either not expressed properly or is unable to act on cells in plants which do not support systemic movement of TMV (Moser et al., 1988; Taliansky et al., 1982). At least three genes, Rsv, Rsv2 and RSV3, confer resistance to various strains of SMV (Buzzel & Tu, 1984; Kihl & Hartwig, 1979; Lim, 1985). However, the resistance conferred by each gene can be overcome by different strains. For example, strain G7 overcomes resistance conferred by the Rsv gene in the soybean line PI 96983. However, several other SMV strains, namely G1 to G6, do not overcome resistance conferred by this gene (Lim, 1985). To improve understanding of the resistance mecha- Soybean mosaic virus (SMV), a member of the potyvirus group of plant viruses (Hollings & Brunt, 1981), is the cause of one of the most widespread viral diseases of soybean. Several strains of the virus have been identified on the basis of both phenotypic response of differential soybean lines (Buzzel & Tu, 1984; Chen et al., 1988; Cho & Goodman, 1979) and transmission by aphid species (Lucas & Hill, 1980). Like other potyviruses, SMV genomic RNA encodes a large precursor polyprotein that is processed by a virusencoded protease(s) (Ghabrial et al., 1990) to yield several proteins (Vance & Beachy, 1984a, b). Unlike those potyviruses whose genomes have been extensively characterized, [i.e., tobacco etch virus, (TEV; Allison et al., 1986), potato virus Y (PVY; Robaglia et al., 1989), tobacco vein mottling virus (TVMV; Domier et al., 1986) and plum pox virus (PPV; Maiss et al., 1989)], SMV is seed-borne (Hill et al., 1980) and its genome structure had not been fully characterized. Host resistance can occur by interruption of the virus life cycle at one or more of several stages. Siegel (1979) identified six such steps upon which resistance could act. These are (i) entry into the cell, (ii) uncoating of the nucleic acid, (iii) translation of viral proteins, (iv) replication of the viral nucleic acid, (v) assembly of progeny virions and (vi) spread of the virus, both to new cells and new hosts. At the molecular level, evidence has supported resistance mechanisms involving alterations 0001-0752 © 1992 SGM 2068 Ch. Jayaram, J. H. Hill and W. A. Miller nism conferred by the Rsv gene, the genomes of strains G2 (unable to induce disease in plants containing the Rsv gene) and G7 (able to induce disease in plants containing the Rsv gene) have been sequenced, and their amino acid sequences derived. The potential pathogenic relevance of differences found between genomic sequences is described here. The sequence data are consistent with a genome organization similar to that of other potyviruses (see review by Riechmann et al., 1992) and have relevance to potyvirus taxonomy. Methods Virus purification, RNA isolation, cDNA synthesis and cloning. The origins of strains G2 and G7 of SMV and their purification have been described (Hill & Benner, 1980a; Hill et aL, 1989). Viral RNA was isolated from purified virions according to the method of Vance & Beachy (1984a). cDNA was synthesized by the method of Gubler & Hoffman (1983), using a modified kit (Pharmacia), and cloned into a pGEM3Zf(+) vector (Promega). Initially, a random primed cDNA library was constructedusing the RNA of strain G7. Approximately50 clones were sequenced and mapped to different regions of the genome by comparisonwith published potyviralsequences. From the data, four oligonucleotide primers were synthesized for use in cloning the different regions of the genomes of strains G2 and G7. cDNA sequencing. The cDNA clones were sequenced both manually and with an Applied Biosystems 370A automated DNA sequencing system using the dideoxynucleotidemethod (Sanger et al., 1977) and Taq polymerase (Promega). Overlapping cDNA clones of different sizes were used to eliminate almost completelythe need for subcloning. Every base was determined by sequencing at least two independent clones or by sequencing twice from a single clone. RNA sequencing and 5" end determination. RNA was sequenced directly using a modified procedure of Mierendorf & Pfeffer (1987). In a volumeof 10 ~tl, a mixture of 1 ~tgof viral RNA and 10 pmol of a 25met primer, which anneals between bases 66 and 90 at the 5' end, was heated for 3 rain at 75 °C and allowed to cool to 42 °C. Two ~tl (16 units) of avian myeloblastosis virus reverse transcriptase (Promega) and 4 ~tl of [ct-3zp]dATP(400 Ci/mmol) were added to the mixture. A 3 ~tl aliquot was removed and added to 3 Ixl of a solution containing 250 ~tM each of dCTP, dGTP and dTTP and one of the four dideoxyNTPs (172 IxM-ddCTP, 15.3 ~tM-ddATP, 1 mM-ddTTPor 250 laM-ddGTP). The reactions were incubated at 42 °C for 15 min, after which 1 ~tl of chase solution containing 2 mM each of all four dNTPs and 2.5 units of terminal deoxynucleotidyl transferase (BRL) were added, and the mixture was incubated at 42 °C for an additional 15 min. Nucleotide sequence alignment and data analysis were performed, compiled and analysed using sequence analysis software from the Genetics Computer Group (version 6.0; Madison, Wis., U.S.A.) and an IBM-compatibleprogramby W. R. Bottomley(CSIRO, Divisionof Plant Industry, Canberra, Australia). Results Phenotype o f soybean plants inoculated with virus strains Soybean lines PI 96983 and Williams '82 responded differently to mechanical inoculation with strains SMV G2 and G7 (Fig. 1). PI 96983, containing the resistance gene Rsv, was resistant to strain G2, but systemic necrosis developed in plants inoculated with strain G7. Systemic mottling occurred when Williams '82, which lacks the Rsv gene, was inoculated with either strain. Nucleotide sequence analysis o f S M V strains G2 and G7 The SMV G2 and G7 c D N A inserts from overlapping sets of 42 and 51 c D N A clones, respectively, were chosen for nucleotide sequence analysis. These c D N A inserts cover the entire genomes of G2 and G7 except the 5'most 27 and 25 (strains G2 and G7, respectively) nucleotides, which were determined by direct R N A sequencing. Genome organization o f S M V R N A The genomic R N A of both strains of SMV is 9588 nucleotides (nt) long, excluding the Y-terminal poly(A) sequence (Fig. 2). This is comparable to the genomes of T V M V (9471 nt; Domier et aL, 1986), T E V (9495 nt; Allison et al., 1986), PVY (9704 nt; Robaglia et al., 1989) and PPV (9741 nt; Maiss et al., 1989). The base composition of both strains was 32% adenine, 24% guanine, 18% cytosine and 26% uracil, in agreement with previous observations for G2 (Hill & Benner, 1980b). The base composition is nearly identical to that of TVMV (Domier et al., 1986). Computer translation of the R N A s and their complements revealed a single, large open reading frame (ORF) beginning at the first A U G on the genome (base 132) and terminating with a U A A codon at position 9330. This differs from PPV and T V M V which appear to initiate translation of the polyprotein at the second and third A U G codons, respectively, in the genome. The large O R F of SMV encodes a 3066 amino acid polyprotein with Mrs of 349542 (G2) or 349741 (G7) (Fig. 3). Polyprotein cleavage sites Based upon the proposed locations of cleavage sites, and sizes of predicted mature (fully processed) proteins of TEV, TVMV, PVY and PPV (Carrington et aL, 1989; Domier et al., 1986; Dougherty & Parks, 1991; Dougherty et al., 1988 ; Ghabrial et al., 1990; Maiss et al., 1989; Parks & Dougherty, 1991; Robaglia et al., 1989), and based upon alignments of amino acid sequences for each protein, nine mature proteins are predicted for SMV (Fig. 4). At least five sites are cleaved by the nuclear inclusion (NI) protein a (NIa) (27K) protease (Parks & Dougherty, 1991). The consensus cleavage site for this protease from SMV G6 is (E/N)XVXXQ'(G/S) (Ghabrial et al., 1990). [Amino acids in parentheses represent alternatives at that position relative to the cleavage site Nucleotide sequences of two S M V strains 2069 Fig. 1. Phenotypicresponsesof Williams '82 and P196983soybeansto inoculationwith strains G2 and G7 of SMV. Williams '82 plants inoculated with (a) G2 and (b) G7 were susceptibleand showed systemicmosaic whereas PI 96983 was immune to (c) G2 but systemic necrosis occurred in plants inoculated with (d) G7. which is shown by t h e ' symbol. X represents any amino acid.] All sites in Fig. 4 that contain a Q are those predicted to be cleaved by the 27K protease. Cleavage between amino acids 2041 and 2042 is a late event separating the VPg (viral protein-genome linked) (21K) from the protease (27K) (Dougherty & Parks, 1991). Although this cleavage has been shown only for TEV, these authors showed a consensus sequence of (E/Q) (D/E/R)(L/V)XXE'(G/S/A)(E/K)(S/A)(L/V) at this site among known potyviruses. Carrington et aL (1989) identified the cleavage site G ' G at the C terminus of the helper component which catalyses its own cleavage from the polyprotein at this site. Thus, this region is designated H C - P R O (helper component-protease). The N-terminal protein (35K in SMV) also serves as a protease to cleave itself from the polyprotein (Verchot et al., 1991). A consensus of (Y/F)'S has been reported by Mavankal & Rhoads (1991). Fulllength sequences that were published before this information was known used Q'S(G/S/A) (the NIa protease consensus) as the cleavage site for all the mature proteins. This led to some predictions of different termini, which we have revised in the alignments of the 35K and 42K ORFs (Fig. 5). 2070 Ch. Jayaram, J. 1t. Hill and W. A. Miller AAATTAAAACTCGTTATAAAGACAACAAACAATTTAATCGCAAACAGAAATTTTCGTAATTACATTTCTACAAGCAACCATTACTCTAGTTATTTGCAGTTTCACATTTCii0 CC G C A G C C T CTCACAGCAATAGCAAGTCAAATGGCAACAATCATGATTGGAAGCATGGCGATTTCTGTGCCAAACACTCACGTCTCGCGCGCATCGAATTCTGTGATGCCGGTTCAAGC C C A T T A CT A G 220 AGTTCAGATGGCAAAACAAGTGCCTTCTGCTCGTGGGGTGTTATACACACTTAAGAGAGAGGGCAGCACGCAAGTCATAAAGCATGAGGAGGCACTGCG~AAATTTCAAG G A A A A A GCAT T 330 AAGCATTCGACCAAGATGTTGGCATTCAGCGAAGGCTTCTAGTAAACAAGCATAGTTCCATACAATCCACAAAGGAAGGATGGTTTGACCTTGCGTCGCTTAACTTTAGA A G G T A A 440 GC•GGCTCGAGCAAAGAAGCGGCAATTGCAAGGCGAAAGCAAG•AGAGGAAGACTTTCTCAATGGGAAGTATGAACAGCAATTTTACGCTGGTGTTTCCGCT ACAAAGTC 550 T G T A CATGAAGTTTGAAGGAGGGAGTGTTGGGTTCAGAACAAAGTACTGGAGACCAACTCCAAAGAAGACTA•AGAAAGGCGTGCAAC•TCACAGTGTAGGAAACCAACATAT A GG TTTTGGAGGAGGTTCTTTCCATAGCTTCAAAGAGTGGTAAGCTGGTTGAATTTATCACAGGCAAAGGGAAGAGAGTCAAAGTCTGTTATGTCCGTAAGCATGGCGCAATA T A A GA G 660 770 T TTGCCCAAGTTCTCCCTCCCGCATGAGGAAGGCAAATATATCCATCAGGAGCTTCAGTATGCAAGCACATATGAATTTCTTCCCTATATTTGCATGTTTGCAAAATATAA A T A T A G T T 880 GAGCATAAATGCGGATGATATAACTTATGGAGATAGTGGTTTACTGTTTGATGAGCGATCATCTTTAACCACAAATCACACTAAGTTACCGTACTTTGTTG G AT A 990 GGAGGAATGGGAAGCTCGTTAACGCTCTTGAAGTGGTTGAAAACATGGAGGATATTCAGCACTACTCCCAAAATCCTGAAGCTCAGTTTTTCCGTGGTT A T C G TTTGATAAAATGCCTCCTCATGTGGRGAATCATGAATGCACCATTGATTTCACAAAT TTCGGGGAA A GGAAA G GTG ii00 GAACAATGTGGTGAATTGGCAGCAGCAATAAGCCAATCAATCTTT C C A G T T A A G 1210 GAAACTATCATGTAAGCAATGTCGGCAGCACATTAAGCAC CTCAGTTGGGAGGAGTATAAACAATTCCTCTTGGCTCATATGGGCTGTCATGGGGCTGAAT G AT AC TCCAAGAAAT T G A C G G C A T G A G G T A T G T G A A G A G A G T G A T T G A G A C A T C A A C T G C G G A A A A T G C A A G T C T G C A A A C A T C A C T G A G G GGGAAACTT A 1320 GGAGATTGTGCGTTTAACGCA.GAACTAT C C A 1430 AAGAGCACTCACATGCTTCAAATACAGGATATTAATAAGGCTCTGATGAAGGGTCCATCGGTAACACAGAGCGAGCTGGAGCAAGCGTCCAAGCAGCTGCTCGCAATGAC AT C G T T G 1540 ACAGTGGTGGAAGAATCACATGGCTTTGACTGATGAAGATGCACTTAAAGTGTTCAGGAATAAGAGATCTTCCAAAGCACTACTTAAC C 1650 CCAAGTTTACTTTGT GAT AACC AGTTGGACAAGAATGGTAACTTTGTTTGGGGAGAGCGTGGCAGGCATTCAAAGCGATTCTTTGCGAATTATTTTGAAGAGGTGGTTCCTTCTGAAGGGTACAGCAAGTAT T A 1760 GTGATCAGAACGAATCCAAATGGGCAAAGGGAGTTGGCAATTGGGTCACTC G A C A 1870 ATTGTGCCGTTGGATTTTGAGCGCGCTCGAATGGCATTACAGGGCAAGAGC A A C AGAGCCAATTACAATGTCATGTATCTCAAGACAAGACGGAAACTTTGTGTATCCTTGTTGT G C GTAACAAG TGTGTCACACATGATGATGGCAAAGCTTTCTATTCTGAGCTCAAGAGTC T T 1980 CTACAAAGC G C C A C T T G G T T A T T G G A A C A T C T G G T G A C C C G A A A T A C A T T G A T C T A C CAGCCACTGATGCAGACAGGATGTACATAGCTAAAGAAGGATTTTGTTAC CTT T A T 2090 AAT • T C T T C T T G G C A A T G T T G G T T A A T G T A A A T G A A G A T G A G G C C A A A G A C T T C A C G A A G A T G G T A A G G G A T G T C A T T G T A C C A A G G T T A G G A A A G T G G C C G A C A A T G T T C T T G T 2200 AGATGTAG C A A C A G C T G C A T A C A T G C T C A C A G T T T T T C A C C C T G A A A C C A G G A A T G C T G A G C T C C C A C G T A T T T T G G T T G A C C A T G C A C C C 2310 CGTGTCAAAC CATG C A C G T G A T T G ACTCTTTTGGATCCTTGACAGTTGGGTACCATGTTCTTAAAGCTGGTACAGTGAATCAATTAATTCAATTTGCTTCTAATGACCTTCAGAGTGAGATGAAATTCTACAGA T C G C T 2420 GTTGGTGGTGAAGTGCAACAGAGAATGAAGTGTGAAACAGCACTTATAACAAGCATTTTCAAACCTAAGAGAATGATTCAAATCCTT G G C T G G 2530 GAAAATGACCCATACATTCTCTT GATGGGCTT GGTTTCAC CTTCTATCTTGATTCACATGTATCGTATGAAGCATTTTGAGAAAGGGGTGGAGTTGTGGATAAGTAAAGAACATAGT G G GTGGCAAAGATTTTCA TCATATTGGGACAACTCACTAAGAGGGTCGCTGCAAATGATGTGCTACTTGAGCAACTCGAAATGAT•TCAGAAACTTCTGAGAGGTTCATGAGTAT A T C A T T G CT TAGAGGATTG C CCTCAAGCACCACATTCATACAAGACAGCAAAAGATTTGTTGACAATGTACATAGAAGGAAAAGCATCCAACAACCAATTGGTGGAGAATGGTTTTGTAGATATGAATGA T C T A CAAATTGTACATGGCATATGAAAAAATCTACTCAGATCGCTTGAAGCAGGAATGGCGCGCATTAAGCTGGTTGGAAAAATTTTCTATAACATGGCAATTGAAAAGATTT 2640 2750 2860 G A 2970 CTCCACATACGGAGAAATGTTTGACAAAGAAAGTTGTAGAAGAAAGCAGCGCATCTTCAGGAAACTTTGCGAGTGTGTGCTTCATGAATGcCCAGTCACACCTAAGAAAT A C A C 3080 GTAAGAAATACACTTTTCCAAAAATGTGACCAGGTTTGGACTGCATCGGTGCGAGCCTTTGTGAGGCTCATAATTTCAACACTTCACAGGTGCTACAGTGATATAGTTTA A T C C C 3190 TCTGGTAAACATCTGTATAATCTTTTCCTTGCTTGTCCAAATGACTAGTGTACTGCAGGGCATTGTCAACACAGCAAGGAGAGACAAAGCACTCTTAAGTGGATGGAAAA CT A G C A A T A T A T G G GCA A TCAT G 3300 GGAAAGAAGATGAAGAGGCCGTGATTCATTTGTAT GAAATGTGTGAAAAGATGGAAGGTGGACATCCAAGTATTGAGAAATTTTTGGACCATGTCAAGG A G T G G C G AGG A G GAGTTAGAC CT A 3410 G A T C T A C T C C C T G T G G C A G T A A G C A T G A C A G G G C A A T C A G A A G A T G T C T CCGCACAGGCCAAAACAGCAACTCAATTGCAACTTGAGAAAATTGT G G C A T T T A T G G C T T T G TT A AGG A A A T T G C 3520 GTTGACCATGTGTATTGATAATGAAAGGAGTGATGCGGTTTTCAAAGTATTGAGCAAGTTAAAGGCATTTTTCAGCACAATGGGTGAGGATGTTAAAGTGCAGAGTCTTG A C A C G G G 3630 Nucleotide sequences of two S M V strains ATGAA/~TT CA~/~GCATT GAT G ~ G A T A A G ~ G C G G A C AATAGACAGT TGGAACAGAATAGAGTAATTC AC G TCACAATTGAT TTCCACCTTGAAAC~TAAGGAGT CTTC C A G T G T C T C T T T T G A T G T C A A G T T T G A G G C C T G G T G G G A A T T A AA CACACTACAGGTCGACAGGTGAGTTT T CT G G A G T T C A C A C G A G A A A C A G C A G C C A A A A T T G C A A A T T T GGT A G C A A C A T C A A G G 3740 3850 AAGCCACACAGAATTTTTGATTAGAGGTGCAGTTGC~TTCAGGGAAATCAACAGGTTTACCACACCACCTTTCAAGGAAGGGCAAAGTTCTGCTACTGGAACCAACTAGAC C C G T A T T T T G 3960 CGTTAGCGGAGAATGTCAGTAAGCAGTTGAGCTTTGAACCTTTCTATCACAATGTAACATTGAGGATGAGAGGATTGAGCAAGTTTGGCTCAAGCAACATAGTTGTTATG C T T TC T 4070 ACAAGTGGATTTGCGTTCCATTACTACTTTAACAATCCACAACAGCTATCTGATTTCGATTTTATCATAATAGATGAATGCCATGTTCAAGATAGCCCAACGATTGCATT G T TT C 4180 C A A C T GT GC GC T T A A A G A A T T T G A A T T C A G T G G C A A G C T T A T A A A A G T G T C C A C G G T C 4290 TGCAACGACTCCAGGGAGAGAGTGC C G G A G A A T T C A C A A C GCAAC AC CC G G T G A A G CT G A A A G A A AT A TTGAAGACCATTTGTCTTTTCAGAACTTTGTGCAAGCTC•AGGTACAGGATCAAATGCTGATATGATCCAACATGGGAACAACTTACTTGTATATGTT•CAAGCTACAAT 4400 C C C A T G T T T G A A G T T G A C CAAT TGT CAC G A T T A T T A A C T G A G A A A C A T A TATAAGGTGACAAAGGTTGATGGGAGAACAATGCAAATGG C C G GAAAT GTAGAGATT G CAACCACAGGCAC CGA A A 4510 GGT~AAACCACACTTCATAGTCGCAACAAACATCATTGA~AATGGAGTGACTCTTGATATTGATTGCGTAATTGATTTTGGACTTAAAGTGGT~C~CTACCCTTGACACAG 4620 GA G T G C T G C T ATAACCGGT~T~TGCGTTACAACAAACAGTCAGTTTCCTATGGAGAGCGAATTCAAAGACTTGGCAGA~TTGGTCGTTGTAAACCTGGATTTGCGCTCA~ATTGGACAC C C A T C C G G A T G 4730 ACAGGAAAAGGAGTTGAGGAAGTTCCCGAGTTCATAGCTACAGAGGCAGCTTTTCTATCCTTTGCTTATGGGTTGCCAGTTACAACACAAAGTGTCTCGACCAATATACT G A A C CC A A C T 4840 GTCCCGTTGCACAGT•AAACAAGCTCGAGTAGCTCTAAATTTTGAGCTAACTCCATTTTTCACCACTAATTTCATAAAGTATGATGGTAGCATGCAC•TGATTGACAcAA G A T G C CCAGAGAT CAC 4950 GACTGCTCAAGTCCTATAAACTCAGGGAGTCTGAGATGTTGCTGACCAAGTTAGCCATACCATATCAGTTTGTTGGGCAGTGGGTAACAGTcAAG•AGTATGAACGTCAA T C C T A A A A T 5060 G GGTATCCACCTCAATTGTCCAGAGAAAGTGAAAATACCTTTCTATGTGCATGGAATACCA~ACAAGTT~TATGAGATGTTGTGG~ACACAGTTTGTAAATACAA~AATGA G T g T 5170 TGCTGGGTTCGGCTCAGTCAAGAGTGTGAATGCAACGAA•ATTAGTTACACTCTAAGCACTGACCCAACAGCAATTCCTCGCACACTTGCAATACTGGATCATTTGTTGA T C 5280 A GTGAGGAGATGAC CAAGAAGAGTCATTTTGACACAATTGGCTCTGCTGTCACTG~GTATTCCTTTTCTCTTGCAGGCATAGCTGATG~qATTTAGGAAGAGGTATTTAAAG A T A C C C G G 5390 GACTACACACAGCATAATATAGCCGTTTTACAACAGGCTAAAGCACAGTTGCTG~AATTTGATTGCAACAAAGTTGACATCAACAACCTGCACAATGTTGAG~GTATAGG C T A CC AC A 5500 CATTTTAAATGCAGTCCAACTACAGAGCAAGCATGAAGT•AGTAAATTTTTGCAGCTCAAAGGAAAGTGGGATGGGAAGAAATTCAT•AATGATGCTGTCGTGGCTATCT G T A G A A A T 5610 TCACTTTAGTG•••GGTGGTT•GATGTTATGGGATTACTTCACAAGAGTTATAC•TGAACCAGTATCAACTCAAGGAAAGAAGAGGCAGATACAAAAACTCAAATTTAGA G C G GATGCCTTTCACAGAAAAATAGGC T G CGTGAGGTGTATGCAGATGACTACACCATGGAACACAGGTTTGGGGAGGCATATACCAAGAAA•GAAAGCA•AAGGGTAGCACCC• C g CC C C A 5720 G 5830 T TACAAAAGGAATGGGTCGCAAGTCGAGGAACTTCATACATCTATATGGAGTTGAGCCAGAGAATTATAGCATGATTAGATTTGTAGACCCGCTAACTGGACATACAATGG C G A T CT G C C T T C 5940 ATGAACACCCCAGAGTTGATATTAGAATGGTTCAACAAGAGTTTGAGGAGATAAGGAAAGACATGATTGGGC,AGGGTGAATTGGATCGGCAAAGAGTCTACCACAATCCT 6050 G C G G C A C G G T T T A C A A G C T T A T T T CAT T G G G A A G A A T A C A G A G G A A G C A C T C A A G G T T G A C C T C A C A C C G C A C A G A C C C A C A C T T C TCT G C C A A A A C A G C A A T G C T A T AG C G G G T T T A 6160 TCCTGAGAGGGAGGATGAATTGCGTCAGACAGGATTGCCACAAGTAGTTTCCAAGTCAGACGTCCCACGTGCCAAAGAAAGGGTTGAAATGGAAAGCAAATCTGTTTACA 6270 C G A T A G C AAGGACTCAGAGATTATAGTG G T G CATTTCCACACTAATAT GTCAAC TTACAAATTCATCAGATGGGCACAAAGAAACAATGTTT T C G G T ACAAAT GGACACTT GTTTAGAAGGAACAACGGAATGCTTACAGTTAAGACAT T C T C G GGGTT GGCTAT GGTTCTTTCATTATC C T 6380 GGCATGGTGAGTTTGT GATACACAACACAACACAGCT CAAGATACATTTTATTCAAGG 6490 G A G G G A T G T G A T T T T G A T T C G C A T G C C A A A G G A C T T ~ C C T C CTT,fT G G A A A G C G C A A C C T C T T T A G A C A A C C A A A G C G T G A G G A A C A C T A C A T TCCAAGAGAAGAGCTT G CGCGCAACAGTTTC GGAATCTTCCATGA~ ATTGCCAGAGGGGAAAGGTTCTTTCTGGATACACTGGAT T A T G G C T TTGCCT CTTGTTTCTGTTAATGATGGGCACATTGTTGGAATACAT C GGGTTTGTAT GGTTGGGACAAACT T 6600 CACAACC CAAGAT GGTTTTTGT GGG g C 6710 GGATTAACATCTAATGATTCAGAGAAGAACTTCTTCGTCCCACTCAC C A ATATCT GGAGAATGCT GATAACTTGTCATGGGATAAGCATTGGTTTTGGGAAC A C CAAGCAAGATAGCATGGGGCTCTTTGAATTTAGTCGAGGAACAAC C T TCAAAATATCAAAGCTT•TGTCGGATCTCTTT•GAAACACAGTGACAGTTCAAGGGA•AAA•GAAAGATGGGTTTTGGAT A T A A G CAAAAGAGGAAT G GCAATGGAAGGTAACTTAGCGGCTTGT C T CAAGAC GACAGTGCACTGGTAACAAAGCATGTTGTTAAAGGAAAGTGCCCCTATTTCGCACAATATCTTTCAGTGAATCAAGAAGCAAAGTC C T T G GGGTGC GTATCAAC CAAGCC GATTGAACAAAGATGCATTCAAAC A T GATGGGT TCGAGAAGGA T A GAGGTTTCTTCAAATATAACAAACCAGTTGTTCTGAATGAAGTT AC G T GGG 6820 6930 7040 CTTCTTCGAACCACTTAT T 7150 GATTTCCAAT CTTTTGAGAGGG T G C A 7260 2071 Ch. Jayaram, J. H. Hill and W. A. Miller 2072 CAGTGGCTGGAGTGAAATTGATGATGATGGAATTTGATTTCAAGGAGTGTGTGTATGTGACTGATCCTGATGAGATATATGACTCCTTGAATATGAAAGCTGCAGTTGGT 7370 GCACAATACAAAGGGAAGAAGCAAGATTATTT CT CT GGAAT G G A C A G T T T T G A C A A G GAAC G C T T G C T T T A T C T C A G T T G C G~d%AGGT T A T T T T A T G G G G A A A A A G G A G T C C C C 7480 GTGGAATGGATCCCTGAAAGCAGAGCTAAGGC T G CAAT T G A A A A A G T G C A A G C A A A C A A A 7590 GTGTTGATGATTTCAACAACCAATTTTACAGC T C TAGAACATTCACAGCAGCAC CAATT GACACATTACTTGGAGCAAAAGTTT C G T C G CT CAAT C T T A C A T G T C C A T G G A C A G T T G G G A T G A C C A A A T T T T A T A G A G G T T G G G A T A A G T T G A T G A G A A G T T T A C C C G GATGGATGGGTGTACTGTCATGCAGATGGTTCACAGTTTGATAGCTCCC G T C A T CC T T GACGCCCTTACTACTGAATGCAGTTCTTGATGTTAGGAGCTTTTTCATGGAAGACTGGTG A A T G G 7700 7810 GGTTGGAAGAGAAATGCTAGAGAACCTCTATGCTGAAATAGTCTACACACCAATTCTAGCACCTGATGGCACAATTTTTAAGAAGTTCAGAGGAAACAACAGTGGGCAAC G A G T T C 7920 CAT CCACAGTTGTGGACAATACCTTGATGGTAG•CATTGCCCTGTACTATTCTGGTTGTAAACAAGGGTGGTCAGAGGAGGACATTCAGGAAAGATTAGTGTTTTTCGCC T C G T G 8030 A A T G G C G A T G A C A T CAT T CTT G C A G T T AGT G A T A A G G A C KCAT G G C T T T AT G A C A C T CTT A G C A C T T C A T T T G C T G A A C T T G G T C T C A A T T A C A A C T T T G A G G A A C G G A C T C G G C AG G C C C 8140 AAAGAAAAGGGAGGAATTGTGGTTCATGTCCCACAAAGCCATGTTAGTTGATGGAATTTATATTCCAAAACTTGAGCCTGAGAGAATTGTCTCTATCCTAGAGTGGGACA A C C A 8250 GGAGCAAAGAGCTTATGCATCGCACTGAGGCGATATGCGCATCAATGATTGAGGCAT•GGGATACACTGAATTGCTGCAGGAGATCCGCAAATTTTATTTGTGGCTTTTG G 8360 C A A A C A A G G A T G A A T T T A A G G A G C T C G C T T C GTC T G G A A A A G CAC C A T A T A T T G C A G A G A C A G C TT T G A G A A A G C T A T A C A C A G A T G T C A A T GC G C A A A C A A G T G A G C T A C A A T G T G 8470 AAGATATCTTGAAGTGCTGGATTTCACTCATGCTGATGACTGTTGTGAATCAGTGTCCTTACAATCAGGCAAGGAGAAGGAAGGAGATATGGATGCAGGTAAGGATCCAA A 8580 AGAAGAGCACCAGTAGTAGTAAGGGAGCTGGCACAA•CAGCAAAGATGTAAATGTTGGATCAAAGGGAAAGGTGGTTCCGCGTTTGCAGAAGATCACAAGGAAGATGAAT C C A T T A 8690 CTTCCAATGGTTGAAGG~AAGATCATCCTCAGTTT~GACCACTT~CTTGAGTACAAACCTAATCAGGTTGATTTATTCAACACTCGAGCAAC~n~GAACACAGTTCGAAGC T T C C T 8800 GTGGTACAATGCAGTT •AAGATGAATATGAGCTTGACGATGA•CAGATGGGTGTGGTTATGAATGGCTTCATGGTATGGTGCATTGACAATGGTACATCTCCAGATGCTA T A 8910 ATGGCGTGTG•GTGATGATGGATGGAGAGGAACAGATTGAATATCCGCTGAAACCCATTGTC•AAAATGCAAAACCAACTTTGAGACAAATCATGCACCATTTCTCAGAT. 9020 GCAGCAG~GCTTACATTGAGATGAGAAATTCTGAAAGTCCGTATATGCCTAGATATGGACTACTGAGG~TTTGAGAGATAGAGAGCTAGCTCGCTATGCTTTTGATTT T A G A A T C CTATGAGGTTACTTCTAAAACACCAAACAGGGC~GGG~GC~TAGCGCAGATG~GGCTGCAGCTCTCTCGGGAGTT~C~C~GTTGTTT~GACTTGATGGG~CA C AA 9130 C 9240 AT TCTC~CC~CT~CGAAAATACTGAAAGGCACACTGC~GGGATGTG~TCAAAACATGCACACTCTTTTG~CATGGGCCCACCGCAGT~TAAAGGCT~GTAAATTG9350 C A 9460 GTCACAGTTATCATTTCGGGTCGCTTTATAGTTTACTAT~TATAGTAGTTGCACTGTCTTTAAATATAGT~GATTGCATCACCAAAT~TGTTTGTGTTTAGTGTGGT C G T G G AC T TTT~CCACCCCAGTGTGCTTTATGTTATAGTTTATG~TGCCAGCGAG~CCATTGTGTTGCCGGAGCCCTTTG~GAGTGATTTCATCACCTCTAGTGGCCGAGGTGC T A A T C CT G T GGC~TGTTTGTTGTCCT 9570 A 9588 Fig. 2. Nucleotide sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Bases of the G7 sequence that differ from G2 are shown below the G2 sequence. Table 1. Percentage amino acid sequence identity of predicted mature proteins of SMV strain G7 with those of other potyviruses including SMV strain G2 SMV G7* Virus G2 PPV TVMV PVY TEV 35K HC-PRO 42K CIP 6K 21K 27K POL 94 14 6 13 9 98 44 45 37 43 94 29 13 17 25 96 72 51 52 51 100 32 32 32 45 99 53 45 45 46 * Abbreviations are as designated in the text. 99 48 47 35 34 98 60 54 56 55 CP 99 51 52 64 60 Comparison of mature proteins with those of other potyviruses T h e nucleotide and a m i n o acid s e q u e n c e s of S M V G 2 and G 7 were 94% and 97% identical, respectively, w i t h c h a n g e s m o r e pronounced in the 5' region (Tables 1 and 2). Based on the proposed g e n o m i c m a p o f S M V , an a m i n o acid sequence c o m p a r i s o n o f strain G 7 w i t h T E V , T V M V , P V Y , P P V and strain G 2 (Table 1) s h o w s that the m o s t c o n s e r v e d regions a m o n g the five potyviruses are the cylindrical inclusion protein (CIP), the putative R N A - d e p e n d e n t R N A p o l y m e r a s e (POL; Robaglia et al., 1989) and the coat protein (CP). POL, C I P and 2 1 K 2073 Nucleotide sequences of two S M V strains M A T I M I G S M A I SVP NT HV S RAS N S V M P V Q A V Q M A K Q V P S A R G V L Y T L K R E G S T Q V I K H E E A L R K F Q E A F D Q D V G I Q R R L L V N K H S S I Q S T K E G W F D LAS LNF RAG S S K E A A I A R R K Q E E E L I C T N H S G E N 120 DFLNGKYEQQFYAGVSATKSMKFEGGSVGFRTKYWRPTPKKTKERRATSQCRKPTYVLEEVLS IASKSGKLVEF I T G K G K R V K V C Y V R K H G A I LP KFSLP H E E G K Y I H Q E L Q Y A S T Y E F L 240 N D T A S I I( P Y I C M F A K Y K S INADD I TYGD SGLLFD ERSS LTTNHTKLP Y F V V R G R R N G K L V N A L E V V E N M E D I QHYSQNP EAQFF RGWKKVFD KMP P HVENHECT ID F T N E Q C G E L A A A I SQSI FPVK S S I K 360 KLSCKQCRQHI K H L S W E E Y K Q F LLAHMGC H G A E W E T F Q E IDGMRYVKRVI ET STAENAS LQT SLE IVRLTQNY KSTHMLQ IQD I N K A L M K G P SVTQ S E LEQAS K Q L L A M T Q W W K N H M A L T 480 N T K K D N R D E D A L K V F R N K R S S K A L L N P S L L C D N Q L D K N G N F V W G ERGRHS KRF FANYF EEVVP SEGYSKYVI RTN PNGQRELAIG SLIVP LD F E R A R M A L Q G K S V T RE P ITMSCI S R Q D G N F V Y P C C K N 600 C V T H D D G K A F Y S ELKS PT K R H L V I G T S G D P KY I DLP ATDAD RMY I A K E G F C Y L N I F L A M L V N V N EDEAKD FTKMVRDV IVP RLGKWP T M L D V A T A A Y M L T V F H P ETRNAELP RI LVD HAC M F 720 QTMHVID SFG S L T V G Y H V L K A G T V N Q L I Q F A S N D L Q S EMKFY RVGG E V Q Q R M K C E T A L I T S IFKP KRMIQI LENDPYI LLMGLVS P S ILI HMYRMKHF E KGVE LWI S KEHSVAKI FI I LG I V E 840 Q L T K R V A A N D V L L E Q L E M I S E T S E R F M S I LEDCPQAP H•YKTAKDLLTMYIEGKASNNQLVENGFVDMNDKLYMAYEKIY•DRLKQ•WRALSWLEKFSITWQLKRFAPHTEKCLTKKVVE S ~ T A 960 ESSASSGNFASVCFMNAQSHLRNVRNTLFQKCDQVWTASVRAFVRLIISTLHRCYSDIVYLVNICI I FS L L V Q M T S V L ~ G I V N T A R R D K A L L S G W K R K E D EEAVI H L Y E M C E K M E G G H P S 1080 N F I HSH R I EKF LDHVKGVRP D L L P V A V S M T G Q S E D V S A Q A K T A T Q L Q L E K I V A F M A L L T M C I D N E R S D A V F K V L S K L K A F F STMGEDVKVQS LD E I QS I D E D K K L T I D F D L E T N K E S S S V S F D V K F E V G N R S RN V V N 1200 A W W N R Q L E Q N RVI P H Y R S T G E F L E FTRETAAKI ~ M L V A T S S HT EFLI RGAVGS G KSTGLP HHLS RKG KVLLLEP TRP LAENVSKQLS FE P FYHNVT LRMRGLS KFG S SNI VVMT SGFAFH D Q K L 1320 Y Y F N N P Q Q L S D F D F I I I D ECHVQD S PT IAFNCALKE F EF S GKLI KVSATT PG RECE F T T Q H P V K L K V E D HLS F Q N F V Q A Q G T G S N A D M I Q H G N N L L V Y V A S Y N E V D Q L S R L L T E KHYKVT V S L P Q 1440 K V D G R T M Q M G N V E IATTGTEVKP HF IVATN I I ENGVT LD I DCVI D F G L K V V A T L D T D N R C V R Y N K Q S V S Y G E RI QRLGRVGRC KP G FALRI G H T G K G V E E V P E FIAT EAAFLS FAYG LPV G L V I 1560 T T Q S V S T N I LS R C T V K Q A R V A L N F E L T P F FTTNF I K Y D G S M H V I DT RLLK SYKLRE S EMLLT KLAI P Y Q F V G Q W V T V K E Y E R Q G I HLNC P EKVKI P FYVHG I P D KLYF~MLWDTVCKY KND PEIH P I V V 1680 X A G F G S V K S V N A T K I SYTLSTDPTAI P RTLAI L D H L L S E E M T K K S H F D T I G S A V T G Y S F S L A G I A D G F R K R Y L K D Y T Q H N I A V L Q Q A K A Q L L E F D C N K V D I N N L H N V E G I G I S R VI LNAVQLQSK 1800 HEVS K F L Q L K G K W D G K K F M N D A V V A I F T L V G G G W M L W D Y F T R V I R E P V S T Q G K K R Q I Q K L K F R D A F D RKI G R E V Y A D D Y T M E H R F G E A Y T KKGKQ KG S T R T K G M G R K S RNF I HLYGVEP E V T 1920 NYSMI R F V D P L T G H T M D E H P RVD I RMVQQEFEEI R K D M I G E G E L D R Q R V Y H N P G L Q A Y F I G K N T E E A L K V D L T P HRPTLLCQNSNAI AGFP E R E D E L R Q T G L P Q W S KSDVP R A K E R V E M 2040 k E S K S V Y K G L R D Y S G I STL I C Q L T N S S D G H K E T M F G V G Y G S F I I T N G H L F R R N N G M L T V K T W H G E F V I HNTTQLKI HFIQGRDVI LI ?~MpKD FP p FG KRNLFRQP KEg E R V C M V G T N F Q E K I K 2160 SLRATVSESSMI LP£GKGSFWI HWITTQDGFCGLPLV••NDGH•VG•HGLTSND•EKNF•VPLTDGFEKEYLENADNLSWDKHWFWEP•KIAWG•LNL•EEQPKEEFKI•KLV•DLFG•T 2280 V T V Q G R K E R W V L D A M E G N L A A C G Q D D S A L V T K H V V K G K C P y F A Q Y L S V N Q E A K S F FE p LMGAYQP S RLN KDAFKRGFF KYN K P W L N EVDFQS F g R A V A G V K L M M M E F D F K E C V Y V T D P D K V A D A K 2400 E I YDS L N M K A A V G A Q Y K g K K Q D Y F S G M D S FD KE RLLY LS C E R L F Y G E K G V W N G S LKAELRP I E K V Q A N K T R T F T A A P I D T L L G A K V C V D D FNNQFYS L N L T C P W T V G M T K F Y R G W D KLMR 2520 SLPDGWVYCHADGSQFDSSLTPLLLNAVLDVRSFFMEDi~dVGREMLENLYA~IVYTPILAPDGTIFKKFRGNN~GQPSTVV~NTL~IALYYSGCKQGW~EEDIQERLVFF/UqGDDIIL 2640 V C AVSDKDTWLYDTLSTSFAELGLNYNFEERTKKREELWFMSHKANLVDGIY~PKLEPERIVSILEhq)RSKEI24HRTEAICASMIEAWGYTELLQEIRKFYLWLLNKDEFKELASSGK~PYI 2760 E K Q A X A E T A L R K L Y T D V N A Q T S E L Q R Y L E V L D FTHADDCC E SVS LQSGKE KEGDMDAGKD P KKST S S S KGAGT S S KDVNVG S KGKVVP RLQKI T RKM~LPMVEG K I I LS LD HLLEYKP NQVD L FN E 2880 T PAT RTQF E A W Y N A V KD EY E L D D E Q M G V V M N G F MVWC I D NGT S P DAN GVWVMMD G E E Q I E Y P L KP IV ENAKP T L RQ I MHH F S D AA FAY I EM~N S E S P YMP R YG L L RN L RD RE LARYAFD F 3000 YEVT S KTPN P A R E A I A Q M K A A A L S G V N N K L F G L D G N I STN SENT E R H T A R D V N Q N M H T L L G M G p p Q I Q 3066 Fig. 3. Deduced amino acid sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Amino acids of the G7 sequence that differ from G2 are shown below the G2 sequence. Scissors indicate predicted cleavage sites. proteins show more similarity to homologous proteins of PPV than to those of the other potyviruses. In contrast, SMV CP shows greater similarity to PVY and TEV CP. Overall, SMV is most similar to PPV. The POL protein of SMV is analogous to the NIb of TEV (Dougherty & Parks, 1991), but nuclear inclusions are not evident in SMV-infected cells (Edwardson & Christie, 1986). The POL protein was identified as the polymerase because it contains the conserved sequence GX~TXXXN(X)¢2o.4o>GDD at amino acids 2595 to 2637. This fits the consensus of virtually all known RNAdependent R N A polymerases (Kamer & Argos, 1984). 2074 Ch. Jayaram, J. H. Hill and W. A. Miller The 21K and the 27K proteins of SMV are analogous to the NIa protein of TEV and, by comparison with TEV (Dougherty & Parks, 1991), consist of a VPg and a protein processing activity, respectively. The tripeptide GRD (2120 to 2122 amino acid position; Fig. 3) in the QS QG % YS cG 0s 308 765 1164 / \ / 135K I.c Rol 42K I zs QG J[21 127 I /6K\2041 2284 1798 1852 Qs I I 2801 3066 Fig. 4. Proposed map of SMV polyprotein. The amino acids between which cleavage occurs and their position in the genome are shown above and below the map, respectively, 27K protein is conserved in TVMV, TEV, PVY and PPV (Domier et al., 1986; Allison et al., 1986; Robaglia et al., 1989; Maiss et al., 1989), with the aspartic acid residue predicted as the active site (Parks & Dougherty, 1991). This tripeptide is also conserved in strain G2. In strain G7, however, the arginine residue at amino acid position 2121 is changed to lysine (GKD). The CIP protein of SMV shares conserved domains with a group of proteins believed to be helicases (Company et al., 1991 ; Koonin, 1991), including the P80 protein of bovine viral diarrhoea virus, dengue 4 virus non-structural protein 3, mammalian translation initiation factor 4A and PPV CIP [recently shown to have 35K G2 PPV PVY TEV TVMV MaTImi GS MSTIvf GS MSTIcf GS MalIfgtvnanilkevfGa MSTI G2 PPV PVY TEV TVMV 282 280 256 276 228 FvVRGrrnGkLVnA FIVRGkhnsiLVDs FIVRGsheGkLyDA FIVRGrskGmLVDA ¥1VRGtcddsLear 08 08 08 19 04 ------ 185 181 157 179 130 XAsksgklVEfItgK mAkangqkVEiIgrK ImsekrgsVhlIsKK IvrkrhmqVEiIsKK IAkasslrVEvIhKK 295 293 269 289 242 ------ 306 306 282 302 254 qHY iHY iqf tHY tHf 199 195 171 193 144 ------ 260 257 233 254 205 G D S G llf G m S G fvv GDSG viL G s S G ivL GDSGIvlL 266 263 239 260 212 308 308 284 304 256 42K G2 PPV PVY TEV TVMV Gev q qrmkceta LItSIFKPkrMiQiLEndPYiLlmglVSPsILihMYrmkhFEkgvelWIske GeV dkcdefknvKl LIrSIyKPqiMeQvLkEEPYLLImsvlSPGvLMAIfNSGslEKAtqyWItrs Gvi m sesaalKl LlkgIFrPkvMrQLLIdEPYLLilsilSPGILMAMYNyGiFEIAvrlWInek G M n rdwtqgaiemLIkSIiKPh/MkQLLEEEPYIivlaiVSPSILiAMYNSGtFEqAIqnWIpnt GiVysen ndasavKa LtqaIFrPdvlseLiEKEPYLmvfalVSPGILMAMsNSGalEfgiskWIssd 64 66 64 66 67 G2 PPV PVY TEV TVMV HSvAkIfiILgqLtkrVaandvLleQlemlsetserfmsiLeDcpQaphSYktAkdlLtmyiegkasnnqLv HSLAaItSmLSALAaKVSiAstLnaQmsvldehAavLyDsvfgGtQpyaSYmmAvktLermkartEsDhtLn qSiAmIASiLSALAlrVSaAetLvaQriiIdaaAtDLiDatcDGfnlhltYptAlmvLqwknrnEcDdTLf MrLAnlAaILSALAwKltlAdlfvqQrnlIneyAqvilDnLiDGvrvnhSlslAmeivtiklatqEmkmaLr HSLvrmASILktLAsKVSvAdtLalQkhimrqnanfLcgeLinGfQkkkSYthAtrfLlmiseenEmDdpvl 136 138 136 138 139 G2 PPV PVY TEV TVMV enGFvdmndklymayEKiYsdrLkqeWraLSWiEKF SitwqlkrfapHte KcLtkKvveessassgnfas dlGFsVlrqatphlvEKsYLqeLeqAWkeLSWsEKF SailesqrwrkHip KpfipK Dgadlggrvdi kaGFpsyntsvvqimEKnYLnlLndAWkdLtWrEny p qhgtHteqnaLstryik ptekadlkg egGyaVtsekvhemlEKnYvkaLkdAWdeLtWiEKF Sairhsrkllkfgr KpLimKntvDcgghidlsvk naGyrVleassheimEKtYLalLetsWsdLSlygKFkSiwftrkhfgryka eLfpKeqtDlqgrysnslr 206 205 199 208 209 G2 PPV PVY TEV TVMV vcfmnaqshL rn vrntlfqkcdqvwtasvrafvrli IstlhrcysdivylvniciifS svrsllgnqy Kr irdvvrwkrddvvcytyqsmgklfckalgispsfLPstlkmldmLiVfS lynispqafLgrsaqvvKg rasglserfnnyfntkcvnissff IrrifrrLPtfvtFvnsLlViS slfkfhlelL Kh tisravndcggarkvrvaknamtkgn flkiysnLPdvykFitvssViS fhyqstlkrL rnkgslcrerflesissarrrttca v fsllhkafPdvlkFintLvivS 264 266 264 268 267 G2 PPV PVY TEV TVMV LLvqmtsvlqgIvntaRrdKallsgwkrKedE eavihLY emceKmEgghPsiekFLdhvkgvrPdllpvA LLlsigatcnsmvneHkhlKqlaAdredKkrf krlqvLYtrlseKvgct PtadEFLEYvgdenPdllkhA mLtsvvavcqaIIldqRkyrreielmqieknE ivcmeLYaslqrKLErd FtwdEyiEYlksvnPqivqfA LLltflfqidcmlraHReaK vAawlaKesEwdniinrt fqysKLEnpigyrstaeErlqsehPeafey LsmqiyymlvaIIheHRaaKiksAqleervlE dktmlLYddfkaKLpeg sfeEFLEYtrqrdkey ve 334 336 334 335 334 G2 PPV PVY TEV TVMV vsmt g q s E D V s aQaKt a t q l q L E k I V A F m A L I t M c i D n E R S D a V F K v L s K I K a f f S t m g e d V k v Q e d i i g d g q v V v h Q s K r d s q a nLE r vVAF v A L V m M I F D s E R S D g V y K I L N K I K G IMG S vd r a V h h Q q a q m e e y D V r hQr s t p w k n L E q v V A F m A L V i M v F D a E R S D C V F K t L N K F K G c L S S idyeVr h Q yk f c igkED i v e Q a K q p e iay fEk IiAFit L V l M a F D a E R S D C V F K I L N K F K G I L S S t e re iivQ ylm~met t E lye fQAKnt gqa s LE r IiAFvs Lt I M I F D n E R S D C V y K I L t K F K G I L g S v e n n V r fQ 399 401 398 401 399 Fig. 5. Amino acid sequence alignments of portions of SMV G2 35K protein and all of 42K protein with those of other potyviruses. Each sequence was aligned with G2 in pairwise fashion using the program BESTFIT (GCG sequence analysis software). Amino acids are shown in bold upper case letters where three or more align. G7 was identical to G2 in all conserved (bold) amino acids. Numbers in the 35K alignment indicate positions of amino acids. Intervening amino acids which showed no significant alignments (indicated by double hyphens) are not shown. Nucleotide sequences o f two S M V strains 2075 Table 2. Nucleotide and amino acid differences between strains G2 and G7 of S M V Amino acids Nucleotides Region* 5'Non-coding 35K HC-PRO 42K CIP 6K 21K 27K POL CP 3'Non-coding Total differences Total leading to Total nonTotal Percentage aminoacid Percentage Total Percentage conservative conservative Total differences differences changes differences Total differences differences differencest differencesi" (1) (2) (2/1) (3) (3/2) (4) (5) (5/4) (6) (7) 131 924 1371 1197 1903 161 567 729 1551 795 259 13 52 54 85 155 9 36 45 78 31 18 10 6 4 7 8 6 6 6 5 4 7 NA~ 24 14 29 37 0 3 2 13 3 NA NA 46 26 34 24 0 8 4 17 10 NA NA 308 457 399 634 54 189 243 517 265 NA NA 20 11 22 27 0 2 2 12 3 NA NA NA NA 6 2 6 13 8 15 7 3 7 4 20 7 0 0 0 1 1 1 1 1 1 9 3 2 1 NA 2 NA 1 NA * Abbreviations are as designated in the text. t Conservative and non-conservative differences are defined on the basis of physicochemical properties of amino acids and reflect similarity of function in three-dimensional conformation of proteins as discussed by George et al. (1990). NA, Not applicable. RNA helicase and RNA-dependent ATPase activity (Lain et al., 1990, 1991)]. The protein also has a conserved nucleotide binding site at amino acid position 1249, a characteristic of potyviruses (Robaglia et al., 1989). The HC-PRO, the 35K protein and the 42K protein, all encoded near the 5' end, show markedly less similarity to the homologous proteins of other potyviruses. The HC-PRO shares between 37 % and 45 % identity whereas the 42K has 13 % to 29 % identity, and the identity of the 35K protein is an insignificant 6% to 14%. Although the amino acid sequences of both 35K and 42K are known to be highly variable among members of the potyvirus group (e.g. Fig. 5), there are no differences between the two SMV strains in the conserved regions of these proteins (Fig. 3). Discussion The basis for resistance to strain G2 of soybean plants containing the Rsv gene is unknown. This report of the complete sequence of two closely related strains of a potyvirus, differentiated by their ability to infect soybean containing the Rsv resistance gene, should provide the basis for correlating host susceptibility/resistance with alterations in nucleotide sequence occurring among the strains. The Y-terminal proteins may be involved in the ability of a TVMV isolate to overcome host resistance (Hellman et al., 1990). We have shown that the region with the greatest number of differences between the two SMV strains is in the 5' region of the genome. In particular, the greatest number of nonconservative amino acid differences between strains G2 and G7 occurs in the 42K protein, followed by the 35K, CIP and HC-PRO proteins (Table 2). Although protease and vector transmission functions have been demonstrated for the 35K and HC-PRO proteins with reasonable certainty, other functions of proteins in the 5' region are only speculative. All, however, could relate to host plant resistance and include the suggestion that the 35K, 42K and CIP proteins may be involved in cell-tocell movement (Domier et al., 1987), regulation of proteolytic processing of the viral polyprotein (Riechmann et al., 1992) and replication (Company et al., 1991 ; Koonin, 1991 ; Lain et al., 1990, t991 ; Robaglia et al., 1989), respectively. A previous report showed a strong correlation between the ability of TMV strains to overcome resistance and a change in local net charge, because of single amino acid changes, in the putative replicase genes encoding the 126K and 183K proteins (Meshi et al., 1988). A comparison of the hydropathy profiles of the 35K, 42K and POL proteins showed differences in only the first two. The 35K protein of strain G7 showed, with respect to G2, an increase in local net charge at amino acid positions 13 to 25 and 244 to 259, and a decrease at positions 47 to 60 and 132 to 137 (Fig. 6). Upon comparison of the 42K protein of strain G7 with that of strain G2, an increase in local net charge at positions 15 2076 Ch. Jayaram, J. H. Hill and IV. A. Miller 50 100 150 613-25 ] ~.v~ 47 60 Jt 132 137 J~, ~, . . . . . . . • . . 1~-29 200 100 200 ~ 250 ~44-25~ ~1-~ 4~..A.. ~ 300 42K , A,A/ vA Atl Fig. 6. Comparison of hydropathy profiles (Kyte & Doolittle, 1982) of cistrons 35K and 42K of SMV strains G2 and G7. Numbers identify amino acid positions. Boxes outline regions that differ between strains. to 29 and a decrease at positions 328 to 340 were evident. Although their significance is unknown, differences in local net charge have been proposed to affect electrostatic interactions between a host factor and non-structural viral proteins involved in resistance and susceptibility (Meshi et al., 1988). We have shown the presence of only three amino acid differences in CP of both G2 and G7 at amino acid positions 2809, 3018 and 3065 (Fig. 3 and Jayaram et al., 1991). Changes at positions 2809 and 3065 occur within the N- and C-terminal regions of the CP, which are known to be highly variable among potyviruses. But the change from methionine in strain G2 to isoleucine in G7 at amino acid position 3018 occurs within the trypsinresistant core, which displays significant amino acid identity among all potyviruses examined (Ward & Shukla, 1991). A recent report has shown that a change from glycine to proline in the virus CP correlates with ability of a strain of potato virus X to overcome resistance (Kohm et al., 1991). We have also noted that a change in the amino acid tripeptide G R D to G K D (at 2121) in the 27K protein correlates with strain G7 infection of soybean plants containing the Rsv gene. The role (if any) that these differences play in the interaction between the virus and resistance gene product is unknown. However, since different viral proteins are involved in different resistance mechanisms, the results of this study provide the basis for determination of specific nucleotide sequences involved in overcoming host resistance. Chimeric full-length infectious transcripts generated by exchanging homologous regions of the two virus strains as well as site-specific mutagenesis will facilitate identification of these sequences. The variability in CP may be a useful criterion for taxonomy of potyviruses (Shukla & Ward, 1989). The sequence identity of CP is greater than 5 0 ~ among all potyviruses. However, the relative similarity of different viruses, when based on a single protein, is dependent upon which viral protein is compared. For example, based on CP, SMV is more closely related to TEV and PVY than PPV, but based on CIP, POL, 21K and overall homology, SMV is most closely related to PPV (Table 1). Thus, it may be insufficient to characterize taxonomic relationships based upon a single protein. Furthermore, overall relatedness may be reflected best by comparison of biological properties such as host range as well as viral genes. The results reported here and those of Robaglia et al. (1989) demonstrate that both the 35K and 42K proteins show little similarity among potyviruses. However, because two strains of the same virus, i.e., SMV G7 and SMV G2, share 97% overall identity, comparison of the 35K and 42K proteins of potyvirus genomes may clarify the taxonomic position of closely related potyviruses as, for example, the distinction between SMV and watermelon mosaic virus 2 (Jayaram et al., 1991). The authors thank J. C. Carrington for access to a manuscript before publication and Carol Manthey of the Iowa State University Nucleic Acids Research Facility for automated D N A sequencing. This research was supported in part by Pioneer Hi-Bred International, Incorporated and by the Iowa Soybean Promotion Board. Journal Paper No. J-14720 of the Iowa Agriculture and Home Economics Experiment Station, Ames, Iowa, U.S.A. Project No. 2428. References ALLISON,R. F., JOHNSTON, R. E. & DOUGHERTY,W. G. (1986). The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology 154, 9-20. BUZZEL, R. I. & TU, J. C. (1984). Inheritance of soybean resistance to soybean mosaic virus. Journal of Heredity 75, 82. CA~mr~OTON, J. C., CARY, S. M., PARr,s, T. D. & DOUGRERTV, W. G. (1989). A second proteinase encoded by a plant potyvirus genome. EMBO Journal 8, 365-370. CrlEN, P., Buss, G. R. & TOLIN, S. A. (1988). Inheritance of reaction to strains G5 and G6 of soybean mosaic virus (SMV) in differential soybean cultivars. Soybean Genetics Newsletter 15, 130-134. CrIO, E. & GOODMAN, R. M. (1979). Strains of soybean mosaic virus: classification based on virulence in resistant soybean cultivars. Phytopathology 69, 467-470. COMPANY, M., ARENAS, J. & ABELSON, J. (1991). Requirement of the RNA helicase-like protein PRP22 for release of messenger RNA from spliceosomes. Nature, London 349, 487-493. DEOM, C. M., OLIVER, M. J. & BEACrIY, R. N. (1987). The 30kd gene product of tobacco mosaic virus potentiates virus movement. Science 237, 389-394. DOMIER, L. L., FRANKLIN,K. M., SHAHABUDDIN, M., HELLMANN, G. M., OVERMEYER, J. H., HIREMATH, S. T., SIAW, M. F., LOMONOSSOFF, G. P., SHAW, J. G. & RHOAOS, R. E. (1986). The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Research 14, 5417-5430. Nucleotide sequences o f two S M V strains DOMIER,L. L., SHAW,J. G. & RHOADS, R. E. (1987). Potyviral proteins share amino sequence homology with picorna-, como-, and caulimoviral proteins. Virology 158, 20-27. DOUGHERTY, W. G. & PARKS, T. D. (1991). Post-translational processing of the tobacco etch virus 49-kDa small nuclear inclusion polyprotein: identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456. DOUGHERTY,W. G., CARRINGTON,J. C., CAR',', S. M. & PARKS,T. D. (1988). Biochemical and mutational analysis of a plant virus polyprotein cleavage site. EMBO Journal 7, 1281-1287. EDWARDSON,J. R. & CHRISTIE,R. G. (1986). Viruses infecting forage legumes, vol. 2. Florida Agricultural Experiment Station Monograph Series no. 14. GEORGE, D. G., BARKER,W. C. & HUNT, L. T. (1990). Mutation data matrix and its uses. Methods in Enzymology 183, 333-351. GHABRIAL, S. A., SMITH, H. A., PARKS, T. D. & DOUGHERTY,W. G. (1990). Molecular genetic analyses of the soybean mosaic virus NIa proteinase. Journal of General Virology 71, 1921-1927. GUBLER, U. & HOFFMAN, B. J. (1983). A simple and very efficient method for generating eDNA libraries. Gene 25, 263-269. HELLMAN, G. M., THORNBURY, D. W. & PIRONE, T. P. (1990). Molecular analysis of tobacco vein mottling virus (TVMV) pathogenicity by infectious transcripts of chimeric potyviral eDNA genomes. Phytopathology 80, 1036 (abstrac0. HILL, J. H. & BENNER, H. I. (1980a). Properties of soybean mosaic virus and its isolated protein. PhytopathologischeZeitschrift 97, 272281. HILL, J. H. & BENNER, H. I. (1980b). Properties of soybean mosaic virus ribonucleic acid. Phytopathology 70, 236-239. HILL, J. H., BENNER, H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS, R. E., JR, DURAND, D. P. & VAN DEUSEN, R. A. (1989). epidemiology of soybean mosaic virus in Iowa. Phytopathology 70, 536-540. HILL, J. H., BENNER,H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS, R. E., JR. DUKAND, D. P. & VAN DEUSEN, R. A. (1989). Differentiation of soybean mosaic virus isolates by one-dimensional trypsin peptide maps immunoblotted with monoclonal antibodies. Phytopathology 79, 1261-1265. HOLLINGS, M. & BRUNT, A. A. (1981). Potyvirus group. CMI/AAB Descriptions of Plant Viruses, no. 245. JAYARAM, CH., HILL, J. H. & MILLER, W. A. (1991). Nucleotide sequences of the coat protein genes of two aphid-transmissible strains of soybean mosaic virus. Journolof General Virology72, 10011003. KAMER, G. & ARGOS, P. (1984). Primary structural comparison of RNA-dependent polymerase from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KIHL, R. A. S. & HARTWIG, E. E. (1979). Inheritance of reaction to soybean mosaic virus in soybeans. Crop Science 19, 372-375. KorlM, B., SANTA CRUZ, S., GOULDEN, M., KAVANAGH,T. & BAULCOMBE,n. (1991). Molecular study of resistance in Solanum tuberosum cv. Cara and potato virus X (PVX). Abstract No. 1225, Third International Congress of Plant Molecular Biology, Molecular Biology of Plant Growth and Development. KOONIN, E. V. (1991). Similarities in RNA helicases. Nature, London 352, 290. KYTE, J. & DOOLITI'LE,R. F. (1982). A simple method for displaying the hydropathic character of a protein. Journalof Molecular Biology 157, 105-132. LAIN, S., RIECrlMANN,J. L. & GARCfA,J. A. (1990). RNA helicase: a novel activity associated with a protein encoded by a positive strand RNA virus. Nucleic Acids Research 18, 7003-7006. LAIN, S., MARTIN, M. T., RIECHMANN,J. L. & GARCiA,J. A. (1991). Novel catalytic activity associated with positive-strand RNA virus infection: nucleic acid-stimulated ATPase activity of the plum pox potyvirus helicaselike protein. Journal of Virology 65, 1-6. LIM, S. M. (1985). Resistance to soybean mosaic virus in soybeans. Phytopathology 75, 199-201. LucAS, B. S. & HILL, J. H. (1980). Characteristics of the transmission of three soybean mosaic virus isolates by Myzus persicae and Rhopalosiphum maidis. Phytopathologische Zeitschrift 97, 47-53. 2077 MAISS, E., TIMPE, U., BRISSKE, A., JELKMANN, W., CASPER, R., HIMMLER,G., MA'I'rANOVICH,n. & KATINGER,H. W. D. (1989). The complete nucleotide sequence of plum pox virus RNA. Journal of General Virology 70, 513-524. MAVANKAL,G. & RHOADS,R. E. (1991). In vitro cleavage at or near the N-terminus of the helper component protein in the tobacco vein mottling virus polyprotein. Virology 185, 721-731. MESHI, T., WATANABE,Y., SAITO, T., SUGIMOTO,A., MAEDA, T. & OKADA, Y. (1987). Functions of the 30kd protein of tobacco mosaic virus: involvement in cell-to-cell movement and dispensability for replication. EMBO Journal 6, 2557-2563. MESHI, T., MOTOYOSm, F., ADACHI,A., WATANABE,Y., TAKAMATSU, N. & OKADA,Y. (1988). Two concomitant base substitutions in the putative replicase genes of tobacco mosaic virus confer the ability to overcome the effects of a tomato resistance gene, Tm-1. EMBO Journal 7, 1575-1581. MIERENDORF, R. C. & PFEFFER, D. (1987). Sequencing of RNA transcripts synthesized in vitro from plasmids containing bacteriophage promoters. Methods in Enzymology 152, 563-566. MOSER, O., GAGEY, M.-J., GODEFROY-COLBURN,T., STUSSI-GARAUD, C., ELLWART-TSCHORTZ,M., NITSCHKO, H. & MUNDRY, K.-W. (1988). The fate of the transport protein of tobacco mosaic virus in systemic and hypersensitive tobacco hosts. Journal of General Virology 69, 1367-1373. PARKS, T. D. & DOUGHERTY,W. G. (1991). Substrate recognition by the NIa proteinase of two potyviruses involves multiple domains: characterization using genetically engineered hybrid proteinase molecules. Virology 182, 17-27. PONZ, F., GLASCOCK,C. B. & BRUENING,G. (1988). An inhibitor of polyprotein processing with the characteristics of a natural virus resistance factor. Molecular Plant-Microbe Interactions 1, 25-31. R~ECHMANN,J. L., LAIN, S. & GARCiA, J. A. (1992). Highlights and prospects of potyvirus molecular biology. Journal of General Virology 73, 1-16. ROBAGLIA,C., DURAND-TARDIF,M., TRONCHET, M., BOUDAZIN,G., ASTIER-MANIFACIER,S. & CASSE-DELBART,F. (1989). Nucleotide sequence of potato virus Y (N strain) genomic RNA. Journal of General Virology 70, 935-947. SANDERSON,J. L., BRUENING, G. & RUSSELL, M. L. (1985). Possible molecular basis of immunity of cowpeas to cowpea mosaic virus. UCLA Symposia on Molecular and Cell Biology, New Series 22, 401412. SANGER,F., NICKLEN,S. & COULSON,A. R. 0977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SHUKLA,n. D. & WARD, C. W. (1989). Identification and classification of potyviruses on the basis of coat protein sequence data and serology. Archives of Virology 106, 171-200. SIEGEL, A. (1979). Recognition and specificity in plant virus infection. In Plant Resistance to Viruses,pp. 109-113. Edited by D. Evered & S. Harnett. Chichester: John Wiley and Sons. TALIANSKY, i . E., MALYSHENKO, S. I., PSHENNIKOVA, E. S. & ATAnEKOV,J. G. (1982). Plant virus-specific transport functions. II. A factor controlling host range. Virology 122, 327-331. VANCE,V. B. & BEACHY,R. N. (1984a). Translation of soybean mosaic virus RNA in vitro: evidence for protein processing. Virology 132, 271-281. VANCE, V. B. & BEACrlY,R. N. (1984b). Detection of genomic-length soybean mosaic virus RNA on polyribosomes of infected soybean leaves. Virology 132, 26-36. VERCHOT,J.-M., KOONIN, E. V. & CARRINGTON,J. C. (1991). The 35kDa protein from the N-terminus of the potyviral polyprotein functions as a third virus-encoded protease. Virology 185, 527535. WARD, C. W. & SHUKLA, D. D. (1991). Taxonomy of potyviruses: current problems and some solutions, lntervirology 32, 269-296. (Received 25 November 1991; Accepted 31 March 1992)