Download Complete nucleotide sequences of two soybean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Metabolism wikipedia , lookup

RNA-Seq wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Western blot wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Expression vector wikipedia , lookup

Protein wikipedia , lookup

Gene expression wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Biosynthesis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

Plant virus wikipedia , lookup

Protein structure prediction wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
2067
Journal of General Virology (1992), 73, 2067-2077. Printed in Great Britain
Complete nucleotide sequences of two soybean mosaic virus strains
differentiated by response of soybean containing the Rsv resistance gene
Ch. Jayaram, John H. Hill* and W. Allen Miller
Department of Plant Pathology, Iowa State University, Ames, Iowa 50011, U.S.A.
The complete nucleotide sequence of the genomic
RNAs of strains G2 and G7 of soybean mosaic virus
were determined. In both cases, the genome is 9588
nucleotides long, excluding the Y-terminal poly(A)
sequence. A large open reading frame (nucleotides 132
to 9329) encodes a polyprotein of 3066 amino acids
with a predicted Mr of either 349542 (strain G2) or
349741 (strain G7). Based on comparison with the
proposed locations of cleavage sites of other potyvirus
polyproteins, nine mature proteins are predicted. The
mature proteins of the two strains share 94 to 100%
amino acid identity, with the greatest variability
occurring in the 35K and 42K proteins. Differences in
local net charge in portions of these proteins as well as
differences in amino acid sequence throughout the
genome are discussed in relation to resistance and
susceptibility of host plants to strains G2 and G7.
Comparison with other potyviruses may be useful for
taxonomic clarification of viruses and strains.
Introduction
in the function of virus-encoded protease, movement or
replicase proteins.
One of the best characterized examples of host
resistance is that of the cowpea cultivar Arlington to
cowpea mosaic virus. In vitro studies suggested that
Arlington leaves contain a protease inhibitor that
inhibits proteolytic processing of a virus-encoded polyprotein (Sanderson et al., 1985). A translation inhibitor,
although not specific to the viral RNA, may also be
involved in the resistance mechanism (Ponz et al., 1988).
A different resistance mechanism involves blocking of
cell-to-cell movement from the initial site of replication.
A single Mr 30 000 (30K) protein encoded by the tobacco
mosaic virus (TMV) genome has been identified which
potentiates cell-to-cell movement of TMV in tobacco
plants (Deom et al., 1987; Meshi et al., 1987). It has been
speculated that the 30K protein is either not expressed
properly or is unable to act on cells in plants which do not
support systemic movement of TMV (Moser et al., 1988;
Taliansky et al., 1982).
At least three genes, Rsv, Rsv2 and RSV3, confer
resistance to various strains of SMV (Buzzel & Tu, 1984;
Kihl & Hartwig, 1979; Lim, 1985). However, the
resistance conferred by each gene can be overcome by
different strains. For example, strain G7 overcomes
resistance conferred by the Rsv gene in the soybean line
PI 96983. However, several other SMV strains, namely
G1 to G6, do not overcome resistance conferred by this
gene (Lim, 1985).
To improve understanding of the resistance mecha-
Soybean mosaic virus (SMV), a member of the potyvirus
group of plant viruses (Hollings & Brunt, 1981), is the
cause of one of the most widespread viral diseases of
soybean. Several strains of the virus have been identified
on the basis of both phenotypic response of differential
soybean lines (Buzzel & Tu, 1984; Chen et al., 1988; Cho
& Goodman, 1979) and transmission by aphid species
(Lucas & Hill, 1980).
Like other potyviruses, SMV genomic RNA encodes a
large precursor polyprotein that is processed by a virusencoded protease(s) (Ghabrial et al., 1990) to yield
several proteins (Vance & Beachy, 1984a, b). Unlike
those potyviruses whose genomes have been extensively
characterized, [i.e., tobacco etch virus, (TEV; Allison et
al., 1986), potato virus Y (PVY; Robaglia et al., 1989),
tobacco vein mottling virus (TVMV; Domier et al., 1986)
and plum pox virus (PPV; Maiss et al., 1989)], SMV is
seed-borne (Hill et al., 1980) and its genome structure
had not been fully characterized.
Host resistance can occur by interruption of the virus
life cycle at one or more of several stages. Siegel (1979)
identified six such steps upon which resistance could act.
These are (i) entry into the cell, (ii) uncoating of the
nucleic acid, (iii) translation of viral proteins, (iv)
replication of the viral nucleic acid, (v) assembly of
progeny virions and (vi) spread of the virus, both to new
cells and new hosts. At the molecular level, evidence has
supported resistance mechanisms involving alterations
0001-0752 © 1992 SGM
2068
Ch. Jayaram, J. H. Hill and W. A. Miller
nism conferred by the Rsv gene, the genomes of strains
G2 (unable to induce disease in plants containing the Rsv
gene) and G7 (able to induce disease in plants containing
the Rsv gene) have been sequenced, and their amino acid
sequences derived. The potential pathogenic relevance
of differences found between genomic sequences is
described here. The sequence data are consistent with a
genome organization similar to that of other potyviruses
(see review by Riechmann et al., 1992) and have
relevance to potyvirus taxonomy.
Methods
Virus purification, RNA isolation, cDNA synthesis and cloning. The
origins of strains G2 and G7 of SMV and their purification have been
described (Hill & Benner, 1980a; Hill et aL, 1989). Viral RNA was
isolated from purified virions according to the method of Vance &
Beachy (1984a). cDNA was synthesized by the method of Gubler &
Hoffman (1983), using a modified kit (Pharmacia), and cloned into a
pGEM3Zf(+) vector (Promega). Initially, a random primed cDNA
library was constructedusing the RNA of strain G7. Approximately50
clones were sequenced and mapped to different regions of the genome
by comparisonwith published potyviralsequences. From the data, four
oligonucleotide primers were synthesized for use in cloning the
different regions of the genomes of strains G2 and G7.
cDNA sequencing. The cDNA clones were sequenced both manually
and with an Applied Biosystems 370A automated DNA sequencing
system using the dideoxynucleotidemethod (Sanger et al., 1977) and
Taq polymerase (Promega). Overlapping cDNA clones of different
sizes were used to eliminate almost completelythe need for subcloning.
Every base was determined by sequencing at least two independent
clones or by sequencing twice from a single clone.
RNA sequencing and 5" end determination. RNA was sequenced
directly using a modified procedure of Mierendorf & Pfeffer (1987). In
a volumeof 10 ~tl, a mixture of 1 ~tgof viral RNA and 10 pmol of a 25met primer, which anneals between bases 66 and 90 at the 5' end,
was heated for 3 rain at 75 °C and allowed to cool to 42 °C. Two ~tl (16
units) of avian myeloblastosis virus reverse transcriptase (Promega)
and 4 ~tl of [ct-3zp]dATP(400 Ci/mmol) were added to the mixture. A
3 ~tl aliquot was removed and added to 3 Ixl of a solution containing
250 ~tM each of dCTP, dGTP and dTTP and one of the four
dideoxyNTPs (172 IxM-ddCTP, 15.3 ~tM-ddATP, 1 mM-ddTTPor 250
laM-ddGTP). The reactions were incubated at 42 °C for 15 min, after
which 1 ~tl of chase solution containing 2 mM each of all four dNTPs
and 2.5 units of terminal deoxynucleotidyl transferase (BRL) were
added, and the mixture was incubated at 42 °C for an additional 15
min.
Nucleotide sequence alignment and data analysis were performed,
compiled and analysed using sequence analysis software from the
Genetics Computer Group (version 6.0; Madison, Wis., U.S.A.) and
an IBM-compatibleprogramby W. R. Bottomley(CSIRO, Divisionof
Plant Industry, Canberra, Australia).
Results
Phenotype o f soybean plants inoculated with virus strains
Soybean lines PI 96983 and Williams '82 responded
differently to mechanical inoculation with strains SMV
G2 and G7 (Fig. 1). PI 96983, containing the resistance
gene Rsv, was resistant to strain G2, but systemic
necrosis developed in plants inoculated with strain G7.
Systemic mottling occurred when Williams '82, which
lacks the Rsv gene, was inoculated with either strain.
Nucleotide sequence analysis o f S M V strains G2 and G7
The SMV G2 and G7 c D N A inserts from overlapping
sets of 42 and 51 c D N A clones, respectively, were chosen
for nucleotide sequence analysis. These c D N A inserts
cover the entire genomes of G2 and G7 except the 5'most 27 and 25 (strains G2 and G7, respectively)
nucleotides, which were determined by direct R N A
sequencing.
Genome organization o f S M V R N A
The genomic R N A of both strains of SMV is 9588
nucleotides (nt) long, excluding the Y-terminal poly(A)
sequence (Fig. 2). This is comparable to the genomes of
T V M V (9471 nt; Domier et aL, 1986), T E V (9495 nt;
Allison et al., 1986), PVY (9704 nt; Robaglia et al., 1989)
and PPV (9741 nt; Maiss et al., 1989). The base
composition of both strains was 32% adenine, 24%
guanine, 18% cytosine and 26% uracil, in agreement
with previous observations for G2 (Hill & Benner,
1980b). The base composition is nearly identical to that
of TVMV (Domier et al., 1986).
Computer translation of the R N A s and their complements revealed a single, large open reading frame (ORF)
beginning at the first A U G on the genome (base 132) and
terminating with a U A A codon at position 9330. This
differs from PPV and T V M V which appear to initiate
translation of the polyprotein at the second and third
A U G codons, respectively, in the genome. The large
O R F of SMV encodes a 3066 amino acid polyprotein
with Mrs of 349542 (G2) or 349741 (G7) (Fig. 3).
Polyprotein cleavage sites
Based upon the proposed locations of cleavage sites, and
sizes of predicted mature (fully processed) proteins of
TEV, TVMV, PVY and PPV (Carrington et aL, 1989;
Domier et al., 1986; Dougherty & Parks, 1991; Dougherty et al., 1988 ; Ghabrial et al., 1990; Maiss et al., 1989;
Parks & Dougherty, 1991; Robaglia et al., 1989), and
based upon alignments of amino acid sequences for each
protein, nine mature proteins are predicted for SMV
(Fig. 4). At least five sites are cleaved by the nuclear
inclusion (NI) protein a (NIa) (27K) protease (Parks &
Dougherty, 1991). The consensus cleavage site for this
protease from SMV G6 is (E/N)XVXXQ'(G/S) (Ghabrial et al., 1990). [Amino acids in parentheses represent
alternatives at that position relative to the cleavage site
Nucleotide sequences of two S M V strains
2069
Fig. 1. Phenotypicresponsesof Williams '82 and P196983soybeansto inoculationwith strains G2 and G7 of SMV. Williams '82 plants
inoculated with (a) G2 and (b) G7 were susceptibleand showed systemicmosaic whereas PI 96983 was immune to (c) G2 but systemic
necrosis occurred in plants inoculated with (d) G7.
which is shown by t h e ' symbol. X represents any amino
acid.] All sites in Fig. 4 that contain a Q are those
predicted to be cleaved by the 27K protease. Cleavage
between amino acids 2041 and 2042 is a late event
separating the VPg (viral protein-genome linked) (21K)
from the protease (27K) (Dougherty & Parks, 1991).
Although this cleavage has been shown only for TEV,
these authors showed a consensus sequence of (E/Q)
(D/E/R)(L/V)XXE'(G/S/A)(E/K)(S/A)(L/V) at this site
among known potyviruses.
Carrington et aL (1989) identified the cleavage site
G ' G at the C terminus of the helper component which
catalyses its own cleavage from the polyprotein at this
site. Thus, this region is designated H C - P R O (helper
component-protease). The N-terminal protein (35K in
SMV) also serves as a protease to cleave itself from the
polyprotein (Verchot et al., 1991). A consensus of (Y/F)'S
has been reported by Mavankal & Rhoads (1991). Fulllength sequences that were published before this information was known used Q'S(G/S/A) (the NIa protease
consensus) as the cleavage site for all the mature
proteins. This led to some predictions of different
termini, which we have revised in the alignments of the
35K and 42K ORFs (Fig. 5).
2070
Ch. Jayaram, J. 1t. Hill and W. A. Miller
AAATTAAAACTCGTTATAAAGACAACAAACAATTTAATCGCAAACAGAAATTTTCGTAATTACATTTCTACAAGCAACCATTACTCTAGTTATTTGCAGTTTCACATTTCii0
CC
G
C
A
G C
C
T
CTCACAGCAATAGCAAGTCAAATGGCAACAATCATGATTGGAAGCATGGCGATTTCTGTGCCAAACACTCACGTCTCGCGCGCATCGAATTCTGTGATGCCGGTTCAAGC
C
C
A T
T
A
CT
A
G
220
AGTTCAGATGGCAAAACAAGTGCCTTCTGCTCGTGGGGTGTTATACACACTTAAGAGAGAGGGCAGCACGCAAGTCATAAAGCATGAGGAGGCACTGCG~AAATTTCAAG
G
A
A
A A
A
GCAT
T
330
AAGCATTCGACCAAGATGTTGGCATTCAGCGAAGGCTTCTAGTAAACAAGCATAGTTCCATACAATCCACAAAGGAAGGATGGTTTGACCTTGCGTCGCTTAACTTTAGA
A
G
G
T
A
A
440
GC•GGCTCGAGCAAAGAAGCGGCAATTGCAAGGCGAAAGCAAG•AGAGGAAGACTTTCTCAATGGGAAGTATGAACAGCAATTTTACGCTGGTGTTTCCGCT
ACAAAGTC
550
T
G T
A
CATGAAGTTTGAAGGAGGGAGTGTTGGGTTCAGAACAAAGTACTGGAGACCAACTCCAAAGAAGACTA•AGAAAGGCGTGCAAC•TCACAGTGTAGGAAACCAACATAT
A
GG
TTTTGGAGGAGGTTCTTTCCATAGCTTCAAAGAGTGGTAAGCTGGTTGAATTTATCACAGGCAAAGGGAAGAGAGTCAAAGTCTGTTATGTCCGTAAGCATGGCGCAATA
T
A
A
GA
G
660
770
T
TTGCCCAAGTTCTCCCTCCCGCATGAGGAAGGCAAATATATCCATCAGGAGCTTCAGTATGCAAGCACATATGAATTTCTTCCCTATATTTGCATGTTTGCAAAATATAA
A
T
A
T
A
G
T
T
880
GAGCATAAATGCGGATGATATAACTTATGGAGATAGTGGTTTACTGTTTGATGAGCGATCATCTTTAACCACAAATCACACTAAGTTACCGTACTTTGTTG
G
AT
A
990
GGAGGAATGGGAAGCTCGTTAACGCTCTTGAAGTGGTTGAAAACATGGAGGATATTCAGCACTACTCCCAAAATCCTGAAGCTCAGTTTTTCCGTGGTT
A
T C
G
TTTGATAAAATGCCTCCTCATGTGGRGAATCATGAATGCACCATTGATTTCACAAAT
TTCGGGGAA
A
GGAAA
G GTG
ii00
GAACAATGTGGTGAATTGGCAGCAGCAATAAGCCAATCAATCTTT C C A G T T A A
G
1210
GAAACTATCATGTAAGCAATGTCGGCAGCACATTAAGCAC CTCAGTTGGGAGGAGTATAAACAATTCCTCTTGGCTCATATGGGCTGTCATGGGGCTGAAT
G
AT
AC
TCCAAGAAAT T G A C G G C A T G A G G T A T G T G A A G A G A G T G A T T G A G A C A T C A A C T G C G G A A A A T G C A A G T C T G C A A A C A T C A C T
G
A
G
G
GGGAAACTT
A
1320
GGAGATTGTGCGTTTAACGCA.GAACTAT
C C
A
1430
AAGAGCACTCACATGCTTCAAATACAGGATATTAATAAGGCTCTGATGAAGGGTCCATCGGTAACACAGAGCGAGCTGGAGCAAGCGTCCAAGCAGCTGCTCGCAATGAC
AT
C
G
T
T G
1540
ACAGTGGTGGAAGAATCACATGGCTTTGACTGATGAAGATGCACTTAAAGTGTTCAGGAATAAGAGATCTTCCAAAGCACTACTTAAC
C
1650
CCAAGTTTACTTTGT GAT AACC
AGTTGGACAAGAATGGTAACTTTGTTTGGGGAGAGCGTGGCAGGCATTCAAAGCGATTCTTTGCGAATTATTTTGAAGAGGTGGTTCCTTCTGAAGGGTACAGCAAGTAT
T
A
1760
GTGATCAGAACGAATCCAAATGGGCAAAGGGAGTTGGCAATTGGGTCACTC
G A
C A
1870
ATTGTGCCGTTGGATTTTGAGCGCGCTCGAATGGCATTACAGGGCAAGAGC
A
A
C
AGAGCCAATTACAATGTCATGTATCTCAAGACAAGACGGAAACTTTGTGTATCCTTGTTGT
G
C
GTAACAAG
TGTGTCACACATGATGATGGCAAAGCTTTCTATTCTGAGCTCAAGAGTC
T
T
1980
CTACAAAGC G C C A C T T G G T T A T T G G A A C A T C T G G T G A C C C G A A A T A C A T T G A T C T A C CAGCCACTGATGCAGACAGGATGTACATAGCTAAAGAAGGATTTTGTTAC CTT
T
A
T
2090
AAT • T C T T C T T G G C A A T G T T G G T T A A T G T A A A T G A A G A T G A G G C C A A A G A C T T C A C G A A G A T G G T A A G G G A T G T C A T T G T A C C A A G G T T A G G A A A G T G G C C G A C A A T G T T
C
T
T
G
T
2200
AGATGTAG C A A C A G C T G C A T A C A T G C T C A C A G T T T T T C A C C C T G A A A C C A G G A A T G C T G A G C T C C C A C G T A T T T T G G T T G A C C A T G
C
A
C
C
C
2310
CGTGTCAAAC CATG C A C G T G A T T G
ACTCTTTTGGATCCTTGACAGTTGGGTACCATGTTCTTAAAGCTGGTACAGTGAATCAATTAATTCAATTTGCTTCTAATGACCTTCAGAGTGAGATGAAATTCTACAGA
T
C
G
C
T
2420
GTTGGTGGTGAAGTGCAACAGAGAATGAAGTGTGAAACAGCACTTATAACAAGCATTTTCAAACCTAAGAGAATGATTCAAATCCTT
G
G
C
T
G
G
2530
GAAAATGACCCATACATTCTCTT
GATGGGCTT GGTTTCAC CTTCTATCTTGATTCACATGTATCGTATGAAGCATTTTGAGAAAGGGGTGGAGTTGTGGATAAGTAAAGAACATAGT
G
G
GTGGCAAAGATTTTCA
TCATATTGGGACAACTCACTAAGAGGGTCGCTGCAAATGATGTGCTACTTGAGCAACTCGAAATGAT•TCAGAAACTTCTGAGAGGTTCATGAGTAT
A
T C
A
T
T
G
CT TAGAGGATTG C
CCTCAAGCACCACATTCATACAAGACAGCAAAAGATTTGTTGACAATGTACATAGAAGGAAAAGCATCCAACAACCAATTGGTGGAGAATGGTTTTGTAGATATGAATGA
T
C
T
A
CAAATTGTACATGGCATATGAAAAAATCTACTCAGATCGCTTGAAGCAGGAATGGCGCGCATTAAGCTGGTTGGAAAAATTTTCTATAACATGGCAATTGAAAAGATTT
2640
2750
2860
G
A
2970
CTCCACATACGGAGAAATGTTTGACAAAGAAAGTTGTAGAAGAAAGCAGCGCATCTTCAGGAAACTTTGCGAGTGTGTGCTTCATGAATGcCCAGTCACACCTAAGAAAT
A
C
A
C
3080
GTAAGAAATACACTTTTCCAAAAATGTGACCAGGTTTGGACTGCATCGGTGCGAGCCTTTGTGAGGCTCATAATTTCAACACTTCACAGGTGCTACAGTGATATAGTTTA
A
T
C
C
C
3190
TCTGGTAAACATCTGTATAATCTTTTCCTTGCTTGTCCAAATGACTAGTGTACTGCAGGGCATTGTCAACACAGCAAGGAGAGACAAAGCACTCTTAAGTGGATGGAAAA
CT A
G
C A A
T
A T
A
T
G
G
GCA A TCAT
G
3300
GGAAAGAAGATGAAGAGGCCGTGATTCATTTGTAT GAAATGTGTGAAAAGATGGAAGGTGGACATCCAAGTATTGAGAAATTTTTGGACCATGTCAAGG
A G
T
G
G C
G
AGG
A
G
GAGTTAGAC CT
A
3410
G A T C T A C T C C C T G T G G C A G T A A G C A T G A C A G G G C A A T C A G A A G A T G T C T CCGCACAGGCCAAAACAGCAACTCAATTGCAACTTGAGAAAATTGT G G C A T T T A T G G C T T T
G TT
A
AGG A
A
A
T
T
G
C
3520
GTTGACCATGTGTATTGATAATGAAAGGAGTGATGCGGTTTTCAAAGTATTGAGCAAGTTAAAGGCATTTTTCAGCACAATGGGTGAGGATGTTAAAGTGCAGAGTCTTG
A
C
A
C G
G
G
3630
Nucleotide sequences of two S M V strains
ATGAA/~TT CA~/~GCATT GAT G ~ G A T A A G ~ G C
G
G A
C
AATAGACAGT TGGAACAGAATAGAGTAATTC
AC G
TCACAATTGAT TTCCACCTTGAAAC~TAAGGAGT
CTTC C A G T G T C T C T T T T G A T G T C A A G T T T G A G G C C T G G T G G
G
A
A
T
T
A
AA
CACACTACAGGTCGACAGGTGAGTTT
T
CT G G A G T T C A C A C G A G A A A C A G C A G C C A A A A T T G C A A A T T T GGT A G C A A C A T C
A
A
G
G
3740
3850
AAGCCACACAGAATTTTTGATTAGAGGTGCAGTTGC~TTCAGGGAAATCAACAGGTTTACCACACCACCTTTCAAGGAAGGGCAAAGTTCTGCTACTGGAACCAACTAGAC
C
C
G
T
A
T
T
T T
G
3960
CGTTAGCGGAGAATGTCAGTAAGCAGTTGAGCTTTGAACCTTTCTATCACAATGTAACATTGAGGATGAGAGGATTGAGCAAGTTTGGCTCAAGCAACATAGTTGTTATG
C
T
T
TC
T
4070
ACAAGTGGATTTGCGTTCCATTACTACTTTAACAATCCACAACAGCTATCTGATTTCGATTTTATCATAATAGATGAATGCCATGTTCAAGATAGCCCAACGATTGCATT
G
T
TT
C
4180
C A A C T GT GC GC T T A A A G A A T T T G A A T T C A G T G G C A A G C T T A T A A A A G T G T C
C A
C
G
G
T
C
4290
TGCAACGACTCCAGGGAGAGAGTGC
C
G
G
A
G A A T T C A C A A C GCAAC AC CC G G T G A A G CT G A A A G
A
A
AT A
TTGAAGACCATTTGTCTTTTCAGAACTTTGTGCAAGCTC•AGGTACAGGATCAAATGCTGATATGATCCAACATGGGAACAACTTACTTGTATATGTT•CAAGCTACAAT 4400
C
C
C
A
T G
T T T
G A A G T T G A C CAAT TGT CAC G A T T A T T A A C T G A G A A A C A T
A
TATAAGGTGACAAAGGTTGATGGGAGAACAATGCAAATGG
C
C
G
GAAAT GTAGAGATT G CAACCACAGGCAC CGA
A
A
4510
GGT~AAACCACACTTCATAGTCGCAACAAACATCATTGA~AATGGAGTGACTCTTGATATTGATTGCGTAATTGATTTTGGACTTAAAGTGGT~C~CTACCCTTGACACAG 4620
GA
G
T
G
C
T G
C
T
ATAACCGGT~T~TGCGTTACAACAAACAGTCAGTTTCCTATGGAGAGCGAATTCAAAGACTTGGCAGA~TTGGTCGTTGTAAACCTGGATTTGCGCTCA~ATTGGACAC
C
C
A
T
C
C
G G
A
T
G
4730
ACAGGAAAAGGAGTTGAGGAAGTTCCCGAGTTCATAGCTACAGAGGCAGCTTTTCTATCCTTTGCTTATGGGTTGCCAGTTACAACACAAAGTGTCTCGACCAATATACT
G
A
A
C
CC
A
A
C
T
4840
GTCCCGTTGCACAGT•AAACAAGCTCGAGTAGCTCTAAATTTTGAGCTAACTCCATTTTTCACCACTAATTTCATAAAGTATGATGGTAGCATGCAC•TGATTGACAcAA
G
A
T
G
C
CCAGAGAT
CAC
4950
GACTGCTCAAGTCCTATAAACTCAGGGAGTCTGAGATGTTGCTGACCAAGTTAGCCATACCATATCAGTTTGTTGGGCAGTGGGTAACAGTcAAG•AGTATGAACGTCAA
T
C
C
T A
A
A
A
T
5060
G
GGTATCCACCTCAATTGTCCAGAGAAAGTGAAAATACCTTTCTATGTGCATGGAATACCA~ACAAGTT~TATGAGATGTTGTGG~ACACAGTTTGTAAATACAA~AATGA
G T
g
T
5170
TGCTGGGTTCGGCTCAGTCAAGAGTGTGAATGCAACGAA•ATTAGTTACACTCTAAGCACTGACCCAACAGCAATTCCTCGCACACTTGCAATACTGGATCATTTGTTGA
T
C
5280
A
GTGAGGAGATGAC CAAGAAGAGTCATTTTGACACAATTGGCTCTGCTGTCACTG~GTATTCCTTTTCTCTTGCAGGCATAGCTGATG~qATTTAGGAAGAGGTATTTAAAG
A
T
A
C
C
C G G
5390
GACTACACACAGCATAATATAGCCGTTTTACAACAGGCTAAAGCACAGTTGCTG~AATTTGATTGCAACAAAGTTGACATCAACAACCTGCACAATGTTGAG~GTATAGG
C
T A CC
AC
A
5500
CATTTTAAATGCAGTCCAACTACAGAGCAAGCATGAAGT•AGTAAATTTTTGCAGCTCAAAGGAAAGTGGGATGGGAAGAAATTCAT•AATGATGCTGTCGTGGCTATCT
G
T
A
G
A
A
A
T
5610
TCACTTTAGTG•••GGTGGTT•GATGTTATGGGATTACTTCACAAGAGTTATAC•TGAACCAGTATCAACTCAAGGAAAGAAGAGGCAGATACAAAAACTCAAATTTAGA
G
C G
GATGCCTTTCACAGAAAAATAGGC
T
G
CGTGAGGTGTATGCAGATGACTACACCATGGAACACAGGTTTGGGGAGGCATATACCAAGAAA•GAAAGCA•AAGGGTAGCACCC•
C
g
CC
C
C
A
5720
G
5830
T
TACAAAAGGAATGGGTCGCAAGTCGAGGAACTTCATACATCTATATGGAGTTGAGCCAGAGAATTATAGCATGATTAGATTTGTAGACCCGCTAACTGGACATACAATGG
C
G
A
T
CT G
C
C
T
T
C
5940
ATGAACACCCCAGAGTTGATATTAGAATGGTTCAACAAGAGTTTGAGGAGATAAGGAAAGACATGATTGGGC,AGGGTGAATTGGATCGGCAAAGAGTCTACCACAATCCT 6050
G
C
G
G
C
A
C
G G T T T A C A A G C T T A T T T CAT T G G G A A G A A T A C A G A G G A A G C A C T C A A G G T T G A C C T C A C A C
C G C A C A G A C C C A C A C T T C TCT G C C A A A A C A G C A A T G C T A T AG C G G G T T T
A
6160
TCCTGAGAGGGAGGATGAATTGCGTCAGACAGGATTGCCACAAGTAGTTTCCAAGTCAGACGTCCCACGTGCCAAAGAAAGGGTTGAAATGGAAAGCAAATCTGTTTACA 6270
C
G
A
T
A
G
C
AAGGACTCAGAGATTATAGTG
G
T
G CATTTCCACACTAATAT GTCAAC TTACAAATTCATCAGATGGGCACAAAGAAACAATGTTT
T
C G
G
T
ACAAAT GGACACTT GTTTAGAAGGAACAACGGAATGCTTACAGTTAAGACAT
T
C
T
C
G GGGTT GGCTAT GGTTCTTTCATTATC
C
T
6380
GGCATGGTGAGTTTGT GATACACAACACAACACAGCT CAAGATACATTTTATTCAAGG
6490
G A G G G A T G T G A T T T T G A T T C G C A T G C C A A A G G A C T T ~ C C T C CTT,fT G G A A A G C G C A A C C T C T T T A G A C A A C C A A A G C G T G A G G A A C
A
C
T
A
C
A
T
TCCAAGAGAAGAGCTT G CGCGCAACAGTTTC GGAATCTTCCATGA~ ATTGCCAGAGGGGAAAGGTTCTTTCTGGATACACTGGAT
T
A
T
G
G C
T
TTGCCT CTTGTTTCTGTTAATGATGGGCACATTGTTGGAATACAT
C
GGGTTTGTAT GGTTGGGACAAACT
T
6600
CACAACC CAAGAT GGTTTTTGT GGG
g
C
6710
GGATTAACATCTAATGATTCAGAGAAGAACTTCTTCGTCCCACTCAC
C
A
ATATCT GGAGAATGCT GATAACTTGTCATGGGATAAGCATTGGTTTTGGGAAC
A
C
CAAGCAAGATAGCATGGGGCTCTTTGAATTTAGTCGAGGAACAAC
C
T
TCAAAATATCAAAGCTT•TGTCGGATCTCTTT•GAAACACAGTGACAGTTCAAGGGA•AAA•GAAAGATGGGTTTTGGAT
A
T
A
A
G
CAAAAGAGGAAT
G
GCAATGGAAGGTAACTTAGCGGCTTGT
C
T
CAAGAC GACAGTGCACTGGTAACAAAGCATGTTGTTAAAGGAAAGTGCCCCTATTTCGCACAATATCTTTCAGTGAATCAAGAAGCAAAGTC
C
T
T
G
GGGTGC GTATCAAC CAAGCC GATTGAACAAAGATGCATTCAAAC
A
T GATGGGT TCGAGAAGGA
T
A
GAGGTTTCTTCAAATATAACAAACCAGTTGTTCTGAATGAAGTT
AC
G
T
GGG
6820
6930
7040
CTTCTTCGAACCACTTAT
T
7150
GATTTCCAAT CTTTTGAGAGGG
T
G
C
A
7260
2071
Ch. Jayaram, J. H. Hill and W. A. Miller
2072
CAGTGGCTGGAGTGAAATTGATGATGATGGAATTTGATTTCAAGGAGTGTGTGTATGTGACTGATCCTGATGAGATATATGACTCCTTGAATATGAAAGCTGCAGTTGGT
7370
GCACAATACAAAGGGAAGAAGCAAGATTATTT
CT CT GGAAT G G A C A G T T T T G A C A A G GAAC G C T T G C T T T A T C T C A G T T G C G~d%AGGT T A T T T T A T G G G G A A A A A G G A G T
C
C
C
C
7480
GTGGAATGGATCCCTGAAAGCAGAGCTAAGGC
T G
CAAT T G A A A A A G T G C A A G C A A A C A A A
7590
GTGTTGATGATTTCAACAACCAATTTTACAGC
T
C TAGAACATTCACAGCAGCAC CAATT GACACATTACTTGGAGCAAAAGTTT
C G
T
C
G
CT CAAT C T T A C A T G T C C A T G G A C A G T T G G G A T G A C C A A A T T T T A T A G A G G T T G G G A T A A G T T G A T G A G A A G T T T A C
C
C
G
GATGGATGGGTGTACTGTCATGCAGATGGTTCACAGTTTGATAGCTCCC
G
T
C
A
T
CC
T
T GACGCCCTTACTACTGAATGCAGTTCTTGATGTTAGGAGCTTTTTCATGGAAGACTGGTG
A
A
T G
G
7700
7810
GGTTGGAAGAGAAATGCTAGAGAACCTCTATGCTGAAATAGTCTACACACCAATTCTAGCACCTGATGGCACAATTTTTAAGAAGTTCAGAGGAAACAACAGTGGGCAAC
G
A
G
T
T
C
7920
CAT CCACAGTTGTGGACAATACCTTGATGGTAG•CATTGCCCTGTACTATTCTGGTTGTAAACAAGGGTGGTCAGAGGAGGACATTCAGGAAAGATTAGTGTTTTTCGCC
T
C
G
T
G
8030
A A T G G C G A T G A C A T CAT T CTT G C A G T T AGT G A T A A G G A C KCAT G G C T T T AT G A C A C T CTT A G C A C T T C A T T T G C T G A A C T T G G T C T C A A T T A C A A C T T T G A G G A A C G G A C
T
C
G
G
C AG
G C
C
C
8140
AAAGAAAAGGGAGGAATTGTGGTTCATGTCCCACAAAGCCATGTTAGTTGATGGAATTTATATTCCAAAACTTGAGCCTGAGAGAATTGTCTCTATCCTAGAGTGGGACA
A
C
C
A
8250
GGAGCAAAGAGCTTATGCATCGCACTGAGGCGATATGCGCATCAATGATTGAGGCAT•GGGATACACTGAATTGCTGCAGGAGATCCGCAAATTTTATTTGTGGCTTTTG
G
8360
C A
A A C A A G G A T G A A T T T A A G G A G C T C G C T T C GTC T G G A A A A G CAC C A T A T A T T G C A G A G A C A G C TT T G A G A A A G C T A T A C A C A G A T G T C A A T GC G C A A A C A A G T G A G C T A C A
A
T
G
T G
8470
AAGATATCTTGAAGTGCTGGATTTCACTCATGCTGATGACTGTTGTGAATCAGTGTCCTTACAATCAGGCAAGGAGAAGGAAGGAGATATGGATGCAGGTAAGGATCCAA
A
8580
AGAAGAGCACCAGTAGTAGTAAGGGAGCTGGCACAA•CAGCAAAGATGTAAATGTTGGATCAAAGGGAAAGGTGGTTCCGCGTTTGCAGAAGATCACAAGGAAGATGAAT
C
C A
T
T
A
8690
CTTCCAATGGTTGAAGG~AAGATCATCCTCAGTTT~GACCACTT~CTTGAGTACAAACCTAATCAGGTTGATTTATTCAACACTCGAGCAAC~n~GAACACAGTTCGAAGC
T
T
C
C
T
8800
GTGGTACAATGCAGTT •AAGATGAATATGAGCTTGACGATGA•CAGATGGGTGTGGTTATGAATGGCTTCATGGTATGGTGCATTGACAATGGTACATCTCCAGATGCTA
T
A
8910
ATGGCGTGTG•GTGATGATGGATGGAGAGGAACAGATTGAATATCCGCTGAAACCCATTGTC•AAAATGCAAAACCAACTTTGAGACAAATCATGCACCATTTCTCAGAT.
9020
GCAGCAG~GCTTACATTGAGATGAGAAATTCTGAAAGTCCGTATATGCCTAGATATGGACTACTGAGG~TTTGAGAGATAGAGAGCTAGCTCGCTATGCTTTTGATTT
T
A
G A A
T
C
CTATGAGGTTACTTCTAAAACACCAAACAGGGC~GGG~GC~TAGCGCAGATG~GGCTGCAGCTCTCTCGGGAGTT~C~C~GTTGTTT~GACTTGATGGG~CA
C
AA
9130
C
9240
AT
TCTC~CC~CT~CGAAAATACTGAAAGGCACACTGC~GGGATGTG~TCAAAACATGCACACTCTTTTG~CATGGGCCCACCGCAGT~TAAAGGCT~GTAAATTG9350
C
A
9460
GTCACAGTTATCATTTCGGGTCGCTTTATAGTTTACTAT~TATAGTAGTTGCACTGTCTTTAAATATAGT~GATTGCATCACCAAAT~TGTTTGTGTTTAGTGTGGT
C
G
T
G G
AC
T
TTT~CCACCCCAGTGTGCTTTATGTTATAGTTTATG~TGCCAGCGAG~CCATTGTGTTGCCGGAGCCCTTTG~GAGTGATTTCATCACCTCTAGTGGCCGAGGTGC
T
A
A T
C
CT
G T
GGC~TGTTTGTTGTCCT
9570
A
9588
Fig. 2. Nucleotide sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Bases of the G7 sequence that
differ from G2 are shown below the G2 sequence.
Table 1. Percentage amino acid sequence identity of
predicted mature proteins of SMV strain G7 with those of
other potyviruses including SMV strain G2
SMV G7*
Virus
G2
PPV
TVMV
PVY
TEV
35K HC-PRO 42K CIP 6K 21K 27K POL
94
14
6
13
9
98
44
45
37
43
94
29
13
17
25
96
72
51
52
51
100
32
32
32
45
99
53
45
45
46
* Abbreviations are as designated in the text.
99
48
47
35
34
98
60
54
56
55
CP
99
51
52
64
60
Comparison of mature proteins with those of other
potyviruses
T h e nucleotide and a m i n o acid s e q u e n c e s of S M V G 2
and G 7 were 94% and 97% identical, respectively, w i t h
c h a n g e s m o r e pronounced in the 5' region (Tables 1 and
2). Based on the proposed g e n o m i c m a p o f S M V , an
a m i n o acid sequence c o m p a r i s o n o f strain G 7 w i t h T E V ,
T V M V , P V Y , P P V and strain G 2 (Table 1) s h o w s that
the m o s t c o n s e r v e d regions a m o n g the five potyviruses
are the cylindrical inclusion protein (CIP), the putative
R N A - d e p e n d e n t R N A p o l y m e r a s e (POL; Robaglia et
al., 1989) and the coat protein (CP). POL, C I P and 2 1 K
2073
Nucleotide sequences of two S M V strains
M A T I M I G S M A I SVP NT HV S RAS N S V M P V Q A V Q M A K Q V P S A R G V L Y T L K R E G S T Q V I K H E E A L R K F Q E A F D Q D V G I Q R R L L V N K H S S I Q S T K E G W F D LAS LNF RAG S S K E A A I A R R K Q E E E
L
I C
T
N
H
S
G
E N
120
DFLNGKYEQQFYAGVSATKSMKFEGGSVGFRTKYWRPTPKKTKERRATSQCRKPTYVLEEVLS IASKSGKLVEF I T G K G K R V K V C Y V R K H G A I LP KFSLP H E E G K Y I H Q E L Q Y A S T Y E F L
240
N
D
T
A
S
I
I(
P Y I C M F A K Y K S INADD I TYGD SGLLFD ERSS LTTNHTKLP Y F V V R G R R N G K L V N A L E V V E N M E D I QHYSQNP EAQFF RGWKKVFD KMP P HVENHECT ID F T N E Q C G E L A A A I SQSI FPVK
S
S I
K
360
KLSCKQCRQHI K H L S W E E Y K Q F LLAHMGC H G A E W E T F Q E IDGMRYVKRVI ET STAENAS LQT SLE IVRLTQNY KSTHMLQ IQD I N K A L M K G P SVTQ S E LEQAS K Q L L A M T Q W W K N H M A L T
480
N
T
K
K
D
N
R
D E D A L K V F R N K R S S K A L L N P S L L C D N Q L D K N G N F V W G ERGRHS KRF FANYF EEVVP SEGYSKYVI RTN PNGQRELAIG SLIVP LD F E R A R M A L Q G K S V T RE P ITMSCI S R Q D G N F V Y P C C
K
N
600
C V T H D D G K A F Y S ELKS PT K R H L V I G T S G D P KY I DLP ATDAD RMY I A K E G F C Y L N I F L A M L V N V N EDEAKD FTKMVRDV IVP RLGKWP T M L D V A T A A Y M L T V F H P ETRNAELP RI LVD HAC
M
F
720
QTMHVID SFG S L T V G Y H V L K A G T V N Q L I Q F A S N D L Q S EMKFY RVGG E V Q Q R M K C E T A L I T S IFKP KRMIQI LENDPYI LLMGLVS P S ILI HMYRMKHF E KGVE LWI S KEHSVAKI FI I LG
I V
E
840
Q L T K R V A A N D V L L E Q L E M I S E T S E R F M S I LEDCPQAP H•YKTAKDLLTMYIEGKASNNQLVENGFVDMNDKLYMAYEKIY•DRLKQ•WRALSWLEKFSITWQLKRFAPHTEKCLTKKVVE
S
~
T
A
960
ESSASSGNFASVCFMNAQSHLRNVRNTLFQKCDQVWTASVRAFVRLIISTLHRCYSDIVYLVNICI I FS L L V Q M T S V L ~ G I V N T A R R D K A L L S G W K R K E D EEAVI H L Y E M C E K M E G G H P S
1080
N
F
I
HSH
R
I EKF LDHVKGVRP D L L P V A V S M T G Q S E D V S A Q A K T A T Q L Q L E K I V A F M A L L T M C I D N E R S D A V F K V L S K L K A F F STMGEDVKVQS LD E I QS I D E D K K L T I D F D L E T N K E S S S V S F D V K F E
V G N R
S RN
V
V
N
1200
A W W N R Q L E Q N RVI P H Y R S T G E F L E FTRETAAKI ~ M L V A T S S HT EFLI RGAVGS G KSTGLP HHLS RKG KVLLLEP TRP LAENVSKQLS FE P FYHNVT LRMRGLS KFG S SNI VVMT SGFAFH
D
Q
K
L
1320
Y Y F N N P Q Q L S D F D F I I I D ECHVQD S PT IAFNCALKE F EF S GKLI KVSATT PG RECE F T T Q H P V K L K V E D HLS F Q N F V Q A Q G T G S N A D M I Q H G N N L L V Y V A S Y N E V D Q L S R L L T E KHYKVT
V
S
L
P
Q
1440
K V D G R T M Q M G N V E IATTGTEVKP HF IVATN I I ENGVT LD I DCVI D F G L K V V A T L D T D N R C V R Y N K Q S V S Y G E RI QRLGRVGRC KP G FALRI G H T G K G V E E V P E FIAT EAAFLS FAYG LPV
G
L
V
I
1560
T T Q S V S T N I LS R C T V K Q A R V A L N F E L T P F FTTNF I K Y D G S M H V I DT RLLK SYKLRE S EMLLT KLAI P Y Q F V G Q W V T V K E Y E R Q G I HLNC P EKVKI P FYVHG I P D KLYF~MLWDTVCKY KND
PEIH
P
I
V
V
1680
X
A G F G S V K S V N A T K I SYTLSTDPTAI P RTLAI L D H L L S E E M T K K S H F D T I G S A V T G Y S F S L A G I A D G F R K R Y L K D Y T Q H N I A V L Q Q A K A Q L L E F D C N K V D I N N L H N V E G I G I
S
R
VI
LNAVQLQSK
1800
HEVS K F L Q L K G K W D G K K F M N D A V V A I F T L V G G G W M L W D Y F T R V I R E P V S T Q G K K R Q I Q K L K F R D A F D RKI G R E V Y A D D Y T M E H R F G E A Y T KKGKQ KG S T R T K G M G R K S RNF I HLYGVEP E
V
T
1920
NYSMI R F V D P L T G H T M D E H P RVD I RMVQQEFEEI R K D M I G E G E L D R Q R V Y H N P G L Q A Y F I G K N T E E A L K V D L T P HRPTLLCQNSNAI AGFP E R E D E L R Q T G L P Q W S KSDVP R A K E R V E M
2040
k
E S K S V Y K G L R D Y S G I STL I C Q L T N S S D G H K E T M F G V G Y G S F I I T N G H L F R R N N G M L T V K T W H G E F V I HNTTQLKI HFIQGRDVI LI ?~MpKD FP p FG KRNLFRQP KEg E R V C M V G T N F Q E K
I
K
2160
SLRATVSESSMI LP£GKGSFWI
HWITTQDGFCGLPLV••NDGH•VG•HGLTSND•EKNF•VPLTDGFEKEYLENADNLSWDKHWFWEP•KIAWG•LNL•EEQPKEEFKI•KLV•DLFG•T
2280
V T V Q G R K E R W V L D A M E G N L A A C G Q D D S A L V T K H V V K G K C P y F A Q Y L S V N Q E A K S F FE p LMGAYQP S RLN KDAFKRGFF KYN K P W L N EVDFQS F g R A V A G V K L M M M E F D F K E C V Y V T D P D
K
V
A
D
A K
2400
E I YDS L N M K A A V G A Q Y K g K K Q D Y F S G M D S FD KE RLLY LS C E R L F Y G E K G V W N G S LKAELRP I E K V Q A N K T R T F T A A P I D T L L G A K V C V D D FNNQFYS L N L T C P W T V G M T K F Y R G W D KLMR
2520
SLPDGWVYCHADGSQFDSSLTPLLLNAVLDVRSFFMEDi~dVGREMLENLYA~IVYTPILAPDGTIFKKFRGNN~GQPSTVV~NTL~IALYYSGCKQGW~EEDIQERLVFF/UqGDDIIL 2640
V
C
AVSDKDTWLYDTLSTSFAELGLNYNFEERTKKREELWFMSHKANLVDGIY~PKLEPERIVSILEhq)RSKEI24HRTEAICASMIEAWGYTELLQEIRKFYLWLLNKDEFKELASSGK~PYI 2760
E
K
Q
A
X
A E T A L R K L Y T D V N A Q T S E L Q R Y L E V L D FTHADDCC E SVS LQSGKE KEGDMDAGKD P KKST S S S KGAGT S S KDVNVG S KGKVVP RLQKI T RKM~LPMVEG K I I LS LD HLLEYKP NQVD L FN
E
2880
T PAT RTQF E A W Y N A V KD EY E L D D E Q M G V V M N G F MVWC I D NGT S P DAN GVWVMMD G E E Q I E Y P L KP IV ENAKP T L RQ I MHH F S D AA FAY I EM~N S E S P YMP R YG L L RN L RD RE LARYAFD F
3000
YEVT S KTPN P A R E A I A Q M K A A A L S G V N N K L F G L D G N I STN SENT E R H T A R D V N Q N M H T L L G M G p p Q
I
Q
3066
Fig. 3. Deduced amino acid sequences of SMV strains G2 and G7. The full sequence of the G2 strain is shown. Amino acids of the G7
sequence that differ from G2 are shown below the G2 sequence. Scissors indicate predicted cleavage sites.
proteins show more similarity to homologous proteins of
PPV than to those of the other potyviruses. In contrast,
SMV CP shows greater similarity to PVY and TEV CP.
Overall, SMV is most similar to PPV.
The POL protein of SMV is analogous to the NIb of
TEV (Dougherty & Parks, 1991), but nuclear inclusions
are not evident in SMV-infected cells (Edwardson &
Christie, 1986). The POL protein was identified as the
polymerase because it contains the conserved sequence
GX~TXXXN(X)¢2o.4o>GDD at amino acids 2595 to
2637. This fits the consensus of virtually all known RNAdependent R N A polymerases (Kamer & Argos, 1984).
2074
Ch. Jayaram, J. H. Hill and W. A. Miller
The 21K and the 27K proteins of SMV are analogous
to the NIa protein of TEV and, by comparison with TEV
(Dougherty & Parks, 1991), consist of a VPg and a
protein processing activity, respectively. The tripeptide
GRD (2120 to 2122 amino acid position; Fig. 3) in the
QS QG
%
YS
cG
0s
308
765
1164
/
\ /
135K I.c Rol 42K I
zs
QG
J[21 127 I
/6K\2041 2284
1798 1852
Qs
I
I
2801 3066
Fig. 4. Proposed map of SMV polyprotein. The amino acids between
which cleavage occurs and their position in the genome are shown
above and below the map, respectively,
27K protein is conserved in TVMV, TEV, PVY and
PPV (Domier et al., 1986; Allison et al., 1986; Robaglia
et al., 1989; Maiss et al., 1989), with the aspartic acid
residue predicted as the active site (Parks & Dougherty,
1991). This tripeptide is also conserved in strain G2. In
strain G7, however, the arginine residue at amino acid
position 2121 is changed to lysine (GKD).
The CIP protein of SMV shares conserved domains
with a group of proteins believed to be helicases
(Company et al., 1991 ; Koonin, 1991), including the P80
protein of bovine viral diarrhoea virus, dengue 4 virus
non-structural protein 3, mammalian translation initiation factor 4A and PPV CIP [recently shown to have
35K
G2
PPV
PVY
TEV
TVMV
MaTImi
GS
MSTIvf
GS
MSTIcf
GS
MalIfgtvnanilkevfGa
MSTI
G2
PPV
PVY
TEV
TVMV
282
280
256
276
228
FvVRGrrnGkLVnA
FIVRGkhnsiLVDs
FIVRGsheGkLyDA
FIVRGrskGmLVDA
¥1VRGtcddsLear
08
08
08
19
04
------
185
181
157
179
130
XAsksgklVEfItgK
mAkangqkVEiIgrK
ImsekrgsVhlIsKK
IvrkrhmqVEiIsKK
IAkasslrVEvIhKK
295
293
269
289
242
------
306
306
282
302
254
qHY
iHY
iqf
tHY
tHf
199
195
171
193
144
------
260
257
233
254
205
G D S G llf
G m S G fvv
GDSG viL
G s S G ivL
GDSGIvlL
266
263
239
260
212
308
308
284
304
256
42K
G2
PPV
PVY
TEV
TVMV
Gev
q
qrmkceta
LItSIFKPkrMiQiLEndPYiLlmglVSPsILihMYrmkhFEkgvelWIske
GeV
dkcdefknvKl
LIrSIyKPqiMeQvLkEEPYLLImsvlSPGvLMAIfNSGslEKAtqyWItrs
Gvi
m
sesaalKl
LlkgIFrPkvMrQLLIdEPYLLilsilSPGILMAMYNyGiFEIAvrlWInek
G M
n
rdwtqgaiemLIkSIiKPh/MkQLLEEEPYIivlaiVSPSILiAMYNSGtFEqAIqnWIpnt
GiVysen
ndasavKa
LtqaIFrPdvlseLiEKEPYLmvfalVSPGILMAMsNSGalEfgiskWIssd
64
66
64
66
67
G2
PPV
PVY
TEV
TVMV
HSvAkIfiILgqLtkrVaandvLleQlemlsetserfmsiLeDcpQaphSYktAkdlLtmyiegkasnnqLv
HSLAaItSmLSALAaKVSiAstLnaQmsvldehAavLyDsvfgGtQpyaSYmmAvktLermkartEsDhtLn
qSiAmIASiLSALAlrVSaAetLvaQriiIdaaAtDLiDatcDGfnlhltYptAlmvLqwknrnEcDdTLf
MrLAnlAaILSALAwKltlAdlfvqQrnlIneyAqvilDnLiDGvrvnhSlslAmeivtiklatqEmkmaLr
HSLvrmASILktLAsKVSvAdtLalQkhimrqnanfLcgeLinGfQkkkSYthAtrfLlmiseenEmDdpvl
136
138
136
138
139
G2
PPV
PVY
TEV
TVMV
enGFvdmndklymayEKiYsdrLkqeWraLSWiEKF
SitwqlkrfapHte KcLtkKvveessassgnfas
dlGFsVlrqatphlvEKsYLqeLeqAWkeLSWsEKF
SailesqrwrkHip KpfipK
Dgadlggrvdi
kaGFpsyntsvvqimEKnYLnlLndAWkdLtWrEny
p
qhgtHteqnaLstryik
ptekadlkg
egGyaVtsekvhemlEKnYvkaLkdAWdeLtWiEKF
Sairhsrkllkfgr KpLimKntvDcgghidlsvk
naGyrVleassheimEKtYLalLetsWsdLSlygKFkSiwftrkhfgryka
eLfpKeqtDlqgrysnslr
206
205
199
208
209
G2
PPV
PVY
TEV
TVMV
vcfmnaqshL
rn
vrntlfqkcdqvwtasvrafvrli
IstlhrcysdivylvniciifS
svrsllgnqy
Kr
irdvvrwkrddvvcytyqsmgklfckalgispsfLPstlkmldmLiVfS
lynispqafLgrsaqvvKg
rasglserfnnyfntkcvnissff
IrrifrrLPtfvtFvnsLlViS
slfkfhlelL
Kh
tisravndcggarkvrvaknamtkgn
flkiysnLPdvykFitvssViS
fhyqstlkrL
rnkgslcrerflesissarrrttca
v
fsllhkafPdvlkFintLvivS
264
266
264
268
267
G2
PPV
PVY
TEV
TVMV
LLvqmtsvlqgIvntaRrdKallsgwkrKedE
eavihLY emceKmEgghPsiekFLdhvkgvrPdllpvA
LLlsigatcnsmvneHkhlKqlaAdredKkrf
krlqvLYtrlseKvgct PtadEFLEYvgdenPdllkhA
mLtsvvavcqaIIldqRkyrreielmqieknE
ivcmeLYaslqrKLErd FtwdEyiEYlksvnPqivqfA
LLltflfqidcmlraHReaK
vAawlaKesEwdniinrt fqysKLEnpigyrstaeErlqsehPeafey
LsmqiyymlvaIIheHRaaKiksAqleervlE
dktmlLYddfkaKLpeg
sfeEFLEYtrqrdkey
ve
334
336
334
335
334
G2
PPV
PVY
TEV
TVMV
vsmt g q s E D V s aQaKt a t q l q L E k I V A F m A L I t M c i D n E R S D a V F K v L s K I K a f f S t m g e d V k v Q
e d i i g d g q v V v h Q s K r d s q a nLE r vVAF v A L V m M I F D s E R S D g V y K I L N K I K G IMG S vd r a V h h Q
q a q m e e y D V r hQr s t p w k n L E q v V A F m A L V i M v F D a E R S D C V F K t L N K F K G c L S S idyeVr h Q
yk f c igkED i v e Q a K q p e iay fEk IiAFit L V l M a F D a E R S D C V F K I L N K F K G I L S S t e re iivQ
ylm~met t E lye fQAKnt gqa s LE r IiAFvs Lt I M I F D n E R S D C V y K I L t K F K G I L g S v e n n V r fQ
399
401
398
401
399
Fig. 5. Amino acid sequence alignments of portions of SMV G2 35K protein and all of 42K protein with those of other potyviruses.
Each sequence was aligned with G2 in pairwise fashion using the program BESTFIT (GCG sequence analysis software). Amino acids
are shown in bold upper case letters where three or more align. G7 was identical to G2 in all conserved (bold) amino acids. Numbers in
the 35K alignment indicate positions of amino acids. Intervening amino acids which showed no significant alignments (indicated by
double hyphens) are not shown.
Nucleotide sequences o f two S M V strains
2075
Table 2. Nucleotide and amino acid differences between strains G2 and G7 of S M V
Amino acids
Nucleotides
Region*
5'Non-coding
35K
HC-PRO
42K
CIP
6K
21K
27K
POL
CP
3'Non-coding
Total
differences
Total
leading to
Total
nonTotal
Percentage aminoacid Percentage
Total
Percentage
conservative conservative
Total differences differences changes differences Total differences differences differencest differencesi"
(1)
(2)
(2/1)
(3)
(3/2)
(4)
(5)
(5/4)
(6)
(7)
131
924
1371
1197
1903
161
567
729
1551
795
259
13
52
54
85
155
9
36
45
78
31
18
10
6
4
7
8
6
6
6
5
4
7
NA~
24
14
29
37
0
3
2
13
3
NA
NA
46
26
34
24
0
8
4
17
10
NA
NA
308
457
399
634
54
189
243
517
265
NA
NA
20
11
22
27
0
2
2
12
3
NA
NA
NA
NA
6
2
6
13
8
15
7
3
7
4
20
7
0
0
0
1
1
1
1
1
1
9
3
2
1
NA
2
NA
1
NA
* Abbreviations are as designated in the text.
t Conservative and non-conservative differences are defined on the basis of physicochemical properties of amino acids and reflect similarity of
function in three-dimensional conformation of proteins as discussed by George et al. (1990).
NA, Not applicable.
RNA helicase and RNA-dependent ATPase activity
(Lain et al., 1990, 1991)]. The protein also has a
conserved nucleotide binding site at amino acid position
1249, a characteristic of potyviruses (Robaglia et al.,
1989).
The HC-PRO, the 35K protein and the 42K protein,
all encoded near the 5' end, show markedly less similarity
to the homologous proteins of other potyviruses. The
HC-PRO shares between 37 % and 45 % identity whereas
the 42K has 13 % to 29 % identity, and the identity of the
35K protein is an insignificant 6% to 14%. Although the
amino acid sequences of both 35K and 42K are known to
be highly variable among members of the potyvirus
group (e.g. Fig. 5), there are no differences between the
two SMV strains in the conserved regions of these
proteins (Fig. 3).
Discussion
The basis for resistance to strain G2 of soybean plants
containing the Rsv gene is unknown. This report of the
complete sequence of two closely related strains of a
potyvirus, differentiated by their ability to infect
soybean containing the Rsv resistance gene, should
provide the basis for correlating host susceptibility/resistance with alterations in nucleotide sequence occurring
among the strains. The Y-terminal proteins may be
involved in the ability of a TVMV isolate to overcome
host resistance (Hellman et al., 1990). We have shown
that the region with the greatest number of differences
between the two SMV strains is in the 5' region of the
genome. In particular, the greatest number of nonconservative amino acid differences between strains G2
and G7 occurs in the 42K protein, followed by the 35K,
CIP and HC-PRO proteins (Table 2). Although protease
and vector transmission functions have been demonstrated for the 35K and HC-PRO proteins with
reasonable certainty, other functions of proteins in the 5'
region are only speculative. All, however, could relate to
host plant resistance and include the suggestion that the
35K, 42K and CIP proteins may be involved in cell-tocell movement (Domier et al., 1987), regulation of
proteolytic processing of the viral polyprotein (Riechmann et al., 1992) and replication (Company et al., 1991 ;
Koonin, 1991 ; Lain et al., 1990, t991 ; Robaglia et al.,
1989), respectively.
A previous report showed a strong correlation between
the ability of TMV strains to overcome resistance and a
change in local net charge, because of single amino acid
changes, in the putative replicase genes encoding the
126K and 183K proteins (Meshi et al., 1988). A
comparison of the hydropathy profiles of the 35K, 42K
and POL proteins showed differences in only the first
two. The 35K protein of strain G7 showed, with respect
to G2, an increase in local net charge at amino acid
positions 13 to 25 and 244 to 259, and a decrease at
positions 47 to 60 and 132 to 137 (Fig. 6). Upon
comparison of the 42K protein of strain G7 with that of
strain G2, an increase in local net charge at positions 15
2076
Ch. Jayaram, J. H. Hill and IV. A. Miller
50
100
150
613-25
] ~.v~ 47 60 Jt
132 137
J~,
~, . . . . . . .
• . .
1~-29
200
100
200
~
250
~44-25~
~1-~
4~..A..
~
300
42K
, A,A/ vA Atl
Fig. 6. Comparison of hydropathy profiles (Kyte & Doolittle, 1982) of
cistrons 35K and 42K of SMV strains G2 and G7. Numbers identify
amino acid positions. Boxes outline regions that differ between strains.
to 29 and a decrease at positions 328 to 340 were evident.
Although their significance is unknown, differences in
local net charge have been proposed to affect electrostatic interactions between a host factor and non-structural
viral proteins involved in resistance and susceptibility
(Meshi et al., 1988).
We have shown the presence of only three amino acid
differences in CP of both G2 and G7 at amino acid
positions 2809, 3018 and 3065 (Fig. 3 and Jayaram et al.,
1991). Changes at positions 2809 and 3065 occur within
the N- and C-terminal regions of the CP, which are
known to be highly variable among potyviruses. But the
change from methionine in strain G2 to isoleucine in G7
at amino acid position 3018 occurs within the trypsinresistant core, which displays significant amino acid
identity among all potyviruses examined (Ward &
Shukla, 1991). A recent report has shown that a change
from glycine to proline in the virus CP correlates with
ability of a strain of potato virus X to overcome
resistance (Kohm et al., 1991). We have also noted that a
change in the amino acid tripeptide G R D to G K D (at
2121) in the 27K protein correlates with strain G7
infection of soybean plants containing the Rsv gene. The
role (if any) that these differences play in the interaction
between the virus and resistance gene product is
unknown. However, since different viral proteins are
involved in different resistance mechanisms, the results
of this study provide the basis for determination of
specific nucleotide sequences involved in overcoming
host resistance. Chimeric full-length infectious transcripts generated by exchanging homologous regions of
the two virus strains as well as site-specific mutagenesis
will facilitate identification of these sequences.
The variability in CP may be a useful criterion for
taxonomy of potyviruses (Shukla & Ward, 1989). The
sequence identity of CP is greater than 5 0 ~ among all
potyviruses. However, the relative similarity of different
viruses, when based on a single protein, is dependent
upon which viral protein is compared. For example,
based on CP, SMV is more closely related to TEV and
PVY than PPV, but based on CIP, POL, 21K and overall
homology, SMV is most closely related to PPV (Table 1).
Thus, it may be insufficient to characterize taxonomic
relationships based upon a single protein. Furthermore,
overall relatedness may be reflected best by comparison
of biological properties such as host range as well as viral
genes. The results reported here and those of Robaglia et
al. (1989) demonstrate that both the 35K and 42K
proteins show little similarity among potyviruses. However, because two strains of the same virus, i.e., SMV G7
and SMV G2, share 97% overall identity, comparison of
the 35K and 42K proteins of potyvirus genomes may
clarify the taxonomic position of closely related potyviruses as, for example, the distinction between SMV and
watermelon mosaic virus 2 (Jayaram et al., 1991).
The authors thank J. C. Carrington for access to a manuscript before
publication and Carol Manthey of the Iowa State University Nucleic
Acids Research Facility for automated D N A sequencing. This
research was supported in part by Pioneer Hi-Bred International,
Incorporated and by the Iowa Soybean Promotion Board. Journal
Paper No. J-14720 of the Iowa Agriculture and Home Economics
Experiment Station, Ames, Iowa, U.S.A. Project No. 2428.
References
ALLISON,R. F., JOHNSTON, R. E. & DOUGHERTY,W. G. (1986). The
nucleotide sequence of the coding region of tobacco etch virus
genomic RNA: evidence for the synthesis of a single polyprotein.
Virology 154, 9-20.
BUZZEL, R. I. & TU, J. C. (1984). Inheritance of soybean resistance to
soybean mosaic virus. Journal of Heredity 75, 82.
CA~mr~OTON, J. C., CARY, S. M., PARr,s, T. D. & DOUGRERTV, W. G.
(1989). A second proteinase encoded by a plant potyvirus genome.
EMBO Journal 8, 365-370.
CrlEN, P., Buss, G. R. & TOLIN, S. A. (1988). Inheritance of reaction to
strains G5 and G6 of soybean mosaic virus (SMV) in differential
soybean cultivars. Soybean Genetics Newsletter 15, 130-134.
CrIO, E. & GOODMAN, R. M. (1979). Strains of soybean mosaic virus:
classification based on virulence in resistant soybean cultivars.
Phytopathology 69, 467-470.
COMPANY, M., ARENAS, J. & ABELSON, J. (1991). Requirement of the
RNA helicase-like protein PRP22 for release of messenger RNA
from spliceosomes. Nature, London 349, 487-493.
DEOM, C. M., OLIVER, M. J. & BEACrIY, R. N. (1987). The 30kd gene
product of tobacco mosaic virus potentiates virus movement. Science
237, 389-394.
DOMIER, L. L., FRANKLIN,K. M., SHAHABUDDIN, M., HELLMANN,
G. M., OVERMEYER, J. H., HIREMATH, S. T., SIAW, M. F.,
LOMONOSSOFF, G. P., SHAW, J. G. & RHOAOS, R. E. (1986). The
nucleotide sequence of tobacco vein mottling virus RNA. Nucleic
Acids Research 14, 5417-5430.
Nucleotide sequences o f two S M V strains
DOMIER,L. L., SHAW,J. G. & RHOADS, R. E. (1987). Potyviral proteins
share amino sequence homology with picorna-, como-, and caulimoviral proteins. Virology 158, 20-27.
DOUGHERTY, W. G. & PARKS, T. D. (1991). Post-translational
processing of the tobacco etch virus 49-kDa small nuclear inclusion
polyprotein: identification of an internal cleavage site and delimitation of VPg and proteinase domains. Virology 183, 449-456.
DOUGHERTY,W. G., CARRINGTON,J. C., CAR',', S. M. & PARKS,T. D.
(1988). Biochemical and mutational analysis of a plant virus
polyprotein cleavage site. EMBO Journal 7, 1281-1287.
EDWARDSON,J. R. & CHRISTIE,R. G. (1986). Viruses infecting forage
legumes, vol. 2. Florida Agricultural Experiment Station Monograph
Series no. 14.
GEORGE, D. G., BARKER,W. C. & HUNT, L. T. (1990). Mutation data
matrix and its uses. Methods in Enzymology 183, 333-351.
GHABRIAL, S. A., SMITH, H. A., PARKS, T. D. & DOUGHERTY,W. G.
(1990). Molecular genetic analyses of the soybean mosaic virus NIa
proteinase. Journal of General Virology 71, 1921-1927.
GUBLER, U. & HOFFMAN, B. J. (1983). A simple and very efficient
method for generating eDNA libraries. Gene 25, 263-269.
HELLMAN, G. M., THORNBURY, D. W. & PIRONE, T. P. (1990).
Molecular analysis of tobacco vein mottling virus (TVMV) pathogenicity by infectious transcripts of chimeric potyviral eDNA
genomes. Phytopathology 80, 1036 (abstrac0.
HILL, J. H. & BENNER, H. I. (1980a). Properties of soybean mosaic
virus and its isolated protein. PhytopathologischeZeitschrift 97, 272281.
HILL, J. H. & BENNER, H. I. (1980b). Properties of soybean mosaic
virus ribonucleic acid. Phytopathology 70, 236-239.
HILL, J. H., BENNER, H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS,
R. E., JR, DURAND, D. P. & VAN DEUSEN, R. A. (1989).
epidemiology of soybean mosaic virus in Iowa. Phytopathology 70,
536-540.
HILL, J. H., BENNER,H. I., PERMAR,T. A., BAILEY,T. B., ANDREWS,
R. E., JR. DUKAND, D. P. & VAN DEUSEN, R. A. (1989).
Differentiation of soybean mosaic virus isolates by one-dimensional
trypsin peptide maps immunoblotted with monoclonal antibodies.
Phytopathology 79, 1261-1265.
HOLLINGS, M. & BRUNT, A. A. (1981). Potyvirus group. CMI/AAB
Descriptions of Plant Viruses, no. 245.
JAYARAM, CH., HILL, J. H. & MILLER, W. A. (1991). Nucleotide
sequences of the coat protein genes of two aphid-transmissible
strains of soybean mosaic virus. Journolof General Virology72, 10011003.
KAMER, G. & ARGOS, P. (1984). Primary structural comparison of
RNA-dependent polymerase from plant, animal and bacterial
viruses. Nucleic Acids Research 12, 7269-7282.
KIHL, R. A. S. & HARTWIG, E. E. (1979). Inheritance of reaction to
soybean mosaic virus in soybeans. Crop Science 19, 372-375.
KorlM, B., SANTA CRUZ, S., GOULDEN, M., KAVANAGH,T. &
BAULCOMBE,n. (1991). Molecular study of resistance in Solanum
tuberosum cv. Cara and potato virus X (PVX). Abstract No. 1225,
Third International Congress of Plant Molecular Biology, Molecular
Biology of Plant Growth and Development.
KOONIN, E. V. (1991). Similarities in RNA helicases. Nature, London
352, 290.
KYTE, J. & DOOLITI'LE,R. F. (1982). A simple method for displaying
the hydropathic character of a protein. Journalof Molecular Biology
157, 105-132.
LAIN, S., RIECrlMANN,J. L. & GARCfA,J. A. (1990). RNA helicase: a
novel activity associated with a protein encoded by a positive strand
RNA virus. Nucleic Acids Research 18, 7003-7006.
LAIN, S., MARTIN, M. T., RIECHMANN,J. L. & GARCiA,J. A. (1991).
Novel catalytic activity associated with positive-strand RNA virus
infection: nucleic acid-stimulated ATPase activity of the plum pox
potyvirus helicaselike protein. Journal of Virology 65, 1-6.
LIM, S. M. (1985). Resistance to soybean mosaic virus in soybeans.
Phytopathology 75, 199-201.
LucAS, B. S. & HILL, J. H. (1980). Characteristics of the transmission of
three soybean mosaic virus isolates by Myzus persicae and
Rhopalosiphum maidis. Phytopathologische Zeitschrift 97, 47-53.
2077
MAISS, E., TIMPE, U., BRISSKE, A., JELKMANN, W., CASPER, R.,
HIMMLER,G., MA'I'rANOVICH,n. & KATINGER,H. W. D. (1989). The
complete nucleotide sequence of plum pox virus RNA. Journal of
General Virology 70, 513-524.
MAVANKAL,G. & RHOADS,R. E. (1991). In vitro cleavage at or near the
N-terminus of the helper component protein in the tobacco vein
mottling virus polyprotein. Virology 185, 721-731.
MESHI, T., WATANABE,Y., SAITO, T., SUGIMOTO,A., MAEDA, T. &
OKADA, Y. (1987). Functions of the 30kd protein of tobacco mosaic
virus: involvement in cell-to-cell movement and dispensability for
replication. EMBO Journal 6, 2557-2563.
MESHI, T., MOTOYOSm, F., ADACHI,A., WATANABE,Y., TAKAMATSU,
N. & OKADA,Y. (1988). Two concomitant base substitutions in the
putative replicase genes of tobacco mosaic virus confer the ability to
overcome the effects of a tomato resistance gene, Tm-1. EMBO
Journal 7, 1575-1581.
MIERENDORF, R. C. & PFEFFER, D. (1987). Sequencing of RNA
transcripts synthesized in vitro from plasmids containing bacteriophage promoters. Methods in Enzymology 152, 563-566.
MOSER, O., GAGEY, M.-J., GODEFROY-COLBURN,T., STUSSI-GARAUD,
C., ELLWART-TSCHORTZ,M., NITSCHKO, H. & MUNDRY, K.-W.
(1988). The fate of the transport protein of tobacco mosaic virus in
systemic and hypersensitive tobacco hosts. Journal of General
Virology 69, 1367-1373.
PARKS, T. D. & DOUGHERTY,W. G. (1991). Substrate recognition by
the NIa proteinase of two potyviruses involves multiple domains:
characterization using genetically engineered hybrid proteinase
molecules. Virology 182, 17-27.
PONZ, F., GLASCOCK,C. B. & BRUENING,G. (1988). An inhibitor of
polyprotein processing with the characteristics of a natural virus
resistance factor. Molecular Plant-Microbe Interactions 1, 25-31.
R~ECHMANN,J. L., LAIN, S. & GARCiA, J. A. (1992). Highlights and
prospects of potyvirus molecular biology. Journal of General Virology
73, 1-16.
ROBAGLIA,C., DURAND-TARDIF,M., TRONCHET, M., BOUDAZIN,G.,
ASTIER-MANIFACIER,S. & CASSE-DELBART,F. (1989). Nucleotide
sequence of potato virus Y (N strain) genomic RNA. Journal of
General Virology 70, 935-947.
SANDERSON,J. L., BRUENING, G. & RUSSELL, M. L. (1985). Possible
molecular basis of immunity of cowpeas to cowpea mosaic virus.
UCLA Symposia on Molecular and Cell Biology, New Series 22, 401412.
SANGER,F., NICKLEN,S. & COULSON,A. R. 0977). DNA sequencing
with chain-terminating inhibitors. Proceedings of the National
Academy of Sciences, U.S.A. 74, 5463-5467.
SHUKLA,n. D. & WARD, C. W. (1989). Identification and classification
of potyviruses on the basis of coat protein sequence data and
serology. Archives of Virology 106, 171-200.
SIEGEL, A. (1979). Recognition and specificity in plant virus infection.
In Plant Resistance to Viruses,pp. 109-113. Edited by D. Evered & S.
Harnett. Chichester: John Wiley and Sons.
TALIANSKY, i . E., MALYSHENKO, S. I., PSHENNIKOVA, E. S. &
ATAnEKOV,J. G. (1982). Plant virus-specific transport functions. II.
A factor controlling host range. Virology 122, 327-331.
VANCE,V. B. & BEACHY,R. N. (1984a). Translation of soybean mosaic
virus RNA in vitro: evidence for protein processing. Virology 132,
271-281.
VANCE, V. B. & BEACrlY,R. N. (1984b). Detection of genomic-length
soybean mosaic virus RNA on polyribosomes of infected soybean
leaves. Virology 132, 26-36.
VERCHOT,J.-M., KOONIN, E. V. & CARRINGTON,J. C. (1991). The 35kDa protein from the N-terminus of the potyviral polyprotein
functions as a third virus-encoded protease. Virology 185, 527535.
WARD, C. W. & SHUKLA, D. D. (1991). Taxonomy of potyviruses:
current problems and some solutions, lntervirology 32, 269-296.
(Received 25 November 1991; Accepted 31 March 1992)