Download Nucleotide sequence of RNA 1, the largest genomic segment of rice

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Journal of General Virology (1994), 75, 3569-3579. Printedin Great Britain
3569
Nucleotide sequence of RNA 1, the largest genomic segment of rice stripe
virus, the prototype of the tenuiviruses
S h i g e m i t s u T o r i y a m a , 1. M a m i T a k a h a s h i , l Y o s h i t a k a S a n o , 2 T a k u m i S h i m i z u ~
and Akira I s h i h a m a 3
1National Institute of Agro-Environmental Sciences, Kannondai 3, Tsukuba, Ibaraki 305, 2 Graduate School of Science
and Technology, Niigata University, Ikarashi, Niigata 950-21 and 3National Institute of Genetics, Mishima, Shizuoka
411, Japan
The complete nucleotide sequence of RNA 1, the largest
genomic segment of rice stripe virus (RSV), was
determined using two sets of overlapping cDNA clones.
RNA segment 1 comprises 8970 nucleotides and on the
viral complementary sequence has a single long open
reading frame coding for a protein of 2919 amino acids
with an estimated M r of 336860. Amino acid sequence
comparisons of the putative protein indicated strong
homology (30% amino acid identity over about 1500
residues) with the L protein of the genus Phlebovirus of
the Bunyaviridae, but no detectable similarity with other
members of the Bunyaviridae. However, weak similarity
was detected with the L protein of Tacaribe arenavirus.
The highly homologous sequence domain includes the
conserved motifs of the putative RNA-dependent RNA
polymerase. The data presented here, along with
previous work clearly show significant similarities in
genome organization, structure and expression between
RSV and members of the genus Phlebovirus of the
Bunyaviridae. Taken together, we propose that tenuiviruses should be included in the Bunyaviridae under the
genus Tenuivirus.
Introduction
suggest that all three RNA segments have ambisense
coding strategies. This was also experimentally shown by
in vitro translation of RNA transcribed from the cDNA
sequences (Hamamatsu et al., 1993).
The 3'- and Y-terminal sequences of approximately 18
nucleotides are conserved among all four RNA segments
and are complementary to each other, except for one
base change (U to A) at the sixth position from the 3' end
of ssRSV RNA 1 (Takahashi et at., 1990). Moreover,
eight terminal nucleotides out of ten conserved nucleotides are identical to those present in the terminal
consensus sequences of the genus Phlebovirus of the
family Bunyaviridae (Elliott, 1990; Elliott et al., 1991;
Kakutani et al., 1990; Takahashi et al., 1990). Weak but
significant amino acid sequence similarity exists between
the nucleocapsid proteins from RSV and Punta Toro
phlebovirus (Kakutani et al., 1990). Likewise, similarity
exists between the putative M r 94K protein of RSV RNA
segment 2 and the membrane glycoproteins of Punta
Toro and Uukuniemi phleboviruses (Ihara et al., 1985;
R6nnholm & Petterson, 1987; Takahashi et al., 1993).
These observations suggest an evolutionary relationship
between RSV and the phleboviruses.
The tenuiviruses include maize stripe virus (MStV),
rice hoja blanca virus (RHBV), rice grassy stunt virus
(RGSV) and three other possible members (Francki et
Rice stripe virus (RSV), the prototype of the genus
Tenuivirus, has a broad host range in the Gramineae and
causes serious damage to rice, particularly Japonica-type
rice varieties (Toriyama, 1983; Francki et al., 1991). RSV
is transmitted by the small brown planthopper Laodelphax striatellus Fall6n, and planthoppers of three other
species. In planthoppers, RSV replicates and is transovarially transmitted to a high percentage of the progeny
(reviewed in Toriyama, 1986b). The genome of RSV
comprises four ssRNA segments; as well as low levels of
four dsRNAs, duplexes of vRNA and its complementary
RNA can also be detected (Toriyama & Watanabe,
1989; Ishikawa et al., 1989). The dsRNAs found in
tenuiviruses seem to be artifacts generated by annealing
of complementary strands (Falk & Tsai, 1984). The
complete nucleotide sequences have been determined for
RNAs 3 and 4 from two different isolates (Kakutani et
al., t990, 1991; Zhu et al., 1991, 1992) and for RNA 2
from one isolate (Takahashi et al,, 1993). The results
The DDBJ accessionnumber for the sequence of RSV RNA 1 is
D31879.
0001-2698 © 1994SGM
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3570
S. Toriyama and others
al., 1991). Recent nucleotide sequencing studies of RNAs
3 and 4 of MStV (Huiet et al., 1991, 1992) and RNA 4
of RHBV (Ramirez et al., 1993) showed strong homology
in the RNA 3 and 4 sequences between MStV, RHBV
and RSV. The 18 nucleotide terminal sequences are
conserved in RNAs 3 and 4 of MStV, RHBV and RSV.
All of these RNAs have an ambisense coding strategy.
Filamentous particles of RSV and RGSV are associated with a high level of RNA-dependent RNA polymerase activity. A minor polypeptide, M r 230K, constituting purified filamentous virus particles of RSV and
RGSV, is considered to be the RNA polymerase protein
(Toriyama, 1986a, 1987). The largest genome segment,
RNA 1, is presumed to encode this 230K RNAdependent RNA polymerase, because this RNA segment
alone is large enough to encode the 230K protein. In this
paper, we present the nucleotide sequence of the RNA
segment 1 of RSV. Analysis of the amino acid sequence
of the predicted open reading frame reveals that RNA
segment 1 does indeed encode the RNA polymerase. A
high degree of homology was found between RSV RNA
1 and the L RNAs of phleboviruses. This homology was
even greater than that detected for RNAs 2 and 3 of
RSV.
Methods
Virus andplant. RSV isolate T was propagated in wheat plants with
transmission by the viruliferous small brown planthopper L. striatellus,
and purified as described previously (Toriyama, 1986a; Toriyama &
Watanabe, 1989). The nB component, which contains RNA segment 1,
was further purified at least twice by centrifugation on linear 5 to 35 %
sucrose gradients. RSV RNA was prepared as described previously
(Toriyama, 1986a). ssRNA I was separated from RNAs 2, 3 and 4 by
electrophoresis in 1% low-melting-point agarose gel (LGT agarose;
Nakarai Chemicals) (Toriyama & Watanabe, 1989).
cDNA synthesis and cloning, vRNA-dependent cDNA synthesis was
done by the method of Gubler & Hoffman (1983) using M-MLV
reverse transcriptase lacking RNase H activity (BRL) and a synthetic
oligonucleotide, primer A, with the sequence 5' AGAGGAAAAAATAATTTTGA 3', which is complementary to the unique nucleotide
sequence located at nucleotides 11 to 30 from the 3' end of RNA 1
(Takahashi et al., 1990). The cDNA was blunt-ended with 1"4 DNA
polymerase and inserted into the Smal site of pUC 18 (Yanisch-Perron
et al., 1985). Recombinant plasmids were transformed into the
competent Escherichia coli strain JM109 (Nippon Gene Company;
Hanahan, 1985). Four independent clones were obtained, all of which
contained the 3'-proximal sequence of RNA segment 1 (Takahashi et
al., 1990). Two clones, pRS1S61 and pRS1S207, were used for the
sequence determination of the 3' half of RNA 1 (Fig. 1). To obtain
cDNA clones for the 5' half of RNA segment 1, primer B
(5' TATCTTGGGTATCTAAAGAA Y), from the 3'-proximal region
nucleotide sequence of clone pRS1S61, was used. The ds-cDNA was
tailed with dCTP using terminal deoxynucleotidyl transferase (BRL),
and annealed with Pstl-cut pUCI9 vector which was previously tailed
with dGTP. The recombinant plasmids were transformed into E. coli
DH5~ F' (BRL). Seven independent clones were isolated, of which two
clones, pRS1CI7 and pRS1C18, were used for the sequencing (Fig. 1).
DNA sequencing. Four recombinant plasmid clones, pRS1S61,
pRSIS207, pRS1CI7 and pRS1C18, were digested with restriction
enzymes and the resulting fragments were subcloned into the M I3
mpl8 or mpl9 phage vectors, or into pUC18 or pUC19 plasmid
vectors. Alternatively, a nested set of deletions was prepared from the
inserted DNAs of subclones of the four clones using the Kiro sequence
deletion kit (Takara Shuzou; Henikoff, 1984; Yanisch-Perron et al.,
1985). The ss- and dsDNAs prepared were sequenced using the
Sequenase version 2.0 kit (United States Biochemicals) and [c~-35S]dCTP
(Amersham) (Sanger et al., 1977). Sequencing in one direction for clone
pRS1C17 was done with an automated DNA sequencer (model 373A,
Applied Biosystems). The nucleotide sequences were analysed using the
DNASIS program (Hitachi Software Engineering Co.). The GenBank/
EMBL and N B R F / P I R databases were searched for RNA and amino
acid sequence homologies.
Results
Nucleotide sequence of the R S V RNA segment l
The first step cDNA synthesis was carried out using an
oligonucleotide primer (primer A) complementary to
nucleotides 11 to 30 from the 3' end of RNA segment 1.
One clone contained an insert of about 7000 bp, but after
a few cycles of transfer the size became smaller,
suggesting deletion by intramolecular recombination.
Therefore, we prepared two sets of partial clones
(pRS1S61 and pRS1S207), covering the 3' half of RNA
segment 1 as illustrated in Fig. 1. Based on the 3'terminal sequence of these 3'-half clones, we prepared an
internal primer and carried out the second step cDNA
synthesis. Clones pRS1C17 and pRS1C18 were chosen
for the sequence determination of the 5' half. The
sequence of RNA segment 1 (nucleotides 21 to 8957) was
determined using two independent overlapping clones,
except for the region between nucleotides 5233 and 5584
(this region was analysed only for clone pRS1S61). The
sequences of both termini were obtained from the data
determined by direct sequencing of viral RNA 1
(Takahashi et al., 1990). The complete nucleotide
sequence of RSV RNA segment 1, expressed as viral
complementary sense, is shown in Fig. 2. RNA segment
1 is composed of 8970 bases, with a base composition of
26-83% A, 34-33% U, 22.02% C and 16.82% G. The
sequence was scanned for AUG-initiated open reading
frames (ORFs). A single large ORF was detected in the
viral complementary sequence (cRNA). Other short
ORFs were identified on viral sense RNA (vRNA),
which may encode M r 9.3K, 8.1K and 6.3K products.
The large ORF present in cRNA extends from the 5'proximal AUG codon at positions 58 to 60 to the UGA
stop codon at position 8815 to 8817 (Fig. 2). The noncoding sequences are therefore 57 nucleotides at the 5'
end and 153 nucleotides at the 3' end.
The amino acid sequence derived from the long ORF
is shown in Fig. 2. The predicted gene product is 2919
amino acids long and has an estimated M r of 336860.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
Complete sequence o f the R S V genome
0
RSV RNA1
31000
6000
I
9000
i
8970
1
I
(1 to 86)
(8925 to 8970)
21
pRS1C17
5250
~.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\!
21
pRS1C18
5250
K-~\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\1
Clones
5164
5584
This predicted M r is larger than the previously estimated
size (230K) of a minor polypeptide associated with RSV,
which was based on the relative migration in an SDSpolyacrylamide gel (Toriyama, 1986a). Henceforth, we
designate the predicted protein as the 'Pol' protein of
RSV.
O o ..ootogtes
~lvl •
,
o f the R S V R N A segment 1 and the L
R N A o f phleboviruses
The similarity search using the GenBank/EMBL nucleotide and NBRF/PIR protein databases showed clearly
that RSV RNA segment 1 is homologous to the L RNA
of phleboviruses, i.e. Uukuniemi virus (UUKV) (Elliott
et al., 1992), Rift Valley fever virus (RVFV) (Muller et
al., 1991) and Toscana virus (TOSV) (Accardi et al.,
1993). At the amino acid level, the similarity was maximal
between the RSV Pol protein and the L proteins of the
phleboviruses (UUKV, RVFV and TOSV), as shown in
the dot-plot analysis of protein homology (Fig. 3 a, b, c).
Among the L proteins of the phleboviruses, the percentage of identical amino acids was 36'9% between
UUKV and RVFV (Fig. 3d), 35.8% between UUKV
and TOSV, and 5l'5 % between RVFV and TOSV over
the entire amino acid sequences. An optimal sequence
alighment of the RSV Pol protein with the UUKV L
protein reveals 31.1% identical residues and 71.2%
overall similarity (including conserved amino acids). The
similarity was maximal between residues 493 to 2026
(RSV) with only a few minor gaps, except for a 26 amino
acid gap in the sequence of UUKV L protein between
residues 1333 to 1362. The greatest similarity was
between residues 1362 to 193l (569 amino acids) where
there is 39'3 % identity and 78-3 % similarity (Fig. 4).
This region contains the sequence of the putative RNA
polymerase domain, including the four polymerase
motifs proposed by Poch et al. (1989) in the RNAdependent RNA polymerase and identified in L proteins
of UUK and RVF phleboviruses (Elliott et at., 1992). In
addition, this region contains one distinct homologous
3571
pRSIS61
8957
pRSIS207
8957
Fig. 1. Relationshipsof four cDNA clones used
for determiningthe nucleotidesequenceof RSV
RNA segment 1. cDNAs were synthesizedusing
two syntheticoligonucleotideprimers: primer A,
complementaryto nucleotides11 to 30fromthe 3'
end, and primer B, complementaryto nucleotides
5233 to 5253. The 3'- and 5'-terminal sequences,
determined by direct RNA sequencing (Takahashi et al., 1990), are indicatedby arrows.
stretch of 21 amino acid residues at 1408 to 1428 and a
leucine zipper motif, L-X~-L-X~-L-X6-L at residues
1531 to 1552, located roughly in the central portion of
the RNA polymerase region. No significant homology
was, however, found between RSV Pol and the L
proteins of Bunyamwera, Hantaan and tomato spotted
wilt viruses (Elliott, 1989; Schmaljohn, 1990; De Haan et
al., 1991) (Fig. 3 e , f ) . A weak similarity was found with
the L protein of Tacaribe arenavirus: 19.2% identity
over 449 amino acids in the region containing the
polymerase motifs (Iapalucci et al., 1989) (Fig. 3g, h).
Discussion
The nucleotide sequences of the RNAs 2, 3 and 4 of RSV
isolate T have been reported previously (Zhu et al., 1991,
1992; Takahashi et al., 1993). The determination of the
sequence of RNA 1 completes the genome sequence of
RSV. As summarized in Fig. 5, the complete genome
comprises 17145 nucleotides, of which 86-8 % code for
seven ORFs. Each of RNAs 2, 3 and 4 has an ambisense
coding strategy, and RNA segment 1 is a negative strand
RNA. MStV, another member of the tenuiviruses,
contains five RNA segments, of which the smallest RNA
(RNA 5) is a negative strand and encodes a highly basic
protein (Huiet et al., 1993). Although the existence of a
small RNA has been reported for a different isolate of
RSV (Ishikawa et al., 1989), we have been unable to find
such a small distinct RNA in RSV (Toriyama, 1982;
Toriyama & Watanabe, 1989). Furthermore, a purified
preparation of RSV containing the four RNA species
alone reproduced the original chlorotic stripe symptoms
on rice seedlings, when inoculated through the planthopper vectors (Toriyama, 1982). Among the seven
putative viral-coded proteins, the nucleocapsid protein
and a non-structural protein (S-protein) were shown to
be encoded by cRNA 3 and vRNA 4, respectively
(Hamamatsu et al., 1993). We have now shown that the
predicted Pol protein (336.8K) is encoded by cRNA 1.
This Pol protein is most probably the previously
designated 230K protein which is associated with RSV
nucleoproteins and was considered to be a putative RNA
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3572
S. Toriyama and others
10
~
30
40
50
~
70
~
90
100
110
120
5' ACACATAGTC AGAGGAAAAAATAAIiilGA TTTTGTTTTC CACAAAAGAA"~GAAGGATG ACGACACCACCTCTCG~AT ACCC~GCAT G~CATGGCA GGTCI~FATGA A~G~GGCG
M T T P
P L V I
P L H V H G R S Y E I. I. A
130
140
150
1~
170
180
190
200
210
220
230
240
GGGTATCATG AAG~GA~G GCAGGAGATA 6AAGA6~GG AAGAAACAGATGTCAGAGGA GATGGA~ G T ~ A T C A ~CCATACTA TATAGTATGG GC~GAGCAA GGAGAACTCT
G Y H E V D W Q E I
E E L
E E T D V R G D G F
C L y H
S I L
Y S M G L S K
E N S
2~
260
270
2~
~0
~0
310
320
330
~0
350
3~
CGCACCA~G AATTTATGAT AAAGCTACGA TCGAATCCAG CCATCTGCCA GCTGGATCAA GAAATGCAAC TGAGCC~AT GAAGCAGC~ 6ATCCAAATG A~CATCAGC CTGGGGTGAA
R T T
E F M I
K L R
S N P
A I C Q
L D Q
E M Q
L S L M
K Q L
D P N
D S S A
W G E
370
380
3~
4~
410
420
430
440
450
460
470
480
GATATAGCAA TFGGG~FAT AGCTATAATA TrGAGAA~A AGATAA~GC ~ACCAGACA G~GATGGGA AG~GTITAA GACTAITTAT GGTGCTGAGT ~GAGAGTAC TA~AGAA~
D I A
l G F I
A l 1 L R I
K I I A
Y Q T
V D G K L F K T I Y
G A E
F E S T
l R I
4~
500
510
520
530
540
550
560
570
580
5~
600
AGGAA~ATG GGAA~ACCA C'Iq'CAAGTCA ~ A G A C A G AITVFGATCA TAAAGTAAAG ~CAGATCAA AAA~GAAGA A~C~GAGA ATGCCAG~G AAGA~G~A ATCCAT~CC
R N Y
G N Y H
F K S
L E T
D F D H
K V K
L R S
K I E E
F L R
M P V
E D C E
S I S
610
620
630
~0
650
660
670
680
6~
7~
710
720
~GTGGCATG CATCTGTTTA CAAGC~ATA GTAT~GATA GCCTTTCTGG ACACAAGAGCTTFAGTAATG TGGATGAA~ GATAGG~GC ATAATATCCA GCATGTA~A GATCATGGAC
L W H A S V Y K P I
V S D S L S G H K S
F S N V D E L
1G
S
I 1S
S M Y K ] M D
730
740
750
760
770
780
790
800
810
820
830
~0
~TGGTGATC AATGTTTT~ ~GGAGTGCA ATGAGAATGG TAGCCAGACC CT~GAAAAA CTATATGCCC ~GCAGTG~ TTTGGGA~C AATC~AAGT TCTATCATGT GAGGAAAAGA
N G D Q C F L
~ S A
M R ~
V A R P
S E K
L Y A
L A V F
L G F
N L K
F Y H V R K R
850
8~
870
8~
890
~0
910
920
930
940
950
960
~TGAAAAATTGACGGCAAAACTTGAGAGTGATCATA~AATTTGGGAGTGAACCTGA~ GAGGTATATGAAGTTTCTGAGCCAACCAGAT~ACCTGGGTC~AAACCAGGAGGGAGC
A E K
L T A K
L E S
D H T
N L G V
K L l
E V Y
E V S E
P T R
S T ~
V L K P
G G S
970
9~
990
I~0
1010
1020
1030
1040
1050
1060
1070
i0~
AGAATAACTG AAACAAGAAAI-IIIGTGA~ GAGGAGATAA TAG~AACAG GCG~CTCTG GAGAGCTTAT ~GTGTCAAG CA~GA~AT C~GCAGAGT T A ~ C C C A GAAACTTAGT
R I T
E T R N
F V l
E E I
I D N R
R S L
E S L
F V S S
S E Y
P A E
L C S Q
K L S
I0~
iI~
iii0
1120
1130
1140
Ii~
1160
I170
II~
Ii~
I~0
~CATCAAAG ACAGAATAGC A~AATGTTT GGCTTTATCA ACAGAACCCCTGAAAACAGT GGGAGGGAAC~TACATAAA CACATA~AT ~GAAGAGGA T~ACAGGT GGAAAGAA~
A I K
O R I A
L M F
G F I
N ~ T P
E N S
G R E
L y I N
T Y Y
L K R
I t Q q
E R N
1210
1220
12~
1240
1250
12~
1270
1280
12~
13~
1310
1320
~AA~AGAG A~C'I23"AAG A~ACAGCCT GCT~GGGGA TGATCCAGAT AATCAGA~A CCAACAGCAT ~GGTACATA CAACCCGGAA GTGGGCA~C TG~G~AGC CCAAACTGGA
V I R D S L R S Q P A V G M I Q I
I R L P T A F G T Y N P E V G T L L L A Q T G
1330
i~0
1350
1360
1370
13~
13~
1400
1410
1420
1430
1440
~AATCTATA GAC~GGCAC CACAA~AGA GTfiCAGATGG AGGTCAGGAG AT~CC~CT G~AI~'TCAA G ~ C ~ A A
GATCA~AGT T'CFCCGGAGA CACAAAAACA~ACAACAAT
L [ Y R L G T
T T R
V Q ~
E V R R
S p S
V I S R S B K
] T S F P E
T Q K H N N N
1450
1460
1470
1480
1490
15~
1510
1520
1530
1540
1550
1560
~ G T A ~ A ~ ATGCACCCAG AACACAGGAGACATI']'TATC ACCCAAATGCTGAGATCTAT fiAGG~G~G ATGTAAAGAC TC~AGTG~ A~ACAGAGA ~G~GATAA TCATATAGT6
L Y D
Y A P R
T Q E
T F Y
H P N A
E I y
E A V
D V K T
P S V
I T E
[ V D N
H I V
1570
1580
1590
1600
1610
16~
1630
1640
1650
1660
1670
16~
ATAAAA~GA ACACTGATGA TAAGGG~GG TCAGTCAGTG A~CGATAAA GCAAGAIIII ~ATA~GGA AGAGACTA~T GG~GC~AG AATA~G~C ATGACTFFGT III~ATATC
[ K L
N T O D
~ G ~
S V S
9 S 1K
Q D F
V Y R
K R L M
D A K
N [ V
H D F V
F D [
1690
1700
1710
1720
1730
1740
1750
1760
1770
1780
17~
1800
~ATCAACTG AGA~GACAA GAGCTTTAAG GGTGCTGACT TAT~ATAGG AGGAATCTCA GATAACTGGT CACCAGATGT C A ~ A T ~ C A AGAGAAAGTG ATCCACA~A TGAAGATATC
L S T
E T D K
S F K
G A D
L S I 6
G I S
D N W
S P D V
I I S
R E S
D P Q Y
E D !
1810
1820
1830
I~0
18~
18~
1870
18~
1890
1900
1910
1920
GTTGT~ATG AG~CACAAC AAGGTCCA~ GAGTCTATAG AAT~CTA~ AAGATCAGTA GAGG~AAAA GC~ACGATA TAAAGAAGCA A~CAGGAAA GAGCCATCAC A~AAAGAAG
V V Y E F T T
R S T
E S I
E S L L
R S V
E V K
S L R Y
K E A
I Q E
R A I T
L K K
1930
1940
19~
1960
1970
19~
19~
2000
~i0
~
~30
2040
AGAATATCGT A~ACACAAT A~TGTCAGT CTAGATGCTG TAGCCACAAA TCTGCTATCA C ~ C ~ G ~ G ATG~TGCAG A G A A ~ T A A~CGTTTAA GAG~G~AA TCAGGTGAAG
R I S
Y Y T I
C V S L D A
V A T N
L L S
L P A
D V C R
E L [
I R L
R V A N Q V K
2050
2060
2070
20~
20~
21~
2110
2120
2130
2140
21~
2160
ATCCAG~AG CTGATAACGA TATCAATC17 GACT~GCCA CITFfi~AGC AC~GACA~ TACAfiAATAA AGGAAATG~ TAGGGA~GT ~CCCAA~A ATAAATTTAT A C A ~
I 0 L
A D N D
I N L
D S h
T L L A
P D I
y R I
K E ~ F
R E S
F p N
N K F I H P I
2170
2180
2190
22~
2210
2220
2230
2240
2250
2260
2270
22~
ACTAAGGAAA TGTATGAGCA TTTTGTCAAT CCAATGATTT CAGGAGAAAAAGACTATG~ GCCAATTTAA AG~CATAAT AGACAA~AG ACCAGAGATG AGCAGAG~A GAATTTAGAG
T K E ~ Y E H
F V N P M I
S G E K
D Y V
A N L
K S I I
D K E T R D
E Q R K
N L E
22~
2300
2310
2320
2330
2340
2350
2360
2370
2380
2390
2400
AGTCTGAAAG'~GTGGATGGGAAAAAGTACACAGAGAGAAAAGCAGAAACTGCTCTGA~ GAGATGTCACAAGCAG~AGAGCA~AGAAG~AI~IGAAA~GACAAII~TAG~CC
S L K V V D G K K Y T E R K A E T
A L N
E M S
Q A E E
H Y R
S Y F
E N D N F R S
2410
2420
2430
2440
2450
24~
2470
2480
2490
2500
2510
2520
ACACTAAAAG CTCCAGTCCA ACTTCCC~A ATCATACCGG ATGTGTCAAG TCAGGACAAT CAA~CTCAA ACAAGGAACT ATCTGATAGG ATACGGAAGA AGCCGATCGA CCACCCTA~
T L K
A P V Q
L P L
I 1P
D V S S
Q D N
Q F S
N K E L
S D R
I R K
K P I D
H P [
Fig. 2. F o r l e g e n d s e e p a g e 3 5 7 5 .
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
Complete sequence of the RSV genome
3573
25~
2~0
2550
2560
2570
25~
25~
26~
2610
2620
2630
2640
TACAAC~CT CGG~CAAGC AC~AATAAG AGAAA~G~ CGA~GCA~ CGGCCATTTG 6ACGAG~AG AA~ATCTAT G~AGAAGGA CAAG~GCTA AGAAAGTGGA GGAATC~AT
Y N [
W D Q A V N Z R N C S I A L G H L D E L E ] S M L E G O V A Z K V E E S Y
2650
2660
2670
26~
26~
27~
2710
~
2730
2740
2750
2760
~GAAAGATA GGAGTCAGTACAACAGGACAA~GCTAA CTA~ATGAA GGAGGACATCTACITGGCTG AAAGGGGGATAA~G~AAG AAGAGG~GGAAGAACCAGATGTGAAATI-f
K K D R S Q Y N R T T L L T N M Z E D [
Y L A E R G [
N A K K R L E E P D V K F
2770
2780
27~
2800
2810
28~
2830
2840
2850
28~
2870
28~
~TCGAGATC AGTCTAAGAGGCCTTTTCAT CCTTTTG~A GTG~ACCAG AGACATAGAGCAG~CAC~ AGAAAGAGTGC~GGAACTC AATGAAGAGTCAGGACA~GCTCG~G~A
Y R D Q S K R P F H P F V S E T R D I E
O F T O Z E C L E L N E E S G H C S L I
2890
2900
2910
2920
2930
2~0
29~
2~0
29~
~
2990
3~0
~TGTAGAGGAT~AGTGI~AT~GCT~AGAG~GCATGAGGTAGGTGATTTAGAACAC~ATGGAACAACATAAAAGC~A~AAAACAAAGTTTGCA~A~CTAAGTrTA~
N V E D L V L S A L E L ~ E V G D L E ~
L ~ N N I ~ ~ B S K T K F A L Y A K F ~
3010
3020
3030
3040
30~
30~
3070
30~
30~
31~
3110
3120
TCTGATCTTGCC~CCGAG~ AG~AITYCA~AT~CAGAA~AAAGAAGACACCT~ ~GG~AAGAAA~CAGAGA~AG~GCTACUrACTCA~AAACCA~AAAC~AAAG
S D L
A T E L
A I S L S Q N C ~ E D T Y V V ~ ~ L R D F S C Y V L
I K P V N L K
3130
3140
31~
3160
3170
31~
31~
32~
3210
3220
3230
3240
AGTAATGTGTTC~CTTTA~CATACCTTCTAATATTTATAAGTCACACAACACAA~ ~CA~A~CTGATAGGC~GTCCAGA~CAGG~GACTCA~FCGT ~CTG~AAT
S N v F F S L y 1 p S N I
y K S H N T T F K T L I G S P E S G Y M T D F V S A N
3250
3260
3270
32~
32~
33~
3310
3320
3330
3~0
3350
3360
~GAGCAAGT TAGTGAA~G GG~AGA~T GAAG~A~A ~C~GCACA AAGAGGTITC~GCGAGAATTTTATG~GT GGCCC~AGCA~GAGGAAC AAGATGGAATGGCGGAGCCA
v S K
L V N ~
v R C
E A W
~ L A Q
R G F
~ R E
F Y A V
A P S
I
E E
Q D G ~
A E P
3370
33~
~
~
~10
34~
~30
~40
3450
3460
3470
34~
GACTC~T~T GTCAGATGATGAG~GGACA CTC~CATAT TA~AAACGA CAAGCATCAG~AG~GAGA TGATCACAGT~AGGTTT GTCCATA~G AAGGCTTTGTAACIIIICCT
D S V C Q M M S ~ T L L I
L L N D K H Q L E E M I T V S R F V B ¼ E G F V T F P
~
3500
3510
3520
~
3~0
35~
3560
3570
35~
35~
36~
~ATGGCCTAAACC~ATAAAATGTTTGATAAA~ATCAGTAA~CCGAGGTCTAGG~AGA~AGTC~AAAGAGG~CA~ATGCTA~AAGCA~A~C~AAAATCCCA~
A ~ p
3610
K p y
K
3620
M F D
3630
K L S
3~0
V T P R
36~
S R L
36~
E C L
V I
3670
K R
3680
L
I
M
36~
L M K
37~
H Y S E
3710
N P [
3720
~ATTTATGA TAGAAGACGAGAAGAAAAAGTGGTrTGGAT TCAAAAATATG~C~GCTT GA~GTAATG GTAAAC~GCTGATr'rAT~ GATCAGGA~AAATGCTTAA~TCIiI]AT
Z F M I E D E K K K ~ F G F K N M F L L
D C N G K L A D L S D Q D Q M L M L F Y
3730
3740
3750
3760
3770
37~
37~
3800
3810
3820
38~
3~0
CTTGG~ATC TAAAGAACAAAGATGAGGAGGTCGAAGACAATGGCATGGGTCAA~A~G A~AAAATCC ~GGCTT[GA GAG~CCA~
L G Y L K N K D E E V E D N G ~ G Q L L T K I
L G F E S A M
38~
38~
3870
38~
3890
3~0
3910
3920
3930
GATCCTGAGTA~ACAAT CAAGAAGCAT6AG~CCA TAAG~ATGT 6AAGGAC~C TGTG~AAAT TCTTACACAGA~AAA~AG
D p E
y G T I [ K H
E F S
I S Y V
K D L
C D ~
F L D R
L K K
3970
39~
3990
4000
4010
40~
4030
4~0
4050
CCAAAGACAAGAGAC~CTT GGGTATGAAA
P K T R D F L
3~0
39~
G W K
39~
ACACACGGAATCAAAGATCCAA~ACTTAT
T H G
I K D P
I T Y
4060
4070
40~
~GGGCGACA AGATAGCTAAA~CCTTACC A~CAGTTTA ~GAGACGATGGCATCTTTG AAGGCA~ATCTAAC~CTC AGAGGA~ACTA'ITrATACA CACCCA~AG AAGA~AAAA
L G D Z l A K F L S T Q F I E T M A S L Z A S S N F S E D Y Y L Y T P S R R L K
40~
4100
4110
4120
4130
4140
41~
41~
4170
41~
41~
42~
~CCAGGAGCAATCTAGAAGT~ACA~TAATAG~GC~
~G~AATATA~TG~A~
~CA~G~AAGCTGT~CATAGAAGCAAA ~ A A ~ G A G A A G C T C A C ~ C C ~ A A ~ A A A
N Q E Q S R S K H V I b A G G N I
S ~ S V [ G ~ t Y ~ ~ S E V I E Z L T T L I ~
4210
4~0
4230
4240
42~
42~
4270
42~
42~
4300
4310
43~
6ACGAAACAC CAGGAAAAGA A~GAAAATA ~GGTAGATC TCTTAC~AA G G ~ A ~ G A A G~C~AACA AAAATGAA~ TATGCACA~ ~TAITITCA AGAAGAA~A GCATGGAGGC
D E T P G K E L K I
V V D L L P K A M E V t N K N E C M H I
C I F K K N O H G G
43~
4~0
4350
43~
4370
43~
43~
44~
4410
44~
4430
4440
C~AGAGAAAT ~ G ~
TA~CITI" GAAAGAATAATGCAGAAGACAGTGGAAGAT~CTAGAG CCA~CTAGA ATGCTGTC~ A~GAGACAA~ACATCCCC GAAAAACAAG
L R E I Y V L N I F E R I
M O K T V E D F S R A I L E C C P S E T M T S P K N K
44~
4460
4470
44~
44~
45~
4510
45~
4530
4~0
45~
45~
TFTAGAATAC ~GAA~GCACAAC~GGAAGCAAGGAAAA ~ A A A A A A T G A ~ A T G A C A ~ A T ~ A
~AGTGATGATGCATCGAAA~GAA~AAG ~ C A ~ A ~ T A T C T A A A ~ C
F R i
P E L H N M E A ~ K T L K N E Y M T I S T S D D A S K ~ N Q G H Y V S K F
4570
45~
45~
4600
4610
46~
4630
4~0
4650
4660
4670
46~
ATGTGTA~CTA~GAGG~CAC~CAACATA~A~ATGGC~C~T ~AGG~CTTCAA~ATGGCATC~AAGAAG~A~C~A~A~AGCTG~GCAA~ ATTTA~CAA
M C 8 L L R L T P T Y Y ff 6 F L V Q A L O L W H H K K I F L G b Q L L Q L F N Q
46~
47~
4710
47~
4730
4~0
4750
47~
4770
47~
47~
4800
~TG~ATGC TAAATACCAT GGACACAACC CTCATGAAAG TCTFrCAAGC ~ACAAAGGG GAGA~CAAG ~CCTTGGAT GAAGGCAGGT A G A ~ A C A ~ G A G A ~ G A G A C A G G ~
N A M L N T M D T T L M K V F O A Y K G E I 0 V P ~ ~ K A G R S Y I E T E T G ~
4810
4820
4830
4~0
4850
48~
4870
48~
4890
4900
4910
49~
~GCAGGGAA ~ C C A ~ A T A ~ A G ~ ~A~CC~G ~ATC~C~ GGACCAA~GGCTG~GAGTGTAGAAGAGATATAAA~GAGCAA~AAGACAATAA~AATAAAGAAA~
~ Q G I L H Y T S S L F H A I F L b Q L A E E C R R D I N R A I K T I N N K E N
4930
4~0
49~
49~
4970
49~
4990
5000
5010
~20
5030
5040
GAGAAGG~TCATGTATAGTG~CA~ATGGAAAG~GACG~AGTAGCTTCA~A~ A~A~C~AATTTCAAAGAGA~GA~CAGCACAA~ AC~GCT~GT~GG~AAC
E K V S C I V N N M E S S D D S S F [ I
S I P N F K E N E A A Q L Y L L C V V N
Fig. 2. For legend see page 3575.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3574
S. Toriyama
and others
5050
~60
5070
50~
50~
51~
5110
51~
5130
5140
51~
51~
TCTTGG~CA GAAAGAAAGAG~GC~GGA ACTTATCTTG GGATATATAAATCTCCAAAGAGTACAA~C AGACA~G~ TGTGATGGAA~CAACTCAG AA~C~CTT ~CTGGTGAT
5 W F R K K E
~ L G T Y L
G I Y K S P K S T T
Q T L F
V M E F N S
E F F F S G D
5170
5180
5190
5200
5210
5220
5230
5240
5250
52~
5270
52~
G~CACAGGCCAACTTTTAGG~GG~A~ G C A G C & G T G C T A ~ A G G A G A G C ~ A G A C ~ G ~ A T A C A O G A A G A 6 ~ C ~ ACA~GAAGGATGTAATAGAAGGTGGAGGA
V H R P T F R W V N
k A V
L I G E
O E T
L 5 G I Q E E
k S N T L K D V I E
G G G
5290
53~
5310
5320
5330
5~0
53~
53~
5370
53~
53~
5400
ACATATGCCC TCACT'I~AT A~GCAAG~ GC~AAGCTA TGATACACTATAGAATG~G GGCAGTAGTG CTTCATCAGT GTGGCC~CA TA~AAA~C ~CTGAA~A C~ATATGAT
T Y A L T F I
V Q V A Q A M I H Y R M L G S S
A S S V W P A Y E T
L L K N S Y D
5410
~20
~
5440
CCTGCAC~GGC~C~C~ A ~ G G ~ A ~ C ~ A A ~ G
P A L G F F L M D N P K C
5530
5~0
5550
55~
~50
5460
~70
~
5490
5500
5510
55~
~ G G C ~ G ~ G~k~CA~CT~A~GTTTGGA~GC~GTACGACGACAC~GGGAGAGAA~A~ATGAGATGATA
A G L L G F N Y N V ~ I A C
T T T
P L G E K Y 8
E ~ I
5570
55~
55~
5~0
5610
56~
56~
5~0
CAAGAAGgAATGAAGG~GA GT~CAGAGC ~AAAATCAG TAACAGAAGATACAA~AAC ACGGGA~AGTFrCACGAAC AAC~TGGTG GGCITrGGAAACAAGAAAAGATGGATGAAA
Q E E M K A E
S Q S
L g S
V T E D T I N T G L Y S R T
T ~ V G F G N K K R ~ ~ K
5650
5660
5670
56~
5690
57~
5710
5720
5730
5740
5750
5760
~CATGACCA CACTGAAT~ GAGTGC~ ~GT~GAAA AG~AGAAGAGGAGCCAAGA~ACTTTT TCCACGCAGCAACAGCTGAACAAATAA~C AGAAAA~GC ~AAA~TG
L ~ T T L N L
S A D V Y E K I E E E P R V ~ F P H A A T A E Q 11
Q K I A I K ~
5770
5780
5790
5800
5810
58~
5830
5~0
58,50
5860
58~
5880
AAGAGTCCCGGTGTGATACAG~ACTGTCT AAAGGAAACATG~GGCAAG GAAGATAGCGTCAA~GTAT TCTTCATATC TAGACATATAGTCTTCACAA~TCCGCTI'A ~ATGATGCA
K S P
G V I O S L S
K G N ~ L A R ~ I A S S V F F l S
R H [
V F T
M S A Y Y D A
5890
5900
5910
5920
5930
5940
5950
5960
5970
59~ .
59~
6000
GACCCTGAGACAAGGAAAACATCA~G~GA~GAG~GA~A~AGCTCTAAA~ACCTCAGAGAC~GA~GCAGGAACC~ ACA~GAAGCCAA~AAAGT~AAG~GAT
D P E T R K T
S L L
K E L
I N S S
K I P
Q R H D Y L Q E P H T L K P T ~ V E V D
~I0
6020
6030
6040
60~
~
~70
60~
~
61~
6110 '
6120
GAGGACAGCTGGGAA~CAAG~AGCAAAAGAGGAATGCG~AGA~G~ AAAACAAAGAATCAAAATACACACTGGGAGAGAAGAGAGA~ T A ~ A G ~ TTTTGTI~CGA AAATATGGCT
E D S
~ E F K
S A K
E E C
V R V L
K O R
I K I
H T G R
E E R
S I S
L L F E
N M A
6130
6140
6150
6160
61~
61~
61~
6200
6210
62~
6230
6240
~CAATGA ~GGGAG~GCACGGACCAGT~G~G~kGAGAAAAT~~CCA~AGCAT~GCACTGAAAATGAA ~A~CTATA ~CAAGAAGGATG~GCACCCAATAGGTAT
K S M I G R C T D Q Y D V R E N V
S I L
A C A L K M N Y S I
F K K D A A P N R Y
6250
6260
6270
62~
62~
6300
6310
6320
6330
6~0
63~
63~
~CC~GA~ AGAAGAAC~T~ATA~CA ~GA~GGAAAGGAAGTA~TGTTTATG~ AAGT~GACAAAGTACATATTGAA~CTGAGAAGAA~AAAGG~CAA~AAA~A
L L D E K N L
V Y P
k I G K E V S
V Y V K S D K V H I
E I S
E K K E R L S T K L
63~
63~
6390
64~
~10
~20
~30
~40
64~
~
~70
64~
I'r3"AATAT~ ~ A A A ~ G A A G G ~ A G A A G A G A ~ CA~A~G~ TCCTAG~ATGGAGA~A~ T~CC~GAAAGAAACAA~ GACCAAGTAA~CCAA~ ~CCATACAC
F N [ D K M K
D I E
E T L
S L L F
P S y
G D Y
L S L ~
E T 1
D O V
T F O S
A I H
64~
65~
6510
6520
6530
6540
6550
6560
6570
65~
6590
6~0
~AGTCAACG AGAGAAGAAGAG~AGGGCA GATGTGCA~ TAACAGGGACAGAAGGATTTTCTAAG~GC CAATGTATAC AGCAG~GTC TGGGCCTGGT ~GATGTGAA GA~ATCCCT
X V N
E R R R
V R A
D V H
L T G T
E G F
S K L
P M Y T
A A V
~ A ~
F D V K
T I P
6610
6620
6630
6640
6650
66~
6670
66~
66~
67~
6710
6720
GCACATGACAGCATTTATA6 AA~AT~GG AAAOT~ACA AAGAACA~ACTC~66~G TCAG~ACAC TGAAAGAGACA~GGAGAA666ACCATI~A AAACA~ACA AGGTGTGG~
A H D S I Y R T I ~
K V Y K E Q y
S ~ L
S D T
L K E T
V E K G P F K T V O G V V
67~
6~0
6750
6760
6770
67~
67~
6800
6810
68~
68~
6~0
AACTTCATTT CTAGAGCTGGT~GAGATCG AGAGTCGTCCATCTAGTAGG6TCAITrGGT AAGAATGTCAGGGGTAGCATAAAT~GGTG ACGGCAATAAAAGA~ACTT TAGCAACGGA
N F I
S R A G V R S R V V H L V G S F G K N V
R G S I
N L V T A I
K D N F
S N G
6850
6860
6870
6880
68~
69~
6910
69~
6930
6~0
6950
6960
~AG']TF~CAAAGGGAATATA~CG~ATCAAGGCAA~AAAE~AGAGAAAG'F~]'GG~A A ~ A C ~ CA~GCACCA~CTCAGGCAC~ATCA~AAGCATG~AAGAAC
L V F
K G N I
F D I
K A g
K T R E
S L D
N Y L
S ] C T
T L S
Q A P
I T K H
D K N
6970
69~
6990
70~
7010
70~
70~
7040
70~
70~
7070
70~
CAGATTT~GC GCTCTCUTTT C~CAG~GT CCAAGAATCCAGTATG~ A~ACAGTTT GGA~AAGAAGAAACAGGATG~AATA~A CAAGAAG~GTGGCAGATGATCCAA~CTA
Q I L
R S L F V S G P R I
Q Y V S S Q F
G S R R N R M S I L
Q E V V A D D P T L
~
71~
7110
7120
7130
7140
7150
71~
7170
71~
71~
72~
C A ~ C ~ G ACCAAGACACAA~CAGAAACAGCTAGAAGACA~CAG AGAA~AGCACACAAGGAGCTCCCAITI~ AACAGAGAAG~GTT~CACG A ~ A T ~ A AAAGATAGAG
H ~ P
D Q D T
S Q K
Q L E
D K F R
E L A
H K E
L P F L
T E K
V F B
D Y L E
K I E
7210
7220
7230
7~0
72~
72~
7270
72~
72~
7300
7310
73~
CAGCTAATGA AGGAGAACACTCAT~AGGT GGTAGGGATG~GATG~AG CAAAACCCCATATGTGC~G CCAGAGCAAATGATA~GAA ATACA~G~ ATGAGTT~G GAGAGAGTAT
Q L M
K E N T
H L G
G R D
V D A S
K T P
Y V L
A R A N
D I E
I H C
¥ E L W
R E ¥
7330
7~0
7350
7360
7370
73~
7390
7400
7410
7420
7430
7440
GATGAGG~G AAGATGAAGCA~CCAGG~ TA~GCAGTG AAG~GAGGCTGCTA~GAT CAAG~AAAC~AATGCT~ A~AGAGAGATACC~AG ACCCTAAAGCAAA~GGA~
U E D
E D E A
Y Q A
Y C S
E V E A
A ~ D
O E K L N A k
I E R
Y H V
D P K A
N W I
74~
~60
7470
74~
74~
7500
7510
75~
75~
7~0
7550
75~
CAAATG~AA TGAATGGTGA GA~GAAACA G~GAAGAGCTGAACAAGCTTGACAAGGGGTTTGAGAGCC ACAGACTrGCT~A~CGAA AGAA~GG TGGGGAA~TTGGAATTTTA
Q M L M N G E
I E T
V E E L N K L D K G F E S
H R L A L V E R I R V G K L
G I L
Fig. 2. F o r l e ~ n d s e e o p p o s i t e .
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3575
Complete sequence o f the R S V g e n o m e
7570
7580
7590
76~
7610
76~
7630
7~0
76~
76~
7670
76~
GGCAG~ACA CCAAGTGTCA ACAGAGAA~GAGGAG~AGATG~GAAGGTAATAAGA~CATAGATACACAGGAGAAGG GATATGGAGAGG~CA~CG ATGA~C~A ~TTTGCATA
G S Y T K C Q Q R I
E E L D G E G N K Y H R Y T G E G ] W R G S F D D S D V C I
76~
7700
7710
77~
7730
7~0
77~
77~
77~
77~
7790
7800
GTTGTCCAAG AC~GAAGAA6ACAAGAGAGAG~ACTTAA AAT~GTCGT |TFIICCAAA GTGTCAGA~ ATAAAGTCTT GATGGGCCAT~GAAGACATGGTGCAG~AACACCATA~
V V Q D L K K T R E S Y L K C V V F S K V S D Y K V L
W G H L K T W C R E H H I
7810
7820
7830
7~0
78~
78~
78~
7880
78~
7900
7910
7920
AGTA~GATG AG]7"rC~AC ~GTA~CAG AAAGAGCTrTTAAG~ATGG TG~ACCAAGAG~CAG~C TA~GTACAA GATGAA~GAA~AAAATGTTGAGGAACATGGAAAAAGGT
S N D E F P T C T Q K E L L S Y G V T K S S V L L Y K M N G W K M L R N M E K G
7930
7940
7950
79~
79~
79~
79~
8000
~i0
~
~
~40
A~C~CTGT A~GGA~CC TAGCTTGTCAA~AGAAGCCAAACTI'ATATCAA~GGC~ G~G~GATA TCACAGATCATAGCTTACGGC~AGGAACAGAACTGTI"GAGAATGGGAGA
I
P L
~50
Y
~
N P
~60
S
L
S
~70
T
R
S
Q T
~
Y
l
~
N W L
81~
A
V
D
8110
I
T
D H
S
8120
L
R
L
8130
R
N
8140
R
T
V
E
N
8150
G R
81~
ffI~GTAAATC AAACAATCATGGTTG~CCT ~6TACAAAA CTG~GCA GATA~CAAAACAT~C~G TAGATCTTGA GCAAGA~TGCAGAA~ATAGACTTAAG~ A~ATCAGTA
V V N Q T ] ~ V V P L Y K T 9 V O I F K T S P V D L E Q D V Q N D R L K L L S V
8170
81~
8190
8200
8210
82~
8230
8~0
82~
8260
82~
82~
ACGAAAGCTG GGGAG~GAGA~GCTTCAA GA~GGATAATGTGGAGATC ATCTG~GTA GACGAITrGAACATA~AAACCAGG~AGAAGAAATAAGG~GCAAGGGATCKITVIAAT
T K A G E L R ~ L Q D ~ I
W ~ R S S A V D D L
N I L N O V R R N K A A R D H F N
82~
8300
8310
8320
8330
8340
83~
8360
8370
83~
83~
~00
GCTAAACCAG AG~CAAAAAA~GATAAAAGAGCTGTGGG A ~ G C A ~ TGACACCACA~AA~AATA AGAAAGTCTTCATAAC~CA CAAGGATCAGAGTCACAGAGCACAGTITCT
A K P E F K I
W I K E L ~ D Y A L D T T L ] N K K V F
I T T Q G S E S Q S T V S
~I0
~20
~30
~40
~50
~
~
~
~90
85~
8510
85~
~AGGAGATA GCGACAGTGC A~GGCAC~ ~AA~GATG AGGCAGTGGATGAGA~CAT GAT~C~AG ACAAAGAG~AGAAAAGGGCACCTTAAAACAGATCATCCATGATGCAACC
S G D S D S A V A P L T D E A V D E I H D L L D K E L
E K G T L K Q I I H D A T
8530
8~0
8550
85~
8570
~
8590
8600
8610
8620
8630
8~0
~CGATGCCC AGC~GATATCCCTG~ATA GAGAGC~CCTGG~GAAGAAATGGAGGTG~CAAGAGTAGCTTAGCCAA GAGCCACCCT CTT~A~AA A~ATG~AG GTACATGA~
] b A O L D I
P A I
E S F L A E E ~ E V F K S S L A K S H p L L L N y V R y ~ i
8650
8660
8670
~
86~
87~
8710
87~
8730
8~0
8750
8760
CAAGAGATAGGTGTGACCAA C~CAGATCA~GA~GATA GCT~AATCA GAAAGATCCC~GA~AGTG TGTCTCTAAG CATCCTAGAC ~GAAAGAAGTG~CAAGTT TGTGTACCAG
QEI
G V T N F R S
LID
S F N Q [ D P
L K S V S L S
] L D L K E V F K F V Y Q
8770
87~
87~
8~0
~i0
88~
8830
8840
88~
~
8870
~
GACATAAATG ATGC~AT'rT TG~AAACAG GAAGAAGACCATAAG~CGA "FI'rCTGAGAA GTC~CTrCA ACAAAGGGACTGCAGCACAAACACAAG~C AGACACCA~ GAA~CCATA
D I N
D A Y F
V K Q
E E D
H K F D
F ,
8890
8900
8910
8920
8930
~40
89~
8960
89~
CAAATATTTC ACG!1FIATC CCTTATGACT TAGAITI-ICAATAATTAAI'TATATAAACAAAAACAIl llG 1ITICCTCTG GACTTTGTGT 3'
Fig. 2. Complete nucleotide sequence of RSV R N A segment 1 and predicted amino acid sequence. The sequence is of the (-t-) strand
R N A written as D N A . The amino acid sequence, represented as the single-letter amino acid code, is shown below the nucleotide
sequence. The asterisk (*) indicates the ( U G A ) stop codon.
polymerase (Toriyama, 1986a). A discrepancy in M r
values has also been reported for the L (RNA polymerase) proteins of Bunyamwera and tomato spotted
wilt viruses (Elliott, 1989; De Haan et al., 1991). The
other four putative viral proteins predicted from the
sequence have not yet been identified, although in vitro
translation experiments indicated the presence of these
genome products (Hamamatsu et al., 1993).
RSV and RGSV tenuiviruses have virus-associated
RNA-dependent RNA polymerase activity, the level of
which is comparable to that of vesicular stomatitis virus
(Toriyama, 1986 a, 1987). The solubilized proteins exhibit
model template-dependent RNA synthesis in vitro (Barbier et al., 1992). The SDD tripeptide motif (Poch et al.,
1989) was found in the putative RNA polymerase of
RSV, at amino acid residues 1486 to 1488 and 1634 to
1636 of RSV Pol protein. These two SDD motifs are also
present in the L protein of phleboviruses (Elliott et al.,
1992), and the second SDD motif is present in the L
protein of segmented negative strand viruses (Poch et al.,
1989). Other prominent conserved sequences are found
in the extreme 5'- and 3'-terminal nucleotide sequences
of RSV and phleboviruses (Kakutani et al., 1990;
Takahashi et al., 1990). The terminal base-paired, panhandle structure (Takahashi et al., 1990) is presumed to
have an important role in the initiation of transcription
of influenza virus (Hsu et al., 1987). Thus, it is likely
that these highly conserved sequences are essential for
replication and transcription of these viruses.
An additional distinguishing similarity of RSV and
phleboviruses is that they have an ambisense genome:
RNAs 2, 3 and 4 of RSV, and the S RNA of
phleboviruses. The intergenic sequence of the S RNA
centrally located between two ORFs is G-rich in three
phleboviruses, TOSV, Sicilian sandfly fever and RVFV,
and AU-rich in the other two phleboviruses, U U K V and
Punta Toro virus (Giorgi et al., 1991). The intergenic
sequences of the ambisense genome RNA 3 of the
tenuiviruses RSV and MStV are all AU-rich (Kakutani
et al., 1991; Zhu et al., 1991; Huiet et al., 1991),
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
S. Toriyama and others
3576
(a)
(c)
(b)
2103
I
2095
209
!
7
/
.
• ':'~:.
':
.1
i ,:
(d)
.
..,
2092
...:
"" ."i~"~.'
" i
""
"
"
"
.
. ..
•
:
•
. . . . . .
. ' .. .
•
..
"~
..
" ' \ .;
.
.....
,
.
..
e.w
,••: ii•••:'::: ' :
:£
~
•: • (• •i
.'
. "
,,.
>.
,
• :• 11 •
291
2919
UUKV
RVFV
1.i. ,
2919
-f
i
'
: " '.
2103
RvFV
TOSV
(e)
(g)
Z238
1
2210
l
L t ii!il!!!!iii I
I
2875
2919,
2919
BUNV
2919
T SWV
h
1022
I
1363
"
~'"
-.
TV
,. ..
,
1460
I
I
"
I
1810
T V
Fig. 3. Dot-plot comparisons of the predicted Pol protein of RSV and the L proteins of phleboviruses (UUKV, RVFV, and TOSV),
tomato spotted wilt virus, Bunyamwera virus and Tacaribe arenavirus, made using the Protein Homology Plot (window 10; stringency
5). The comparisons show (a) RSV and U U K V ; (b) RSV and RVFV; (c) RSV and TOSV; (d) U U K V and RVFV; (e) RSV and
Bunyamwera virus (BUNV); 0") RSV and tomato spotted wilt virus (TSWV); (g) and (h), RSV and Tacaribe virus (TV), shown with
expanded scale in (h). Sequence data were obtained from Elliott e t al. (1992) (UUKV), Muller e t al. (1991) (RVFV), Accardi e t al. (1993)
(TOSV), De Haan e t al. (1991) (TSWV), Elliott e t al. (1989) (BUNV) and Iapalucci e t al. (1989) (TV).
suggesting that tenuiviruses, in this respect, are similar to
UUK or Punta Toro phleboviruses.
Given the strong similarity between RSV and phleboviruses, we propose that RSV and the other tenuiviruses,
MStV, RHBV and RGSV (Francki et ai., 1991), should
be classified in the family Bunyaviridae, but in the genus
Tenuivirus not Phlebovirus. This is because the genome of
tenuiviruses comprises four segments (RSV and RHBV)
or five segments (MStV), while all phleboviruses have
three RNA segments, although the genetic organization,
expression strategies, amino acid and nucleotide sequence similarities strongly suggest that these viruses
have evolved from a common ancestor.
No significant homology was observed between the
Pol protein and the L protein of other members of the
Bunyaviridaeo including tomato spotted wilt virus, previously the sole plant-infecting member of the Bunyaviridae (De Haan et al., t991). A weak homolgy was found
with the L protein of Tacaribe virus of the Arenaviridae
(Iapalucci et al., 1989). It is probable that Tenuivirus is in
a unique position in the evolution of these ambisense
genome viruses.
One of the differences between tenuiviruses and other
members of the Bunyaviridae is viral particle morphology. Virions of tenuiviruses are thin filamentous
particles which are pleomorphic: partially or completely
unfolded coiled filaments, branched configurations, or
circular filaments (Koganezawa et al., 1975; Toriyama,
1982; Ishikawa et al., 1989). So far, enveloped spherical
particles (the morphology of virions of other Bunyaviridae) have not been observed for tenuiviruses, despite
extensive examination by electron microscopy of infected
plant and insect tissues. Immunogold labelling with antiIgG to nucleoprotein of RSV resulted in labelling of
amorphous or membranous structures in the cytoplasm
of the small brown planthopper L. striatellus (Suzuki et
al., 1992). Observations of thin filamentous particles or
circular filamentous particles of tenuiviruses seem to
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3577
Complete sequence of the RSV genome
RSV Pol
Uukuniemi L
RSV Pol
1350
1400
LKNQEQSRSKHVI DAGGNI SASVK~Lg~N~I~LTTL INDETPGIIELKI VVDLLPKAMEVLNKNEeNHI ~ I FIi'tf:N~LR~ ~ ' ~
I FEi~I MOKTV
DLERLATLKASSNFNEEWYQKRGD~i~i~I~V;KY~ . . . . . ~SSSH~HH)MEECLRKVESOG~V~i~IP~i~;FE~VV~L~'i
9OO
1450
1500
EDFS~Ai~LECCP~SPiiNIIFRiN~NMENRIi"irLKNEYMTI S~DDASK~NQG~S~NCN~LRLTPTYYIi(I~LVQALQL~'HtlKK:~
FLGDQ~QLF
Uukuniemi L
950
RSV Pol
1000
1550
1600
N0N.AMLN'I3~I)T'r'EM~VFQ
A ~ GEI OYF~MKA~RSYI ETmlit~t!;t~IL~LFIiA I FH)QL~EECRRDI NR.~I KTI NNKENEKVSCI ~NNMESSDD$
Uukuniemi L
1050
RSV Pol
11 O0
1650
1700
SFI i:SI~NFKENEAAQLYELCVVNSWNRKi~EKLGT~SPI~S~Q~FVMEF~SEFFFS
GDVHI~I~TFN~N~VEiGEI1E~SGI qEELSNTLKDVI
................
Uukuniemi L
~SV Pol
Uukuniemi L
RSV Pol
Uukuniemi L
RSV Pol
Uukuniemi L
:~:. . . .
.....
......... ............
...................
~:
. ~.,:~.. ;':
:~. ;::;::.:I:
.
~:l::::.
*.. :~..
:.
GMMI~IFPSTDKGATGKYR~SAL--IFKYKKVI~:~'Ii~i~SSV~$TNNTLttLL~N~HINHNR~LNITACDT~:SEOES~ASRO~MYIqNLTSVL
1150
1200
1750
1800
~G~TYALTFIV~V.AOAMI~RM~(ISSA~SV~PA~TLLKNSYD~F~I)N~](CA~FN~IACTTrP~EK~HEMI OEEMKAESOSLI(SVTE
~SFSi]VSFC~FG~LLL~TL~MTV;i~LFLE,~I K~VSEI K ~ i ~ S ~ Y ~ H ~ F G S ~ S ; F k ~ A Y Q N S I ~ S i ~ R S i i i ; I QN;F;XXP~kTLb
1250
1300
1850
1900
DTINTGL~SRT~MVGNN~R~MK~MTT~S~VYEK~EEEORV'[FFHAATAEQI I OKi~I E~IKgPNI Og~Nfa~MLA~K~!A~NFFI ~N~VF- TMS
~ S G T ~ - Q S ~ i i R~Di~QR~VDI~P~;WLDVi~KN~Ei VYRI?PR;GFEVSLRi ~ ' H ~ S N ~ i ~ c
) I ~ V i S ~ @ i L~SI:LSD;LA
1350
1400
1950
2000
~ECVRVQiQR IKIH~GREERSISLLFENMAKSM~
AYNADPNRKTS~L~ELINSSKIPQRItDYLQEP~TK~VDEDSWEFK~CRCTDQYDVR
WL~iEEEViiRP~Y~VMNQPELDLHS
RLTPiQLS~N~MM~FEKLQTHLR
~Yk~:I~GEF?SliiiVITQ~RVNILETE~il~iiPE,
;ii.iDKWI~CFTRT
1450
1500
Fig. 4. Amino acid sequence homology between the predicted protein Pol of RSV and the L protein of UUKV. Identical residues are
indicated by two dots and are shaded; consensus amino acid similarities are indicated by one dot. Gaps inserted in the sequences to
maximize homology are indicated by dashes. Sequence data were obtained from Elliott et al. (1992), in which the RNA polymerase
motifs of UUKV were indicated by underlining in the amino acid sequence.
1
vRNA1
5'
cRNA1
3'~
vRNA2
1
5' ~
cRNA2
3'
vRNA3
5
cRNA3
3'
vRNA4
5
cRNA4
3'
8970
I~
Pol (336.8K)
]ml5'
3514
3'
fl
94K
~ 5'
2504
1•
l•
~
3'
3'
~
N (35:1K) i
~ 5'
2157
3'
~
32.4K
]" 5'
Fig. 5. Genome structure and coding arrangement of RSV. Black lines are genomic RNAs, with the nucleotide numbers on both ends.
The open reading frame and its direction are indicated with arrows on vRNA and cRNA. Shaded arrows show that the corresponding
proteins have been found : Pol, the putative RNA polymerase protein (probably the 230K protein) ; N, nucleocapsid; Ns, non-structural
protein (S protein).
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
3578
S. Toriyama and others
suggest that these particles might correspond to the
nucleocapsids of enveloped viruses of Bunyaviridae or
Arenaviridae (von Bonsdorff et al., 1969; Palmer et al.,
1977).
Grateful acknowledgement is made to Professor M. Kojima of
Niigata University for his continuing interest, encouragement and
discussions throughout this work. We also thank Dr N. Ogasawara of
Plant Biological Defence System Lab. Co., Ltd for providing DNA
sequencing facilities, and Dr T. Ogasawara of Hitachi Software
Engineering Co., Ltd for the homology search of nucleotide and
protein sequences. This work was supported in part by Grants-in-Aid
from the Ministry of Agriculture, Forestry and Fisheries [Biocosmos
Program-94-(4)].
References
ACCARDI, L., GRO, M. C., BONITO, P. D. & GIORGI, C. (1993). Toscana
virus genomic L segment: molecular cloning, coding strategy and
amino acid sequence in comparison with other negative strand RNA
viruses. Virus Research 27, 119-131.
BARBIER,P., TAKAHASHI,M., NAKAMURA,I., TORIYAMA,S. 8~ ISHIHAMA,
A. (1992). Solubilization and promoter analysis of RNA polymerase
from rice stripe virus. Journal of Virology 66, 6171-6174.
DE HAAN, P., KORMELINK, R., RESENDE, R. O., VAN POELWIJK, F.,
PETERS, D. & GOLDBACH, R. (1991). Tomato spotted wilt virus L
RNA encodes a putative RNA polymerase. Journal of General
Virology 72, 2207 2216.
ELLIOTT, R. M. (1989). Nucleotide sequence analysis of the large (L)
genomic RNA segment of Bunyamwera virus, the prototype of the
family Bunyaviridae. Virology 173, 426436.
ELLIOTT, R. M. (1990). Molecular biology of the Bunyaviridae. Journal
of General Virology 71, 501 522.
ELLIOTT, R.M., SCHMALJOHN, C.S. & COLLETT, M.S. (1991).
Bunyaviridae genome structure and gene expression. Current Topics
in Microbiology and Immunology 169, 91-141.
ELLIOTT, R. M., DUNN, E., SIMONS, J. F. & PETTERSSON, R. F. (1992).
Nucleotide sequence and coding strategy of the Uukuniemi virus L
RNA segment. Journal of General Virology 73, 1745-1752.
FALK, B. W. & TSAI, J. H. (1984). Identification of single- and doublestranded RNAs associated with maize stripe virus. Phytopathology
74, 909-915.
FRANCKI, R. I. B., FAUQUET,C. M., KNUDSON, D. L. & BROWN, F.
(editors) (1991). Classification and Nomenclature of Viruses. Fifth
Report of the International Committee on Taxonomy of Viruses.
Archives of Virology supplementum 2, 398-399.
GIORGI, C., ACCARDI, L., NICOLETTI, L., GRO, M. C., TAKEHARA, K.,
HILDITCH, C., MORIKAWA, S. & BISHOP, D. H. L (1991). Sequences
and coding strategies of the S RNAs of Toscana and Rift Valley
fever viruses compared to those of Punta Toro, Sicilian sandfly
fever, and Uukuniemi viruses. Virology 180, 738-753.
GUBLER, U. & HOFFMAN, B.J. (1983). A simple and very efficient
method for generating cDNA libraries. Gene 25, 263 269.
HAMAMATSU, C., TORIYAMA, S., TOYODA, T. 8¢; ISHIHAMA, A. (1993).
Ambisense coding strategy of the rice stripe virus genome: in vitro
translation studies. Journal of General Virology 74, 1125 1131.
HANAHAN,D. (1985). Techniques for transformation of Escherichia
coli. In DNA Cloning: A Practical Approach, vol. 1, pp. 109 135.
Edited by D. M. Glover. Oxford: IRL Press.
HENIKOEE, S. (1984). Unidirectional digestion with exonuclease III
creates targeted breakpoints for DNA sequencing. Gene 28,
351 359.
Hsu, M.-T., PARVIN, J. D., GUPTA, S., KRYSTAL, M. & PALESE, P.
(1987). Genomic RNAs of influenza viruses are held in circular
conformation in virions and in infected cells by a terminal panhandle.
Proceedings of the National Academy of Sciences, U.S.A. 84,
814(~8144.
HUIET, L., KLAASSEN,V., TSAI, J. H. & FALK, B. W. (1991). Nucleotide
sequence and RNA hybridization analyses reveal an ambisense
coding strategy for maize stripe virus RNA 3. Virology 182, 47-53.
HUIET, L., TSAI, J. H. & FALK, B.W. (1992). Complete sequence of
maize stripe virus RNA 4 and mapping of its subgenomic RNAs.
Journal of General Virology 73, 1603-1607.
HUIET, L., TSAI, J. H. & FALK, B. W. (1993). Maize stripe virus RNA5
is of negative polarity and encodes a highly basic protein. Journal of
General Virology 74, 549 554.
IHARA, T., SMITH, J., DALRYMPLE, J. M. & BISHOP, D. H. L. (1985).
Complete sequences of the glycoproteins and M RNA of Punta Toro
phlebovirus compared to those of Rift Valley fever virus. Virology
144, 246-259.
ISHIKAWA, K., OMURA, T. & HIBINO, H. (1989). Morphological
characteristics of rice stripe virus. Journal of General Virology 70,
3465 3468.
IAPALUCCI, S., LOPEZ, R., REY, O., LOPEZ, N., FRANZE-FERNANDEZ,
M.T., COrrEN, G.N., LUCERO, M., OCHOA, A. & ZAKIN, M . M .
(1989). Tacaribe virus L gene encodes a protein of 2210 amino acid
residues. Virology 170, 40~7.
KAKUTANI, T., HAYANO, Y., HAYASHI, T. & MINOBE, Y. (1990).
Ambisense segment 4 of rice stripe virus: possible evolutionary
relationship with phleboviruses and uukuviruses (Bunyaviridae).
Journal of General Virology 71, 1427-1432.
KAKUTANI, T., HAYANO, Y., HAYASHI, T. & MINOBE, Y. (1991).
Ambisense segment 3 of rice stripe virus: the first instance of a virus
containing two ambisense segments. Journal of General Virology 72,
465~,68.
KOGANEZAWA, U., Dol, Y. & YORA, K. (1975). Purification of rice
stripe virus. Annals of the Phytopathological Society of Japan 41,
148-154.
MULLER, R., ARGENTINI, C., BOULOY, M., PREHAUD, C. ~ BISHOP,
D. H. L. (1991). Completion of the genome sequence of Rift Valley
fever phlebovirus indicates that the L RNA is negative sense or
ambisense and codes for a putative transcriptase-replicase. Nucleic
Acids Research 19, 5433.
PALMER,E. L., OBIJESKI,J. F., WEBB,P. A. & JOHNSON,K. M. (1977).
The circular, segmented nucleocapsid of an arenavirus-Tacaribe
virus. Journal of General Virology 36, 541-545.
POCH, O., SAUVAGET, I., DELARLrE, M. & TORDO, N. (1989).
Identification of four conserved motifs among the RNA-dependent
polymerase encoding elements. EMBO Journal 8, 3867-3874.
RAMIREZ, B.-C., LOZANO, I., CONSTANTINO, L.-M, HAENNI, A.-L. &
CALVERT, L.A. (1993). Complete nucleotide sequence and coding
strategy of rice hoja blanca virus RNA4. Journal of General Virology
74, 2463 2468.
RONNHOLM, R. & PETTERSSON, R.F. (1987). Complete nucleotide
sequence of the M RNA segment of Uukuniemi virus encoding the
membrane glycoproteins G1 and G2. Virology 160, 191 202.
SANGER, F., NICKLEN, S. 8¢ COULSON, A. R. (1977). DNA sequencing
with chain-terminating inhibitors. Proceedings of the National
Academy of Sciences, U.S.A. 74, 5463-5467.
SCHMALJOHN, C.S. (1990). Nucleotide sequence of the L genome
segment of Hantaan virus. Nucleic Acids Research 18, 6728.
SUZUKI, Y., FUJI, S., TAKAHASHI,Y. & KOHMA, M. (1992). Immunogold
localization of rice stripe virus particle antigen in thin sections of
insect host cells. Annals"~?fthe Phytopathological Society of Japan 58,
48(L484.
TAKAHASHI, M., TORIYAMA, S., KIKUCHI, Y., HAYAKAWA, T. &
ISHIHAMA, A. (1990). Complementarity between the 5'- and Yterminal sequences of rice stripe virus RNAs. Journal of General
Virology 71, 2817 282I.
TAKAHASHI, M., TORIYAMA,S., HAMAMATSU,C. & ISHIHAMA,A. (1993).
Nucleotide sequence and possible ambisense coding strategy of rice
stripe virus RNA segment 2. Journal of General Virology 74,
769-773.
TORIYAMA,S. (1982). Characterization of rice stripe virus: a heavy
component carrying infectivity. Journal of General Virology 61,
187-195.
TORIYAMA,S. (1983). Rice stripe virus. CMI/AAB Descriptions of Plant
Viruses, no. 269.
TOR1YAMA,S. (1986a). An RNA-dependent RNA polymerase associated with the filamentous nucleoproteins of rice stripe virus. Journal
of General Virology 67, 1247 1255.
TORIYAMA, S. (1986b). Rice stripe virus: prototype of a new group of
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06
Complete sequence o f the R S V genome
viruses that replicate in plants and insects. Microbiological Sciences
3, 347-351.
TORIYAMA, S. (1987). Ribonucleic acid polymerase activity in filamentous nucleoproteins of rice grassy stunt virus. Journal of General
l/Trology 68, 925-929.
TORIYAMA, S. & WATANABE,Y. (1989). Characterization of single- and
double-stranded RNAs in particles of rice stripe virus. Journal of
General Virology 70, 505 511.
VON BONSDORFF,C. H., SAIKKU, P. & OKER-BLOM, N. (1969). The inner
structure of Uukuniemi virus and two Bunyamwera supergroup
arboviruses. Virology 39, 342 344.
YANBCH-PERRON, C., VIEmA, J. & MESS1NG,J. (1985). Improved MI3
3579
phage cloning vectors and host strains : nucleotide sequences of the
M13mpl8 and pUC19 vectors. Gene 33, 103-119.
ZHU, Y., HAYAKAWA, T., TORIYAMA, S. ~,L TAKAHASHI, M. (1991).
Complete nucleotide sequence of RNA 3 of rice stripe virus: an
ambisense coding strategy. Journal of General Virology 72, 763-767.
ZHU, Y., HAYAKAWA,T. & TORIYAMA,S. (1992). Complete nucleotide
sequence of RNA 4 of rice stripe virus isolate T, and comparison
with another isolate and with maize stripe virus. Journal ofGeneral
Virology 73, 1309-1312.
(Received 7 June 1994; Accepted 19 August 1994)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 14 May 2017 00:09:06