Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Journal of General Virology (1994), 75, 3569-3579. Printedin Great Britain 3569 Nucleotide sequence of RNA 1, the largest genomic segment of rice stripe virus, the prototype of the tenuiviruses S h i g e m i t s u T o r i y a m a , 1. M a m i T a k a h a s h i , l Y o s h i t a k a S a n o , 2 T a k u m i S h i m i z u ~ and Akira I s h i h a m a 3 1National Institute of Agro-Environmental Sciences, Kannondai 3, Tsukuba, Ibaraki 305, 2 Graduate School of Science and Technology, Niigata University, Ikarashi, Niigata 950-21 and 3National Institute of Genetics, Mishima, Shizuoka 411, Japan The complete nucleotide sequence of RNA 1, the largest genomic segment of rice stripe virus (RSV), was determined using two sets of overlapping cDNA clones. RNA segment 1 comprises 8970 nucleotides and on the viral complementary sequence has a single long open reading frame coding for a protein of 2919 amino acids with an estimated M r of 336860. Amino acid sequence comparisons of the putative protein indicated strong homology (30% amino acid identity over about 1500 residues) with the L protein of the genus Phlebovirus of the Bunyaviridae, but no detectable similarity with other members of the Bunyaviridae. However, weak similarity was detected with the L protein of Tacaribe arenavirus. The highly homologous sequence domain includes the conserved motifs of the putative RNA-dependent RNA polymerase. The data presented here, along with previous work clearly show significant similarities in genome organization, structure and expression between RSV and members of the genus Phlebovirus of the Bunyaviridae. Taken together, we propose that tenuiviruses should be included in the Bunyaviridae under the genus Tenuivirus. Introduction suggest that all three RNA segments have ambisense coding strategies. This was also experimentally shown by in vitro translation of RNA transcribed from the cDNA sequences (Hamamatsu et al., 1993). The 3'- and Y-terminal sequences of approximately 18 nucleotides are conserved among all four RNA segments and are complementary to each other, except for one base change (U to A) at the sixth position from the 3' end of ssRSV RNA 1 (Takahashi et at., 1990). Moreover, eight terminal nucleotides out of ten conserved nucleotides are identical to those present in the terminal consensus sequences of the genus Phlebovirus of the family Bunyaviridae (Elliott, 1990; Elliott et al., 1991; Kakutani et al., 1990; Takahashi et al., 1990). Weak but significant amino acid sequence similarity exists between the nucleocapsid proteins from RSV and Punta Toro phlebovirus (Kakutani et al., 1990). Likewise, similarity exists between the putative M r 94K protein of RSV RNA segment 2 and the membrane glycoproteins of Punta Toro and Uukuniemi phleboviruses (Ihara et al., 1985; R6nnholm & Petterson, 1987; Takahashi et al., 1993). These observations suggest an evolutionary relationship between RSV and the phleboviruses. The tenuiviruses include maize stripe virus (MStV), rice hoja blanca virus (RHBV), rice grassy stunt virus (RGSV) and three other possible members (Francki et Rice stripe virus (RSV), the prototype of the genus Tenuivirus, has a broad host range in the Gramineae and causes serious damage to rice, particularly Japonica-type rice varieties (Toriyama, 1983; Francki et al., 1991). RSV is transmitted by the small brown planthopper Laodelphax striatellus Fall6n, and planthoppers of three other species. In planthoppers, RSV replicates and is transovarially transmitted to a high percentage of the progeny (reviewed in Toriyama, 1986b). The genome of RSV comprises four ssRNA segments; as well as low levels of four dsRNAs, duplexes of vRNA and its complementary RNA can also be detected (Toriyama & Watanabe, 1989; Ishikawa et al., 1989). The dsRNAs found in tenuiviruses seem to be artifacts generated by annealing of complementary strands (Falk & Tsai, 1984). The complete nucleotide sequences have been determined for RNAs 3 and 4 from two different isolates (Kakutani et al., t990, 1991; Zhu et al., 1991, 1992) and for RNA 2 from one isolate (Takahashi et al,, 1993). The results The DDBJ accessionnumber for the sequence of RSV RNA 1 is D31879. 0001-2698 © 1994SGM Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3570 S. Toriyama and others al., 1991). Recent nucleotide sequencing studies of RNAs 3 and 4 of MStV (Huiet et al., 1991, 1992) and RNA 4 of RHBV (Ramirez et al., 1993) showed strong homology in the RNA 3 and 4 sequences between MStV, RHBV and RSV. The 18 nucleotide terminal sequences are conserved in RNAs 3 and 4 of MStV, RHBV and RSV. All of these RNAs have an ambisense coding strategy. Filamentous particles of RSV and RGSV are associated with a high level of RNA-dependent RNA polymerase activity. A minor polypeptide, M r 230K, constituting purified filamentous virus particles of RSV and RGSV, is considered to be the RNA polymerase protein (Toriyama, 1986a, 1987). The largest genome segment, RNA 1, is presumed to encode this 230K RNAdependent RNA polymerase, because this RNA segment alone is large enough to encode the 230K protein. In this paper, we present the nucleotide sequence of the RNA segment 1 of RSV. Analysis of the amino acid sequence of the predicted open reading frame reveals that RNA segment 1 does indeed encode the RNA polymerase. A high degree of homology was found between RSV RNA 1 and the L RNAs of phleboviruses. This homology was even greater than that detected for RNAs 2 and 3 of RSV. Methods Virus andplant. RSV isolate T was propagated in wheat plants with transmission by the viruliferous small brown planthopper L. striatellus, and purified as described previously (Toriyama, 1986a; Toriyama & Watanabe, 1989). The nB component, which contains RNA segment 1, was further purified at least twice by centrifugation on linear 5 to 35 % sucrose gradients. RSV RNA was prepared as described previously (Toriyama, 1986a). ssRNA I was separated from RNAs 2, 3 and 4 by electrophoresis in 1% low-melting-point agarose gel (LGT agarose; Nakarai Chemicals) (Toriyama & Watanabe, 1989). cDNA synthesis and cloning, vRNA-dependent cDNA synthesis was done by the method of Gubler & Hoffman (1983) using M-MLV reverse transcriptase lacking RNase H activity (BRL) and a synthetic oligonucleotide, primer A, with the sequence 5' AGAGGAAAAAATAATTTTGA 3', which is complementary to the unique nucleotide sequence located at nucleotides 11 to 30 from the 3' end of RNA 1 (Takahashi et al., 1990). The cDNA was blunt-ended with 1"4 DNA polymerase and inserted into the Smal site of pUC 18 (Yanisch-Perron et al., 1985). Recombinant plasmids were transformed into the competent Escherichia coli strain JM109 (Nippon Gene Company; Hanahan, 1985). Four independent clones were obtained, all of which contained the 3'-proximal sequence of RNA segment 1 (Takahashi et al., 1990). Two clones, pRS1S61 and pRS1S207, were used for the sequence determination of the 3' half of RNA 1 (Fig. 1). To obtain cDNA clones for the 5' half of RNA segment 1, primer B (5' TATCTTGGGTATCTAAAGAA Y), from the 3'-proximal region nucleotide sequence of clone pRS1S61, was used. The ds-cDNA was tailed with dCTP using terminal deoxynucleotidyl transferase (BRL), and annealed with Pstl-cut pUCI9 vector which was previously tailed with dGTP. The recombinant plasmids were transformed into E. coli DH5~ F' (BRL). Seven independent clones were isolated, of which two clones, pRS1CI7 and pRS1C18, were used for the sequencing (Fig. 1). DNA sequencing. Four recombinant plasmid clones, pRS1S61, pRSIS207, pRS1CI7 and pRS1C18, were digested with restriction enzymes and the resulting fragments were subcloned into the M I3 mpl8 or mpl9 phage vectors, or into pUC18 or pUC19 plasmid vectors. Alternatively, a nested set of deletions was prepared from the inserted DNAs of subclones of the four clones using the Kiro sequence deletion kit (Takara Shuzou; Henikoff, 1984; Yanisch-Perron et al., 1985). The ss- and dsDNAs prepared were sequenced using the Sequenase version 2.0 kit (United States Biochemicals) and [c~-35S]dCTP (Amersham) (Sanger et al., 1977). Sequencing in one direction for clone pRS1C17 was done with an automated DNA sequencer (model 373A, Applied Biosystems). The nucleotide sequences were analysed using the DNASIS program (Hitachi Software Engineering Co.). The GenBank/ EMBL and N B R F / P I R databases were searched for RNA and amino acid sequence homologies. Results Nucleotide sequence of the R S V RNA segment l The first step cDNA synthesis was carried out using an oligonucleotide primer (primer A) complementary to nucleotides 11 to 30 from the 3' end of RNA segment 1. One clone contained an insert of about 7000 bp, but after a few cycles of transfer the size became smaller, suggesting deletion by intramolecular recombination. Therefore, we prepared two sets of partial clones (pRS1S61 and pRS1S207), covering the 3' half of RNA segment 1 as illustrated in Fig. 1. Based on the 3'terminal sequence of these 3'-half clones, we prepared an internal primer and carried out the second step cDNA synthesis. Clones pRS1C17 and pRS1C18 were chosen for the sequence determination of the 5' half. The sequence of RNA segment 1 (nucleotides 21 to 8957) was determined using two independent overlapping clones, except for the region between nucleotides 5233 and 5584 (this region was analysed only for clone pRS1S61). The sequences of both termini were obtained from the data determined by direct sequencing of viral RNA 1 (Takahashi et al., 1990). The complete nucleotide sequence of RSV RNA segment 1, expressed as viral complementary sense, is shown in Fig. 2. RNA segment 1 is composed of 8970 bases, with a base composition of 26-83% A, 34-33% U, 22.02% C and 16.82% G. The sequence was scanned for AUG-initiated open reading frames (ORFs). A single large ORF was detected in the viral complementary sequence (cRNA). Other short ORFs were identified on viral sense RNA (vRNA), which may encode M r 9.3K, 8.1K and 6.3K products. The large ORF present in cRNA extends from the 5'proximal AUG codon at positions 58 to 60 to the UGA stop codon at position 8815 to 8817 (Fig. 2). The noncoding sequences are therefore 57 nucleotides at the 5' end and 153 nucleotides at the 3' end. The amino acid sequence derived from the long ORF is shown in Fig. 2. The predicted gene product is 2919 amino acids long and has an estimated M r of 336860. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 Complete sequence o f the R S V genome 0 RSV RNA1 31000 6000 I 9000 i 8970 1 I (1 to 86) (8925 to 8970) 21 pRS1C17 5250 ~.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\! 21 pRS1C18 5250 K-~\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\1 Clones 5164 5584 This predicted M r is larger than the previously estimated size (230K) of a minor polypeptide associated with RSV, which was based on the relative migration in an SDSpolyacrylamide gel (Toriyama, 1986a). Henceforth, we designate the predicted protein as the 'Pol' protein of RSV. O o ..ootogtes ~lvl • , o f the R S V R N A segment 1 and the L R N A o f phleboviruses The similarity search using the GenBank/EMBL nucleotide and NBRF/PIR protein databases showed clearly that RSV RNA segment 1 is homologous to the L RNA of phleboviruses, i.e. Uukuniemi virus (UUKV) (Elliott et al., 1992), Rift Valley fever virus (RVFV) (Muller et al., 1991) and Toscana virus (TOSV) (Accardi et al., 1993). At the amino acid level, the similarity was maximal between the RSV Pol protein and the L proteins of the phleboviruses (UUKV, RVFV and TOSV), as shown in the dot-plot analysis of protein homology (Fig. 3 a, b, c). Among the L proteins of the phleboviruses, the percentage of identical amino acids was 36'9% between UUKV and RVFV (Fig. 3d), 35.8% between UUKV and TOSV, and 5l'5 % between RVFV and TOSV over the entire amino acid sequences. An optimal sequence alighment of the RSV Pol protein with the UUKV L protein reveals 31.1% identical residues and 71.2% overall similarity (including conserved amino acids). The similarity was maximal between residues 493 to 2026 (RSV) with only a few minor gaps, except for a 26 amino acid gap in the sequence of UUKV L protein between residues 1333 to 1362. The greatest similarity was between residues 1362 to 193l (569 amino acids) where there is 39'3 % identity and 78-3 % similarity (Fig. 4). This region contains the sequence of the putative RNA polymerase domain, including the four polymerase motifs proposed by Poch et al. (1989) in the RNAdependent RNA polymerase and identified in L proteins of UUK and RVF phleboviruses (Elliott et at., 1992). In addition, this region contains one distinct homologous 3571 pRSIS61 8957 pRSIS207 8957 Fig. 1. Relationshipsof four cDNA clones used for determiningthe nucleotidesequenceof RSV RNA segment 1. cDNAs were synthesizedusing two syntheticoligonucleotideprimers: primer A, complementaryto nucleotides11 to 30fromthe 3' end, and primer B, complementaryto nucleotides 5233 to 5253. The 3'- and 5'-terminal sequences, determined by direct RNA sequencing (Takahashi et al., 1990), are indicatedby arrows. stretch of 21 amino acid residues at 1408 to 1428 and a leucine zipper motif, L-X~-L-X~-L-X6-L at residues 1531 to 1552, located roughly in the central portion of the RNA polymerase region. No significant homology was, however, found between RSV Pol and the L proteins of Bunyamwera, Hantaan and tomato spotted wilt viruses (Elliott, 1989; Schmaljohn, 1990; De Haan et al., 1991) (Fig. 3 e , f ) . A weak similarity was found with the L protein of Tacaribe arenavirus: 19.2% identity over 449 amino acids in the region containing the polymerase motifs (Iapalucci et al., 1989) (Fig. 3g, h). Discussion The nucleotide sequences of the RNAs 2, 3 and 4 of RSV isolate T have been reported previously (Zhu et al., 1991, 1992; Takahashi et al., 1993). The determination of the sequence of RNA 1 completes the genome sequence of RSV. As summarized in Fig. 5, the complete genome comprises 17145 nucleotides, of which 86-8 % code for seven ORFs. Each of RNAs 2, 3 and 4 has an ambisense coding strategy, and RNA segment 1 is a negative strand RNA. MStV, another member of the tenuiviruses, contains five RNA segments, of which the smallest RNA (RNA 5) is a negative strand and encodes a highly basic protein (Huiet et al., 1993). Although the existence of a small RNA has been reported for a different isolate of RSV (Ishikawa et al., 1989), we have been unable to find such a small distinct RNA in RSV (Toriyama, 1982; Toriyama & Watanabe, 1989). Furthermore, a purified preparation of RSV containing the four RNA species alone reproduced the original chlorotic stripe symptoms on rice seedlings, when inoculated through the planthopper vectors (Toriyama, 1982). Among the seven putative viral-coded proteins, the nucleocapsid protein and a non-structural protein (S-protein) were shown to be encoded by cRNA 3 and vRNA 4, respectively (Hamamatsu et al., 1993). We have now shown that the predicted Pol protein (336.8K) is encoded by cRNA 1. This Pol protein is most probably the previously designated 230K protein which is associated with RSV nucleoproteins and was considered to be a putative RNA Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3572 S. Toriyama and others 10 ~ 30 40 50 ~ 70 ~ 90 100 110 120 5' ACACATAGTC AGAGGAAAAAATAAIiilGA TTTTGTTTTC CACAAAAGAA"~GAAGGATG ACGACACCACCTCTCG~AT ACCC~GCAT G~CATGGCA GGTCI~FATGA A~G~GGCG M T T P P L V I P L H V H G R S Y E I. I. A 130 140 150 1~ 170 180 190 200 210 220 230 240 GGGTATCATG AAG~GA~G GCAGGAGATA 6AAGA6~GG AAGAAACAGATGTCAGAGGA GATGGA~ G T ~ A T C A ~CCATACTA TATAGTATGG GC~GAGCAA GGAGAACTCT G Y H E V D W Q E I E E L E E T D V R G D G F C L y H S I L Y S M G L S K E N S 2~ 260 270 2~ ~0 ~0 310 320 330 ~0 350 3~ CGCACCA~G AATTTATGAT AAAGCTACGA TCGAATCCAG CCATCTGCCA GCTGGATCAA GAAATGCAAC TGAGCC~AT GAAGCAGC~ 6ATCCAAATG A~CATCAGC CTGGGGTGAA R T T E F M I K L R S N P A I C Q L D Q E M Q L S L M K Q L D P N D S S A W G E 370 380 3~ 4~ 410 420 430 440 450 460 470 480 GATATAGCAA TFGGG~FAT AGCTATAATA TrGAGAA~A AGATAA~GC ~ACCAGACA G~GATGGGA AG~GTITAA GACTAITTAT GGTGCTGAGT ~GAGAGTAC TA~AGAA~ D I A l G F I A l 1 L R I K I I A Y Q T V D G K L F K T I Y G A E F E S T l R I 4~ 500 510 520 530 540 550 560 570 580 5~ 600 AGGAA~ATG GGAA~ACCA C'Iq'CAAGTCA ~ A G A C A G AITVFGATCA TAAAGTAAAG ~CAGATCAA AAA~GAAGA A~C~GAGA ATGCCAG~G AAGA~G~A ATCCAT~CC R N Y G N Y H F K S L E T D F D H K V K L R S K I E E F L R M P V E D C E S I S 610 620 630 ~0 650 660 670 680 6~ 7~ 710 720 ~GTGGCATG CATCTGTTTA CAAGC~ATA GTAT~GATA GCCTTTCTGG ACACAAGAGCTTFAGTAATG TGGATGAA~ GATAGG~GC ATAATATCCA GCATGTA~A GATCATGGAC L W H A S V Y K P I V S D S L S G H K S F S N V D E L 1G S I 1S S M Y K ] M D 730 740 750 760 770 780 790 800 810 820 830 ~0 ~TGGTGATC AATGTTTT~ ~GGAGTGCA ATGAGAATGG TAGCCAGACC CT~GAAAAA CTATATGCCC ~GCAGTG~ TTTGGGA~C AATC~AAGT TCTATCATGT GAGGAAAAGA N G D Q C F L ~ S A M R ~ V A R P S E K L Y A L A V F L G F N L K F Y H V R K R 850 8~ 870 8~ 890 ~0 910 920 930 940 950 960 ~TGAAAAATTGACGGCAAAACTTGAGAGTGATCATA~AATTTGGGAGTGAACCTGA~ GAGGTATATGAAGTTTCTGAGCCAACCAGAT~ACCTGGGTC~AAACCAGGAGGGAGC A E K L T A K L E S D H T N L G V K L l E V Y E V S E P T R S T ~ V L K P G G S 970 9~ 990 I~0 1010 1020 1030 1040 1050 1060 1070 i0~ AGAATAACTG AAACAAGAAAI-IIIGTGA~ GAGGAGATAA TAG~AACAG GCG~CTCTG GAGAGCTTAT ~GTGTCAAG CA~GA~AT C~GCAGAGT T A ~ C C C A GAAACTTAGT R I T E T R N F V l E E I I D N R R S L E S L F V S S S E Y P A E L C S Q K L S I0~ iI~ iii0 1120 1130 1140 Ii~ 1160 I170 II~ Ii~ I~0 ~CATCAAAG ACAGAATAGC A~AATGTTT GGCTTTATCA ACAGAACCCCTGAAAACAGT GGGAGGGAAC~TACATAAA CACATA~AT ~GAAGAGGA T~ACAGGT GGAAAGAA~ A I K O R I A L M F G F I N ~ T P E N S G R E L y I N T Y Y L K R I t Q q E R N 1210 1220 12~ 1240 1250 12~ 1270 1280 12~ 13~ 1310 1320 ~AA~AGAG A~C'I23"AAG A~ACAGCCT GCT~GGGGA TGATCCAGAT AATCAGA~A CCAACAGCAT ~GGTACATA CAACCCGGAA GTGGGCA~C TG~G~AGC CCAAACTGGA V I R D S L R S Q P A V G M I Q I I R L P T A F G T Y N P E V G T L L L A Q T G 1330 i~0 1350 1360 1370 13~ 13~ 1400 1410 1420 1430 1440 ~AATCTATA GAC~GGCAC CACAA~AGA GTfiCAGATGG AGGTCAGGAG AT~CC~CT G~AI~'TCAA G ~ C ~ A A GATCA~AGT T'CFCCGGAGA CACAAAAACA~ACAACAAT L [ Y R L G T T T R V Q ~ E V R R S p S V I S R S B K ] T S F P E T Q K H N N N 1450 1460 1470 1480 1490 15~ 1510 1520 1530 1540 1550 1560 ~ G T A ~ A ~ ATGCACCCAG AACACAGGAGACATI']'TATC ACCCAAATGCTGAGATCTAT fiAGG~G~G ATGTAAAGAC TC~AGTG~ A~ACAGAGA ~G~GATAA TCATATAGT6 L Y D Y A P R T Q E T F Y H P N A E I y E A V D V K T P S V I T E [ V D N H I V 1570 1580 1590 1600 1610 16~ 1630 1640 1650 1660 1670 16~ ATAAAA~GA ACACTGATGA TAAGGG~GG TCAGTCAGTG A~CGATAAA GCAAGAIIII ~ATA~GGA AGAGACTA~T GG~GC~AG AATA~G~C ATGACTFFGT III~ATATC [ K L N T O D ~ G ~ S V S 9 S 1K Q D F V Y R K R L M D A K N [ V H D F V F D [ 1690 1700 1710 1720 1730 1740 1750 1760 1770 1780 17~ 1800 ~ATCAACTG AGA~GACAA GAGCTTTAAG GGTGCTGACT TAT~ATAGG AGGAATCTCA GATAACTGGT CACCAGATGT C A ~ A T ~ C A AGAGAAAGTG ATCCACA~A TGAAGATATC L S T E T D K S F K G A D L S I 6 G I S D N W S P D V I I S R E S D P Q Y E D ! 1810 1820 1830 I~0 18~ 18~ 1870 18~ 1890 1900 1910 1920 GTTGT~ATG AG~CACAAC AAGGTCCA~ GAGTCTATAG AAT~CTA~ AAGATCAGTA GAGG~AAAA GC~ACGATA TAAAGAAGCA A~CAGGAAA GAGCCATCAC A~AAAGAAG V V Y E F T T R S T E S I E S L L R S V E V K S L R Y K E A I Q E R A I T L K K 1930 1940 19~ 1960 1970 19~ 19~ 2000 ~i0 ~ ~30 2040 AGAATATCGT A~ACACAAT A~TGTCAGT CTAGATGCTG TAGCCACAAA TCTGCTATCA C ~ C ~ G ~ G ATG~TGCAG A G A A ~ T A A~CGTTTAA GAG~G~AA TCAGGTGAAG R I S Y Y T I C V S L D A V A T N L L S L P A D V C R E L [ I R L R V A N Q V K 2050 2060 2070 20~ 20~ 21~ 2110 2120 2130 2140 21~ 2160 ATCCAG~AG CTGATAACGA TATCAATC17 GACT~GCCA CITFfi~AGC AC~GACA~ TACAfiAATAA AGGAAATG~ TAGGGA~GT ~CCCAA~A ATAAATTTAT A C A ~ I 0 L A D N D I N L D S h T L L A P D I y R I K E ~ F R E S F p N N K F I H P I 2170 2180 2190 22~ 2210 2220 2230 2240 2250 2260 2270 22~ ACTAAGGAAA TGTATGAGCA TTTTGTCAAT CCAATGATTT CAGGAGAAAAAGACTATG~ GCCAATTTAA AG~CATAAT AGACAA~AG ACCAGAGATG AGCAGAG~A GAATTTAGAG T K E ~ Y E H F V N P M I S G E K D Y V A N L K S I I D K E T R D E Q R K N L E 22~ 2300 2310 2320 2330 2340 2350 2360 2370 2380 2390 2400 AGTCTGAAAG'~GTGGATGGGAAAAAGTACACAGAGAGAAAAGCAGAAACTGCTCTGA~ GAGATGTCACAAGCAG~AGAGCA~AGAAG~AI~IGAAA~GACAAII~TAG~CC S L K V V D G K K Y T E R K A E T A L N E M S Q A E E H Y R S Y F E N D N F R S 2410 2420 2430 2440 2450 24~ 2470 2480 2490 2500 2510 2520 ACACTAAAAG CTCCAGTCCA ACTTCCC~A ATCATACCGG ATGTGTCAAG TCAGGACAAT CAA~CTCAA ACAAGGAACT ATCTGATAGG ATACGGAAGA AGCCGATCGA CCACCCTA~ T L K A P V Q L P L I 1P D V S S Q D N Q F S N K E L S D R I R K K P I D H P [ Fig. 2. F o r l e g e n d s e e p a g e 3 5 7 5 . Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 Complete sequence of the RSV genome 3573 25~ 2~0 2550 2560 2570 25~ 25~ 26~ 2610 2620 2630 2640 TACAAC~CT CGG~CAAGC AC~AATAAG AGAAA~G~ CGA~GCA~ CGGCCATTTG 6ACGAG~AG AA~ATCTAT G~AGAAGGA CAAG~GCTA AGAAAGTGGA GGAATC~AT Y N [ W D Q A V N Z R N C S I A L G H L D E L E ] S M L E G O V A Z K V E E S Y 2650 2660 2670 26~ 26~ 27~ 2710 ~ 2730 2740 2750 2760 ~GAAAGATA GGAGTCAGTACAACAGGACAA~GCTAA CTA~ATGAA GGAGGACATCTACITGGCTG AAAGGGGGATAA~G~AAG AAGAGG~GGAAGAACCAGATGTGAAATI-f K K D R S Q Y N R T T L L T N M Z E D [ Y L A E R G [ N A K K R L E E P D V K F 2770 2780 27~ 2800 2810 28~ 2830 2840 2850 28~ 2870 28~ ~TCGAGATC AGTCTAAGAGGCCTTTTCAT CCTTTTG~A GTG~ACCAG AGACATAGAGCAG~CAC~ AGAAAGAGTGC~GGAACTC AATGAAGAGTCAGGACA~GCTCG~G~A Y R D Q S K R P F H P F V S E T R D I E O F T O Z E C L E L N E E S G H C S L I 2890 2900 2910 2920 2930 2~0 29~ 2~0 29~ ~ 2990 3~0 ~TGTAGAGGAT~AGTGI~AT~GCT~AGAG~GCATGAGGTAGGTGATTTAGAACAC~ATGGAACAACATAAAAGC~A~AAAACAAAGTTTGCA~A~CTAAGTrTA~ N V E D L V L S A L E L ~ E V G D L E ~ L ~ N N I ~ ~ B S K T K F A L Y A K F ~ 3010 3020 3030 3040 30~ 30~ 3070 30~ 30~ 31~ 3110 3120 TCTGATCTTGCC~CCGAG~ AG~AITYCA~AT~CAGAA~AAAGAAGACACCT~ ~GG~AAGAAA~CAGAGA~AG~GCTACUrACTCA~AAACCA~AAAC~AAAG S D L A T E L A I S L S Q N C ~ E D T Y V V ~ ~ L R D F S C Y V L I K P V N L K 3130 3140 31~ 3160 3170 31~ 31~ 32~ 3210 3220 3230 3240 AGTAATGTGTTC~CTTTA~CATACCTTCTAATATTTATAAGTCACACAACACAA~ ~CA~A~CTGATAGGC~GTCCAGA~CAGG~GACTCA~FCGT ~CTG~AAT S N v F F S L y 1 p S N I y K S H N T T F K T L I G S P E S G Y M T D F V S A N 3250 3260 3270 32~ 32~ 33~ 3310 3320 3330 3~0 3350 3360 ~GAGCAAGT TAGTGAA~G GG~AGA~T GAAG~A~A ~C~GCACA AAGAGGTITC~GCGAGAATTTTATG~GT GGCCC~AGCA~GAGGAAC AAGATGGAATGGCGGAGCCA v S K L V N ~ v R C E A W ~ L A Q R G F ~ R E F Y A V A P S I E E Q D G ~ A E P 3370 33~ ~ ~ ~10 34~ ~30 ~40 3450 3460 3470 34~ GACTC~T~T GTCAGATGATGAG~GGACA CTC~CATAT TA~AAACGA CAAGCATCAG~AG~GAGA TGATCACAGT~AGGTTT GTCCATA~G AAGGCTTTGTAACIIIICCT D S V C Q M M S ~ T L L I L L N D K H Q L E E M I T V S R F V B ¼ E G F V T F P ~ 3500 3510 3520 ~ 3~0 35~ 3560 3570 35~ 35~ 36~ ~ATGGCCTAAACC~ATAAAATGTTTGATAAA~ATCAGTAA~CCGAGGTCTAGG~AGA~AGTC~AAAGAGG~CA~ATGCTA~AAGCA~A~C~AAAATCCCA~ A ~ p 3610 K p y K 3620 M F D 3630 K L S 3~0 V T P R 36~ S R L 36~ E C L V I 3670 K R 3680 L I M 36~ L M K 37~ H Y S E 3710 N P [ 3720 ~ATTTATGA TAGAAGACGAGAAGAAAAAGTGGTrTGGAT TCAAAAATATG~C~GCTT GA~GTAATG GTAAAC~GCTGATr'rAT~ GATCAGGA~AAATGCTTAA~TCIiI]AT Z F M I E D E K K K ~ F G F K N M F L L D C N G K L A D L S D Q D Q M L M L F Y 3730 3740 3750 3760 3770 37~ 37~ 3800 3810 3820 38~ 3~0 CTTGG~ATC TAAAGAACAAAGATGAGGAGGTCGAAGACAATGGCATGGGTCAA~A~G A~AAAATCC ~GGCTT[GA GAG~CCA~ L G Y L K N K D E E V E D N G ~ G Q L L T K I L G F E S A M 38~ 38~ 3870 38~ 3890 3~0 3910 3920 3930 GATCCTGAGTA~ACAAT CAAGAAGCAT6AG~CCA TAAG~ATGT 6AAGGAC~C TGTG~AAAT TCTTACACAGA~AAA~AG D p E y G T I [ K H E F S I S Y V K D L C D ~ F L D R L K K 3970 39~ 3990 4000 4010 40~ 4030 4~0 4050 CCAAAGACAAGAGAC~CTT GGGTATGAAA P K T R D F L 3~0 39~ G W K 39~ ACACACGGAATCAAAGATCCAA~ACTTAT T H G I K D P I T Y 4060 4070 40~ ~GGGCGACA AGATAGCTAAA~CCTTACC A~CAGTTTA ~GAGACGATGGCATCTTTG AAGGCA~ATCTAAC~CTC AGAGGA~ACTA'ITrATACA CACCCA~AG AAGA~AAAA L G D Z l A K F L S T Q F I E T M A S L Z A S S N F S E D Y Y L Y T P S R R L K 40~ 4100 4110 4120 4130 4140 41~ 41~ 4170 41~ 41~ 42~ ~CCAGGAGCAATCTAGAAGT~ACA~TAATAG~GC~ ~G~AATATA~TG~A~ ~CA~G~AAGCTGT~CATAGAAGCAAA ~ A A ~ G A G A A G C T C A C ~ C C ~ A A ~ A A A N Q E Q S R S K H V I b A G G N I S ~ S V [ G ~ t Y ~ ~ S E V I E Z L T T L I ~ 4210 4~0 4230 4240 42~ 42~ 4270 42~ 42~ 4300 4310 43~ 6ACGAAACAC CAGGAAAAGA A~GAAAATA ~GGTAGATC TCTTAC~AA G G ~ A ~ G A A G~C~AACA AAAATGAA~ TATGCACA~ ~TAITITCA AGAAGAA~A GCATGGAGGC D E T P G K E L K I V V D L L P K A M E V t N K N E C M H I C I F K K N O H G G 43~ 4~0 4350 43~ 4370 43~ 43~ 44~ 4410 44~ 4430 4440 C~AGAGAAAT ~ G ~ TA~CITI" GAAAGAATAATGCAGAAGACAGTGGAAGAT~CTAGAG CCA~CTAGA ATGCTGTC~ A~GAGACAA~ACATCCCC GAAAAACAAG L R E I Y V L N I F E R I M O K T V E D F S R A I L E C C P S E T M T S P K N K 44~ 4460 4470 44~ 44~ 45~ 4510 45~ 4530 4~0 45~ 45~ TFTAGAATAC ~GAA~GCACAAC~GGAAGCAAGGAAAA ~ A A A A A A T G A ~ A T G A C A ~ A T ~ A ~AGTGATGATGCATCGAAA~GAA~AAG ~ C A ~ A ~ T A T C T A A A ~ C F R i P E L H N M E A ~ K T L K N E Y M T I S T S D D A S K ~ N Q G H Y V S K F 4570 45~ 45~ 4600 4610 46~ 4630 4~0 4650 4660 4670 46~ ATGTGTA~CTA~GAGG~CAC~CAACATA~A~ATGGC~C~T ~AGG~CTTCAA~ATGGCATC~AAGAAG~A~C~A~A~AGCTG~GCAA~ ATTTA~CAA M C 8 L L R L T P T Y Y ff 6 F L V Q A L O L W H H K K I F L G b Q L L Q L F N Q 46~ 47~ 4710 47~ 4730 4~0 4750 47~ 4770 47~ 47~ 4800 ~TG~ATGC TAAATACCAT GGACACAACC CTCATGAAAG TCTFrCAAGC ~ACAAAGGG GAGA~CAAG ~CCTTGGAT GAAGGCAGGT A G A ~ A C A ~ G A G A ~ G A G A C A G G ~ N A M L N T M D T T L M K V F O A Y K G E I 0 V P ~ ~ K A G R S Y I E T E T G ~ 4810 4820 4830 4~0 4850 48~ 4870 48~ 4890 4900 4910 49~ ~GCAGGGAA ~ C C A ~ A T A ~ A G ~ ~A~CC~G ~ATC~C~ GGACCAA~GGCTG~GAGTGTAGAAGAGATATAAA~GAGCAA~AAGACAATAA~AATAAAGAAA~ ~ Q G I L H Y T S S L F H A I F L b Q L A E E C R R D I N R A I K T I N N K E N 4930 4~0 49~ 49~ 4970 49~ 4990 5000 5010 ~20 5030 5040 GAGAAGG~TCATGTATAGTG~CA~ATGGAAAG~GACG~AGTAGCTTCA~A~ A~A~C~AATTTCAAAGAGA~GA~CAGCACAA~ AC~GCT~GT~GG~AAC E K V S C I V N N M E S S D D S S F [ I S I P N F K E N E A A Q L Y L L C V V N Fig. 2. For legend see page 3575. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3574 S. Toriyama and others 5050 ~60 5070 50~ 50~ 51~ 5110 51~ 5130 5140 51~ 51~ TCTTGG~CA GAAAGAAAGAG~GC~GGA ACTTATCTTG GGATATATAAATCTCCAAAGAGTACAA~C AGACA~G~ TGTGATGGAA~CAACTCAG AA~C~CTT ~CTGGTGAT 5 W F R K K E ~ L G T Y L G I Y K S P K S T T Q T L F V M E F N S E F F F S G D 5170 5180 5190 5200 5210 5220 5230 5240 5250 52~ 5270 52~ G~CACAGGCCAACTTTTAGG~GG~A~ G C A G C & G T G C T A ~ A G G A G A G C ~ A G A C ~ G ~ A T A C A O G A A G A 6 ~ C ~ ACA~GAAGGATGTAATAGAAGGTGGAGGA V H R P T F R W V N k A V L I G E O E T L 5 G I Q E E k S N T L K D V I E G G G 5290 53~ 5310 5320 5330 5~0 53~ 53~ 5370 53~ 53~ 5400 ACATATGCCC TCACT'I~AT A~GCAAG~ GC~AAGCTA TGATACACTATAGAATG~G GGCAGTAGTG CTTCATCAGT GTGGCC~CA TA~AAA~C ~CTGAA~A C~ATATGAT T Y A L T F I V Q V A Q A M I H Y R M L G S S A S S V W P A Y E T L L K N S Y D 5410 ~20 ~ 5440 CCTGCAC~GGC~C~C~ A ~ G G ~ A ~ C ~ A A ~ G P A L G F F L M D N P K C 5530 5~0 5550 55~ ~50 5460 ~70 ~ 5490 5500 5510 55~ ~ G G C ~ G ~ G~k~CA~CT~A~GTTTGGA~GC~GTACGACGACAC~GGGAGAGAA~A~ATGAGATGATA A G L L G F N Y N V ~ I A C T T T P L G E K Y 8 E ~ I 5570 55~ 55~ 5~0 5610 56~ 56~ 5~0 CAAGAAGgAATGAAGG~GA GT~CAGAGC ~AAAATCAG TAACAGAAGATACAA~AAC ACGGGA~AGTFrCACGAAC AAC~TGGTG GGCITrGGAAACAAGAAAAGATGGATGAAA Q E E M K A E S Q S L g S V T E D T I N T G L Y S R T T ~ V G F G N K K R ~ ~ K 5650 5660 5670 56~ 5690 57~ 5710 5720 5730 5740 5750 5760 ~CATGACCA CACTGAAT~ GAGTGC~ ~GT~GAAA AG~AGAAGAGGAGCCAAGA~ACTTTT TCCACGCAGCAACAGCTGAACAAATAA~C AGAAAA~GC ~AAA~TG L ~ T T L N L S A D V Y E K I E E E P R V ~ F P H A A T A E Q 11 Q K I A I K ~ 5770 5780 5790 5800 5810 58~ 5830 5~0 58,50 5860 58~ 5880 AAGAGTCCCGGTGTGATACAG~ACTGTCT AAAGGAAACATG~GGCAAG GAAGATAGCGTCAA~GTAT TCTTCATATC TAGACATATAGTCTTCACAA~TCCGCTI'A ~ATGATGCA K S P G V I O S L S K G N ~ L A R ~ I A S S V F F l S R H [ V F T M S A Y Y D A 5890 5900 5910 5920 5930 5940 5950 5960 5970 59~ . 59~ 6000 GACCCTGAGACAAGGAAAACATCA~G~GA~GAG~GA~A~AGCTCTAAA~ACCTCAGAGAC~GA~GCAGGAACC~ ACA~GAAGCCAA~AAAGT~AAG~GAT D P E T R K T S L L K E L I N S S K I P Q R H D Y L Q E P H T L K P T ~ V E V D ~I0 6020 6030 6040 60~ ~ ~70 60~ ~ 61~ 6110 ' 6120 GAGGACAGCTGGGAA~CAAG~AGCAAAAGAGGAATGCG~AGA~G~ AAAACAAAGAATCAAAATACACACTGGGAGAGAAGAGAGA~ T A ~ A G ~ TTTTGTI~CGA AAATATGGCT E D S ~ E F K S A K E E C V R V L K O R I K I H T G R E E R S I S L L F E N M A 6130 6140 6150 6160 61~ 61~ 61~ 6200 6210 62~ 6230 6240 ~CAATGA ~GGGAG~GCACGGACCAGT~G~G~kGAGAAAAT~~CCA~AGCAT~GCACTGAAAATGAA ~A~CTATA ~CAAGAAGGATG~GCACCCAATAGGTAT K S M I G R C T D Q Y D V R E N V S I L A C A L K M N Y S I F K K D A A P N R Y 6250 6260 6270 62~ 62~ 6300 6310 6320 6330 6~0 63~ 63~ ~CC~GA~ AGAAGAAC~T~ATA~CA ~GA~GGAAAGGAAGTA~TGTTTATG~ AAGT~GACAAAGTACATATTGAA~CTGAGAAGAA~AAAGG~CAA~AAA~A L L D E K N L V Y P k I G K E V S V Y V K S D K V H I E I S E K K E R L S T K L 63~ 63~ 6390 64~ ~10 ~20 ~30 ~40 64~ ~ ~70 64~ I'r3"AATAT~ ~ A A A ~ G A A G G ~ A G A A G A G A ~ CA~A~G~ TCCTAG~ATGGAGA~A~ T~CC~GAAAGAAACAA~ GACCAAGTAA~CCAA~ ~CCATACAC F N [ D K M K D I E E T L S L L F P S y G D Y L S L ~ E T 1 D O V T F O S A I H 64~ 65~ 6510 6520 6530 6540 6550 6560 6570 65~ 6590 6~0 ~AGTCAACG AGAGAAGAAGAG~AGGGCA GATGTGCA~ TAACAGGGACAGAAGGATTTTCTAAG~GC CAATGTATAC AGCAG~GTC TGGGCCTGGT ~GATGTGAA GA~ATCCCT X V N E R R R V R A D V H L T G T E G F S K L P M Y T A A V ~ A ~ F D V K T I P 6610 6620 6630 6640 6650 66~ 6670 66~ 66~ 67~ 6710 6720 GCACATGACAGCATTTATA6 AA~AT~GG AAAOT~ACA AAGAACA~ACTC~66~G TCAG~ACAC TGAAAGAGACA~GGAGAA666ACCATI~A AAACA~ACA AGGTGTGG~ A H D S I Y R T I ~ K V Y K E Q y S ~ L S D T L K E T V E K G P F K T V O G V V 67~ 6~0 6750 6760 6770 67~ 67~ 6800 6810 68~ 68~ 6~0 AACTTCATTT CTAGAGCTGGT~GAGATCG AGAGTCGTCCATCTAGTAGG6TCAITrGGT AAGAATGTCAGGGGTAGCATAAAT~GGTG ACGGCAATAAAAGA~ACTT TAGCAACGGA N F I S R A G V R S R V V H L V G S F G K N V R G S I N L V T A I K D N F S N G 6850 6860 6870 6880 68~ 69~ 6910 69~ 6930 6~0 6950 6960 ~AG']TF~CAAAGGGAATATA~CG~ATCAAGGCAA~AAAE~AGAGAAAG'F~]'GG~A A ~ A C ~ CA~GCACCA~CTCAGGCAC~ATCA~AAGCATG~AAGAAC L V F K G N I F D I K A g K T R E S L D N Y L S ] C T T L S Q A P I T K H D K N 6970 69~ 6990 70~ 7010 70~ 70~ 7040 70~ 70~ 7070 70~ CAGATTT~GC GCTCTCUTTT C~CAG~GT CCAAGAATCCAGTATG~ A~ACAGTTT GGA~AAGAAGAAACAGGATG~AATA~A CAAGAAG~GTGGCAGATGATCCAA~CTA Q I L R S L F V S G P R I Q Y V S S Q F G S R R N R M S I L Q E V V A D D P T L ~ 71~ 7110 7120 7130 7140 7150 71~ 7170 71~ 71~ 72~ C A ~ C ~ G ACCAAGACACAA~CAGAAACAGCTAGAAGACA~CAG AGAA~AGCACACAAGGAGCTCCCAITI~ AACAGAGAAG~GTT~CACG A ~ A T ~ A AAAGATAGAG H ~ P D Q D T S Q K Q L E D K F R E L A H K E L P F L T E K V F B D Y L E K I E 7210 7220 7230 7~0 72~ 72~ 7270 72~ 72~ 7300 7310 73~ CAGCTAATGA AGGAGAACACTCAT~AGGT GGTAGGGATG~GATG~AG CAAAACCCCATATGTGC~G CCAGAGCAAATGATA~GAA ATACA~G~ ATGAGTT~G GAGAGAGTAT Q L M K E N T H L G G R D V D A S K T P Y V L A R A N D I E I H C ¥ E L W R E ¥ 7330 7~0 7350 7360 7370 73~ 7390 7400 7410 7420 7430 7440 GATGAGG~G AAGATGAAGCA~CCAGG~ TA~GCAGTG AAG~GAGGCTGCTA~GAT CAAG~AAAC~AATGCT~ A~AGAGAGATACC~AG ACCCTAAAGCAAA~GGA~ U E D E D E A Y Q A Y C S E V E A A ~ D O E K L N A k I E R Y H V D P K A N W I 74~ ~60 7470 74~ 74~ 7500 7510 75~ 75~ 7~0 7550 75~ CAAATG~AA TGAATGGTGA GA~GAAACA G~GAAGAGCTGAACAAGCTTGACAAGGGGTTTGAGAGCC ACAGACTrGCT~A~CGAA AGAA~GG TGGGGAA~TTGGAATTTTA Q M L M N G E I E T V E E L N K L D K G F E S H R L A L V E R I R V G K L G I L Fig. 2. F o r l e ~ n d s e e o p p o s i t e . Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3575 Complete sequence o f the R S V g e n o m e 7570 7580 7590 76~ 7610 76~ 7630 7~0 76~ 76~ 7670 76~ GGCAG~ACA CCAAGTGTCA ACAGAGAA~GAGGAG~AGATG~GAAGGTAATAAGA~CATAGATACACAGGAGAAGG GATATGGAGAGG~CA~CG ATGA~C~A ~TTTGCATA G S Y T K C Q Q R I E E L D G E G N K Y H R Y T G E G ] W R G S F D D S D V C I 76~ 7700 7710 77~ 7730 7~0 77~ 77~ 77~ 77~ 7790 7800 GTTGTCCAAG AC~GAAGAA6ACAAGAGAGAG~ACTTAA AAT~GTCGT |TFIICCAAA GTGTCAGA~ ATAAAGTCTT GATGGGCCAT~GAAGACATGGTGCAG~AACACCATA~ V V Q D L K K T R E S Y L K C V V F S K V S D Y K V L W G H L K T W C R E H H I 7810 7820 7830 7~0 78~ 78~ 78~ 7880 78~ 7900 7910 7920 AGTA~GATG AG]7"rC~AC ~GTA~CAG AAAGAGCTrTTAAG~ATGG TG~ACCAAGAG~CAG~C TA~GTACAA GATGAA~GAA~AAAATGTTGAGGAACATGGAAAAAGGT S N D E F P T C T Q K E L L S Y G V T K S S V L L Y K M N G W K M L R N M E K G 7930 7940 7950 79~ 79~ 79~ 79~ 8000 ~i0 ~ ~ ~40 A~C~CTGT A~GGA~CC TAGCTTGTCAA~AGAAGCCAAACTI'ATATCAA~GGC~ G~G~GATA TCACAGATCATAGCTTACGGC~AGGAACAGAACTGTI"GAGAATGGGAGA I P L ~50 Y ~ N P ~60 S L S ~70 T R S Q T ~ Y l ~ N W L 81~ A V D 8110 I T D H S 8120 L R L 8130 R N 8140 R T V E N 8150 G R 81~ ffI~GTAAATC AAACAATCATGGTTG~CCT ~6TACAAAA CTG~GCA GATA~CAAAACAT~C~G TAGATCTTGA GCAAGA~TGCAGAA~ATAGACTTAAG~ A~ATCAGTA V V N Q T ] ~ V V P L Y K T 9 V O I F K T S P V D L E Q D V Q N D R L K L L S V 8170 81~ 8190 8200 8210 82~ 8230 8~0 82~ 8260 82~ 82~ ACGAAAGCTG GGGAG~GAGA~GCTTCAA GA~GGATAATGTGGAGATC ATCTG~GTA GACGAITrGAACATA~AAACCAGG~AGAAGAAATAAGG~GCAAGGGATCKITVIAAT T K A G E L R ~ L Q D ~ I W ~ R S S A V D D L N I L N O V R R N K A A R D H F N 82~ 8300 8310 8320 8330 8340 83~ 8360 8370 83~ 83~ ~00 GCTAAACCAG AG~CAAAAAA~GATAAAAGAGCTGTGGG A ~ G C A ~ TGACACCACA~AA~AATA AGAAAGTCTTCATAAC~CA CAAGGATCAGAGTCACAGAGCACAGTITCT A K P E F K I W I K E L ~ D Y A L D T T L ] N K K V F I T T Q G S E S Q S T V S ~I0 ~20 ~30 ~40 ~50 ~ ~ ~ ~90 85~ 8510 85~ ~AGGAGATA GCGACAGTGC A~GGCAC~ ~AA~GATG AGGCAGTGGATGAGA~CAT GAT~C~AG ACAAAGAG~AGAAAAGGGCACCTTAAAACAGATCATCCATGATGCAACC S G D S D S A V A P L T D E A V D E I H D L L D K E L E K G T L K Q I I H D A T 8530 8~0 8550 85~ 8570 ~ 8590 8600 8610 8620 8630 8~0 ~CGATGCCC AGC~GATATCCCTG~ATA GAGAGC~CCTGG~GAAGAAATGGAGGTG~CAAGAGTAGCTTAGCCAA GAGCCACCCT CTT~A~AA A~ATG~AG GTACATGA~ ] b A O L D I P A I E S F L A E E ~ E V F K S S L A K S H p L L L N y V R y ~ i 8650 8660 8670 ~ 86~ 87~ 8710 87~ 8730 8~0 8750 8760 CAAGAGATAGGTGTGACCAA C~CAGATCA~GA~GATA GCT~AATCA GAAAGATCCC~GA~AGTG TGTCTCTAAG CATCCTAGAC ~GAAAGAAGTG~CAAGTT TGTGTACCAG QEI G V T N F R S LID S F N Q [ D P L K S V S L S ] L D L K E V F K F V Y Q 8770 87~ 87~ 8~0 ~i0 88~ 8830 8840 88~ ~ 8870 ~ GACATAAATG ATGC~AT'rT TG~AAACAG GAAGAAGACCATAAG~CGA "FI'rCTGAGAA GTC~CTrCA ACAAAGGGACTGCAGCACAAACACAAG~C AGACACCA~ GAA~CCATA D I N D A Y F V K Q E E D H K F D F , 8890 8900 8910 8920 8930 ~40 89~ 8960 89~ CAAATATTTC ACG!1FIATC CCTTATGACT TAGAITI-ICAATAATTAAI'TATATAAACAAAAACAIl llG 1ITICCTCTG GACTTTGTGT 3' Fig. 2. Complete nucleotide sequence of RSV R N A segment 1 and predicted amino acid sequence. The sequence is of the (-t-) strand R N A written as D N A . The amino acid sequence, represented as the single-letter amino acid code, is shown below the nucleotide sequence. The asterisk (*) indicates the ( U G A ) stop codon. polymerase (Toriyama, 1986a). A discrepancy in M r values has also been reported for the L (RNA polymerase) proteins of Bunyamwera and tomato spotted wilt viruses (Elliott, 1989; De Haan et al., 1991). The other four putative viral proteins predicted from the sequence have not yet been identified, although in vitro translation experiments indicated the presence of these genome products (Hamamatsu et al., 1993). RSV and RGSV tenuiviruses have virus-associated RNA-dependent RNA polymerase activity, the level of which is comparable to that of vesicular stomatitis virus (Toriyama, 1986 a, 1987). The solubilized proteins exhibit model template-dependent RNA synthesis in vitro (Barbier et al., 1992). The SDD tripeptide motif (Poch et al., 1989) was found in the putative RNA polymerase of RSV, at amino acid residues 1486 to 1488 and 1634 to 1636 of RSV Pol protein. These two SDD motifs are also present in the L protein of phleboviruses (Elliott et al., 1992), and the second SDD motif is present in the L protein of segmented negative strand viruses (Poch et al., 1989). Other prominent conserved sequences are found in the extreme 5'- and 3'-terminal nucleotide sequences of RSV and phleboviruses (Kakutani et al., 1990; Takahashi et al., 1990). The terminal base-paired, panhandle structure (Takahashi et al., 1990) is presumed to have an important role in the initiation of transcription of influenza virus (Hsu et al., 1987). Thus, it is likely that these highly conserved sequences are essential for replication and transcription of these viruses. An additional distinguishing similarity of RSV and phleboviruses is that they have an ambisense genome: RNAs 2, 3 and 4 of RSV, and the S RNA of phleboviruses. The intergenic sequence of the S RNA centrally located between two ORFs is G-rich in three phleboviruses, TOSV, Sicilian sandfly fever and RVFV, and AU-rich in the other two phleboviruses, U U K V and Punta Toro virus (Giorgi et al., 1991). The intergenic sequences of the ambisense genome RNA 3 of the tenuiviruses RSV and MStV are all AU-rich (Kakutani et al., 1991; Zhu et al., 1991; Huiet et al., 1991), Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 S. Toriyama and others 3576 (a) (c) (b) 2103 I 2095 209 ! 7 / . • ':'~:. ': .1 i ,: (d) . .., 2092 ...: "" ."i~"~.' " i "" " " " . . .. • : • . . . . . . . ' .. . • .. "~ .. " ' \ .; . ..... , . .. e.w ,••: ii•••:'::: ' : :£ ~ •: • (• •i .' . " ,,. >. , • :• 11 • 291 2919 UUKV RVFV 1.i. , 2919 -f i ' : " '. 2103 RvFV TOSV (e) (g) Z238 1 2210 l L t ii!il!!!!iii I I 2875 2919, 2919 BUNV 2919 T SWV h 1022 I 1363 " ~'" -. TV ,. .. , 1460 I I " I 1810 T V Fig. 3. Dot-plot comparisons of the predicted Pol protein of RSV and the L proteins of phleboviruses (UUKV, RVFV, and TOSV), tomato spotted wilt virus, Bunyamwera virus and Tacaribe arenavirus, made using the Protein Homology Plot (window 10; stringency 5). The comparisons show (a) RSV and U U K V ; (b) RSV and RVFV; (c) RSV and TOSV; (d) U U K V and RVFV; (e) RSV and Bunyamwera virus (BUNV); 0") RSV and tomato spotted wilt virus (TSWV); (g) and (h), RSV and Tacaribe virus (TV), shown with expanded scale in (h). Sequence data were obtained from Elliott e t al. (1992) (UUKV), Muller e t al. (1991) (RVFV), Accardi e t al. (1993) (TOSV), De Haan e t al. (1991) (TSWV), Elliott e t al. (1989) (BUNV) and Iapalucci e t al. (1989) (TV). suggesting that tenuiviruses, in this respect, are similar to UUK or Punta Toro phleboviruses. Given the strong similarity between RSV and phleboviruses, we propose that RSV and the other tenuiviruses, MStV, RHBV and RGSV (Francki et ai., 1991), should be classified in the family Bunyaviridae, but in the genus Tenuivirus not Phlebovirus. This is because the genome of tenuiviruses comprises four segments (RSV and RHBV) or five segments (MStV), while all phleboviruses have three RNA segments, although the genetic organization, expression strategies, amino acid and nucleotide sequence similarities strongly suggest that these viruses have evolved from a common ancestor. No significant homology was observed between the Pol protein and the L protein of other members of the Bunyaviridaeo including tomato spotted wilt virus, previously the sole plant-infecting member of the Bunyaviridae (De Haan et al., t991). A weak homolgy was found with the L protein of Tacaribe virus of the Arenaviridae (Iapalucci et al., 1989). It is probable that Tenuivirus is in a unique position in the evolution of these ambisense genome viruses. One of the differences between tenuiviruses and other members of the Bunyaviridae is viral particle morphology. Virions of tenuiviruses are thin filamentous particles which are pleomorphic: partially or completely unfolded coiled filaments, branched configurations, or circular filaments (Koganezawa et al., 1975; Toriyama, 1982; Ishikawa et al., 1989). So far, enveloped spherical particles (the morphology of virions of other Bunyaviridae) have not been observed for tenuiviruses, despite extensive examination by electron microscopy of infected plant and insect tissues. Immunogold labelling with antiIgG to nucleoprotein of RSV resulted in labelling of amorphous or membranous structures in the cytoplasm of the small brown planthopper L. striatellus (Suzuki et al., 1992). Observations of thin filamentous particles or circular filamentous particles of tenuiviruses seem to Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3577 Complete sequence of the RSV genome RSV Pol Uukuniemi L RSV Pol 1350 1400 LKNQEQSRSKHVI DAGGNI SASVK~Lg~N~I~LTTL INDETPGIIELKI VVDLLPKAMEVLNKNEeNHI ~ I FIi'tf:N~LR~ ~ ' ~ I FEi~I MOKTV DLERLATLKASSNFNEEWYQKRGD~i~i~I~V;KY~ . . . . . ~SSSH~HH)MEECLRKVESOG~V~i~IP~i~;FE~VV~L~'i 9OO 1450 1500 EDFS~Ai~LECCP~SPiiNIIFRiN~NMENRIi"irLKNEYMTI S~DDASK~NQG~S~NCN~LRLTPTYYIi(I~LVQALQL~'HtlKK:~ FLGDQ~QLF Uukuniemi L 950 RSV Pol 1000 1550 1600 N0N.AMLN'I3~I)T'r'EM~VFQ A ~ GEI OYF~MKA~RSYI ETmlit~t!;t~IL~LFIiA I FH)QL~EECRRDI NR.~I KTI NNKENEKVSCI ~NNMESSDD$ Uukuniemi L 1050 RSV Pol 11 O0 1650 1700 SFI i:SI~NFKENEAAQLYELCVVNSWNRKi~EKLGT~SPI~S~Q~FVMEF~SEFFFS GDVHI~I~TFN~N~VEiGEI1E~SGI qEELSNTLKDVI ................ Uukuniemi L ~SV Pol Uukuniemi L RSV Pol Uukuniemi L RSV Pol Uukuniemi L :~:. . . . ..... ......... ............ ................... ~: . ~.,:~.. ;': :~. ;::;::.:I: . ~:l::::. *.. :~.. :. GMMI~IFPSTDKGATGKYR~SAL--IFKYKKVI~:~'Ii~i~SSV~$TNNTLttLL~N~HINHNR~LNITACDT~:SEOES~ASRO~MYIqNLTSVL 1150 1200 1750 1800 ~G~TYALTFIV~V.AOAMI~RM~(ISSA~SV~PA~TLLKNSYD~F~I)N~](CA~FN~IACTTrP~EK~HEMI OEEMKAESOSLI(SVTE ~SFSi]VSFC~FG~LLL~TL~MTV;i~LFLE,~I K~VSEI K ~ i ~ S ~ Y ~ H ~ F G S ~ S ; F k ~ A Y Q N S I ~ S i ~ R S i i i ; I QN;F;XXP~kTLb 1250 1300 1850 1900 DTINTGL~SRT~MVGNN~R~MK~MTT~S~VYEK~EEEORV'[FFHAATAEQI I OKi~I E~IKgPNI Og~Nfa~MLA~K~!A~NFFI ~N~VF- TMS ~ S G T ~ - Q S ~ i i R~Di~QR~VDI~P~;WLDVi~KN~Ei VYRI?PR;GFEVSLRi ~ ' H ~ S N ~ i ~ c ) I ~ V i S ~ @ i L~SI:LSD;LA 1350 1400 1950 2000 ~ECVRVQiQR IKIH~GREERSISLLFENMAKSM~ AYNADPNRKTS~L~ELINSSKIPQRItDYLQEP~TK~VDEDSWEFK~CRCTDQYDVR WL~iEEEViiRP~Y~VMNQPELDLHS RLTPiQLS~N~MM~FEKLQTHLR ~Yk~:I~GEF?SliiiVITQ~RVNILETE~il~iiPE, ;ii.iDKWI~CFTRT 1450 1500 Fig. 4. Amino acid sequence homology between the predicted protein Pol of RSV and the L protein of UUKV. Identical residues are indicated by two dots and are shaded; consensus amino acid similarities are indicated by one dot. Gaps inserted in the sequences to maximize homology are indicated by dashes. Sequence data were obtained from Elliott et al. (1992), in which the RNA polymerase motifs of UUKV were indicated by underlining in the amino acid sequence. 1 vRNA1 5' cRNA1 3'~ vRNA2 1 5' ~ cRNA2 3' vRNA3 5 cRNA3 3' vRNA4 5 cRNA4 3' 8970 I~ Pol (336.8K) ]ml5' 3514 3' fl 94K ~ 5' 2504 1• l• ~ 3' 3' ~ N (35:1K) i ~ 5' 2157 3' ~ 32.4K ]" 5' Fig. 5. Genome structure and coding arrangement of RSV. Black lines are genomic RNAs, with the nucleotide numbers on both ends. The open reading frame and its direction are indicated with arrows on vRNA and cRNA. Shaded arrows show that the corresponding proteins have been found : Pol, the putative RNA polymerase protein (probably the 230K protein) ; N, nucleocapsid; Ns, non-structural protein (S protein). Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 3578 S. Toriyama and others suggest that these particles might correspond to the nucleocapsids of enveloped viruses of Bunyaviridae or Arenaviridae (von Bonsdorff et al., 1969; Palmer et al., 1977). Grateful acknowledgement is made to Professor M. Kojima of Niigata University for his continuing interest, encouragement and discussions throughout this work. We also thank Dr N. Ogasawara of Plant Biological Defence System Lab. Co., Ltd for providing DNA sequencing facilities, and Dr T. Ogasawara of Hitachi Software Engineering Co., Ltd for the homology search of nucleotide and protein sequences. This work was supported in part by Grants-in-Aid from the Ministry of Agriculture, Forestry and Fisheries [Biocosmos Program-94-(4)]. References ACCARDI, L., GRO, M. C., BONITO, P. D. & GIORGI, C. (1993). Toscana virus genomic L segment: molecular cloning, coding strategy and amino acid sequence in comparison with other negative strand RNA viruses. Virus Research 27, 119-131. BARBIER,P., TAKAHASHI,M., NAKAMURA,I., TORIYAMA,S. 8~ ISHIHAMA, A. (1992). Solubilization and promoter analysis of RNA polymerase from rice stripe virus. Journal of Virology 66, 6171-6174. DE HAAN, P., KORMELINK, R., RESENDE, R. O., VAN POELWIJK, F., PETERS, D. & GOLDBACH, R. (1991). Tomato spotted wilt virus L RNA encodes a putative RNA polymerase. Journal of General Virology 72, 2207 2216. ELLIOTT, R. M. (1989). Nucleotide sequence analysis of the large (L) genomic RNA segment of Bunyamwera virus, the prototype of the family Bunyaviridae. Virology 173, 426436. ELLIOTT, R. M. (1990). Molecular biology of the Bunyaviridae. Journal of General Virology 71, 501 522. ELLIOTT, R.M., SCHMALJOHN, C.S. & COLLETT, M.S. (1991). Bunyaviridae genome structure and gene expression. Current Topics in Microbiology and Immunology 169, 91-141. ELLIOTT, R. M., DUNN, E., SIMONS, J. F. & PETTERSSON, R. F. (1992). Nucleotide sequence and coding strategy of the Uukuniemi virus L RNA segment. Journal of General Virology 73, 1745-1752. FALK, B. W. & TSAI, J. H. (1984). Identification of single- and doublestranded RNAs associated with maize stripe virus. Phytopathology 74, 909-915. FRANCKI, R. I. B., FAUQUET,C. M., KNUDSON, D. L. & BROWN, F. (editors) (1991). Classification and Nomenclature of Viruses. Fifth Report of the International Committee on Taxonomy of Viruses. Archives of Virology supplementum 2, 398-399. GIORGI, C., ACCARDI, L., NICOLETTI, L., GRO, M. C., TAKEHARA, K., HILDITCH, C., MORIKAWA, S. & BISHOP, D. H. L (1991). Sequences and coding strategies of the S RNAs of Toscana and Rift Valley fever viruses compared to those of Punta Toro, Sicilian sandfly fever, and Uukuniemi viruses. Virology 180, 738-753. GUBLER, U. & HOFFMAN, B.J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263 269. HAMAMATSU, C., TORIYAMA, S., TOYODA, T. 8¢; ISHIHAMA, A. (1993). Ambisense coding strategy of the rice stripe virus genome: in vitro translation studies. Journal of General Virology 74, 1125 1131. HANAHAN,D. (1985). Techniques for transformation of Escherichia coli. In DNA Cloning: A Practical Approach, vol. 1, pp. 109 135. Edited by D. M. Glover. Oxford: IRL Press. HENIKOEE, S. (1984). Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28, 351 359. Hsu, M.-T., PARVIN, J. D., GUPTA, S., KRYSTAL, M. & PALESE, P. (1987). Genomic RNAs of influenza viruses are held in circular conformation in virions and in infected cells by a terminal panhandle. Proceedings of the National Academy of Sciences, U.S.A. 84, 814(~8144. HUIET, L., KLAASSEN,V., TSAI, J. H. & FALK, B. W. (1991). Nucleotide sequence and RNA hybridization analyses reveal an ambisense coding strategy for maize stripe virus RNA 3. Virology 182, 47-53. HUIET, L., TSAI, J. H. & FALK, B.W. (1992). Complete sequence of maize stripe virus RNA 4 and mapping of its subgenomic RNAs. Journal of General Virology 73, 1603-1607. HUIET, L., TSAI, J. H. & FALK, B. W. (1993). Maize stripe virus RNA5 is of negative polarity and encodes a highly basic protein. Journal of General Virology 74, 549 554. IHARA, T., SMITH, J., DALRYMPLE, J. M. & BISHOP, D. H. L. (1985). Complete sequences of the glycoproteins and M RNA of Punta Toro phlebovirus compared to those of Rift Valley fever virus. Virology 144, 246-259. ISHIKAWA, K., OMURA, T. & HIBINO, H. (1989). Morphological characteristics of rice stripe virus. Journal of General Virology 70, 3465 3468. IAPALUCCI, S., LOPEZ, R., REY, O., LOPEZ, N., FRANZE-FERNANDEZ, M.T., COrrEN, G.N., LUCERO, M., OCHOA, A. & ZAKIN, M . M . (1989). Tacaribe virus L gene encodes a protein of 2210 amino acid residues. Virology 170, 40~7. KAKUTANI, T., HAYANO, Y., HAYASHI, T. & MINOBE, Y. (1990). Ambisense segment 4 of rice stripe virus: possible evolutionary relationship with phleboviruses and uukuviruses (Bunyaviridae). Journal of General Virology 71, 1427-1432. KAKUTANI, T., HAYANO, Y., HAYASHI, T. & MINOBE, Y. (1991). Ambisense segment 3 of rice stripe virus: the first instance of a virus containing two ambisense segments. Journal of General Virology 72, 465~,68. KOGANEZAWA, U., Dol, Y. & YORA, K. (1975). Purification of rice stripe virus. Annals of the Phytopathological Society of Japan 41, 148-154. MULLER, R., ARGENTINI, C., BOULOY, M., PREHAUD, C. ~ BISHOP, D. H. L. (1991). Completion of the genome sequence of Rift Valley fever phlebovirus indicates that the L RNA is negative sense or ambisense and codes for a putative transcriptase-replicase. Nucleic Acids Research 19, 5433. PALMER,E. L., OBIJESKI,J. F., WEBB,P. A. & JOHNSON,K. M. (1977). The circular, segmented nucleocapsid of an arenavirus-Tacaribe virus. Journal of General Virology 36, 541-545. POCH, O., SAUVAGET, I., DELARLrE, M. & TORDO, N. (1989). Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO Journal 8, 3867-3874. RAMIREZ, B.-C., LOZANO, I., CONSTANTINO, L.-M, HAENNI, A.-L. & CALVERT, L.A. (1993). Complete nucleotide sequence and coding strategy of rice hoja blanca virus RNA4. Journal of General Virology 74, 2463 2468. RONNHOLM, R. & PETTERSSON, R.F. (1987). Complete nucleotide sequence of the M RNA segment of Uukuniemi virus encoding the membrane glycoproteins G1 and G2. Virology 160, 191 202. SANGER, F., NICKLEN, S. 8¢ COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SCHMALJOHN, C.S. (1990). Nucleotide sequence of the L genome segment of Hantaan virus. Nucleic Acids Research 18, 6728. SUZUKI, Y., FUJI, S., TAKAHASHI,Y. & KOHMA, M. (1992). Immunogold localization of rice stripe virus particle antigen in thin sections of insect host cells. Annals"~?fthe Phytopathological Society of Japan 58, 48(L484. TAKAHASHI, M., TORIYAMA, S., KIKUCHI, Y., HAYAKAWA, T. & ISHIHAMA, A. (1990). Complementarity between the 5'- and Yterminal sequences of rice stripe virus RNAs. Journal of General Virology 71, 2817 282I. TAKAHASHI, M., TORIYAMA,S., HAMAMATSU,C. & ISHIHAMA,A. (1993). Nucleotide sequence and possible ambisense coding strategy of rice stripe virus RNA segment 2. Journal of General Virology 74, 769-773. TORIYAMA,S. (1982). Characterization of rice stripe virus: a heavy component carrying infectivity. Journal of General Virology 61, 187-195. TORIYAMA,S. (1983). Rice stripe virus. CMI/AAB Descriptions of Plant Viruses, no. 269. TOR1YAMA,S. (1986a). An RNA-dependent RNA polymerase associated with the filamentous nucleoproteins of rice stripe virus. Journal of General Virology 67, 1247 1255. TORIYAMA, S. (1986b). Rice stripe virus: prototype of a new group of Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06 Complete sequence o f the R S V genome viruses that replicate in plants and insects. Microbiological Sciences 3, 347-351. TORIYAMA, S. (1987). Ribonucleic acid polymerase activity in filamentous nucleoproteins of rice grassy stunt virus. Journal of General l/Trology 68, 925-929. TORIYAMA, S. & WATANABE,Y. (1989). Characterization of single- and double-stranded RNAs in particles of rice stripe virus. Journal of General Virology 70, 505 511. VON BONSDORFF,C. H., SAIKKU, P. & OKER-BLOM, N. (1969). The inner structure of Uukuniemi virus and two Bunyamwera supergroup arboviruses. Virology 39, 342 344. YANBCH-PERRON, C., VIEmA, J. & MESS1NG,J. (1985). Improved MI3 3579 phage cloning vectors and host strains : nucleotide sequences of the M13mpl8 and pUC19 vectors. Gene 33, 103-119. ZHU, Y., HAYAKAWA, T., TORIYAMA, S. ~,L TAKAHASHI, M. (1991). Complete nucleotide sequence of RNA 3 of rice stripe virus: an ambisense coding strategy. Journal of General Virology 72, 763-767. ZHU, Y., HAYAKAWA,T. & TORIYAMA,S. (1992). Complete nucleotide sequence of RNA 4 of rice stripe virus isolate T, and comparison with another isolate and with maize stripe virus. Journal ofGeneral Virology 73, 1309-1312. (Received 7 June 1994; Accepted 19 August 1994) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 14 May 2017 00:09:06