* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The 1B (NS2), 1C (NS1) and N Proteins of Human Respiratory
Metalloprotein wikipedia , lookup
Gene nomenclature wikipedia , lookup
Interactome wikipedia , lookup
Community fingerprinting wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene regulatory network wikipedia , lookup
Western blot wikipedia , lookup
Expression vector wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Magnesium transporter wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Homology modeling wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Biochemistry wikipedia , lookup
Proteolysis wikipedia , lookup
Biosynthesis wikipedia , lookup
Point mutation wikipedia , lookup
J. gen. Virol. (1989), 70, 1539 1547. Printedin Great Britain 1539 Key words: RSV, human/nucleotide sequence/evolution The 1B (NS2), 1C (NS1) and N Proteins of Human Respiratory Syncytial Virus (RSV) of Antigenic Subgroups A and B: Sequence Conservation and Divergence within RSV Genomic RNA By P H I L I P R. J O H N S O N 1 , 2 , AND P E T E R L. C O L L I N S 1. 1Laboratory of Infectious Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892 and 2Department of Pediatrics, Vanderbilt University, Nashville, Tennessee 37232, U.S.A. (Accepted 3 March 1989) SUMMARY A 2330 nucleotide sequence spanning tlae 1B (NS2), IC (NS1) and N genes and intergenic regions of human respiratory syncytial virus strain 18537, representing antigenic subgroup B, was determined by sequencing cloned cDNAs of intracellular mRNAs. Comparison with the previously reported sequences for strain A2 of subgroup A showed that 1B, 1C and N were highly conserved at the nucleotide level (78, 78 and 86~ identity, respectively) and at the amino acid level (92, 87 and 96~o identity, respectively). The gene-start signals were exactly conserved between subgroups, and the gene-end signals contained only a single nucleotide substitution each in 1B and N. In most cases intergenic and non-coding gene sequences that were not part of presumed transcriptive signals were much less well conserved (generally 50 to 71~) than sequences that were part of translational open reading frames (82 to 86~). The nucleotide and deduced amino acid sequences of the N gene and protein of the Long strain of subgroup A were determined by sequencing cDNA clones of intracellular mRNA; the nucleotide sequence (representing all but the first 10 nucleotides of the gene) contained 15 differences from that of the A2 strain, but the deduced amino acid sequences were identical. Human respiratory syncytial virus (RSV) is an important, ubiquitous cause of pediatric respiratory tract disease (Mclntosh & Chanock, 1985). RSV is an enveloped, RNA-containing virus that is classified in the pneumovirus genus of the paramyxovirus family. RSV genomic RNA (vRNA) is a single negative-sense strand of approximately 15000 nucleotides which is transcribed in a sequential, polar fashion to yield 10 major species o f m R N A (Collins et al., 1984, 1985, 1986; Collins & Wertz, 1983, 1985; Dickens et al., 1984, and references cited therein). An additional short non-unique polyadenylated RNA is generated by transcriptional attenuation within the L gene (Collins et al., 1987), but this species has not yet been shown to have messenger activity. The 10 major RSV mRNAs encode 10 major proteins, namely the F and G glycoproteins, the M and 22K (or M2) proteins of the inner surface of the viral envelope, the small integral membrane SH (or 1A) protein, the large nucleocapsid L protein, the nucleocapsid phosphoprotein P, the major nucleocapsid protein N, and the 1B and 1C (or NS2 and NS1) proteins which are thought to be non-structural (Collins et al., 1984; Huang et al., 1985). The gene order determined by sequencing v RNA is 3' 1C (NS 1)--1B ( N S 2 ) - N - P - M - S H - G - F - 2 2 K (M2)-L 5' (Collins et al., 1986) and is the same as the order of sequential gene transcription (Collins & Wertz, 1983; Dickens et al., 1985). Two distinct antigenic subgroups of RSV have been identified on the basis of differences in t Present address: Georgetown University School of Medicine and Dentistry, NIH/Twinbrook Facility, Rockville, Maryland 20852, U.S.A. 0000-8794 © 1989 SGM Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 1540 Short communication reactions with monoclonal and polyclonal antibodies (Coates et al., 1966; Anderson et al., 1985; Mufson et al., 1985, 1987; Gimenez et al., 1986; Hendry et al., 1986; Akerlind & Norrby, 1986; Johnson et al., 1987a; Morgan et al., 1987; Orvell et al., 1987). Information on the extent of naturally occurring antigenic and structural diversity among RSV strains is important for guiding vaccine development and might also provide additional insight into the structure, function and evolution of RSV vRNA and its gene products. For example, the cross-subgroup relatedness of the F and G proteins has been investigated by monoclonal antibody reactivity (Anderson et al., 1985; Akerlind & Norrby, 1986; Mufson et al., 1985; Orvell et al., 1987), by analysis of convalescent human and animal sera by neutralization in vitro and by F or G proteinspecific ELISA (Coates et al., 1966; Johnson et al., 1987 a), by cross-protection studies in which animals were immunized with recombinant vaccinia viruses expressing the F or G protein (Johnson et al., 1987a ; Stott et al., 1987) or with purified F or G protein (Walsh et al., 1987) and by cDNA cloning and sequencing of the F and G mRNAs of the subgroup B strain 18537 for comparison with the previously published sequences for the subgroup A strain A2 (Johnson et al., 1987b; Johnson & Collins, 1988a, b). These results established that the F proteins of the two subgroups were highly related antigenically (two-fold difference in antigenic reactivity) and structurally (91 ~o amino acid sequence identity exclusive of the predicted signal peptide), whereas the G proteins were relatively distinct (20- to 40-fold difference in antigenic reactivity and 53 ~ amino acid identity). The unexpectedly large amount of cross-subgroup diversity in the G protein prompted further analysis and comparison of nucleotide and amino acid sequences for the two prototype RSV subgroup strains A2 and 18537. Recently, we described the additional finding that the intergenic and flanking gene regions of RSV vRNA are poorly conserved between subgroups, compared with the relatively greater conservation of nucleotide sequences that encode protein and the nearly exact conservation of the short gene-start and gene-end sequences that are located at the gene termini and are thought to be polymerase recognition signals (Johnson & Collins, 1988b). In the work described in the present paper, a 2330 nucleotide sequence was determined for intracellular mRNAs representing the 1B, IC and N genes of strain 18537, and these sequences were compared with their counterparts for the A2 strain. Also, the nucleotide and amino acid sequences of the N gene and protein of the Long strain of subgroup A were determined. As described previously (Johnson et al., 1987b), cDNA libraries were constructed using as template mRNA isolated from HEp-2 cells that had been infected with RSV strain 18537 or Long. cDNA clones were identified presumptively as viral by differential hybridization with radiolabeUed cDNA synthesized by reverse transcription of mRNAs from uninfected cells or from cells infected with the 18537 or Long strain. Virus-specific cDNAs of strain 18537 1B, 1C and N genes and the N gene of the Long strain were identified by the homology of their nucleotide and predicted amino acid sequences with those described for the A2 strain (Collins & Wertz, 1985; Collins et al., 1985). The identities of these cDNAs were also confirmed by hybridization with radiolabelled cDNAs of the 1B, 1C and N genes of strain A2. Dideoxynucleotide sequencing of denatured plasmid DNA using synthetic oligonucleotide primers was performed as described previously (Johnson et al., 1987b; Zagursky et al., 1985). The sequences for the N gene and protein of the Long strain were determined from a cDNA clone, LD35, that initiated at nucleotide 11 of the mRNA sequence and otherwise contained the complete sequence including polyadenylate. The A2 and Long strains both represent antigenic subgroup A, and the nucleotide sequences of the two N genes contained only 15 nucleotide differences (Table 1). Fourteen of the 15 changes were in the third codon position, and none of the changes resulted in a change in the encoded protein. Thus, the N proteins of the two subgroups are predicted to be identical, at least for the isolates from which the sequences were determined. Previously, Anderson et al. (1985) showed that an N-specific monoclonal antibody (designated 132-7B) bound in an immunofluorescence assay to cells infected with the A2 strain but not to cells infected with the Long strain. However, in an ELISA, the same antibody bound to cells infected with either strain (Anderson et al., 1985). Taken together with the sequencing data described here, this latter result suggests that the difference in reactivity observed in the Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 1541 Short communication Table l. Nucleotide differences between the N genes of the A2 and Long strains Nucleotide sequence position* r A2 75 225 327 333 471 522 528 540 627 711 783 787 903 975 1110 C G T A C T n C C T G T T C C Nucleotide identity according to strain ~ Long (18537)t T T C G T C G T T C A C A T T (C) (T) (A) (A) (T) (C) (A) (T) (T) (C) (A) (C) (A) (C) (C) *Number of nucleotides ~omthe 5' end in the complete mRNA sequence published previously for the A2 strain (Collins et aL, 1985). 1"The 18537 and A2 strains differed at 169 nucleotides (Fig. 1, Table 2); entries here are only for those positions where the Long strain differed from A2. At these 15 positions, none of the nucleotide differences in any strain resulted in a change in amino acid coding assignment. For these 15 positions, strains A2 and 18537 were identical at five nucleotides, Long and 18537 were identical at nine, and 18537 was unique at one. immunofluorescence assay should not be interpreted as evidence of antigenic difference, although it also is possible that the Long isolate used in that study contained one or more amino acid differences from the one sequenced here due to intrastrain sequence variability (Collins et al., 1984) or that a difference in another protein affected the binding of the antibody to N protein in that particular assay. The strain 18537 sequences were determined from the previously described (Johnson & Collins, 1988b) 1C-1B dicistronic c D N A s C9 and G15, the 1 C - 1 B - N tricistronic c D N A B53, and the N P discistronic c D N A s R4 and F5. Additional c D N A s sequenced for this work were as follows. D35 and C85 initiated at nucleotides 10 and 12, respectively, of the 1C sequence and otherwise contained the complete sequence including polyadenylate; AA75 initiated at nucleotide 30 of the I B sequence and otherwise contained the complete sequence including polyadenylate; 7N30 initiated at nucleotide 31 of the N sequence and otherwise contained the complete sequence including polyadenylate. In situations where different c D N A s overlapped the same gene region, no sequence differences were observed. The gene and encoded protein sequences of 18537 1C, 1B, N and the upstream region of P are shown in Fig. 1 and aligned with the previously reported A2 sequences. The IB, 1C and N genes of strain 18537 were each identical in length to their strain A2 counterpart and shared 78, 78 and 8 6 ~ nucleotide sequence identity, respectively, with the corresponding strain A2 gene (Fig. 1, Table 2). Consistent with previous findings, the sequences of the translational open reading frames were more highly conserved (82, 83 and 8 6 ~ for 1B, 1C and N) than were flanking gene sequences that did not encode protein and were not part of the gene-start or gene-end sequences (apart from the transcriptive signals, the non-coding gene sequences had 50 to 71 ~ identity between subgroups for 1B and 1C). The six nucleotide 5' noncoding region of the N gene immediately following the gene-start sequence was exactly conserved between the subgroups, but the significance of this is unclear because the sequence is short and is not found in the 5' non-coding regions of other RSV genes. The 31 non-coding nucleotides immediately following the gene-start sequence of the IC gene were also relatively highly conserved ( 8 4 ~ identity). It will be of interest to sequence the 3' end of v R N A , which is Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 Short communication 1542 lC 1 . 5 ~.',2 7 Start ~ G A A r T r A ¢ c T TCCC TCG T G CS e t T T A C T C,ATAA~TGCTAr TTAAAXCT~CCTTTT¢~ATCA~AAATe¢C~TGCAATT¢^¢T~^~¢^TC.AT~e~r T~^r TA¢~TTTATTT~^C~T~¢G~T^eC^T~eTT~T~CA 1~ METG•yCy•AsnSerLeuSe•MET••eLy••a•Ar•LeuG•nAsnLeuPheAspAsnAspG•uVa•A•aLeuLeuLysI•eThr A2 18537 His Val AspIle Asn C T AA TA T C T G T G G C G T T G G G T T TAT AT T A T GTTATACTGACAAATTAATTCTTCTGACC AATGCATTAGCCAAAGCAG TAATACATACAAT TAAAT TAAAC GGCATAGT TTTTATACATGTTATAACAAGCAG TGAAGT GTG CC CT GACAACAATAT T GTAGT GAAA Cys~yrThrAspLy~LeuI~eLeuLeuThrAsnA~aLeuA~Ly~A~a~a~I~eHi•ThrI~eLy•LeuA~nG~yI1eVa~heI~•~i8Va~I~eThr$erSerG~uVa~CysPr~AspA~DAsnI leValValLys 267 74 11537 Val MET~T Pro Le~Asp Lys C T C G C T T T A G A A T C T C A A C T C C A T T CTAACTTTACAACAATGCCAATAT TACAAAACGGAGGATACATATGGGAATTC, d~TTGAGTTGACACACTGCTCTCAATCAA~TGGT¢TAATGGTTCATAATT G T G A A A T C ~ T T TT C T ~ C T ~ G T ~ C TCA S e rAlnPheThr Thr ~ T P r O IleLeuGlnAlnGlyGlyTyr I l eT~pG luLeuI l eGluLeuThr Hi •Cysser GInSerAInGIyLeU~ETValASpASnCyICIu lle LysPheSer LylAr gLeuSerAspser 405 120 A~ 18537 Thr Le u GIu ~he Pro I C End intergenic ^C C r ^ AT r C ^ AA A A C AC UC ^ TCAA ¢AC A ^¢CA ~ T A ¢^¢ AC AA GTAATGACTAATTATATGAATCAAATAT CTGATTTACTTGCGCTTGATCTCAATTCATGAATTATGTTTAGTCTAATTTA^TAGACATGTGTTTATCACCATTTT~GT TAATATAAA~C CTCATCAAAGGGAAA--T V• IMETThrAsnTyr~TA,nGInI leS•rAspLeuLeu¢lyLeuA•pl~uAsnSer - * * 540 13~ 1B S t a r t A2 11537 [ ~ TeA TTC GC Asp GA ^C C HI8 CCAC A T T Pro-AC A--- C GT Leu AC T Thr GA C A A A Ar gAsp G c 677 iGGGGCAAA~GAACTcAC~TA~TcAGT~A~AccATGAGcAcT^~^---AATc~cAA~A~TAcTATGCA~.Ac`ATTcaT¢ATcxcAGA¢AT~¢~cCTGTC~T~T~T~T~CTTcTcTcAcc~G~c^TA ME~$er~h•~r_AsnA•pAsnThrThr~T~nAr~LeuMET~eThrAj~v~TArgPZ~Leu~erMETG~u~er~e~1eThr~e~LeuThrLysG1uI~eI~e A~ I|$37 A2 18537 35 His Lys T TC C A G A G c ¢ u C A A A A C T A T A T ACAC.ACAAATTCATAT^CTTGATAAACAATC.AATGTATT GTAAC~AACTT GATG~AAGA~AAG~YrACATT T A C A T T C T T A G T ~ T T A T ~ T ~ T A T T G C A C ~ G T A G G ~ G T A ¢ ¢ A T A T A ~ G ~ T A ¢ ThrMis LysPh• IleTyr Leul IaAI~AInGIuCy s I I eV~ iA/gLy *Le~A*pGIuAr ~ClnAlsThrPheThrPheLeuVa iAlnTyr GIUMETLy• LeULeU~il LySV* iGlySe rThr I leTyr Lys Ly•Tyr |I$ I~ C C T ^ C T cr c . c T C G T T A T ACTGAATATAATACAAAATATCa~¢ACTTTCCC CATGCCTATATTTATCAATCATGACGGGTTTCTACAATCTATTCd~CATTAACd:CTAC2%AAACACACTCCTATAATATA~TATGACCTCAACCCGTAAATTCCA T h r G l u T y r A mn Th r b y * T y r G l y T h • p h *P r o ~ T# r o I 1 e P h • I i e AlrUqil a l p G l ¥ p h e L e u g l u c y * I l l G l y I l • Ly #P ~o Th f L y * H i • r h r P r o 11 • 11 e T y r h y • r y r a l p LeuA I n e • O* * * 95~ 121 2B E n d J~tergenic W Start A2 18537 C T T C A AT T A~C A A e TATOC T T^ A C ~" CC T CTCaAA A [-----'--'-'~--'~ T Ae.-AOATATA e T O e . A ~ ACAIO~.AACTAAeC C~TCCdU~d: T ; ~ a CTATTCCTClO~ACaU~C^~TC,CTC.AAC.XGTTAAC,AAC,e.AC.CTA~TCC A T T T ' ~ r ^ A T T / O U t ~ AAAGGT^eACC¢~AT;~eATAAATa'~CG eCAA~T~CA.~AGATCC~:T ~TAII 10537 ~TTAGCAAAGTCAAGTTAAATGATACATTAA~TAAGGATCA~TG~TGT~AT~CAGCAAAT&CA~T&TTCAACCT~GT&CAG~T~TA~CTCCC~TTAT~TGTGC~cAC¢T~TATGT a cc c A A r C LeuSer Ly/ValLy/LeuAa~A•pThZLeUA/nLyeAmpGInLeuLeuser$1E$IrLyITyrTh~ C ~ s•r c C r r ile A¢r a 1090 ~r 1221 41 ilgUlnArgS•~ThrGlyAspAsnIlaAlpThrProAsnTyrAspValGlnLylHilLeuAlnLFs~uCy• A2 11537 Arq C T A T r U U X A C A CU U A ^ ~TATGCTATTAATCACTed~AaATG~AATCATAAATTCA~C.GATTAAT~,GTAT G T T A T A T G C T A T G T C C A G G T T A G ~ A T ~ T A C ~ T G ~ G ~ T A T C A T G T T ~ G C T ~ T G ~ GIyMETLeULeu| leThEGluAs~AlaAmnH£JLymPheThrGlyLeu l1 •G i y ~ T ~ u T y E A I aHET 5 • rAEqLeuG lyAm qG luAmpTh r I leLysl IILeuLysAspAIaGIyTyEfl£ gym ILFIAIaAaflGIy 1366 94 A2 I|537 val enls ~a rhr ix• A~ ~ c T ^ T ~ ~a a AT T A ¢ A GTAGATATA~C~%ACATAT C~TCAAGATAT^^ATG~&~%G~4%AAT~O~%ATT¢ GAAGTATTAACATT A ~ TGACATCAV,%AATACAAGT ~ T ATT~ T A ~ T ~ A ~ CCTA~ TGCT ~ V a l A ~ p I l l T h r T h r TyrAw~jG l n A ~ p I XeA~nG1FLy sG 1 u~ET L y l P h l G l u V a l L m u T h r L e u 3 e r S e ~ L e u T h r S l r G l u I I I G I n V I I A ~ n I l e G l u I l e G I U S I r A Z q L y s S I ~ T y r L y s L y l L e u L e u L y * 1504 140 A G A C C r T TA A TA T A G C T C G A T GAGATGGG~%GTGGCTCCAGA&TAT~TGATTCTCCAGA~TGTGGGATGATAATACTGTGTATAGCTGCACTTGTAATAAc~A~G~TA~TC~TACAGCAGT~A~GG~A GIU/~ETGI~GIuVa 1AI aP EOGIuT~ ~A~gHI•A~pS eEP • o A s p ~ •Gl y ~ T I i• I l~LeuC~s IIeAIaAIaLeuVal I I •Th r Ly s LeuAl realaGIyASpAr qS• r G ly LeuThr A1 aVa I Il~ArqA~qAla 1642 l|~ ~T L~U ph• T c T ~ T A T AC ¢ C C C ¢ T C CT T T T ^ T T AACAATGTCTT~AAAAACGA~ATAAAACGCTACAAGGGCcTCATACCAAAGGATATAG~TAACAGTTTTTATGAAGTGTTT~CA~C~T~TATA~T~G~TT~T~TCC A~nA~ nV• 1 LeuLy sA~nG lu I laLy•AzgT~zL~GlyLeul leproLy~AspX leAl~AsnS•rPheTyrGluValPheGluLysHi~ProSisLeu IleAspVa IPh~VIIH~ sPh~Glyl leA1sGlnSerSer 1710 232 ~ 11537 c T C ~ T ~ ~ T CC ~ ^ A Y Y T ~ A AC~u~AGGGGGT~GTAGAGTTC-%AC~d~ATCTTTG~AGGATT~TTTAT~AATGccTATGGTTCAGGu~J~%GTAATGcTAAGATGc'GGA~T~GT~TAT~T~A~e~T~A~C~G Th rA~qGIyG lySerAz~Va IGIUGXy llep heAlaGlyLeu~ h a ~ T A ~ hal ~ Tyr G1 ySer GlyGlnVa I/~TLeuA~gT rpGlyVal LeuAl a LysSerV~ILy~A~nl i e/~TLe UGI yH£ ~AI a S• rva IGI n 1~1| 271 A2 11~37 T G ^ C A A T T A C AT A T C~CAGAAATGGAACAAGTTGTGGAJkGTTTATGAGTATG~ACAGAAGTTGGGAGGAGAACCTGGATTCTA~CATATA~TC~T~TGT~TT~CT~T~C~T~T~C 2056 A2 11537 A~ ~1537 G C A~a GIUMETG~UGXn V a l V a XGl u V a 1 T~EGl u T ~ EAI ^ G i n Ly • LeUGI y G l y U 1 UA1a G l y P h e T~ E H i • lleLe~L,%•nA~nProLy~sSeELeul,euSerLeuThrGlnPh~p A2 Z1537 T T T C C G G T A A G T A CT~T~GGTCTAG~C-ATAATGGGAGAGTATA~AGGTACACC`AAGAAAC~TCTATATGATC`CGe~TATGCAGAGCAACTC~T~¢T~T~CTA~GTGTATT~A L e u G l y A ~ n A l a A 1 • G I y L 4 U G 1~ i X~qETGI y G X u T y t A ~ ~ G I y T h r F r OAZgAsnG l n A • p L e u T y r A~pAIaA1 a L y ~ N End A2 115~7 (391) C A A T C 324 G 21%4 a T~zAlmGXUGi n LeU L y s G1 uA~nG 1 ~ a 11 l e A ~ n T y r S • r V a 1 LeUA~ p L e u 370 p Start C A G T C a C T T A A ~"An ~ G ~ = = = = - - ~ =G= ~ -. A ~ A T ACAGCAGAAGAATTGC*AAGCCATAAAGCATCA~=TCd~ACCC CAAAGAAGATC.ATGTAGAGCTTTA~ TTAACAAAA~AdGG~TkAGT~T Th~AIaGIUGIULeuGIuAlaI leLysHisGInLeuAsnProLysGXUAspAepValGlUI~U. * • T C r oAsr~heSerSerV•lVal m A T C G ~ G T ~ G ~ C C T ~ T G ~ T G ~ . . ~TGlULySPh~A~aPrOGI~heHisGIyGI~A•pAla.., , 2330 /nt ergenl~ Fig. 1. Alignment of the nucleotide and deduced amino acid sequences of the 18537 (subgroup B) and A2 (subgroup A) 1C, 1B and N genes and proteins, including the nucleotide sequences of the intergenic regions and the nucleotide and encoded amino acid sequences of the upstream region of the P gene. For the A2 sequence, only the nucleotides and amino acids that differ from the 18537 sequence are shown. Gaps introduced into the nucleotide sequences during alignment are indicated by short dashes within the sequences, and gaps introduced into the amino acid sequences are indicated by longer dashes. Genestart and gene-end sequences are boxed. The short N-P intergenic region is underlined with a filled rectangle. predicted to be immediately upstream of 1C (Collins et al., 1986; Dickens et al., 1984), and to determine whether the conserved sequence at the start of 1C is part of a larger conserved structure. The nine nucleotide gene-start sequences of 1B, 1 C, N and P were exactly conserved between subgroups, and the 12 to 13 nucleotide gene-end sequences contained only a single nucleotide substitution each in IB and N. The deduced amino acid sequences of the strain 18537 1B, 1C and N proteins were identical in length to their A2 counterparts and shared 92, 87 and 96 ~ sequence identity, respectively (Table 3). For the 1B protein in general, the C-terminal third of the molecule (amino acids 78 to 124) was more highly conserved (100 ~ identity) than were the N-terminal two-thirds (amino acids 1 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 Short communication 1543 Table 2. Summary of nucleotide sequence identities between strains A2 and 18537for the 1B, 1C and N genes Sequence domain 1B gene Complete gene* 5' Non-codingt Open reading frame:~ 3' Non-coding§ 1C gene Complete gene*]l 5" Non-codingt Open reading frames 3' Non-coding§ N gene Complete gene* 5" Non-coding~" Open reading frame~ 3' Non-coding§ Length (nucleotides) Identity between strains (~) 503 23 375 84 78 50 82 57 523 45 420 45 78 71 83 51 1203 6 1176 None 86 100 86 - * Includes the four (in the case of A2 1C, 18537 1C and 18537 1B) or five (A2 1B, A2 N and 18537 N) 3'-terminal A residues (mRNA-sense) that represent the vRNA coding sequences for the poly(A) tail. t Exclusive of the exactly conserved Y-terminal nine nucleotide gene-start sequence. :~Includes termination codon. § Exclusive of the exactly conserved 13 nucleotide (1C) or nearly exactly conserved 12 nucleotide (1B, N) 3'terminal gene-end sequence. II Exclusive of the nine nucleotide gene-start sequence, which was not determined for 18537 1C. to 77, 8 7 ~ identity) (Fig. 1). In particular, in the alignment of the 1B proteins (Fig. 1), the Nterminal 11 amino acids contained a single gap in both the A2 and the 18537 sequences (whereas the 1C and N alignments had no gaps), suggesting that this region of 1B could tolerate some variability in segment length and sequence. The situation for the 1C protein in general was the reverse, with the N-terminal two-fifths (residues 1 to 56) of the molecule being relatively more highly conserved (96~o identity) and the C-terminal three-fifths (residues 57 to 139) being somewhat less well conserved (81 ~ identity). Previous sequence analysis of the 1B and 1C genes of strain A2 (Collins & Wertz, 1985) showed that an 18 nucleotide sequence spanning the end of the open reading frame (this 18 nucleotide sequence consisted of the last four codons, the stop codon and the following three nucleotides) was neady exactly conserved (one nucleotide difference out of 18) between the two different genes, and showed that the C-terminal four amino acids of the two predicted proteins were identical (Collins & Wertz, 1985). Interestingly, antibodies raised against a synthetic peptide containing the C-terminal 12 amino acids of the 1B protein reacted with both the 1C and 1B proteins in immunoprecipitation assays (unpublished results). This suggested that the common four amino acids were part of an antigenic site in each protein and were oriented externally within the folded proteins. However, as shown in Fig. 1, the 1C protein of strain 18537 contained a single amino acid substitution (Pro 139 to Ser 139 in 18537) within this region, suggesting that the duplication of this sequence in the two A2 proteins might be fortuitous rather than indicative of a common functional or structural site. Also, the C terminus of the 1C protein was one of the most divergent regions of the molecule, which also suggested that the C-terminal end was not part of a conserved functional or structural domain. As shown in Table 3, the high degree of amino acid sequence identity between subgroups for the 1B, 1C and N proteins was similar to that described previously for the F1 + F2 protein (Johnson & Collins, 1988a) and for the cytoplasmic and transmembrane domains of the G protein (Johnson et al., 1987b), and contrasted with the extensive divergence described previously for the F protein signal sequence and the G protein ectodomain. Thus, different polypeptide domains exhibited differences in the extent of cross-subgroup sequence identity, Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 Short communication 1544 Table 3. Amino acid sequence identity between strains A2 and 18537for the 1B, 1C, N, F and G proteins Protein or domain Length (amino acids) Identity between strains A2 and 18537 (%) IB 1C N F1 + F2* F signalt G ectodomain:~ G cytoplasmic and transmembrane§ 124 139 391 551 23 229 63 92 87 96 91 35 44 83 * Exclusive of the predicted signal peptide. F1 and F2 are assumed to contain amino acids 24 to 574 of the unmodified F amino acid sequence. ~"The exact cleavage site of the F signal ~as not been determined but is assumed here to follow amino acid 23 of the unmodified F sequence. :~The G ectodomain has not been mapped directly but is predicted to begin after amino acid 63. The 18537 sequence is six amino acids shorter than that of A2, and in the aligned sequences the C-terminal seven amino acids of A2 have no 18537 counterparts and are not included in these calculations. § Amino acids 1 to 63 of the complete G sequence. and the boundaries between conserved and non-conserved regions were sometimes sharply delineated. One interpretation was that the selective pressures influencing sequence divergence and conservation were acting primarily at the protein level and were not uniform. To explore this, we examined the open reading frames of the IB, 1C, N, F and G proteins, tabulated the codons within each that contained single nucleotide changes, and quantified the frequency at which the change was silent at the amino acid level. For simplicity, codons containing more than one change were not considered. The rationale was that a protein or polypeptide domain whose structural or functional properties were relatively intolerant of amino acid substitution would contain a higher frequency of silent single nucleotide changes whereas a polypeptide domain that was relatively tolerant of substitution would have a higher frequency of changes in amino acid coding assignments. As shown in Table 4, this analysis identified three groups: (i) the sequences encoding 1B, N and F1 + F2, for which 87 to 100~ of the nucleotide changes were silent at the amino acid level supporting the interpretation that the encoded proteins were relatively intolerant of amino acid substitutions, (ii) the sequences encoding the 1C protein and G cytoplasmic and transmembrane domains which had 75 to 7 8 ~ silent changes and (iii) the F signal sequence and G ectodomaincoding sequences which exhibited 33 to 48 ~ silent changes, consistent with the interpretation that the encoded polypeptide domains were relatively more tolerant of amino acid substitutions. In the case of the G ectodomain, .the comparison included both intra- (Long and A2) and inter(18537 and A2) subgroup alignments, and the frequencies of silent nucleotide changes were approximately the same (48 ~o and 42~o, respectively) even though the sequences within the A subgroup were much more highly related than those of subgroup A and B strains. Also, the N gene exhibited a high percentage of silent nucleotide differences both between (90 ~ ) and within (100~) subgroups. These results are consistent with the interpretation that the individual genes, proteins and polypeptide domains have different intrinsic rates of sequence substitution, and that the ability of the encoded polypeptide to tolerate substitutions is an important factor. To place the level of sequence divergence between the RSV antigenic subgroups in perspective, we note that the percentage amino acid identities between individual proteins of a human and bovine strain of parainfluenza virus type 3 (PIV-3) were 8 6 ~ (NP), 6 2 ~ (P), 7 6 ~ (C), 8 0 ~ (F) and 7 7 ~ (HN) (Sakai et al., 1987; Suzu et al., 1987). The amino acid sequence identity between the H N proteins of seven independent isolates of human PIV-3 was > 96~o (Coelingh et al., 1988). Thus, unlike human PIV-3, human RSV exhibits a substantial amount of Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 Short communication 1545 Table 4. Differences among individual proteins (IB, 1C, N, F and G) in the apparent constraint on amino acid sequence divergence between strains A2 and 18537: frequency of single nucleotide differences within codons that are silent at the amino acid level Open reading flame No. of codons having one or more nucleotide differences (total no. codons) No. of codons having No. of codons in which the a single nucleotide singledifference is silent difference at the amino acid level (~) A2 and 18537 1B lC N FI + F2 (codons 24-574, exclusive of signal) F signal (codons 1 23) G ectodomain (codons 64-292) G cytoplasmic and transmembrane (codons 1-63) A2 and Long N G ectodomain (codons 63-298) 57 (124) 62 (137) 160 (391) 258 (551) 51 55 152 228 46 (90~) 43 (78~) 137 (90~) 199 (87~) 18 (23) 9 3 (33~) 162 (229) 91 38 (42~) 31 (63) 24 18 (75%) 16 (391) 29 (235) 16 27 16 (100K) 13 (48~) sequence diversity. For the RSV 1B, 1C, N and F proteins, the divergence was much greater than that observed among the human PIV-3 H N proteins but was less than that observed between the bovine and human PIV-3 isolates. We previously suggested (Johnson et al., 1987b; Johnson & Collins, 1988a, b) that the two RSV subgroups represent an early stage in divergent evolution. Continued divergent evolution might eventually result, for example, in two or more distinct human RSV types analogous to the different types of human parainfluenza viruses. However, an alternative possibility is that the two subgroups arose during a past episode of divergent evolution and represent relatively stable endpoints. We cannot distinguish between these possibilities at the present time. But we note that the high frequency of amino acid substitution per nucleotide substitution in the G protein and gene, which was characteristic of the inter-subgroup comparison, was also characteristic of the comparison within subgroup A of the Long and A2 strains (Table 4). The Long and A2 strains are nearly identical, having few amino acid and nucleotide sequence differences (Table 1; Johnson et al., 1987a; Lopez et al., 1988). In particular, the low frequency of nucleotide differences in the non-coding gene regions, which are generally thought to be tolerant of nucleotide substitutions, suggests that these viruses have undergone little divergence and probably share a common ancestor that is very recent compared to the ancestor that gave rise to the two subgroups. Thus, the unusual capacity of the G protein to tolerate amino acid substitutions in the ectodomain probably is a continuing characteristic of current strains rather than a characteristic that existed only during a past episode of divergent evolution. However, although this suggests that current strains have the capacity for relatively extensive amino acid substitution in G, it is not known whether the extent of divergence between the A and B subgroups is an intermediate stage or represents the limit of divergence possible for the G protein of human strains. Finally, these results show that the N gene is the most highly conserved of the five genes compared to date. The high level of nucleotide sequence identity between subgroups indicates that N would be the gene of choice for use as a hybridization probe for detecting RSV RNAs. This work was performed in the laboratories of Drs Brian R. Murphy and Robert M. Chanock. We thank them for their advice and comments. We thank Linda Jordan, Lori Souza, Christina Fonseca and Sandra Chang for editorial assistance. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 1546 Short communication REFERENCES /Td(ERLIND, B. & NORRBY, E. (1986). Occurrence of respiratory syncytial virus subtypes A and B strains in Sweden. Journal of Medical Virology 19, 241 247. ANDERSON, L. J., HIERHOLZER,J. C., TSOS,C., HENDRY, R. M., FERNIE, B. F., STORE, Y. & MclNTOSH,K. (1985). Antigenic characterization of respiratory syncytial virus strains with monoclonal antibodies. Journal of Infectious Diseases 151, 626-633. COATES, H. V., ALLING, D. W. & CHANOCK,R. M. (1966). A n antigenic analysis of respiratory syncytial virus isolates by a plaque reduction neutralization test. American Journal of Epidemiology 83, 299-313. COELINGH, K. L. V., WINTER, C. C. & MURPHY, B. R. (1988). Nucleotide and deduced amino acid sequence of hemagglutinin-neuraminidase genes of h u m a n type 3 parainfluenza viruses isolated from 1957 to 1983. Virology 162, 137 143. COLLINS, e. L. & WERTZ, O. W. (1983). c D N A cloning and transcriptional m a p p i n g of nine polyadenylated R N A s encoded by the genome of h u m a n respiratory syncytial virus. Proceedings of the National Academy of Sciences, U.S.A. 80, 3208-3212. COLLINS, P. L. & WERTZ, G. W. (1985). Nucleotide sequences of the 1B and 1C nonstructural protein m R N A s of h u m a n respiratory syncytial virus. Virology 143, 442-451. COLLINS, P. L., HUANG, Y. T. & WERTZ, G. W. (1984). Identification of a tenth m R N A of respiratory syncytial virus and assignment of polypeptides to the 10 viral genes. Journal of Virology 49, 572-578. COLLINS, P. L., ANDERSON,K., LANGER, S. J. & WERTZ, G. W. (1985). Correct sequence for the major nucleocapsid protein m R N A of respiratory syncytial virus. Virology 146, 69-77. COLLINS, P. L., DICKENS, L. E., BUCKLER-WHITE,A., OLMSTED,R. A., SPRIGGS, M. K., CAM_ARGO,E. & COELINGH, K. V. W. (1986). Nucleotide sequences for the gene junctions of h u m a n respiratory syncytial virus reveal distinctive features of intergenic structure and gene order. Proceedings of the National Academy of Sciences, U.S.A. 83, 4594-4598. COLLINS, P. L., OLMSTED, R. A., SPRIGGS, M. K., JOHNSON, P. R. & BUCKLER-WHITE,A. J. (1987). Gene overlap and sitespecific attenuation of transcription of the viral polymerase L gene of h u m a n respiratory syncytial virus. Proceedings of the National Academy of Sciences, U.S.A, 84, 5134-5138. DICKENS,L. E., COLLINS,P. L. & WERTZ, G. W. (1984). Transcriptional m a p p i n g of h u m a n respiratory syncytial virus. Journal of Virology 52, 364-369. GIMENEZ, H. B., HARDMAN, N., KEIR, H. M. & CASH, P. (1986). Antigenic variation between h u m a n respiratory syncytial virus isolates. Journal of General Virology 67, 863-870. HENDRY, R. M., TALIS, A. L., GODFREY, E., ANDERSON, L. J., FERNIE, B. F. & MclNTOSH, K. (1986). Concurrent circulation of antigenically distinct strains of respiratory syncytial virus during community outbreaks. Journal of lnfectious Diseases 153, 291-297. HUANG, Y. T., COLLINS, P. L. & WERTZ, G. W. (1985). Characterization of the 10 proteins of h u m a n respiratory syncytial virus: identification of a fourth envelope-associated protein. Virus Research 2, 157-173. JOHNSON, P. R. & COLLINS, P. L. (1988a). The fusion glycoproteins of h u m a n respiratory syncytial virus of subgroups A and B: sequence conservation provides a structural basis for antigenic relatedness. Journal of General Virology 69, 2623-2628. JOHNSON, P. R. & COLLINS,P. L. (1988b). The A and B subgroups of h u m a n respiratory syncytial virus: comparison of intergenic and gene-overlap sequences. Journal of General Virology 69, 2901-2906. JOHNSON, P. R., OLMSTED,R. A., PRINCE, G. A., MURPHY, B. R., ALLING, D. W., WALSH, E. E. & COLLINS, P. L. (1987a). Antigenic relatedness between the glycoproteins of h u m a n respiratory syncytial virus subgroups A and B: evaluation of the contributions of the F and G glycoproteins to immunity. Journal of Virology 61, 3163-3166. JOHNSON, P. R., SPRIGGS, M. K., OLMSTED, R. A. & COLLINS, P. L. (1987b). T h e G glycoproteins of h u m a n respiratory syncytial virus subgroups A and B: extensive sequence divergence between antigenically related proteins. Proceedings of the National Academy of Sciences, U.S.A. 84, 5625-5629. LOPEZ, J. A., VILLANUEVA, N., MELERO, J. A. & PORTELA, A. (1988). Nucleotide sequence of the fusion and phosphoprotein genes of h u m a n respiratory syncytial (RS) virus Long strain: evidence of subtype genetic heterogeneity. Virus Research 10, 249-262. MclNTOSH, K. M. & CHANOCK,R. M. (1985). Respiratory syncytial virus. In Virology, pp. 1285-1304. Edited by B. N. Fields. New York: Raven Press. MORGAN,L. A., ROUTLEDGE,E. G., WILLCOCKS,M. M., SAMSON,A. C. R., SCOTT,R. & TOMS,G. L. (1987). Strain variation of respiratory syncytial virus. Journal of General Virology 68, 2781-2788. MUFSON, M. A., 6RVELL, C., RAFNAR, B. & NORRBY, E. (1985). Two distinct subtypes of h u m a n respiratory syncytial virus. Journal of General Virology 66, 2111-2124. MUFSON,M. A., BELSHE,R. B., (3RVELL,C. & NORRBY,E. (1987). Subgroup characteristics of respiratory syncytial virus strains recovered from children with two consecutive infections. Journal of Clinical Microbiology 25, 15351539. ORVELL, C., NORRBY, E. & MUFSON, M. A. (1987). Preparation and characterization of monoclonal antibodies directed against five structural components of h u m a n respiratory syncytial virus subgroup B. Journal of General Virology 65, 3125-3135. SAKAI, Y., SUZU, S., SHIODA, T. & SHIBUTA, H. (1987). Nucelotide sequence of the bovine parainfluenza 3 virus genome: its 3' end and the genes of NP, P, C and M proteins. Nuecleic Acids Research 15, 2927-2944. STOTT, E. l., TAYLOR, G., BALL, L. A., ANDERSON, K., YOUNG, K. K.-Y., KING, A. M. Q. & WERTZ, G. W. (1987). I m m u n e Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56 Short communication 1547 and histopathological responses in animals vaccinated with recombinant vaccinia viruses that express individual genes of human respiratory syncytial virus. Journal of Virology 61, 3855-3861. SUZU, S., SAKAI, Y., SHIODA, T. & SH1BUTA, H. (1987). Nucleotide sequence of the bovine parainfluenza 3 virus genome: the genes of the F and HN glycoproteins. Nucleic Acids Research 15, 2945 2958. WALSH, E. E., BRANDRISS, M. W. & SCHLESINGER, J. J. (1987). Immunological differences between the envelope gtycoproteins of two strains of human respiratory syncytial virus. Journal of General Virology 68, 2169-2176. ZAGURSKY, R. J., BAUMEISTER,K., LOMAX, N. & BERMAN, M. L. (1985). Rapid and easy sequencing of large linear double stranded DNA a n d supercoiled plasmid DNA. Gene Analytical Techniques 2, 89-94. (Received 23 November 1988) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sat, 17 Jun 2017 06:20:56