Download The 1B (NS2), 1C (NS1) and N Proteins of Human Respiratory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metalloprotein wikipedia , lookup

Gene nomenclature wikipedia , lookup

Interactome wikipedia , lookup

Community fingerprinting wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene regulatory network wikipedia , lookup

Metabolism wikipedia , lookup

Western blot wikipedia , lookup

Expression vector wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein wikipedia , lookup

RNA-Seq wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene expression wikipedia , lookup

Gene wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Biosynthesis wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
J. gen. Virol. (1989), 70, 1539 1547. Printedin Great Britain
1539
Key words: RSV, human/nucleotide sequence/evolution
The 1B (NS2), 1C (NS1) and N Proteins of Human Respiratory Syncytial
Virus (RSV) of Antigenic Subgroups A and B: Sequence Conservation and
Divergence within RSV Genomic RNA
By P H I L I P R. J O H N S O N 1 , 2 , AND P E T E R L. C O L L I N S 1.
1Laboratory of Infectious Diseases, National Institute of Allergy and Infectious Diseases,
National Institutes of Health, Bethesda, Maryland 20892 and 2Department of Pediatrics,
Vanderbilt University, Nashville, Tennessee 37232, U.S.A.
(Accepted 3 March 1989)
SUMMARY
A 2330 nucleotide sequence spanning tlae 1B (NS2), IC (NS1) and N genes and
intergenic regions of human respiratory syncytial virus strain 18537, representing
antigenic subgroup B, was determined by sequencing cloned cDNAs of intracellular
mRNAs. Comparison with the previously reported sequences for strain A2 of subgroup
A showed that 1B, 1C and N were highly conserved at the nucleotide level (78, 78 and
86~ identity, respectively) and at the amino acid level (92, 87 and 96~o identity,
respectively). The gene-start signals were exactly conserved between subgroups, and
the gene-end signals contained only a single nucleotide substitution each in 1B and N.
In most cases intergenic and non-coding gene sequences that were not part of presumed
transcriptive signals were much less well conserved (generally 50 to 71~) than
sequences that were part of translational open reading frames (82 to 86~). The
nucleotide and deduced amino acid sequences of the N gene and protein of the Long
strain of subgroup A were determined by sequencing cDNA clones of intracellular
mRNA; the nucleotide sequence (representing all but the first 10 nucleotides of the
gene) contained 15 differences from that of the A2 strain, but the deduced amino acid
sequences were identical.
Human respiratory syncytial virus (RSV) is an important, ubiquitous cause of pediatric
respiratory tract disease (Mclntosh & Chanock, 1985). RSV is an enveloped, RNA-containing
virus that is classified in the pneumovirus genus of the paramyxovirus family. RSV genomic
RNA (vRNA) is a single negative-sense strand of approximately 15000 nucleotides which is
transcribed in a sequential, polar fashion to yield 10 major species o f m R N A (Collins et al., 1984,
1985, 1986; Collins & Wertz, 1983, 1985; Dickens et al., 1984, and references cited therein). An
additional short non-unique polyadenylated RNA is generated by transcriptional attenuation
within the L gene (Collins et al., 1987), but this species has not yet been shown to have messenger
activity. The 10 major RSV mRNAs encode 10 major proteins, namely the F and G
glycoproteins, the M and 22K (or M2) proteins of the inner surface of the viral envelope, the
small integral membrane SH (or 1A) protein, the large nucleocapsid L protein, the nucleocapsid
phosphoprotein P, the major nucleocapsid protein N, and the 1B and 1C (or NS2 and NS1)
proteins which are thought to be non-structural (Collins et al., 1984; Huang et al., 1985). The
gene order determined by sequencing v RNA is 3' 1C (NS 1)--1B ( N S 2 ) - N - P - M - S H - G - F - 2 2 K
(M2)-L 5' (Collins et al., 1986) and is the same as the order of sequential gene transcription
(Collins & Wertz, 1983; Dickens et al., 1985).
Two distinct antigenic subgroups of RSV have been identified on the basis of differences in
t Present address: Georgetown University School of Medicine and Dentistry, NIH/Twinbrook Facility,
Rockville, Maryland 20852, U.S.A.
0000-8794 © 1989 SGM
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
1540
Short communication
reactions with monoclonal and polyclonal antibodies (Coates et al., 1966; Anderson et al., 1985;
Mufson et al., 1985, 1987; Gimenez et al., 1986; Hendry et al., 1986; Akerlind & Norrby, 1986;
Johnson et al., 1987a; Morgan et al., 1987; Orvell et al., 1987). Information on the extent of
naturally occurring antigenic and structural diversity among RSV strains is important for
guiding vaccine development and might also provide additional insight into the structure,
function and evolution of RSV vRNA and its gene products. For example, the cross-subgroup
relatedness of the F and G proteins has been investigated by monoclonal antibody reactivity
(Anderson et al., 1985; Akerlind & Norrby, 1986; Mufson et al., 1985; Orvell et al., 1987), by
analysis of convalescent human and animal sera by neutralization in vitro and by F or G proteinspecific ELISA (Coates et al., 1966; Johnson et al., 1987 a), by cross-protection studies in which
animals were immunized with recombinant vaccinia viruses expressing the F or G protein
(Johnson et al., 1987a ; Stott et al., 1987) or with purified F or G protein (Walsh et al., 1987) and
by cDNA cloning and sequencing of the F and G mRNAs of the subgroup B strain 18537 for
comparison with the previously published sequences for the subgroup A strain A2 (Johnson et
al., 1987b; Johnson & Collins, 1988a, b). These results established that the F proteins of the two
subgroups were highly related antigenically (two-fold difference in antigenic reactivity) and
structurally (91 ~o amino acid sequence identity exclusive of the predicted signal peptide),
whereas the G proteins were relatively distinct (20- to 40-fold difference in antigenic reactivity
and 53 ~ amino acid identity).
The unexpectedly large amount of cross-subgroup diversity in the G protein prompted further
analysis and comparison of nucleotide and amino acid sequences for the two prototype RSV
subgroup strains A2 and 18537. Recently, we described the additional finding that the
intergenic and flanking gene regions of RSV vRNA are poorly conserved between subgroups,
compared with the relatively greater conservation of nucleotide sequences that encode protein
and the nearly exact conservation of the short gene-start and gene-end sequences that are located
at the gene termini and are thought to be polymerase recognition signals (Johnson & Collins,
1988b). In the work described in the present paper, a 2330 nucleotide sequence was determined
for intracellular mRNAs representing the 1B, IC and N genes of strain 18537, and these
sequences were compared with their counterparts for the A2 strain. Also, the nucleotide and
amino acid sequences of the N gene and protein of the Long strain of subgroup A were
determined.
As described previously (Johnson et al., 1987b), cDNA libraries were constructed using as
template mRNA isolated from HEp-2 cells that had been infected with RSV strain 18537 or
Long. cDNA clones were identified presumptively as viral by differential hybridization with
radiolabeUed cDNA synthesized by reverse transcription of mRNAs from uninfected cells or
from cells infected with the 18537 or Long strain. Virus-specific cDNAs of strain 18537 1B, 1C
and N genes and the N gene of the Long strain were identified by the homology of their
nucleotide and predicted amino acid sequences with those described for the A2 strain (Collins &
Wertz, 1985; Collins et al., 1985). The identities of these cDNAs were also confirmed by
hybridization with radiolabelled cDNAs of the 1B, 1C and N genes of strain A2.
Dideoxynucleotide sequencing of denatured plasmid DNA using synthetic oligonucleotide
primers was performed as described previously (Johnson et al., 1987b; Zagursky et al., 1985).
The sequences for the N gene and protein of the Long strain were determined from a cDNA
clone, LD35, that initiated at nucleotide 11 of the mRNA sequence and otherwise contained the
complete sequence including polyadenylate. The A2 and Long strains both represent antigenic
subgroup A, and the nucleotide sequences of the two N genes contained only 15 nucleotide
differences (Table 1). Fourteen of the 15 changes were in the third codon position, and none of
the changes resulted in a change in the encoded protein. Thus, the N proteins of the two
subgroups are predicted to be identical, at least for the isolates from which the sequences were
determined. Previously, Anderson et al. (1985) showed that an N-specific monoclonal antibody
(designated 132-7B) bound in an immunofluorescence assay to cells infected with the A2 strain
but not to cells infected with the Long strain. However, in an ELISA, the same antibody bound
to cells infected with either strain (Anderson et al., 1985). Taken together with the sequencing
data described here, this latter result suggests that the difference in reactivity observed in the
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
1541
Short communication
Table l. Nucleotide differences between the N genes of the A2 and Long strains
Nucleotide
sequence position*
r
A2
75
225
327
333
471
522
528
540
627
711
783
787
903
975
1110
C
G
T
A
C
T
n
C
C
T
G
T
T
C
C
Nucleotide identity
according to strain
~
Long
(18537)t
T
T
C
G
T
C
G
T
T
C
A
C
A
T
T
(C)
(T)
(A)
(A)
(T)
(C)
(A)
(T)
(T)
(C)
(A)
(C)
(A)
(C)
(C)
*Number of nucleotides ~omthe 5' end in the complete mRNA sequence published previously for the A2
strain (Collins et aL, 1985).
1"The 18537 and A2 strains differed at 169 nucleotides (Fig. 1, Table 2); entries here are only for those positions
where the Long strain differed from A2. At these 15 positions, none of the nucleotide differences in any strain
resulted in a change in amino acid coding assignment. For these 15 positions, strains A2 and 18537 were identical
at five nucleotides, Long and 18537 were identical at nine, and 18537 was unique at one.
immunofluorescence assay should not be interpreted as evidence of antigenic difference,
although it also is possible that the Long isolate used in that study contained one or more amino
acid differences from the one sequenced here due to intrastrain sequence variability (Collins et
al., 1984) or that a difference in another protein affected the binding of the antibody to N protein
in that particular assay.
The strain 18537 sequences were determined from the previously described (Johnson &
Collins, 1988b) 1C-1B dicistronic c D N A s C9 and G15, the 1 C - 1 B - N tricistronic c D N A B53,
and the N P discistronic c D N A s R4 and F5. Additional c D N A s sequenced for this work were
as follows. D35 and C85 initiated at nucleotides 10 and 12, respectively, of the 1C sequence and
otherwise contained the complete sequence including polyadenylate; AA75 initiated at
nucleotide 30 of the I B sequence and otherwise contained the complete sequence including
polyadenylate; 7N30 initiated at nucleotide 31 of the N sequence and otherwise contained the
complete sequence including polyadenylate. In situations where different c D N A s overlapped
the same gene region, no sequence differences were observed. The gene and encoded protein
sequences of 18537 1C, 1B, N and the upstream region of P are shown in Fig. 1 and aligned with
the previously reported A2 sequences.
The IB, 1C and N genes of strain 18537 were each identical in length to their strain A2
counterpart and shared 78, 78 and 8 6 ~ nucleotide sequence identity, respectively, with the
corresponding strain A2 gene (Fig. 1, Table 2). Consistent with previous findings, the sequences
of the translational open reading frames were more highly conserved (82, 83 and 8 6 ~ for 1B, 1C
and N) than were flanking gene sequences that did not encode protein and were not part of the
gene-start or gene-end sequences (apart from the transcriptive signals, the non-coding gene
sequences had 50 to 71 ~ identity between subgroups for 1B and 1C). The six nucleotide 5' noncoding region of the N gene immediately following the gene-start sequence was exactly
conserved between the subgroups, but the significance of this is unclear because the sequence is
short and is not found in the 5' non-coding regions of other RSV genes. The 31 non-coding
nucleotides immediately following the gene-start sequence of the IC gene were also relatively
highly conserved ( 8 4 ~ identity). It will be of interest to sequence the 3' end of v R N A , which is
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
Short communication
1542
lC
1 . 5 ~.',2
7
Start
~ G A A r T r
A ¢ c
T
TCCC TCG T
G
CS e t
T
T
A
C
T
C,ATAA~TGCTAr TTAAAXCT~CCTTTT¢~ATCA~AAATe¢C~TGCAATT¢^¢T~^~¢^TC.AT~e~r
T~^r TA¢~TTTATTT~^C~T~¢G~T^eC^T~eTT~T~CA
1~
METG•yCy•AsnSerLeuSe•MET••eLy••a•Ar•LeuG•nAsnLeuPheAspAsnAspG•uVa•A•aLeuLeuLysI•eThr
A2
18537
His
Val
AspIle
Asn
C
T
AA TA
T C T G T G
G
C
G T
T G
G G
T
T
TAT
AT
T
A
T GTTATACTGACAAATTAATTCTTCTGACC AATGCATTAGCCAAAGCAG TAATACATACAAT TAAAT TAAAC GGCATAGT TTTTATACATGTTATAACAAGCAG TGAAGT GTG CC CT GACAACAATAT T GTAGT GAAA
Cys~yrThrAspLy~LeuI~eLeuLeuThrAsnA~aLeuA~Ly~A~a~a~I~eHi•ThrI~eLy•LeuA~nG~yI1eVa~heI~•~i8Va~I~eThr$erSerG~uVa~CysPr~AspA~DAsnI
leValValLys
267
74
11537
Val
MET~T
Pro
Le~Asp
Lys
C T C
G C
T
T T
A
G A A
T
C T
C A A
C
T
C C
A
T
T CTAACTTTACAACAATGCCAATAT
TACAAAACGGAGGATACATATGGGAATTC,
d~TTGAGTTGACACACTGCTCTCAATCAA~TGGT¢TAATGGTTCATAATT G T G A A A T C ~ T T TT C T ~ C
T ~ G T ~ C TCA
S e rAlnPheThr Thr ~ T P r O IleLeuGlnAlnGlyGlyTyr I l eT~pG luLeuI l eGluLeuThr Hi •Cysser GInSerAInGIyLeU~ETValASpASnCyICIu lle LysPheSer LylAr gLeuSerAspser
405
120
A~
18537
Thr
Le u
GIu
~he
Pro
I C End
intergenic
^C
C
r
^
AT
r
C
^
AA
A A C AC
UC ^ TCAA
¢AC A ^¢CA ~
T
A ¢^¢
AC
AA
GTAATGACTAATTATATGAATCAAATAT
CTGATTTACTTGCGCTTGATCTCAATTCATGAATTATGTTTAGTCTAATTTA^TAGACATGTGTTTATCACCATTTT~GT TAATATAAA~C CTCATCAAAGGGAAA--T
V• IMETThrAsnTyr~TA,nGInI leS•rAspLeuLeu¢lyLeuA•pl~uAsnSer - * *
540
13~
1B S t a r t
A2
11537
[
~
TeA TTC GC
Asp
GA
^C C
HI8
CCAC
A
T
T
Pro-AC A---
C
GT
Leu
AC T
Thr
GA C
A
A
A
Ar gAsp
G
c
677
iGGGGCAAA~GAACTcAC~TA~TcAGT~A~AccATGAGcAcT^~^---AATc~cAA~A~TAcTATGCA~.Ac`ATTcaT¢ATcxcAGA¢AT~¢~cCTGTC~T~T~T~T~CTTcTcTcAcc~G~c^TA
ME~$er~h•~r_AsnA•pAsnThrThr~T~nAr~LeuMET~eThrAj~v~TArgPZ~Leu~erMETG~u~er~e~1eThr~e~LeuThrLysG1uI~eI~e
A~
I|$37
A2
18537
35
His
Lys
T
TC
C A G
A
G c
¢ u
C
A
A
A
A C T A
T A
T
ACAC.ACAAATTCATAT^CTTGATAAACAATC.AATGTATT
GTAAC~AACTT GATG~AAGA~AAG~YrACATT T A C A T T C T T A G T ~ T T A T ~ T ~ T A T T G C A C ~ G T A G G ~ G T A ¢ ¢ A T A T A ~ G ~ T A ¢
ThrMis LysPh• IleTyr Leul IaAI~AInGIuCy s I I eV~ iA/gLy *Le~A*pGIuAr ~ClnAlsThrPheThrPheLeuVa iAlnTyr GIUMETLy• LeULeU~il LySV* iGlySe rThr I leTyr Lys Ly•Tyr
|I$
I~
C C
T
^
C
T
cr
c
.
c T
C
G
T
T A
T
ACTGAATATAATACAAAATATCa~¢ACTTTCCC
CATGCCTATATTTATCAATCATGACGGGTTTCTACAATCTATTCd~CATTAACd:CTAC2%AAACACACTCCTATAATATA~TATGACCTCAACCCGTAAATTCCA
T h r G l u T y r A mn Th r b y * T y r G l y T h • p h *P r o ~ T# r o I 1 e P h • I i e AlrUqil a l p G l ¥ p h e L e u g l u c y * I l l G l y I l • Ly #P ~o Th f L y * H i • r h r P r o 11 • 11 e T y r h y • r y r a l p LeuA I n e • O* * *
95~
121
2B E n d
J~tergenic
W Start
A2
18537
C T T C A AT T A~C A A e TATOC T
T^ A
C ~"
CC
T
CTCaAA
A [-----'--'-'~--'~ T
Ae.-AOATATA e T O e . A ~
ACAIO~.AACTAAeC C~TCCdU~d: T ; ~ a CTATTCCTClO~ACaU~C^~TC,CTC.AAC.XGTTAAC,AAC,e.AC.CTA~TCC A T T T ' ~ r ^ A T T / O U t ~ AAAGGT^eACC¢~AT;~eATAAATa'~CG eCAA~T~CA.~AGATCC~:T
~TAII
10537
~TTAGCAAAGTCAAGTTAAATGATACATTAA~TAAGGATCA~TG~TGT~AT~CAGCAAAT&CA~T&TTCAACCT~GT&CAG~T~TA~CTCCC~TTAT~TGTGC~cAC¢T~TATGT
a
cc
c
A
A
r
C
LeuSer Ly/ValLy/LeuAa~A•pThZLeUA/nLyeAmpGInLeuLeuser$1E$IrLyITyrTh~
C
~
s•r
c
C
r
r
ile
A¢r
a
1090
~r
1221
41
ilgUlnArgS•~ThrGlyAspAsnIlaAlpThrProAsnTyrAspValGlnLylHilLeuAlnLFs~uCy•
A2
11537
Arq
C
T
A
T
r
U
U
X
A
C
A
CU
U
A
^
~TATGCTATTAATCACTed~AaATG~AATCATAAATTCA~C.GATTAAT~,GTAT G T T A T A T G C T A T G T C C A G G T T A G ~ A T ~ T A C ~ T G ~ G ~ T A T C A T G T T ~ G C T ~ T G ~
GIyMETLeULeu| leThEGluAs~AlaAmnH£JLymPheThrGlyLeu l1 •G i y ~ T ~ u T y E A I aHET 5 • rAEqLeuG lyAm qG luAmpTh r I leLysl IILeuLysAspAIaGIyTyEfl£ gym ILFIAIaAaflGIy
1366
94
A2
I|537
val
enls
~a
rhr
ix•
A~
~
c T
^
T
~
~a
a
AT
T
A
¢
A
GTAGATATA~C~%ACATAT
C~TCAAGATAT^^ATG~&~%G~4%AAT~O~%ATT¢ GAAGTATTAACATT A ~
TGACATCAV,%AATACAAGT
~ T ATT~ T A ~ T ~ A ~
CCTA~
TGCT ~
V a l A ~ p I l l T h r T h r TyrAw~jG l n A ~ p I XeA~nG1FLy sG 1 u~ET L y l P h l G l u V a l L m u T h r L e u 3 e r S e ~ L e u T h r S l r G l u I I I G I n V I I A ~ n I l e G l u I l e G I U S I r A Z q L y s S I ~ T y r L y s L y l L e u L e u L y *
1504
140
A
G A
C
C
r T
TA
A
TA
T A
G C
T
C G
A T
GAGATGGG~%GTGGCTCCAGA&TAT~TGATTCTCCAGA~TGTGGGATGATAATACTGTGTATAGCTGCACTTGTAATAAc~A~G~TA~TC~TACAGCAGT~A~GG~A
GIU/~ETGI~GIuVa 1AI aP EOGIuT~ ~A~gHI•A~pS eEP • o A s p ~ •Gl y ~ T I i• I l~LeuC~s IIeAIaAIaLeuVal I I •Th r Ly s LeuAl realaGIyASpAr qS• r G ly LeuThr A1 aVa I Il~ArqA~qAla
1642
l|~
~T
L~U
ph•
T
c
T
~
T
A
T AC
¢
C
C
C ¢
T C CT
T T
T ^
T T
AACAATGTCTT~AAAAACGA~ATAAAACGCTACAAGGGCcTCATACCAAAGGATATAG~TAACAGTTTTTATGAAGTGTTT~CA~C~T~TATA~T~G~TT~T~TCC
A~nA~ nV• 1 LeuLy sA~nG lu I laLy•AzgT~zL~GlyLeul leproLy~AspX leAl~AsnS•rPheTyrGluValPheGluLysHi~ProSisLeu
IleAspVa IPh~VIIH~ sPh~Glyl leA1sGlnSerSer
1710
232
~
11537
c
T C
~ T
~
~
T CC
~
^
A Y
Y
T
~ A
AC~u~AGGGGGT~GTAGAGTTC-%AC~d~ATCTTTG~AGGATT~TTTAT~AATGccTATGGTTCAGGu~J~%GTAATGcTAAGATGc'GGA~T~GT~TAT~T~A~e~T~A~C~G
Th rA~qGIyG lySerAz~Va IGIUGXy llep heAlaGlyLeu~ h a ~ T A ~ hal ~ Tyr G1 ySer GlyGlnVa I/~TLeuA~gT rpGlyVal LeuAl a LysSerV~ILy~A~nl i e/~TLe UGI yH£ ~AI a S• rva IGI n
1~1|
271
A2
11~37
T G
^
C A A
T T
A
C
AT A T
C~CAGAAATGGAACAAGTTGTGGAJkGTTTATGAGTATG~ACAGAAGTTGGGAGGAGAACCTGGATTCTA~CATATA~TC~T~TGT~TT~CT~T~C~T~T~C
2056
A2
11537
A~
~1537
G
C
A~a GIUMETG~UGXn V a l V a XGl u V a 1 T~EGl u T ~ EAI ^ G i n Ly • LeUGI y G l y U 1 UA1a G l y P h e T~ E H i • lleLe~L,%•nA~nProLy~sSeELeul,euSerLeuThrGlnPh~p
A2
Z1537
T
T T C
C
G G T
A A G
T A
CT~T~GGTCTAG~C-ATAATGGGAGAGTATA~AGGTACACC`AAGAAAC~TCTATATGATC`CGe~TATGCAGAGCAACTC~T~¢T~T~CTA~GTGTATT~A
L e u G l y A ~ n A l a A 1 • G I y L 4 U G 1~ i X~qETGI y G X u T y t A ~ ~ G I y T h r F r OAZgAsnG l n A • p L e u T y r A~pAIaA1 a L y ~
N End
A2
115~7
(391)
C
A
A
T
C
324
G
21%4
a T~zAlmGXUGi n LeU L y s G1 uA~nG 1 ~ a 11 l e A ~ n T y r S • r V a 1 LeUA~ p L e u
370
p Start
C A G T C a
C T T A
A ~"An ~
G ~ = = = = - - ~ =G= ~ -.
A
~
A
T
ACAGCAGAAGAATTGC*AAGCCATAAAGCATCA~=TCd~ACCC
CAAAGAAGATC.ATGTAGAGCTTTA~
TTAACAAAA~AdGG~TkAGT~T
Th~AIaGIUGIULeuGIuAlaI leLysHisGInLeuAsnProLysGXUAspAepValGlUI~U. * •
T
C
r oAsr~heSerSerV•lVal
m
A
T
C
G ~ G T ~ G ~ C C T ~ T G ~ T G ~ . .
~TGlULySPh~A~aPrOGI~heHisGIyGI~A•pAla..,
,
2330
/nt ergenl~
Fig. 1. Alignment of the nucleotide and deduced amino acid sequences of the 18537 (subgroup B) and
A2 (subgroup A) 1C, 1B and N genes and proteins, including the nucleotide sequences of the intergenic
regions and the nucleotide and encoded amino acid sequences of the upstream region of the P gene. For
the A2 sequence, only the nucleotides and amino acids that differ from the 18537 sequence are shown.
Gaps introduced into the nucleotide sequences during alignment are indicated by short dashes within
the sequences, and gaps introduced into the amino acid sequences are indicated by longer dashes. Genestart and gene-end sequences are boxed. The short N-P intergenic region is underlined with a filled
rectangle.
predicted to be immediately upstream of 1C (Collins et al., 1986; Dickens et al., 1984), and to
determine whether the conserved sequence at the start of 1C is part of a larger conserved
structure. The nine nucleotide gene-start sequences of 1B, 1 C, N and P were exactly conserved
between subgroups, and the 12 to 13 nucleotide gene-end sequences contained only a single
nucleotide substitution each in IB and N.
The deduced amino acid sequences of the strain 18537 1B, 1C and N proteins were identical in
length to their A2 counterparts and shared 92, 87 and 96 ~ sequence identity, respectively (Table
3). For the 1B protein in general, the C-terminal third of the molecule (amino acids 78 to 124)
was more highly conserved (100 ~ identity) than were the N-terminal two-thirds (amino acids 1
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
Short communication
1543
Table 2. Summary of nucleotide sequence identities between strains A2 and 18537for the 1B, 1C
and N genes
Sequence
domain
1B gene
Complete gene*
5' Non-codingt
Open reading frame:~
3' Non-coding§
1C gene
Complete gene*]l
5" Non-codingt
Open reading frames
3' Non-coding§
N gene
Complete gene*
5" Non-coding~"
Open reading frame~
3' Non-coding§
Length
(nucleotides)
Identity between
strains (~)
503
23
375
84
78
50
82
57
523
45
420
45
78
71
83
51
1203
6
1176
None
86
100
86
-
* Includes the four (in the case of A2 1C, 18537 1C and 18537 1B) or five (A2 1B, A2 N and 18537 N) 3'-terminal
A residues (mRNA-sense) that represent the vRNA coding sequences for the poly(A) tail.
t Exclusive of the exactly conserved Y-terminal nine nucleotide gene-start sequence.
:~Includes termination codon.
§ Exclusive of the exactly conserved 13 nucleotide (1C) or nearly exactly conserved 12 nucleotide (1B, N) 3'terminal gene-end sequence.
II Exclusive of the nine nucleotide gene-start sequence, which was not determined for 18537 1C.
to 77, 8 7 ~ identity) (Fig. 1). In particular, in the alignment of the 1B proteins (Fig. 1), the Nterminal 11 amino acids contained a single gap in both the A2 and the 18537 sequences (whereas
the 1C and N alignments had no gaps), suggesting that this region of 1B could tolerate some
variability in segment length and sequence. The situation for the 1C protein in general was the
reverse, with the N-terminal two-fifths (residues 1 to 56) of the molecule being relatively more
highly conserved (96~o identity) and the C-terminal three-fifths (residues 57 to 139) being
somewhat less well conserved (81 ~ identity).
Previous sequence analysis of the 1B and 1C genes of strain A2 (Collins & Wertz, 1985)
showed that an 18 nucleotide sequence spanning the end of the open reading frame (this 18
nucleotide sequence consisted of the last four codons, the stop codon and the following three
nucleotides) was neady exactly conserved (one nucleotide difference out of 18) between the two
different genes, and showed that the C-terminal four amino acids of the two predicted proteins
were identical (Collins & Wertz, 1985). Interestingly, antibodies raised against a synthetic
peptide containing the C-terminal 12 amino acids of the 1B protein reacted with both the 1C and
1B proteins in immunoprecipitation assays (unpublished results). This suggested that the
common four amino acids were part of an antigenic site in each protein and were oriented
externally within the folded proteins. However, as shown in Fig. 1, the 1C protein of strain
18537 contained a single amino acid substitution (Pro 139 to Ser 139 in 18537) within this region,
suggesting that the duplication of this sequence in the two A2 proteins might be fortuitous rather
than indicative of a common functional or structural site. Also, the C terminus of the 1C protein
was one of the most divergent regions of the molecule, which also suggested that the C-terminal
end was not part of a conserved functional or structural domain.
As shown in Table 3, the high degree of amino acid sequence identity between subgroups for
the 1B, 1C and N proteins was similar to that described previously for the F1 + F2 protein
(Johnson & Collins, 1988a) and for the cytoplasmic and transmembrane domains of the G
protein (Johnson et al., 1987b), and contrasted with the extensive divergence described
previously for the F protein signal sequence and the G protein ectodomain. Thus, different
polypeptide domains exhibited differences in the extent of cross-subgroup sequence identity,
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
Short communication
1544
Table 3. Amino acid sequence identity between strains A2 and 18537for the 1B, 1C, N, F
and G proteins
Protein
or domain
Length
(amino acids)
Identity
between strains A2
and 18537 (%)
IB
1C
N
F1 + F2*
F signalt
G ectodomain:~
G cytoplasmic
and transmembrane§
124
139
391
551
23
229
63
92
87
96
91
35
44
83
* Exclusive of the predicted signal peptide. F1 and F2 are assumed to contain amino acids 24 to 574 of the
unmodified F amino acid sequence.
~"The exact cleavage site of the F signal ~as not been determined but is assumed here to follow amino acid 23 of
the unmodified F sequence.
:~The G ectodomain has not been mapped directly but is predicted to begin after amino acid 63. The 18537
sequence is six amino acids shorter than that of A2, and in the aligned sequences the C-terminal seven amino acids
of A2 have no 18537 counterparts and are not included in these calculations.
§ Amino acids 1 to 63 of the complete G sequence.
and the boundaries between conserved and non-conserved regions were sometimes sharply
delineated. One interpretation was that the selective pressures influencing sequence divergence
and conservation were acting primarily at the protein level and were not uniform. To explore
this, we examined the open reading frames of the IB, 1C, N, F and G proteins, tabulated the
codons within each that contained single nucleotide changes, and quantified the frequency at
which the change was silent at the amino acid level. For simplicity, codons containing more than
one change were not considered. The rationale was that a protein or polypeptide domain whose
structural or functional properties were relatively intolerant of amino acid substitution would
contain a higher frequency of silent single nucleotide changes whereas a polypeptide domain
that was relatively tolerant of substitution would have a higher frequency of changes in amino
acid coding assignments.
As shown in Table 4, this analysis identified three groups: (i) the sequences encoding 1B, N
and F1 + F2, for which 87 to 100~ of the nucleotide changes were silent at the amino acid level
supporting the interpretation that the encoded proteins were relatively intolerant of amino acid
substitutions, (ii) the sequences encoding the 1C protein and G cytoplasmic and transmembrane
domains which had 75 to 7 8 ~ silent changes and (iii) the F signal sequence and G ectodomaincoding sequences which exhibited 33 to 48 ~ silent changes, consistent with the interpretation
that the encoded polypeptide domains were relatively more tolerant of amino acid substitutions.
In the case of the G ectodomain, .the comparison included both intra- (Long and A2) and inter(18537 and A2) subgroup alignments, and the frequencies of silent nucleotide changes were
approximately the same (48 ~o and 42~o, respectively) even though the sequences within the A
subgroup were much more highly related than those of subgroup A and B strains. Also, the N
gene exhibited a high percentage of silent nucleotide differences both between (90 ~ ) and within
(100~) subgroups. These results are consistent with the interpretation that the individual genes,
proteins and polypeptide domains have different intrinsic rates of sequence substitution, and
that the ability of the encoded polypeptide to tolerate substitutions is an important factor.
To place the level of sequence divergence between the RSV antigenic subgroups in
perspective, we note that the percentage amino acid identities between individual proteins of a
human and bovine strain of parainfluenza virus type 3 (PIV-3) were 8 6 ~ (NP), 6 2 ~ (P), 7 6 ~
(C), 8 0 ~ (F) and 7 7 ~ (HN) (Sakai et al., 1987; Suzu et al., 1987). The amino acid sequence
identity between the H N proteins of seven independent isolates of human PIV-3 was > 96~o
(Coelingh et al., 1988). Thus, unlike human PIV-3, human RSV exhibits a substantial amount of
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
Short communication
1545
Table 4. Differences among individual proteins (IB, 1C, N, F and G) in the apparent constraint
on amino acid sequence divergence between strains A2 and 18537: frequency of single nucleotide
differences within codons that are silent at the amino acid level
Open reading
flame
No. of codons having one or
more nucleotide differences
(total no. codons)
No. of codons having No. of codons in which the
a single nucleotide
singledifference is silent
difference
at the amino acid level (~)
A2 and 18537
1B
lC
N
FI + F2 (codons
24-574, exclusive
of signal)
F signal
(codons 1 23)
G ectodomain
(codons 64-292)
G cytoplasmic
and transmembrane
(codons 1-63)
A2 and Long
N
G ectodomain
(codons 63-298)
57 (124)
62 (137)
160 (391)
258 (551)
51
55
152
228
46 (90~)
43 (78~)
137 (90~)
199 (87~)
18 (23)
9
3 (33~)
162 (229)
91
38 (42~)
31 (63)
24
18 (75%)
16 (391)
29 (235)
16
27
16 (100K)
13 (48~)
sequence diversity. For the RSV 1B, 1C, N and F proteins, the divergence was much greater
than that observed among the human PIV-3 H N proteins but was less than that observed
between the bovine and human PIV-3 isolates.
We previously suggested (Johnson et al., 1987b; Johnson & Collins, 1988a, b) that the two
RSV subgroups represent an early stage in divergent evolution. Continued divergent evolution
might eventually result, for example, in two or more distinct human RSV types analogous to the
different types of human parainfluenza viruses. However, an alternative possibility is that the
two subgroups arose during a past episode of divergent evolution and represent relatively stable
endpoints. We cannot distinguish between these possibilities at the present time. But we note
that the high frequency of amino acid substitution per nucleotide substitution in the G protein
and gene, which was characteristic of the inter-subgroup comparison, was also characteristic of
the comparison within subgroup A of the Long and A2 strains (Table 4). The Long and A2
strains are nearly identical, having few amino acid and nucleotide sequence differences (Table
1; Johnson et al., 1987a; Lopez et al., 1988). In particular, the low frequency of nucleotide
differences in the non-coding gene regions, which are generally thought to be tolerant of
nucleotide substitutions, suggests that these viruses have undergone little divergence and
probably share a common ancestor that is very recent compared to the ancestor that gave rise to
the two subgroups. Thus, the unusual capacity of the G protein to tolerate amino acid
substitutions in the ectodomain probably is a continuing characteristic of current strains rather
than a characteristic that existed only during a past episode of divergent evolution. However,
although this suggests that current strains have the capacity for relatively extensive amino acid
substitution in G, it is not known whether the extent of divergence between the A and B
subgroups is an intermediate stage or represents the limit of divergence possible for the G
protein of human strains.
Finally, these results show that the N gene is the most highly conserved of the five genes
compared to date. The high level of nucleotide sequence identity between subgroups indicates
that N would be the gene of choice for use as a hybridization probe for detecting RSV RNAs.
This work was performed in the laboratories of Drs Brian R. Murphy and Robert M. Chanock. We thank them
for their advice and comments. We thank Linda Jordan, Lori Souza, Christina Fonseca and Sandra Chang for
editorial assistance.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
1546
Short communication
REFERENCES
/Td(ERLIND, B. & NORRBY, E. (1986). Occurrence of respiratory syncytial virus subtypes A and B strains in Sweden.
Journal of Medical Virology 19, 241 247.
ANDERSON, L. J., HIERHOLZER,J. C., TSOS,C., HENDRY, R. M., FERNIE, B. F., STORE, Y. & MclNTOSH,K. (1985). Antigenic
characterization of respiratory syncytial virus strains with monoclonal antibodies. Journal of Infectious
Diseases 151, 626-633.
COATES, H. V., ALLING, D. W. & CHANOCK,R. M. (1966). A n antigenic analysis of respiratory syncytial virus isolates
by a plaque reduction neutralization test. American Journal of Epidemiology 83, 299-313.
COELINGH, K. L. V., WINTER, C. C. & MURPHY, B. R. (1988). Nucleotide and deduced amino acid sequence of
hemagglutinin-neuraminidase genes of h u m a n type 3 parainfluenza viruses isolated from 1957 to 1983.
Virology 162, 137 143.
COLLINS, e. L. & WERTZ, O. W. (1983). c D N A cloning and transcriptional m a p p i n g of nine polyadenylated R N A s
encoded by the genome of h u m a n respiratory syncytial virus. Proceedings of the National Academy of Sciences,
U.S.A. 80, 3208-3212.
COLLINS, P. L. & WERTZ, G. W. (1985). Nucleotide sequences of the 1B and 1C nonstructural protein m R N A s of
h u m a n respiratory syncytial virus. Virology 143, 442-451.
COLLINS, P. L., HUANG, Y. T. & WERTZ, G. W. (1984). Identification of a tenth m R N A of respiratory syncytial virus
and assignment of polypeptides to the 10 viral genes. Journal of Virology 49, 572-578.
COLLINS, P. L., ANDERSON,K., LANGER, S. J. & WERTZ, G. W. (1985). Correct sequence for the major nucleocapsid
protein m R N A of respiratory syncytial virus. Virology 146, 69-77.
COLLINS, P. L., DICKENS, L. E., BUCKLER-WHITE,A., OLMSTED,R. A., SPRIGGS, M. K., CAM_ARGO,E. & COELINGH, K. V. W.
(1986). Nucleotide sequences for the gene junctions of h u m a n respiratory syncytial virus reveal distinctive
features of intergenic structure and gene order. Proceedings of the National Academy of Sciences, U.S.A. 83,
4594-4598.
COLLINS, P. L., OLMSTED, R. A., SPRIGGS, M. K., JOHNSON, P. R. & BUCKLER-WHITE,A. J. (1987). Gene overlap and sitespecific attenuation of transcription of the viral polymerase L gene of h u m a n respiratory syncytial virus.
Proceedings of the National Academy of Sciences, U.S.A, 84, 5134-5138.
DICKENS,L. E., COLLINS,P. L. & WERTZ, G. W. (1984). Transcriptional m a p p i n g of h u m a n respiratory syncytial virus.
Journal of Virology 52, 364-369.
GIMENEZ, H. B., HARDMAN, N., KEIR, H. M. & CASH, P. (1986). Antigenic variation between h u m a n respiratory
syncytial virus isolates. Journal of General Virology 67, 863-870.
HENDRY, R. M., TALIS, A. L., GODFREY, E., ANDERSON, L. J., FERNIE, B. F. & MclNTOSH, K. (1986). Concurrent
circulation of antigenically distinct strains of respiratory syncytial virus during community outbreaks. Journal
of lnfectious Diseases 153, 291-297.
HUANG, Y. T., COLLINS, P. L. & WERTZ, G. W. (1985). Characterization of the 10 proteins of h u m a n respiratory
syncytial virus: identification of a fourth envelope-associated protein. Virus Research 2, 157-173.
JOHNSON, P. R. & COLLINS, P. L. (1988a). The fusion glycoproteins of h u m a n respiratory syncytial virus of subgroups
A and B: sequence conservation provides a structural basis for antigenic relatedness. Journal of General
Virology 69, 2623-2628.
JOHNSON, P. R. & COLLINS,P. L. (1988b). The A and B subgroups of h u m a n respiratory syncytial virus: comparison
of intergenic and gene-overlap sequences. Journal of General Virology 69, 2901-2906.
JOHNSON, P. R., OLMSTED,R. A., PRINCE, G. A., MURPHY, B. R., ALLING, D. W., WALSH, E. E. & COLLINS, P. L. (1987a).
Antigenic relatedness between the glycoproteins of h u m a n respiratory syncytial virus subgroups A and B:
evaluation of the contributions of the F and G glycoproteins to immunity. Journal of Virology 61, 3163-3166.
JOHNSON, P. R., SPRIGGS, M. K., OLMSTED, R. A. & COLLINS, P. L. (1987b). T h e G glycoproteins of h u m a n respiratory
syncytial virus subgroups A and B: extensive sequence divergence between antigenically related proteins.
Proceedings of the National Academy of Sciences, U.S.A. 84, 5625-5629.
LOPEZ, J. A., VILLANUEVA, N., MELERO, J. A. & PORTELA, A. (1988). Nucleotide sequence of the fusion and
phosphoprotein genes of h u m a n respiratory syncytial (RS) virus Long strain: evidence of subtype genetic
heterogeneity. Virus Research 10, 249-262.
MclNTOSH, K. M. & CHANOCK,R. M. (1985). Respiratory syncytial virus. In Virology, pp. 1285-1304. Edited by B. N.
Fields. New York: Raven Press.
MORGAN,L. A., ROUTLEDGE,E. G., WILLCOCKS,M. M., SAMSON,A. C. R., SCOTT,R. & TOMS,G. L. (1987). Strain variation
of respiratory syncytial virus. Journal of General Virology 68, 2781-2788.
MUFSON, M. A., 6RVELL, C., RAFNAR, B. & NORRBY, E. (1985). Two distinct subtypes of h u m a n respiratory syncytial
virus. Journal of General Virology 66, 2111-2124.
MUFSON,M. A., BELSHE,R. B., (3RVELL,C. & NORRBY,E. (1987). Subgroup characteristics of respiratory syncytial virus
strains recovered from children with two consecutive infections. Journal of Clinical Microbiology 25, 15351539.
ORVELL, C., NORRBY, E. & MUFSON, M. A. (1987). Preparation and characterization of monoclonal antibodies
directed against five structural components of h u m a n respiratory syncytial virus subgroup B. Journal of
General Virology 65, 3125-3135.
SAKAI, Y., SUZU, S., SHIODA, T. & SHIBUTA, H. (1987). Nucelotide sequence of the bovine parainfluenza 3 virus
genome: its 3' end and the genes of NP, P, C and M proteins. Nuecleic Acids Research 15, 2927-2944.
STOTT, E. l., TAYLOR, G., BALL, L. A., ANDERSON, K., YOUNG, K. K.-Y., KING, A. M. Q. & WERTZ, G. W. (1987). I m m u n e
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56
Short communication
1547
and histopathological responses in animals vaccinated with recombinant vaccinia viruses that express
individual genes of human respiratory syncytial virus. Journal of Virology 61, 3855-3861.
SUZU, S., SAKAI, Y., SHIODA, T. & SH1BUTA, H. (1987). Nucleotide sequence of the bovine parainfluenza 3 virus
genome: the genes of the F and HN glycoproteins. Nucleic Acids Research 15, 2945 2958.
WALSH, E. E., BRANDRISS, M. W. & SCHLESINGER, J. J. (1987). Immunological differences between the envelope
gtycoproteins of two strains of human respiratory syncytial virus. Journal of General Virology 68, 2169-2176.
ZAGURSKY, R. J., BAUMEISTER,K., LOMAX, N. & BERMAN, M. L. (1985). Rapid and easy sequencing of large linear
double stranded DNA a n d supercoiled plasmid DNA. Gene Analytical Techniques 2, 89-94.
(Received 23 November 1988)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sat, 17 Jun 2017 06:20:56