Download The nucleotide sequence of the gene encoding the attachment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene therapy wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Molecular ecology wikipedia , lookup

Genetic code wikipedia , lookup

Proteolysis wikipedia , lookup

Gene desert wikipedia , lookup

Epitranscriptome wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Expression vector wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Journal o f General Virology (1991), 72, 443-447.
443
Printed in Great Britain
The nucleotide sequence of the gene encoding the attachment protein H of
canine distemper virus
M. D. Curran, D. K. Clarke t and B. K. Rima*
Division of Genetic Engineering, School of Biology and Biochemistry, The Queen's University of Belfast,
Belfast BT9 7BL, U.K.
The sequence of the H gene and flanking sequences in
the F and L genes of canine distemper virus (CDV)
have been determined. The H gene of CDV (1946
nucleotides) contains one large open reading frame
starting at position 21 and terminating at position
1835, encoding a protein of 604 amino acid residues.
This protein contains three potential glycosylation
sites in the extracellular domain and, like all other
paramyxoviruses, a N-terminal membrane-spanning
hydrophobic anchor domain. The deduced H protein
sequence shows an identity of 36% with rinderpest
virus (RPV) and measles virus (MV). The identities at
the nucleotide level are higher (RPV 52% and MV
53 %). The amino acid sequence shows conservation of
all the structural determinants with the H proteins of
MV and RPV. The data also show that CDV is
evolutionarily equidistant to RPV and MV with respect
to the H gene.
Canine distemper virus (CDV) belongs to the morbillivirus subgroup of the Paramyxoviridae and is a nonsegmented negative-stranded enveloped RNA virus.
Other established members of the group include measles
virus (MV), rinderpest virus (RPV) and peste-des-petits
ruminants virus. Recently, a fifth member has been
proposed, phocine distemper virus (PDV), responsible
for distemper in seals.
It is now well established that morbillivirus virions
contain six proteins (Rima, 1983): the nucleocapsid (N)
protei n, the phosphoprotein (P), the large (L) protein, the
matrix protein (M) and two integral membrane proteins,
namely the fusion protein (F) and the attachment protein
haemagglutinin (H) which in MV carries the haemagglutinating activity. These proteins display varying levels
of serological cross-reactivity among the individual
members (Sheshberadaran et aL, 1986).
The morbillivirus genome is a single-stranded negative-sense RNA of 15 to 16 kb in length (Barrett et al.,
1991) and is organized into six transcription units or
genes encoding the N, P, M, F, H and L proteins
separated by almost totally conserved intergenic trinucleotides and preceded and followed by small leader
and trailer sequences. In contrast to MV, where the
complete nucleotide sequence of the genome is known,
much of the CDV genome remains to be sequenced. To
date, almost the entire N, P and M genes and the
complete F gene of CDV have been sequenced (Rozenblatt et al., 1985; Bellini et al., 1986; Barrett et al., 1985,
1987). We report here the nucleotide sequence of the H
gene of the Onderstepoort strain of CDV and compare
the predicted H protein sequence to those published for
MV (Alkhatib & Briedis, 1986; Gerald et al., 1986;
Cattaneo et al., 1989) and RPV (Tsukiyama et al., 1987;
Yamanaka et al., 1988).
An H gene-specific cDNA clone pCDV54 derived
from reverse-transcribed oligo(dT)-primed poly(A)÷
RNA extracted from CDV-infected cells (Russell et al.,
1985) was sequenced by the dideoxynucleotide chaintermination method (Sanger et al., 1977). Consistent
with the strategy used to synthesize it, the insert sequence
of pCDV54 contained a stretch of adenine residues at
one end positioning the sequence at the 3' end of the H
mRNA. Two additional H gene-specific clones pCDV9
and pCDV815, which cross-hybridized with pCDV54
and were isolated from the genome of the Onderstepoort
strain of CDV as described earlier (Rima et al., 1986),
were also sequenced. The entire sequence determined for
pCDV9 fell within the sequence determined for
pCDV54; no differences were observed between them.
The sequence of clone pCDV815 overlapped with the
end of the pCDV54 sequence and extended 212
nucleotides into the L gene.
Since the entire merged sequence of these three cDNA
clones extended only 650 nucleotides from the poly(A)
t Present address: Department of Biology, University of California
at San Diego, La Jolla, California 92093, U.S.A.
The nucleotide sequence in this paper will appear in the DDBJ,
EMBL and GenBank nucleotide sequence database under accession
number D00758: morbillivirus H gene.
0000-9837 O 1991 SGM
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Thu, 03 Aug 2017 16:44:36
444
Short communication
tail of the H mRNA towards the 5' end and since no
clones overlapping with pCDV54 and extending into the
F gene had been found in our (and others) cDNA
libraries generated from genomic RNA, the polymerase
chain reaction (PCR) was employed to isolate cDNAs of
the remaining H gene sequence. Two oligonucleotide
primers specific to the F gene sequence and the H gene
sequence of pCDV54, both engineered with EcoRI sites,
were used to prime total RNA extracted from CDVinfected Vero cells for reverse transcription and PCR.
The first primer is derived from nucleotides 2073 to 2094
of the F mRNA (Barrett et al., 1987). The sequence of the
second primer was derived from the genomic strand
sequence of pCDV54 (nucleotides 1324 to 1301 in Fig. 1).
After confirmation of its estimated size (about 1400
nucleotides) and H/F gene specificity by Northern blot
analysis, the amplified product was digested with EcoRI,
subcloned into M13tgl30/Bluescript plasmids and
sequenced.
All the sequences (existing cDNA clones and the PCR
product) were merged to give the complete sequence of
the end of the F gene, the H gene and the beginning of
the L gene of CDV (see Fig. 1). The numbering in this
sequence begins at the conserved gene start sequence in
the H gene and ends at position 1946 of the consensus 3'terminal sequence. Genes of similar size have been
described for MV (1954 nucleotides; Alkhatib & Briedis,
1986) and RPV (1953 nucleotides; Tsukiyama et al.,
1987). When compared to the H gene sequences of MV
and RPV, CDV displays a similar identity level (53~o
and 5 2 ~ respectively), whereas the identity between MV
and RPV is 64~. A comparison between the F gene
sequence determined here and the published sequence
(Barrett et al., 1987) revealed two differences, one at
position 2140 (C to 1")and the other at position 2191 (C to
TC). Since the sequence in question represents the 3'
untranslated region of the F mRNA of CDV this
variability is not surprising.
At the end of the sequence there are 212 nucleotides of
the L gene of CDV. The open reading frame (ORF)
encoding the L protein starts at the same position as that
of MV and the sequence of the first 63 residues is 75 ~o
identical to that of MV.
As in MV and RPV the major ORF of the H mRNA
sequence starts at position 21 with the first AUG codon
in a favourable context for translation initiation (Kozak,
1986). This ORF extends to a termination codon (TAA)
at position 1835. The 3' non-translated region is 109
nucleotides in length. It varies in length between the
morbilliviruses. Stable stem-loop structures were not
found in either the 5' or 3' untranslated sequences.
The gene boundary sequences of paramyxoviruses are
thought to act as transcriptional signals, directing the
polyadenylation of the newly transcribed mRNA and the
CGC2'I'GA'FL'GCCAGGTTTGA~TCTA~GAC
80 ACT G AC C A T ~ A ~ A~C G~ G A ~ A A A G
CCCGCC C ~ T T T T C T T C ~ G T C A C T C A A C T G C ~ T A A A C A T C G G A A A
A~ACTT
1
20
40
.
60
.
AGGGCTCAGGTAGTCCAGC~TGCTCCCCTACCAAGAC~GGTGGGTGCCTTCTACAAGGATAATGC~GAGCCAATTCA
M L P
Q D K V G A F Y K D N A R A N S
80
i00
120
140
ACC~GCTGTCC~AGTGACAG~GGACATGGGGGCAGGAGACCACCTTATTTG~GTTTGTCC~CTCATCTTA~GGT
T K L
L V T ~ &
G ~ R R P P
L L F V L L
160
L L V
160
200
220
TGGTATCCTGGCC~GCTTGCTATCACTGGAGTTCGAT~CACCAAGTATC~CTAGTAATATGG~T~AGCAGA~GC
G I L A L L A I T G V R F H Q V S T S N M E F S R L
240
260
280
300
TGAAAGAGGATATGGAGA~TCAGAGGCCGTACATCACC~GTCATAGATGTCTTGACACCGCTC~C~GATTA~GGA
L ~ E n M E K S S A V H H ~ V I D V L T P L F K I I G
320
340
360
380
GATGAGA~GGG~ACGG~GCCACAAAAGCTAAACGAGATCAAAC~T~A[CCTTC~%J%AAGAC~A~TTC~C~TCC
D E I
L R L P Q
L N E I K Q
L Q K T N F F N P
400
420
440
460
GAACAGAG~TTCGAC~CCGCGATCTCCACTGGTGCATTAACCCGCCTAGTACGGTC~GGTG~TT~ACT~ACT
N ~ E F D F K D L ~ C I ~ P P S T V K ~ N
480
T N ¥
500
520
540
GTGAGTCAA~GGGATUAGA~GCTA~GCATCGGCAGCA~TCCTATCCTT~ATCAGCCCTATCTGGGGGCAGAGGT
C E ~ I G I R K A I A S A A N P
L L S A L S
560
G R G
•
580
600
620
GACATA~CCCACCACACAGATGCAGTGGAGCTACTAC~CAGTAGGCA~GTCfTTCCCCCTATCAGTCTCA~ATCCAT
D I F P F ~ R C S G K T T S V G K V F P L S V S L S M
640
660
680
.
700
•
GTC~EGATCTC~G~CCTCAGAGGT~TC~TATGCTGACCGCTATCTCAGACGOCGTGTATGGCAAAAC~AC~GC
5 L I S R T S E V
N M L T A I S D G V Y G
720
T Y L
•
740
760
780
TAGTGCCTGATGATATAGAAAGAGAGTTCGACACTCGAGAGA~CGAGTC~TGA~TAGGGTTCATCAAAAGGTGGCTG
L V ~ D D I E R E F
T R E I R
F E I G F I ~ R W L
800
820
840
860
AATGACATGCCA~TACTCCAAAC~CC~CTATATGGTACTCCCG~G~ccA~GCC~GGTATGTACTATAGCAGT
M D M
L L Q T T N y M V L P K
S K A K V C T I A V
880
900
920
940
GGGTGAG~GACACTGGCTTCC~GTGTGTAG~GAGAGCACTGTATTA~ATATCATGACAGCAGTGG~CAC~GATG
GE
T L A S L C
E E S T V L
Y H D S S
960
S Q D
980
I000
1020
GTA~CTAGTAGTGACACTGGGGATA'~TTGGGCAACACCTATGGATCACA~GAGG~GTGATACCTGTCGCTCACCCA
G I L
V T L G I F W A T P M D
E E V I P V A H P
1040
1060
1080
1100
TC~TG~GAAAATACATATAACAAACCACCGTGGT~TAT~GA~C~GC~CCTGGATGGTGCCTGCCCTGGC
S M K
I H I T N
R G F I K D
I A T W M V
1140
1160
1180
CTCTGAGA~C~G~G~CA~GG~GTCTGGAGTCAGCTTGTCAAAGAPJ~AACCTACCCCATGTGC~CC~GCGT
SE
Q E E Q K G
L E S A C Q
K T Y P M
1120
A L A
1200
N Q A
1220
1240
1260
CATGGGAACCC~CGGAGG~GACAG~GCCATC~ATGGGCGG~GACA~ACCTCTAGATGC~GTG~SACC~C~
5 W E
F G G R Q L P S y G R L
L P L D A S V D L Q
1280
1300
.
1320
1340
1360
CTT~CATATCG~CAUATACGGTCCGGTTATACTGAATGGAGATGGTATGGA~ATTATGAAAGCCCACT'FI'TGAACTC
L N I
F T Y G P V I L N G D G M D Y Y E S P
L N S
1380
"
1400
1420
CGGATGGC~ACCATTCCCCCUA~GACGG~C~TCTCTGGA~GATA~CA~SCAGGTAGASGAGACCAS~CACTG
G W L T 3 P P K D G T I S G L I N K A G R S
1440
Q F T
•
1460
1480
1500
TACTCCCCCATGTG~CATTTGCGCCCAGGGAATC~GTGOAAATTG~ATR'TACCTA~CAAACATCTCAAA~AGA
V L P H V L T F A P
~ S S G N
~ L P I Q T
1540
1560
1580
GATAGAGATGTCCTCA~GAGTCCAATATAG~GTGTTGCCTACACA~AGTA~AGATAT~CATAGC~CdTATGACAT
D R D
L I E S N
V V L P T Q S I R Y V I A
•
1620
,
1640
1660
ATCACG~GTGATCATGCTA~GTTTA~ATGT~'fATGACCC~TCCGGACGA~TTC~ATACGCACCCA~TAGACT~
S R S D H A I V Y Y V Y D P I R
I S Y T H ~ F R L
1700
.
1720
1740
CTACC~GTAGACCTGATTTCCT~GGA~G~TG~TTGTGTGG~ATGAC~TTTGTGGTGTCACC~TTTTACAGA
T T K G E P D F L R I E C F V W
D N L W C H
1520
Q I R
1600
Y D I
1680
1760
F Y R
1780
.
1800
1820
~CGAGGCTGACATCGCC~CTCTAC~CCAGTGTTGAG~TTTAGTCCGTAT~GA~CTCATGT~CCG~%AATCC
F E A D I A N S T T S V E N L V
I R F S C N R *
1840
1860
.
1880
1900
CTGACAGTATGATGATACACATCTC~TTGGCCTTAGGCATGAT~CTGCGGTGAGAAATCCC~ACAGACGA~G~
1920
•
1940
A~CCATCTCTAGCA~ATAAAAAAACTA
AGGATCC~GATCC~AGCCATGGACTCTGTATCAGTGAACCAGATTCTATACCCTGASGTCCATCTAGATAGCCC~
M D S V S V N Q I L Y P E V H L D S P
TTGT~CC~TAAGCTAGTATCTATTTTAG~TACGCACG~AGACATAACTATCAGCTCC~GATAC~GA~AGTG
I V T N K L V S I L E Y A R I K K N Y Q L L D T R L V
CGT~TATCA~SAGAG~T~CAGAAGGG~CTCA~CCAGATGATCA~A
R N I K E R I S E G F S N Q M I I
Fig. 1. The nucleotidesequenceof the end of the F gene, the H gene
and the startof the L geneof CDV. The positiveantigenomicsequence
is displayed in its DNA formas determinedfromcDNA clones and
PCR products.
start of transcription of the next. Nucleotide sequence
comparison has revealed a high degree of conservation
within and between individual paramyxovirus species in
these regions. In contrast to MV, where all the gene
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Thu, 03 Aug 2017 16:44:36
Short communication
Polyadenylation
signal sequence
Intergenic mRNA start
sequence
iAn
i
Leader
MV
(AGTGCA)
CTT
AGGATTCAAGA
Gene N
CDV
MV
PDV
RPV
ATTATAn
GTTATAAAAAA
ATTATAAAAAA
ATTATAn
CTT
CTT
---
AGGAACCAGGT
AGGACCCAGGT
CDV
MV
ATTATAn
ATTATAAAAA
CTT
AGGAGCAAAGT
Gene M
CDV
MV
RPL
ATTAATCAAAA
ACTAAACAAAA
CTT
CTT
AGGGTCCAGGA
AGGGCCAAGGA
AGGGCCAAAGA
Gene F
Gene F
CDV
MV
RPL
RPK
ATTAAAGAAAA
ATTAAAA
ATTGCTACAAAGA.
ATTGTTATAAAGA,
CTT
CTT
AGGGCTCAGGT
AGGGTGCAAGA
AGGATGCAAGA
Gene H
Gene H
CDV
MV
RPL
RPK
ATTATAAAAAAA
ATTAAGAA
ATTATA.
ATTATAo
CTA
CGT
AGGATCCAAGA
AGGGTCCAAGT
Gene L
Gene L
MV
ATTAAAGAAAA
CTT
(TGAAAATA)
Trailer
Gene P
TA
MV
ATANA4~
Sendai virus
ANTAAGA 5
Parainfluenza type
3 virus
TA
AaATaNAs~
445
Gene N
Gene P
Gene M
A
C
A
A
CtT
AGGGNNAAcGT
Ctt
AGGtNAAaG
CTt
AGGANaAAG
Fig. 2 The gene end, intergenic sequences and gene start sequences of some paramyxoviruses. The data for Sendai virus were from
Gupta & Kingsbury (1984), those of MV from Cattaneo et al. (1987) and for parainfluenza virus type 3 from Spriggs & Collins (1986).
The data for all known morbillivirus genes are included. Capital letters indicate conserved nucleotides; small letters indicate
conservation in the large majority of cases.
boundary sequences are known, only the M/F boundary has been described for CDV (Barrett et al., 1987).
The 3' end sequences of the N and P mRNAs have also
been reported (Rozenblatt et al., 1985; Barrett et al.,
1985). Here we report two more gene boundary sequences of CDV for the F/H and H/L genes. These are
shown in Fig. 2 together with the consensus gene
boundary sequences of MV (Cattaneo et al., 1987),
Sendai virus (Gupta & Kingsbury, 1984) and human
parainfiuenza virus type 3 (Spriggs & Collins, 1986). As
expected the gene boundary sequences of CDV are also
highly conserved and display striking similarity with the
consensus sequences shown in Fig. 2.
In contrast to the sequence reported by Barrett et al.
(1987) for the polyadenylation signal (ATTATAn) our
data identified a G residue which interrupts the string of
adenine residues (Fig. 2). A re-examination of the
mRNA-derived cDNA sequence also revealed the presence of the G residue, confirming that the sequence
presented here represents both the antigenomic as well as
the m R N A sequence.
It is interesting to note that as with MV and Sendai
virus, the conserved intergenic trinucleotide C U U (in
the positive antigenome sense) in CDV is altered in the
H/L boundary (CUU to CUA). One could postulate that
this enables the polymerase to attenuate differentially at
this intergenic sequence, explaining the extreme level of
attenuation of transcription observed at the H/L intergenic boundary of MV (Cattaneo et al., 1987). The
completion of the remaining CDV gene boundary
sequences should allow a more comprehensive comparison with its paramyxovirus relatives.
The deduced amino acid sequence of the H protein of
CDV contains 604 amino acid residues and is depicted
beneath the sequence in Fig. 1. Since glycosylation is
known to affect Mr estimates on SDS-PAGE, increasing
the estimate by 2000 to 3000 per oligosaccharide chain
(Keil et al., 1979; Horisberger et al., 1980), the
calculated M~ of the translation product (67996) compares well with the SDS-PAGE estimate of the H
protein of CDV at 76K (Rima, 1983) if one takes into
account the predicted glycosylation sites in the H protein
sequence (see below).
Considering the absence of other ORFs, the favourable context of the initiation codon and the identities
found with the other morbillivirus H proteins, it is
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Thu, 03 Aug 2017 16:44:36
446
Short communication
MCON
RPL
CDV
MSPQP~DRINAFYKDNPHPKGSRI
VINREHLMI DR - PYVLLAVL FVMFL S L I GLLAI AG I RL
MS SPP~DRVNAFYKDNLQFK/~TRWLNKEQLL
I ER - PYMLLAVLFVMFLSLVGLLA
I AGI RL
MLPYQDKVGAFyKDNARANSTKLSLVTEGHGGRRPPY
- L LFVLL I LLVGI LALLA I TGVRF
*
*__ ******
....
*
* **_**
*~ .....
**** *-*
MCON
RPL
CDV
HRAAI YTAE I HKS L STNLDVTNS I EHQVKDVLTPLFK
I I GDEVGLRTPQRFTDLVKF
I $D 120
HRAAVNTA~
I N SGLTT S I DI TKS I EYQVKDVLTPLFK
I I GDEVGLRTPQRFTDLTKF
I SD 120
HQVSTSNMEFSRLLKEDMEK$
EAVHHQVI DVLTPLFK I I GDEI GLRLPQKLNE
I KQF I LQ 120
* _
_*
*
. . . . .
** * * ~ * * * * * * * * * * ~ * * *
**_
_.
** !
!
KI KFLN~DREYDFRDLTWC
I NPPERI KLDYDQYCADVAAEEL~4NALVNSTLLETRTTNQF
180
KI KFLNPDKEYDFRD
I NWC I ZPPERI KI NYDQYCAHTAAEEL
i TMLVNS S LAGTSVLPTS
180
KTNFFNPNREFDFRDLHWC
I NPpSTVKVNFTNYCES
i G I R K A I A S A A N P I L L S A L S GCJRG 1 8 0
*
* **__*_****_
*** **
-* .... **
. . . . .
*
*_
,
l
T
,
LAVSKGNC 5GPTTI RGQFSNMSLS
LLDLYLGRGYNV5
S IVTMT 5QGMYGGTYLVEKP_NLS
240
LVNLGRSCTGSTTTKGQFSNMSLALS
GI YS GRGYN I SSMI TI TEKGMYGSTYLVGK
HNQG
240
DIFPPHRCSGATTSVGKVFPLSVSLSMSLISRTSEVINMLTAISDG'~fGKTYLLVPDDIE
240
*.* **
*
_*_ *
*
....
**_** ***_
MCON
RPL
CDV
MCON
RPL
CDV
60
60
60
MCON
RPL
CDV
S KRS E LSQLSMYRVFEVGVI
RNPGL GAPVFHMTNYLEQPVS
ND LSNC~V[VALGELKLAAL C
ARRPSTAWQRDYRVFEVGI
I RELGLGTPVFHMTNYLELPRQP
ELE I CMLALGEFKLAALC
-REFDT- - -REIRVFEIGFIKRWLNDMPLLQTTNY~LPK~WSKAKVCTI
AVGELTLASLC
****_*_*_
*_
****
* _*_**
* * **
MCON
RPL
CDV
HGED51 TI PYQGSGKGVSFQLVKLGV
-WKS PTDMQSWVPLSTDDPVi
DRLYLS Z HRGVI AD
LADNSVALHYGGLRDDHK
i RFVKLGV -WP5 PADS DTLATL S AVDPTLDGLYI
TTHRGI I AA
VEESTVLLYHDS
SGS QDGI LVVTLG I FWATPMDHI EEVI PVAH- P SMKKI H i TNHRGF I KD
.....
* * * - * _* *
* .....
*** *
MCON
RPL
CDV
NQAKWAVPTTRTDDKLRMETCFQQACKGKI
QALC ENPEWAP LKDNRI P SYGVLSVDLSLT
GKAVWVVpVTRTDDQRKMGQCRREACREKPPPFCNSTDWEPLEAGRI
PAYGI LTI RLGLA
S I ATW~PALASEKQEEQKGCLESACQRKTYPMCNQASWEPFGGRQLPSYGRLTLPLDAS
* *_**
__
_
*
**- *
*_
* *
.* * * - * _ _
* -
420
420
416
MCON
VELKI KI ASGFGPL I THGSGMDLYKSNHNNVYWLTI
PPMKNLALGVINTLEWI
PRFKVSP
DKLKLTI I 5EFGPL I THDSGMDLYTPLDGNEYWLTI
PPLQNSALGTVNTLVLEPSLKI
SP
%rDLQLNI SFTYGPVI LNGDGMDYYES
PLLNS GWLT I PPKDGT I SGL I NKAGRGDQFTVLP
* _ *
_**_*
*** *
*
******
_ * _* _
_ *
480
480
476
MCON
RPL
CDV
YLFTVpI KZAGEDCHAPT~fLPAEVDGDVKLSSNLVI
LPGQDLQIrVLATYDTSRVEHAVVY
N I LTLP I RSGGGDCYTPTYLSDLADDDVKLSSNLVI
LPSRNLQYVSATYDTSRVEHAIVX
HVLTFAPRES
SGNCYLPI QTSQIRDRDVLI
ESNIWLPTQS
IRYVIATYD I S RSDHAI VY
_ *
_
_*_ *
* ** _ **___**
- ** * * * * * * _ * * _ * *
540
540
536
MCON
RPL
CDV
YVYS P S RS F SYFYPFRLPI
KGVP I ELQVE CFTWDQKLWC~FC
- VLADSESGGHITHSGM
yIYS AGRLS SY~PVKLP
I KGDPVS LQ I GCFPWGLKLWCHHFC
- SVI DSGTRKQVTHTGA
YVYDp I RTI SYTHPFRLTTKGRPDFLRI
ECFVWDDNLWCHQFYRFEADI
AN - STTSVENL
*-*
*
** -* _*
** ,
* _ ** ,
***
*
_,
_
599
599
595
MCON
RPL
CDV
VGMGVSCTVTREDGTNRR
VGIEITC ........ NSR
VR~RFSC ........ NR
* _
_*
,
RPL
CDV
300
300
296
360
360
356
617
609
604
Fig. 3. Alignmentof the sequencesof the H proteinsof MV, RPV and
CDV. The sequence data for RPV (lapinized strain RPL) are from
Tsukiyamaet al. (1987) and those for MV are a consensussequence
fromCattaneoet al. (1989). (*), Residuesconservedin all threeviruses;
(-), residuesfunctionallyconservedin all three viruses; (!), potential
glycosylationsite.
proposed that the primary sequence of the ORF shown in
Fig. 1 is that of the H protein of CDV. Interestingly, an
ORF encoding a potential product of 70 amino acid
residues containing a hydrophobic domain of 16 residues
and two potential glycosylation sites has previously been
identified near the 3' end of the H mRNA sequence of
MV (Gerald et al., 1986). In the case of RPV (Tsukiyama
et al., 1987; Yamanaka et al., 1988) and CDV, no
counterpart to this ORF was found, ruling out the
possibility of a common function of this as yet
unidentified product in morbillivirus-infected cells p e r
se. In common with the other Paramyxoviridae H
proteins examined to date (Morrison, 1988), the CDV H
protein appears to be a class II glycoprotein as the only
hydrophobic domain large enough to span the lipid
membrane is located near the N terminus of the protein
(amino acids 35 to 55). This domain is thought to act both
as a signal sequence for membrane transport and as the
anchor of paramyxovirus H (N) proteins (Morrison,
1988). Three potential sites for N-linked glycosylation
were found at amino acid positions 149, 422 and 587 in
agreement with the above mentioned prediction. An-
other potential glycosylation site is located at residue 19,
but this is unlikely to be used, since it has been proposed
that this sequence is in the cytoplasmic domain of the H
glycoprotein molecule (Morrison, 1988).
To facilitate a comparative analysis between the
morbillivirus H glycoprotein primary sequences, the H
proteins of MV, CDV and RPV were aligned and
compared (Fig. 3). Gaps had to be inserted in order to
maximize the alignment of the CDV H sequence with
those of MV and RPV. Most of these were single residue
omissions indicating sporadic insertion and subsequent
deletion downstream of triplet codons in the H gene
sequence. However, two alterations were more substantial. There was a gap of three residues at positions 246 to
248 of the MV and RPV sequence and an eight residue
gap at the C terminus, explaining the larger size of the
MV H primary sequence.
The CDV H protein sequence shows an overall
identity of 36~ with the H protein sequences of RPV and
MV, whereas the identity between the latter two is 60~.
Substantial stretches of three-way matched residues are
scattered throughout the sequence, notably between
residues 87 and 109 and between 519 and 535.
Conservative replacements are also in abundance (Fig.
3). The areas of perfect identity probably play an
important role in the structure and function of the H
glycoprotein. Additionally, all the cysteine residues in
the CDV sequence are matched with the sequences of
both MY and RP¥. However, one of the matched
cysteine residues in MV and RPV (at position 583) is
absent in the CDV sequence. Of the 32 proline residues
present in the CDV H sequence, 21 are present in
identical positions in all three sequences. It is thus
obvious that the morbillivirus H proteins have a similar
conformation. However, when one compares the levels
of identity observed in the H proteins with those
observed between the other viral proteins of CDV and
MV (N, 66~; P, 44~; C, 44~; M, 76~; F, 66~), it
appears that considerable sequence divergence in the H
proteins exists. It is interesting to note that the
divergence is not as extreme at the nucleotide level.
These data also reinforce the existing view (based on
the sequences of the F genes of these three viruses), that
MV is more closely related to RPV than to CDV. This
divergence is further amplified when one considers the
potential glycosylation sites. Three of the five potential
glycosylation sites in the MV sequence occur in identical
positions in the RPV sequence, whereas in the case of
CDV its three potential glycosylation sites (Fig. 3) occur
in different positions to those of MV and RPV. No doubt
these differences together with the sequence divergence
may result in significant changes in the tertiary structure
of the CDV H protein with respect to MV and RPV and
therefore the antigenic determinants.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Thu, 03 Aug 2017 16:44:36
Short communication
Although immunological and sequence data have
clearly shown that the H protein is the most variable of
the morbillivirus proteins, the level of variability
observed with respect to MV and RPV is considerably
higher than anticipated, especially from the immunological studies carried out to date (Norrby et al., 1985;
Sheshberadaran et al., 1986). A more comprehensive
comparison between all the morbillivirus H proteins will
throw some light on the significance of this high degree
of variability between CDV and MV/RPV. This may
reflect the wide host range of CDV.
In summary, determination of the H gene sequence of
CDV has indicated that there is a low level of identity
between CDV and the other two morbilliviruses. The
data on both the H and F proteins indicate that CDV is
equidistant from MV and RPV and therefore this does
not provide evidence for or against the suggestion that
RPV is the archetypal virus in this group (Norrby et al.,
1985). It will be interesting to compare the sequence of
the H glycoprotein of PDV with the one reported here
and work is in progress to complete the sequence of
PDV. The availability of the primary sequence for t h e H
protein of CDV may now allow studies to determine
antigenic determinants of the CDV H protein.
We thank the Department of Education for Northern Ireland for
studentship support to D.K.C., and the U.K. Medical Research
Council for support under grant number 8604630CA.
References
ALKHATIB,G. & BRIEDIS, D. J. (1986). The predicted structure of the
measles virus hemagglutinin. Virology 150, 479-490.
BARRETT,T., SI-IRIMPTON,S. B. & RUSSELL,S. E. H. (1985). Nucleotide
sequence of the entire protein coding region of canine distemper
virus polymerase-associated (P) protein mRNA. Virus Research 3,
367-372.
BARRE'rI',T., CLARKE,D. K., EVANS,S. A. & RIMA, B. K. (1987). The
nucleotide sequence of the gene encoding the F protein of canine
distemper virus: a comparison of the deduced amino acid sequence
with other paramyxoviruses. Virus Research 8, 373-386.
BARRETT, T., SUBBARAO,S. M., BELSHAM,G. J. & MArlY, B. W. J.
(1991). The molecular biology of the morbilliviruses. In The
Paramyxoviruses, pp. 83-102. Edited by D. W. Kingsbury. New
York & London: Plenum Press.
BELLINI, W. J., ENGLUND,G., RICHARDSON,C. D., ROZENBLATr,S. &
LAZZARINI,R. A. (1986). Matrix genes of measles virus and canine
distemper virus: cloning, nucleotide sequences and deduced amino
acid sequences. Journal of Virology 58, 408-416.
CATTANEO,R., REBMANN,G., SCHMID,A., BACZKO,K., TER MEULEN,
V. & BILLETER, M. A. (1987). Altered transcription of a defective
measles virus genome derived from a diseased human brain. EMBO
Journal 6, 681-688.
447
CATTANEO,R., SCHMID,A, SPIELHOFER,P., KAELIN,K., BACZKO,K.,
TER MEULEN,V., PARDOWlTZ,J., ELANAGAN,S., RIMA,B. K., UDEM,
S. A. & BILLETER,M. A. (1989). Mutated and hypermutated genes of
persistent measles viruses which caused lethal human brain diseases.
Virology 173, 415-425.
GERALD, C., BUCKLAND,R., BARKER,R., FREEMAN,G. & WILD, T. F.
(1986). Measles virus haemagglutinin gene: cloning, complete
nucleotide sequence analysis and expression in COS cells. Journal of
General Virology 67, 2695-2703.
GUPTA, K. C. & KINGSBURY,D. W. (1984). Complete sequences of the
intergenic and mRNA start signals in the Sendal virus genome:
homologies with the genome of VSV. Nucleic Acids Research 12,
3829-3841.
HORISBERGER,M. A., DESTRITZ,C. & CONTENT,J. (1980). Intracellular
glycosylation of influenza hemagglutinin: the effect of glucosamine.
Archives of Virology 64, 9-16.
KEIL, W., KLENK, H. D. & SCHWARZ,R. T. (1979). Carbohydrates of
influenza virus. Ill. Nature of oligosaccharide protein linkage in
viral glycoproteins. Journal of Virology 31, 253-256.
KOZ~K, M. (1986). Point mutations define a sequence flanking the
AUG initiator codon that modulate translation by eukaryotic
ribosomes. Cell 44, 283-292.
MORRISON, T. G. (1988). Structure, function and intracellular
processing of paramyxovirus membrane proteins. Virus Research 10,
113-136.
NORRBY,E., SHESHBERADARAN,H., MCCULLOUGH,K. C., CARPENTER,
W. C. & (}RVELL,C. (1985). Is rinderpest virus the archevirus of the
morbillivirus genus? Intervirology 23, 228-232.
RIMA, B. K. (1983)~ The proteins of morbiUiviruses. Journal of General
Virology 64, 1205-1219.
RIM-A,B. K., BACZKO,K., CLARKE,D. K., CURRAN,M. D., MARTIN,
S. J., BILLETER,M. A. & TER MEULEN,V. (1986). Characterization of
clones for the sixth (L) gene and a transcriptional map for
morbilliviruses. Journal of General Virology 67, 1971-1978.
ROZENBLATr,S., EIZENBERG,O., BEN-LEVY,R., LAVIE,V. & BELLINI,
W. J. (1985). Sequence homology within morbilliviruses. Journal of
Virology 53, 684-690.
RUSSELL, S. E. H., CLARKE, D. K., HOEY, E. M., RIMA, B. K. &
MARTIN, S. J. (1985). eDNA cloning of the messenger RNAs of five
genes of canine distemper virus. Journal of General Virology66, 433441.
SANGER,F., NICKLEN,S. & COULSON,A. R. (1977). DNA sequencing
with chain-terminating inhibitors. Proceedings of the National
Academy of Sciences, U.S.A. 74, 5463-5467.
SHESHBERADARAN,H., NORRBY, E., MCCULLOUGH,K. C., CARPENTER, W. C. & ORVELL,C. (1986). The antigenic relationship between
measles, canine distemper and rinderpest viruses studied with
monoclonal antibodies. Journal of General Virology 67, 1381-1392
SPRIGGS,M. K. & COLLINS,P. L. (1986). Human parainfluenza virus
type 3: mRNAs, polypeptide coding assignments, intergenic
sequences and genetic map. Journal of Virology 59, 649-654.
TSUKIYAMA,K., SUGIYAMA,M., YOSHIKAWA,Y. & YAMANOUCHI,K.
(1987). Molecular cloning and sequence analysis of the rinderpest
virus mRNA encoding the hemagglutinin protein. Virology 160, 4854.
YAMANAKA, M., HSU, D., CRISP, T., DALE, B., GRUBMAN, M. &
Y1LMA, T. (1988). Cloning and sequence analysis of the hemagglutinin gene of the virulent strain of rinderpest virus. Virology166, 251253.
(Received 3 August 1990; Accepted 6 November 1990)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Thu, 03 Aug 2017 16:44:36