* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The nucleotide sequence of the gene encoding the attachment
Gene therapy wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Molecular ecology wikipedia , lookup
Genetic code wikipedia , lookup
Proteolysis wikipedia , lookup
Gene desert wikipedia , lookup
Epitranscriptome wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Expression vector wikipedia , lookup
Gene nomenclature wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene regulatory network wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Homology modeling wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Journal o f General Virology (1991), 72, 443-447. 443 Printed in Great Britain The nucleotide sequence of the gene encoding the attachment protein H of canine distemper virus M. D. Curran, D. K. Clarke t and B. K. Rima* Division of Genetic Engineering, School of Biology and Biochemistry, The Queen's University of Belfast, Belfast BT9 7BL, U.K. The sequence of the H gene and flanking sequences in the F and L genes of canine distemper virus (CDV) have been determined. The H gene of CDV (1946 nucleotides) contains one large open reading frame starting at position 21 and terminating at position 1835, encoding a protein of 604 amino acid residues. This protein contains three potential glycosylation sites in the extracellular domain and, like all other paramyxoviruses, a N-terminal membrane-spanning hydrophobic anchor domain. The deduced H protein sequence shows an identity of 36% with rinderpest virus (RPV) and measles virus (MV). The identities at the nucleotide level are higher (RPV 52% and MV 53 %). The amino acid sequence shows conservation of all the structural determinants with the H proteins of MV and RPV. The data also show that CDV is evolutionarily equidistant to RPV and MV with respect to the H gene. Canine distemper virus (CDV) belongs to the morbillivirus subgroup of the Paramyxoviridae and is a nonsegmented negative-stranded enveloped RNA virus. Other established members of the group include measles virus (MV), rinderpest virus (RPV) and peste-des-petits ruminants virus. Recently, a fifth member has been proposed, phocine distemper virus (PDV), responsible for distemper in seals. It is now well established that morbillivirus virions contain six proteins (Rima, 1983): the nucleocapsid (N) protei n, the phosphoprotein (P), the large (L) protein, the matrix protein (M) and two integral membrane proteins, namely the fusion protein (F) and the attachment protein haemagglutinin (H) which in MV carries the haemagglutinating activity. These proteins display varying levels of serological cross-reactivity among the individual members (Sheshberadaran et aL, 1986). The morbillivirus genome is a single-stranded negative-sense RNA of 15 to 16 kb in length (Barrett et al., 1991) and is organized into six transcription units or genes encoding the N, P, M, F, H and L proteins separated by almost totally conserved intergenic trinucleotides and preceded and followed by small leader and trailer sequences. In contrast to MV, where the complete nucleotide sequence of the genome is known, much of the CDV genome remains to be sequenced. To date, almost the entire N, P and M genes and the complete F gene of CDV have been sequenced (Rozenblatt et al., 1985; Bellini et al., 1986; Barrett et al., 1985, 1987). We report here the nucleotide sequence of the H gene of the Onderstepoort strain of CDV and compare the predicted H protein sequence to those published for MV (Alkhatib & Briedis, 1986; Gerald et al., 1986; Cattaneo et al., 1989) and RPV (Tsukiyama et al., 1987; Yamanaka et al., 1988). An H gene-specific cDNA clone pCDV54 derived from reverse-transcribed oligo(dT)-primed poly(A)÷ RNA extracted from CDV-infected cells (Russell et al., 1985) was sequenced by the dideoxynucleotide chaintermination method (Sanger et al., 1977). Consistent with the strategy used to synthesize it, the insert sequence of pCDV54 contained a stretch of adenine residues at one end positioning the sequence at the 3' end of the H mRNA. Two additional H gene-specific clones pCDV9 and pCDV815, which cross-hybridized with pCDV54 and were isolated from the genome of the Onderstepoort strain of CDV as described earlier (Rima et al., 1986), were also sequenced. The entire sequence determined for pCDV9 fell within the sequence determined for pCDV54; no differences were observed between them. The sequence of clone pCDV815 overlapped with the end of the pCDV54 sequence and extended 212 nucleotides into the L gene. Since the entire merged sequence of these three cDNA clones extended only 650 nucleotides from the poly(A) t Present address: Department of Biology, University of California at San Diego, La Jolla, California 92093, U.S.A. The nucleotide sequence in this paper will appear in the DDBJ, EMBL and GenBank nucleotide sequence database under accession number D00758: morbillivirus H gene. 0000-9837 O 1991 SGM Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 03 Aug 2017 16:44:36 444 Short communication tail of the H mRNA towards the 5' end and since no clones overlapping with pCDV54 and extending into the F gene had been found in our (and others) cDNA libraries generated from genomic RNA, the polymerase chain reaction (PCR) was employed to isolate cDNAs of the remaining H gene sequence. Two oligonucleotide primers specific to the F gene sequence and the H gene sequence of pCDV54, both engineered with EcoRI sites, were used to prime total RNA extracted from CDVinfected Vero cells for reverse transcription and PCR. The first primer is derived from nucleotides 2073 to 2094 of the F mRNA (Barrett et al., 1987). The sequence of the second primer was derived from the genomic strand sequence of pCDV54 (nucleotides 1324 to 1301 in Fig. 1). After confirmation of its estimated size (about 1400 nucleotides) and H/F gene specificity by Northern blot analysis, the amplified product was digested with EcoRI, subcloned into M13tgl30/Bluescript plasmids and sequenced. All the sequences (existing cDNA clones and the PCR product) were merged to give the complete sequence of the end of the F gene, the H gene and the beginning of the L gene of CDV (see Fig. 1). The numbering in this sequence begins at the conserved gene start sequence in the H gene and ends at position 1946 of the consensus 3'terminal sequence. Genes of similar size have been described for MV (1954 nucleotides; Alkhatib & Briedis, 1986) and RPV (1953 nucleotides; Tsukiyama et al., 1987). When compared to the H gene sequences of MV and RPV, CDV displays a similar identity level (53~o and 5 2 ~ respectively), whereas the identity between MV and RPV is 64~. A comparison between the F gene sequence determined here and the published sequence (Barrett et al., 1987) revealed two differences, one at position 2140 (C to 1")and the other at position 2191 (C to TC). Since the sequence in question represents the 3' untranslated region of the F mRNA of CDV this variability is not surprising. At the end of the sequence there are 212 nucleotides of the L gene of CDV. The open reading frame (ORF) encoding the L protein starts at the same position as that of MV and the sequence of the first 63 residues is 75 ~o identical to that of MV. As in MV and RPV the major ORF of the H mRNA sequence starts at position 21 with the first AUG codon in a favourable context for translation initiation (Kozak, 1986). This ORF extends to a termination codon (TAA) at position 1835. The 3' non-translated region is 109 nucleotides in length. It varies in length between the morbilliviruses. Stable stem-loop structures were not found in either the 5' or 3' untranslated sequences. The gene boundary sequences of paramyxoviruses are thought to act as transcriptional signals, directing the polyadenylation of the newly transcribed mRNA and the CGC2'I'GA'FL'GCCAGGTTTGA~TCTA~GAC 80 ACT G AC C A T ~ A ~ A~C G~ G A ~ A A A G CCCGCC C ~ T T T T C T T C ~ G T C A C T C A A C T G C ~ T A A A C A T C G G A A A A~ACTT 1 20 40 . 60 . AGGGCTCAGGTAGTCCAGC~TGCTCCCCTACCAAGAC~GGTGGGTGCCTTCTACAAGGATAATGC~GAGCCAATTCA M L P Q D K V G A F Y K D N A R A N S 80 i00 120 140 ACC~GCTGTCC~AGTGACAG~GGACATGGGGGCAGGAGACCACCTTATTTG~GTTTGTCC~CTCATCTTA~GGT T K L L V T ~ & G ~ R R P P L L F V L L 160 L L V 160 200 220 TGGTATCCTGGCC~GCTTGCTATCACTGGAGTTCGAT~CACCAAGTATC~CTAGTAATATGG~T~AGCAGA~GC G I L A L L A I T G V R F H Q V S T S N M E F S R L 240 260 280 300 TGAAAGAGGATATGGAGA~TCAGAGGCCGTACATCACC~GTCATAGATGTCTTGACACCGCTC~C~GATTA~GGA L ~ E n M E K S S A V H H ~ V I D V L T P L F K I I G 320 340 360 380 GATGAGA~GGG~ACGG~GCCACAAAAGCTAAACGAGATCAAAC~T~A[CCTTC~%J%AAGAC~A~TTC~C~TCC D E I L R L P Q L N E I K Q L Q K T N F F N P 400 420 440 460 GAACAGAG~TTCGAC~CCGCGATCTCCACTGGTGCATTAACCCGCCTAGTACGGTC~GGTG~TT~ACT~ACT N ~ E F D F K D L ~ C I ~ P P S T V K ~ N 480 T N ¥ 500 520 540 GTGAGTCAA~GGGATUAGA~GCTA~GCATCGGCAGCA~TCCTATCCTT~ATCAGCCCTATCTGGGGGCAGAGGT C E ~ I G I R K A I A S A A N P L L S A L S 560 G R G • 580 600 620 GACATA~CCCACCACACAGATGCAGTGGAGCTACTAC~CAGTAGGCA~GTCfTTCCCCCTATCAGTCTCA~ATCCAT D I F P F ~ R C S G K T T S V G K V F P L S V S L S M 640 660 680 . 700 • GTC~EGATCTC~G~CCTCAGAGGT~TC~TATGCTGACCGCTATCTCAGACGOCGTGTATGGCAAAAC~AC~GC 5 L I S R T S E V N M L T A I S D G V Y G 720 T Y L • 740 760 780 TAGTGCCTGATGATATAGAAAGAGAGTTCGACACTCGAGAGA~CGAGTC~TGA~TAGGGTTCATCAAAAGGTGGCTG L V ~ D D I E R E F T R E I R F E I G F I ~ R W L 800 820 840 860 AATGACATGCCA~TACTCCAAAC~CC~CTATATGGTACTCCCG~G~ccA~GCC~GGTATGTACTATAGCAGT M D M L L Q T T N y M V L P K S K A K V C T I A V 880 900 920 940 GGGTGAG~GACACTGGCTTCC~GTGTGTAG~GAGAGCACTGTATTA~ATATCATGACAGCAGTGG~CAC~GATG GE T L A S L C E E S T V L Y H D S S 960 S Q D 980 I000 1020 GTA~CTAGTAGTGACACTGGGGATA'~TTGGGCAACACCTATGGATCACA~GAGG~GTGATACCTGTCGCTCACCCA G I L V T L G I F W A T P M D E E V I P V A H P 1040 1060 1080 1100 TC~TG~GAAAATACATATAACAAACCACCGTGGT~TAT~GA~C~GC~CCTGGATGGTGCCTGCCCTGGC S M K I H I T N R G F I K D I A T W M V 1140 1160 1180 CTCTGAGA~C~G~G~CA~GG~GTCTGGAGTCAGCTTGTCAAAGAPJ~AACCTACCCCATGTGC~CC~GCGT SE Q E E Q K G L E S A C Q K T Y P M 1120 A L A 1200 N Q A 1220 1240 1260 CATGGGAACCC~CGGAGG~GACAG~GCCATC~ATGGGCGG~GACA~ACCTCTAGATGC~GTG~SACC~C~ 5 W E F G G R Q L P S y G R L L P L D A S V D L Q 1280 1300 . 1320 1340 1360 CTT~CATATCG~CAUATACGGTCCGGTTATACTGAATGGAGATGGTATGGA~ATTATGAAAGCCCACT'FI'TGAACTC L N I F T Y G P V I L N G D G M D Y Y E S P L N S 1380 " 1400 1420 CGGATGGC~ACCATTCCCCCUA~GACGG~C~TCTCTGGA~GATA~CA~SCAGGTAGASGAGACCAS~CACTG G W L T 3 P P K D G T I S G L I N K A G R S 1440 Q F T • 1460 1480 1500 TACTCCCCCATGTG~CATTTGCGCCCAGGGAATC~GTGOAAATTG~ATR'TACCTA~CAAACATCTCAAA~AGA V L P H V L T F A P ~ S S G N ~ L P I Q T 1540 1560 1580 GATAGAGATGTCCTCA~GAGTCCAATATAG~GTGTTGCCTACACA~AGTA~AGATAT~CATAGC~CdTATGACAT D R D L I E S N V V L P T Q S I R Y V I A • 1620 , 1640 1660 ATCACG~GTGATCATGCTA~GTTTA~ATGT~'fATGACCC~TCCGGACGA~TTC~ATACGCACCCA~TAGACT~ S R S D H A I V Y Y V Y D P I R I S Y T H ~ F R L 1700 . 1720 1740 CTACC~GTAGACCTGATTTCCT~GGA~G~TG~TTGTGTGG~ATGAC~TTTGTGGTGTCACC~TTTTACAGA T T K G E P D F L R I E C F V W D N L W C H 1520 Q I R 1600 Y D I 1680 1760 F Y R 1780 . 1800 1820 ~CGAGGCTGACATCGCC~CTCTAC~CCAGTGTTGAG~TTTAGTCCGTAT~GA~CTCATGT~CCG~%AATCC F E A D I A N S T T S V E N L V I R F S C N R * 1840 1860 . 1880 1900 CTGACAGTATGATGATACACATCTC~TTGGCCTTAGGCATGAT~CTGCGGTGAGAAATCCC~ACAGACGA~G~ 1920 • 1940 A~CCATCTCTAGCA~ATAAAAAAACTA AGGATCC~GATCC~AGCCATGGACTCTGTATCAGTGAACCAGATTCTATACCCTGASGTCCATCTAGATAGCCC~ M D S V S V N Q I L Y P E V H L D S P TTGT~CC~TAAGCTAGTATCTATTTTAG~TACGCACG~AGACATAACTATCAGCTCC~GATAC~GA~AGTG I V T N K L V S I L E Y A R I K K N Y Q L L D T R L V CGT~TATCA~SAGAG~T~CAGAAGGG~CTCA~CCAGATGATCA~A R N I K E R I S E G F S N Q M I I Fig. 1. The nucleotidesequenceof the end of the F gene, the H gene and the startof the L geneof CDV. The positiveantigenomicsequence is displayed in its DNA formas determinedfromcDNA clones and PCR products. start of transcription of the next. Nucleotide sequence comparison has revealed a high degree of conservation within and between individual paramyxovirus species in these regions. In contrast to MV, where all the gene Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 03 Aug 2017 16:44:36 Short communication Polyadenylation signal sequence Intergenic mRNA start sequence iAn i Leader MV (AGTGCA) CTT AGGATTCAAGA Gene N CDV MV PDV RPV ATTATAn GTTATAAAAAA ATTATAAAAAA ATTATAn CTT CTT --- AGGAACCAGGT AGGACCCAGGT CDV MV ATTATAn ATTATAAAAA CTT AGGAGCAAAGT Gene M CDV MV RPL ATTAATCAAAA ACTAAACAAAA CTT CTT AGGGTCCAGGA AGGGCCAAGGA AGGGCCAAAGA Gene F Gene F CDV MV RPL RPK ATTAAAGAAAA ATTAAAA ATTGCTACAAAGA. ATTGTTATAAAGA, CTT CTT AGGGCTCAGGT AGGGTGCAAGA AGGATGCAAGA Gene H Gene H CDV MV RPL RPK ATTATAAAAAAA ATTAAGAA ATTATA. ATTATAo CTA CGT AGGATCCAAGA AGGGTCCAAGT Gene L Gene L MV ATTAAAGAAAA CTT (TGAAAATA) Trailer Gene P TA MV ATANA4~ Sendai virus ANTAAGA 5 Parainfluenza type 3 virus TA AaATaNAs~ 445 Gene N Gene P Gene M A C A A CtT AGGGNNAAcGT Ctt AGGtNAAaG CTt AGGANaAAG Fig. 2 The gene end, intergenic sequences and gene start sequences of some paramyxoviruses. The data for Sendai virus were from Gupta & Kingsbury (1984), those of MV from Cattaneo et al. (1987) and for parainfluenza virus type 3 from Spriggs & Collins (1986). The data for all known morbillivirus genes are included. Capital letters indicate conserved nucleotides; small letters indicate conservation in the large majority of cases. boundary sequences are known, only the M/F boundary has been described for CDV (Barrett et al., 1987). The 3' end sequences of the N and P mRNAs have also been reported (Rozenblatt et al., 1985; Barrett et al., 1985). Here we report two more gene boundary sequences of CDV for the F/H and H/L genes. These are shown in Fig. 2 together with the consensus gene boundary sequences of MV (Cattaneo et al., 1987), Sendai virus (Gupta & Kingsbury, 1984) and human parainfiuenza virus type 3 (Spriggs & Collins, 1986). As expected the gene boundary sequences of CDV are also highly conserved and display striking similarity with the consensus sequences shown in Fig. 2. In contrast to the sequence reported by Barrett et al. (1987) for the polyadenylation signal (ATTATAn) our data identified a G residue which interrupts the string of adenine residues (Fig. 2). A re-examination of the mRNA-derived cDNA sequence also revealed the presence of the G residue, confirming that the sequence presented here represents both the antigenomic as well as the m R N A sequence. It is interesting to note that as with MV and Sendai virus, the conserved intergenic trinucleotide C U U (in the positive antigenome sense) in CDV is altered in the H/L boundary (CUU to CUA). One could postulate that this enables the polymerase to attenuate differentially at this intergenic sequence, explaining the extreme level of attenuation of transcription observed at the H/L intergenic boundary of MV (Cattaneo et al., 1987). The completion of the remaining CDV gene boundary sequences should allow a more comprehensive comparison with its paramyxovirus relatives. The deduced amino acid sequence of the H protein of CDV contains 604 amino acid residues and is depicted beneath the sequence in Fig. 1. Since glycosylation is known to affect Mr estimates on SDS-PAGE, increasing the estimate by 2000 to 3000 per oligosaccharide chain (Keil et al., 1979; Horisberger et al., 1980), the calculated M~ of the translation product (67996) compares well with the SDS-PAGE estimate of the H protein of CDV at 76K (Rima, 1983) if one takes into account the predicted glycosylation sites in the H protein sequence (see below). Considering the absence of other ORFs, the favourable context of the initiation codon and the identities found with the other morbillivirus H proteins, it is Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 03 Aug 2017 16:44:36 446 Short communication MCON RPL CDV MSPQP~DRINAFYKDNPHPKGSRI VINREHLMI DR - PYVLLAVL FVMFL S L I GLLAI AG I RL MS SPP~DRVNAFYKDNLQFK/~TRWLNKEQLL I ER - PYMLLAVLFVMFLSLVGLLA I AGI RL MLPYQDKVGAFyKDNARANSTKLSLVTEGHGGRRPPY - L LFVLL I LLVGI LALLA I TGVRF * *__ ****** .... * * **_** *~ ..... **** *-* MCON RPL CDV HRAAI YTAE I HKS L STNLDVTNS I EHQVKDVLTPLFK I I GDEVGLRTPQRFTDLVKF I $D 120 HRAAVNTA~ I N SGLTT S I DI TKS I EYQVKDVLTPLFK I I GDEVGLRTPQRFTDLTKF I SD 120 HQVSTSNMEFSRLLKEDMEK$ EAVHHQVI DVLTPLFK I I GDEI GLRLPQKLNE I KQF I LQ 120 * _ _* * . . . . . ** * * ~ * * * * * * * * * * ~ * * * **_ _. ** ! ! KI KFLN~DREYDFRDLTWC I NPPERI KLDYDQYCADVAAEEL~4NALVNSTLLETRTTNQF 180 KI KFLNPDKEYDFRD I NWC I ZPPERI KI NYDQYCAHTAAEEL i TMLVNS S LAGTSVLPTS 180 KTNFFNPNREFDFRDLHWC I NPpSTVKVNFTNYCES i G I R K A I A S A A N P I L L S A L S GCJRG 1 8 0 * * **__*_****_ *** ** -* .... ** . . . . . * *_ , l T , LAVSKGNC 5GPTTI RGQFSNMSLS LLDLYLGRGYNV5 S IVTMT 5QGMYGGTYLVEKP_NLS 240 LVNLGRSCTGSTTTKGQFSNMSLALS GI YS GRGYN I SSMI TI TEKGMYGSTYLVGK HNQG 240 DIFPPHRCSGATTSVGKVFPLSVSLSMSLISRTSEVINMLTAISDG'~fGKTYLLVPDDIE 240 *.* ** * _*_ * * .... **_** ***_ MCON RPL CDV MCON RPL CDV 60 60 60 MCON RPL CDV S KRS E LSQLSMYRVFEVGVI RNPGL GAPVFHMTNYLEQPVS ND LSNC~V[VALGELKLAAL C ARRPSTAWQRDYRVFEVGI I RELGLGTPVFHMTNYLELPRQP ELE I CMLALGEFKLAALC -REFDT- - -REIRVFEIGFIKRWLNDMPLLQTTNY~LPK~WSKAKVCTI AVGELTLASLC ****_*_*_ *_ **** * _*_** * * ** MCON RPL CDV HGED51 TI PYQGSGKGVSFQLVKLGV -WKS PTDMQSWVPLSTDDPVi DRLYLS Z HRGVI AD LADNSVALHYGGLRDDHK i RFVKLGV -WP5 PADS DTLATL S AVDPTLDGLYI TTHRGI I AA VEESTVLLYHDS SGS QDGI LVVTLG I FWATPMDHI EEVI PVAH- P SMKKI H i TNHRGF I KD ..... * * * - * _* * * ..... *** * MCON RPL CDV NQAKWAVPTTRTDDKLRMETCFQQACKGKI QALC ENPEWAP LKDNRI P SYGVLSVDLSLT GKAVWVVpVTRTDDQRKMGQCRREACREKPPPFCNSTDWEPLEAGRI PAYGI LTI RLGLA S I ATW~PALASEKQEEQKGCLESACQRKTYPMCNQASWEPFGGRQLPSYGRLTLPLDAS * *_** __ _ * **- * *_ * * .* * * - * _ _ * - 420 420 416 MCON VELKI KI ASGFGPL I THGSGMDLYKSNHNNVYWLTI PPMKNLALGVINTLEWI PRFKVSP DKLKLTI I 5EFGPL I THDSGMDLYTPLDGNEYWLTI PPLQNSALGTVNTLVLEPSLKI SP %rDLQLNI SFTYGPVI LNGDGMDYYES PLLNS GWLT I PPKDGT I SGL I NKAGRGDQFTVLP * _ * _**_* *** * * ****** _ * _* _ _ * 480 480 476 MCON RPL CDV YLFTVpI KZAGEDCHAPT~fLPAEVDGDVKLSSNLVI LPGQDLQIrVLATYDTSRVEHAVVY N I LTLP I RSGGGDCYTPTYLSDLADDDVKLSSNLVI LPSRNLQYVSATYDTSRVEHAIVX HVLTFAPRES SGNCYLPI QTSQIRDRDVLI ESNIWLPTQS IRYVIATYD I S RSDHAI VY _ * _ _*_ * * ** _ **___** - ** * * * * * * _ * * _ * * 540 540 536 MCON RPL CDV YVYS P S RS F SYFYPFRLPI KGVP I ELQVE CFTWDQKLWC~FC - VLADSESGGHITHSGM yIYS AGRLS SY~PVKLP I KGDPVS LQ I GCFPWGLKLWCHHFC - SVI DSGTRKQVTHTGA YVYDp I RTI SYTHPFRLTTKGRPDFLRI ECFVWDDNLWCHQFYRFEADI AN - STTSVENL *-* * ** -* _* ** , * _ ** , *** * _, _ 599 599 595 MCON RPL CDV VGMGVSCTVTREDGTNRR VGIEITC ........ NSR VR~RFSC ........ NR * _ _* , RPL CDV 300 300 296 360 360 356 617 609 604 Fig. 3. Alignmentof the sequencesof the H proteinsof MV, RPV and CDV. The sequence data for RPV (lapinized strain RPL) are from Tsukiyamaet al. (1987) and those for MV are a consensussequence fromCattaneoet al. (1989). (*), Residuesconservedin all threeviruses; (-), residuesfunctionallyconservedin all three viruses; (!), potential glycosylationsite. proposed that the primary sequence of the ORF shown in Fig. 1 is that of the H protein of CDV. Interestingly, an ORF encoding a potential product of 70 amino acid residues containing a hydrophobic domain of 16 residues and two potential glycosylation sites has previously been identified near the 3' end of the H mRNA sequence of MV (Gerald et al., 1986). In the case of RPV (Tsukiyama et al., 1987; Yamanaka et al., 1988) and CDV, no counterpart to this ORF was found, ruling out the possibility of a common function of this as yet unidentified product in morbillivirus-infected cells p e r se. In common with the other Paramyxoviridae H proteins examined to date (Morrison, 1988), the CDV H protein appears to be a class II glycoprotein as the only hydrophobic domain large enough to span the lipid membrane is located near the N terminus of the protein (amino acids 35 to 55). This domain is thought to act both as a signal sequence for membrane transport and as the anchor of paramyxovirus H (N) proteins (Morrison, 1988). Three potential sites for N-linked glycosylation were found at amino acid positions 149, 422 and 587 in agreement with the above mentioned prediction. An- other potential glycosylation site is located at residue 19, but this is unlikely to be used, since it has been proposed that this sequence is in the cytoplasmic domain of the H glycoprotein molecule (Morrison, 1988). To facilitate a comparative analysis between the morbillivirus H glycoprotein primary sequences, the H proteins of MV, CDV and RPV were aligned and compared (Fig. 3). Gaps had to be inserted in order to maximize the alignment of the CDV H sequence with those of MV and RPV. Most of these were single residue omissions indicating sporadic insertion and subsequent deletion downstream of triplet codons in the H gene sequence. However, two alterations were more substantial. There was a gap of three residues at positions 246 to 248 of the MV and RPV sequence and an eight residue gap at the C terminus, explaining the larger size of the MV H primary sequence. The CDV H protein sequence shows an overall identity of 36~ with the H protein sequences of RPV and MV, whereas the identity between the latter two is 60~. Substantial stretches of three-way matched residues are scattered throughout the sequence, notably between residues 87 and 109 and between 519 and 535. Conservative replacements are also in abundance (Fig. 3). The areas of perfect identity probably play an important role in the structure and function of the H glycoprotein. Additionally, all the cysteine residues in the CDV sequence are matched with the sequences of both MY and RP¥. However, one of the matched cysteine residues in MV and RPV (at position 583) is absent in the CDV sequence. Of the 32 proline residues present in the CDV H sequence, 21 are present in identical positions in all three sequences. It is thus obvious that the morbillivirus H proteins have a similar conformation. However, when one compares the levels of identity observed in the H proteins with those observed between the other viral proteins of CDV and MV (N, 66~; P, 44~; C, 44~; M, 76~; F, 66~), it appears that considerable sequence divergence in the H proteins exists. It is interesting to note that the divergence is not as extreme at the nucleotide level. These data also reinforce the existing view (based on the sequences of the F genes of these three viruses), that MV is more closely related to RPV than to CDV. This divergence is further amplified when one considers the potential glycosylation sites. Three of the five potential glycosylation sites in the MV sequence occur in identical positions in the RPV sequence, whereas in the case of CDV its three potential glycosylation sites (Fig. 3) occur in different positions to those of MV and RPV. No doubt these differences together with the sequence divergence may result in significant changes in the tertiary structure of the CDV H protein with respect to MV and RPV and therefore the antigenic determinants. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 03 Aug 2017 16:44:36 Short communication Although immunological and sequence data have clearly shown that the H protein is the most variable of the morbillivirus proteins, the level of variability observed with respect to MV and RPV is considerably higher than anticipated, especially from the immunological studies carried out to date (Norrby et al., 1985; Sheshberadaran et al., 1986). A more comprehensive comparison between all the morbillivirus H proteins will throw some light on the significance of this high degree of variability between CDV and MV/RPV. This may reflect the wide host range of CDV. In summary, determination of the H gene sequence of CDV has indicated that there is a low level of identity between CDV and the other two morbilliviruses. The data on both the H and F proteins indicate that CDV is equidistant from MV and RPV and therefore this does not provide evidence for or against the suggestion that RPV is the archetypal virus in this group (Norrby et al., 1985). It will be interesting to compare the sequence of the H glycoprotein of PDV with the one reported here and work is in progress to complete the sequence of PDV. The availability of the primary sequence for t h e H protein of CDV may now allow studies to determine antigenic determinants of the CDV H protein. We thank the Department of Education for Northern Ireland for studentship support to D.K.C., and the U.K. Medical Research Council for support under grant number 8604630CA. References ALKHATIB,G. & BRIEDIS, D. J. (1986). The predicted structure of the measles virus hemagglutinin. Virology 150, 479-490. BARRETT,T., SI-IRIMPTON,S. B. & RUSSELL,S. E. H. (1985). Nucleotide sequence of the entire protein coding region of canine distemper virus polymerase-associated (P) protein mRNA. Virus Research 3, 367-372. BARRE'rI',T., CLARKE,D. K., EVANS,S. A. & RIMA, B. K. (1987). The nucleotide sequence of the gene encoding the F protein of canine distemper virus: a comparison of the deduced amino acid sequence with other paramyxoviruses. Virus Research 8, 373-386. BARRETT, T., SUBBARAO,S. M., BELSHAM,G. J. & MArlY, B. W. J. (1991). The molecular biology of the morbilliviruses. In The Paramyxoviruses, pp. 83-102. Edited by D. W. Kingsbury. New York & London: Plenum Press. BELLINI, W. J., ENGLUND,G., RICHARDSON,C. D., ROZENBLATr,S. & LAZZARINI,R. A. (1986). Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences and deduced amino acid sequences. Journal of Virology 58, 408-416. CATTANEO,R., REBMANN,G., SCHMID,A., BACZKO,K., TER MEULEN, V. & BILLETER, M. A. (1987). Altered transcription of a defective measles virus genome derived from a diseased human brain. EMBO Journal 6, 681-688. 447 CATTANEO,R., SCHMID,A, SPIELHOFER,P., KAELIN,K., BACZKO,K., TER MEULEN,V., PARDOWlTZ,J., ELANAGAN,S., RIMA,B. K., UDEM, S. A. & BILLETER,M. A. (1989). Mutated and hypermutated genes of persistent measles viruses which caused lethal human brain diseases. Virology 173, 415-425. GERALD, C., BUCKLAND,R., BARKER,R., FREEMAN,G. & WILD, T. F. (1986). Measles virus haemagglutinin gene: cloning, complete nucleotide sequence analysis and expression in COS cells. Journal of General Virology 67, 2695-2703. GUPTA, K. C. & KINGSBURY,D. W. (1984). Complete sequences of the intergenic and mRNA start signals in the Sendal virus genome: homologies with the genome of VSV. Nucleic Acids Research 12, 3829-3841. HORISBERGER,M. A., DESTRITZ,C. & CONTENT,J. (1980). Intracellular glycosylation of influenza hemagglutinin: the effect of glucosamine. Archives of Virology 64, 9-16. KEIL, W., KLENK, H. D. & SCHWARZ,R. T. (1979). Carbohydrates of influenza virus. Ill. Nature of oligosaccharide protein linkage in viral glycoproteins. Journal of Virology 31, 253-256. KOZ~K, M. (1986). Point mutations define a sequence flanking the AUG initiator codon that modulate translation by eukaryotic ribosomes. Cell 44, 283-292. MORRISON, T. G. (1988). Structure, function and intracellular processing of paramyxovirus membrane proteins. Virus Research 10, 113-136. NORRBY,E., SHESHBERADARAN,H., MCCULLOUGH,K. C., CARPENTER, W. C. & (}RVELL,C. (1985). Is rinderpest virus the archevirus of the morbillivirus genus? Intervirology 23, 228-232. RIMA, B. K. (1983)~ The proteins of morbiUiviruses. Journal of General Virology 64, 1205-1219. RIM-A,B. K., BACZKO,K., CLARKE,D. K., CURRAN,M. D., MARTIN, S. J., BILLETER,M. A. & TER MEULEN,V. (1986). Characterization of clones for the sixth (L) gene and a transcriptional map for morbilliviruses. Journal of General Virology 67, 1971-1978. ROZENBLATr,S., EIZENBERG,O., BEN-LEVY,R., LAVIE,V. & BELLINI, W. J. (1985). Sequence homology within morbilliviruses. Journal of Virology 53, 684-690. RUSSELL, S. E. H., CLARKE, D. K., HOEY, E. M., RIMA, B. K. & MARTIN, S. J. (1985). eDNA cloning of the messenger RNAs of five genes of canine distemper virus. Journal of General Virology66, 433441. SANGER,F., NICKLEN,S. & COULSON,A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SHESHBERADARAN,H., NORRBY, E., MCCULLOUGH,K. C., CARPENTER, W. C. & ORVELL,C. (1986). The antigenic relationship between measles, canine distemper and rinderpest viruses studied with monoclonal antibodies. Journal of General Virology 67, 1381-1392 SPRIGGS,M. K. & COLLINS,P. L. (1986). Human parainfluenza virus type 3: mRNAs, polypeptide coding assignments, intergenic sequences and genetic map. Journal of Virology 59, 649-654. TSUKIYAMA,K., SUGIYAMA,M., YOSHIKAWA,Y. & YAMANOUCHI,K. (1987). Molecular cloning and sequence analysis of the rinderpest virus mRNA encoding the hemagglutinin protein. Virology 160, 4854. YAMANAKA, M., HSU, D., CRISP, T., DALE, B., GRUBMAN, M. & Y1LMA, T. (1988). Cloning and sequence analysis of the hemagglutinin gene of the virulent strain of rinderpest virus. Virology166, 251253. (Received 3 August 1990; Accepted 6 November 1990) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Thu, 03 Aug 2017 16:44:36