Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Gene Duplication Is Infrequent in the Recent Evolutionary History of RNA Viruses Etienne Simon-Loriere1,2 and Edward C. Holmes*,3,4 1 Institut Pasteur, Unité de Génétique Fonctionnelle des Maladies Infectieuses, Paris, France Centre National de la Recherche Scientifique, URA CNRS3012, Paris, France 3 Sydney Emerging Infections and Biosecurity Institute, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW, Australia 4 Fogarty International Center, National Institutes of Health, Bethesda, Maryland *Corresponding author: E-mail: [email protected]. Associate editor: James McInerney 2 Abstract Gene duplication generates genetic novelty and redundancy and is a major mechanism of evolutionary change in bacteria and eukaryotes. To date, however, gene duplication has been reported only rarely in RNA viruses. Using a conservative BLAST approach we systematically screened for the presence of duplicated (i.e., paralogous) proteins in all RNA viruses for which full genome sequences are publicly available. Strikingly, we found only nine significantly supported cases of gene duplication, two of which are newly described here—in the 25 and 26 kDa proteins of Beet necrotic yellow vein virus (genus Benyvirus) and in the U1 and U2 proteins of Wongabel virus (family Rhabdoviridae). Hence, gene duplication has occurred at a far lower frequency in the recent evolutionary history of RNA viruses than in other organisms. Although the rapidity of RNA virus evolution means that older gene duplication events will be difficult to detect through sequencebased analyses alone, it is likely that specific features of RNA virus biology, and particularly intrinsic constraints on genome size, reduce the likelihood of the fixation and maintenance of duplicated genes. Key words: RNA virus, gene duplication, genome size, genetic redundancy. Introduction Mol. Biol. Evol. 30(6):1263–1269 doi:10.1093/molbev/mst044 Advance Access publication March 13, 2013 1263 Fast Track ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Article Gene duplication is central to the development of organismal complexity. Gene duplication provides important evolutionary opportunities through the creation of new genetic material (Ohta 1989; Zhang 2003; Hurles 2004; Innan and Kondrashov 2010) and has been linked to many aspects of genome evolution (Wagner et al. 2007) and species diversification (Zhang et al. 2002; Zhang 2003). As gene duplication is a potent way to create new biological function, it is not surprising that it occurs frequently in many organisms and sometimes as duplications of complete genomes (Meyer and Schartl 1999; Soltis and Soltis 1999). In many species, particularly large eukaryotes, gene duplication also leads to genetic redundancy, such that many paralogous gene copies have no apparent function. Surveys of gene duplication in representative genomes from different domains of life indicate that paralogous genes (and which may form multigene families) are a common occurrence, representing as much as 40–65% of the total number of genes (Zhang 2003). Although several evolutionary models have been developed to explain how duplicated genes can be fixed and maintained in genomes (Innan and Kondrashov 2010), mechanistically gene duplication can result from a variety of processes, including unequal crossing over and retroposition, although always related to a form of recombination. Despite the evolutionary importance of gene duplication, far less is known about this process in viruses, particularly those with RNA genomes. To date, gene duplication has been described relatively frequently in large DNA viruses, in which multigene families are a relatively common occurrence (Shackelton and Holmes 2004). Among the many examples are the numerous multigene families in African swine fever virus (de la Vega et al. 1990), the multiple cases of duplication in the E4 region of mastadenoviruses (Davison et al. 2003), and the terminal inverted repeats in myxoma viruses (Labudovic et al. 2004). Similarly, gene duplication has been reported in a number of small DNA viruses, such as the Papillomaviridae (Cole and Danos 1987) and Parvoviridae (Hoelzer et al. 2008). In contrast, gene duplication has to date only been relatively rarely documented in RNA viruses and reflected in the marked lack of multigene families (and genetic redundancy) compared with other organisms (Holmes 2009). In particular, there are few reported cases in which gene duplication has resulted in two complete open reading frames within a viral genome, and which may be tandemly repeated (Forss and Schaller 1982; Tristem et al. 1990; Boyko et al. 1992; Walker et al. 1992; Wang and Walker 1993; Karasev et al. 1995; LaPierre et al. 1999; Peng et al. 2001; Valli et al. 2007; Walker et al. 2011; and see Results). Other duplication events in RNA viruses involve short sequence duplications in untranslated regions (Panavas et al. 2003; Gritsun and Gould 2006) and short intragenic regions (Nagai et al. 2003; Zlateva et al. 2007; Cao et al. 2008). Such a low frequency of gene duplication is especially striking given MBE Simon-Loriere and Holmes . doi:10.1093/molbev/mst044 that endogenous retroviruses have been associated with gene duplication events in their hosts (Hughes and Coffin 2001), suggesting that gene duplication is mechanistically possible in RNA viruses. Although it is likely that some gene duplication events in RNA viruses will be difficult, if not impossible, to recover through gene sequence analysis because of their high levels of divergence, it may also be that these organisms experience intrinsic evolutionary constraints against gene duplication (Holmes 2009). In particular, it has been suggested that there is a cap on the maximum size that can be attained by an RNA virus genome, which is set by their extremely high mutation rates (approximately one mutation per genome replication). Accordingly, both gene duplication and lateral gene transfer are expected to be rare in RNA viruses, as any increase in genome size is likely to increase the burden of deleterious mutations and hence reduce fitness (Holmes 2003, 2009). An increase in viral genome size would also result in longer replication times, which could be selectively disadvantageous, and constraints associated with unwinding long regions of dsRNA may similarly limit genome size (Reanney 1982), as could those imposed by limits to capsid size and shape. Finally, it may be that RNA viruses are better able to create evolutionary novelty through a combination of frequent mutation and large population sizes. Indeed, a similar rationale has been invoked to explain the low rates of recombination observed in many RNA viruses (Simon-Loriere and Holmes 2011). To better understand the causes and consequences of gene duplication, as well as the determinants of this process, it is essential to assess the frequency with which it occurs in RNA viruses. To this end we performed a comprehensive survey of the occurrence of gene duplication in all publicly available families of RNA viruses. Results We employed a BLAST approach to analyze gene duplication events in 1198 virus species, comprising 774 single-strand (ss), positive-sense RNA viruses, 155 single-strand, negative-sense RNA viruses, 119 reverse-transcribing viruses, and 150 doublestrand (ds) viruses. Despite the size of the data set analyzed, we detected only nine statistically supported cases (i.e., at a protein BLAST e-value of <105) of gene duplication, although a number of other viral genes exhibited nearly significant matches. In addition, all but one of these duplicate genes are located adjacent to each other in the viral genome. Hence, it is clear that gene duplication is a rare and highly sporadic event in recent RNA virus evolution. Table 1 summarizes the species, proteins, and BLAST e-values obtained. The rarity of this process precluded any meaningful comparison of the relative frequency of gene duplication by taxonomic group, although some viral families such as the Rhabdoviridae (Walker et al. 2011) contain multiple occurrences, while no duplicate genes were observed in the dsRNA viruses analyzed. We now describe each case of gene duplication in turn. 1264 Single-Strand, Positive-Sense RNA Viruses We describe, for the first time, a potential gene duplication event in Beet necrotic yellow vein virus (BNYVV; genus Benyvirus). Notably, some BNYVV isolates contain five instead of four RNA segments, and our analysis indicates that the 26 kDa protein encoded by the fifth RNA segment exhibits strong sequence similarities to the 25 kDa protein encoded by the third segment (e-value: 4 1010, 22% sequence identity, and a 43% positive match in a 217 amino acid region) (fig. 1). Because these are multicomponent viruses it is likely that this particular case of gene duplication occurred through a form of segmental reassortment, rather than intrasegment recombination. Indeed, we suggest that the transmission of an additional copy of segment 3, from the same or a homologous virus, and the subsequent functional differentiation from the original p25 protein (or vice versa) produced this particular genomic organization. We also found several cases of duplication of the coat protein (CP) in the Closteroviridae, a family of plant RNA viruses that possess genomes up to 20 kb in length, and which form flexuous, filamentous virions. Specifically, all members of the Closteroviridae possess a minor coat protein (CPm) that is located adjacent to the CP in a 50 or 30 location. This duplication event was previously described in two members of the genus Closterovirus—Beet yellow virus (BYV) and Citrus tristeza virus (Boyko et al. 1992). The signal for gene duplication (i.e., gene paralogy) was found to be statistically significant in 9 of the 33 species of Closteriviridae studied here, and scattered among the three genera (Ampelovirus, Clostrovirus, Crinivirus), with e-values ranging from 1 106 to 6 1024 (Table 1). In addition, another homolog of CP (CPh) is present in all criniviruses and, based on the identification of two conserved Arg and Asp residues, the C-terminal domain of a 64 kDa protein expressed by the closteroviruses has been shown to be homologous to CP in BYV (Napuli et al. 2003). The corresponding protein in the Ampelovirus genome (55 kDa protein) exhibits some sequence similarity to the 64 kDa protein of the closteroviruses and hence is also likely to be a distant homolog of CP. However, this putative gene duplication event was (marginally) not significant in our analysis, with the highest value found for the Closterovirus mint virus 1 (e-value: 9 104, 27% identity, and a 42% positive match in a 81 amino acid region). Finally, and uniquely among the Costeroviridae, Grapevine leafroll-associated virus 1 possesses two copies of the CPm protein (Fazeli and Rezaian 2000), which also exhibit strong sequence similarity (e-value: 7 1022, 38% identity, and a 58% positive match in a 125 amino acid region). The presence of multiple homologs of the CP among all species of the Closteroviridae is suggestive of an ancient duplication event, or series of events, that occurred prior to the diversification into the three current viral genera. Interestingly, this viral family possess an unusual range of different genome lengths and organizations, including mono-, bi-, and tripartite genomes (Dolja et al. 2006), again indicative of a history of major genomic events. MBE Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044 Table 1. Duplicated Genes in RNA virus Genomes. Genome Organization (+)ssRNA ()ssRNA Reverse-transcribing viruses Family Genus Closterovirus Closterovirus Closterovirus Closterovirus Crinivirus Closteroviridae Ampelovirus Ampelovirus Ampelovirus Ampelovirus Ampelovirus Picornaviridae Aphthovirus NA Benyvirus Virus CTV BYV SCF-AV Mint virus 1 SPCSV LCV 2 GLRaV 3 GLRaV 12 GLRaV 1 GLRaV 1 FMDV BNYVV Ephemerovirus Ephemerovirus Rhabdoviridae Unassigned Unassigned BEFV Kotonkan virus Ngaingan virus Wongabel virus Retroviridae Epsilonretrovirus Unclassified Lentivirus Lentivirus WEHV-2 Xen1 HIV-2 SIV-MND-2 Gene Duplicated CP ! CPm CP ! CPm CP ! CPm CP ! CPm CP ! CPm CP ! CPm CP ! CPm CP ! CPm CP ! CPm CPm1 ! CPm2 Vpg ! Vpg p25 ! p26 Position Blast P Value Adjacent 2 1013 Adjacent 3 1010 Adjacent 1 106 Adjacent 2 109 Adjacent 4 106 Adjacent 6 1024 Adjacent 1 1010 Adjacent 5 1010 Adjacent 7 1022 Adjacent 1 1023 Adjacent 2 106 On different 4 1010 segments Publication Boyko et al. (1992) Boyko et al. (1992) Tzanetakis et al. (2007) Tzanetakis et al. (2005) Kreuze et al. (2002) This study This study This study Fazeli et al. (2000) Fazeli et al. (2000) Forss and Schaller (1982) This study G G G U1 ! ! ! ! Gns Gns Gns U2 Adjacent Adjacent Adjacent Adjacent 1 107 4 109 9 1025 8 109 Walker et al. (1992) Blasdell et al. (2012) Gubala et al. (2010) This study orf A orf 1 vpr vpr ! ! ! ! orf B orf 2 vpx vpx Adjacent Adjacent Adjacent Adjacent 8 109 2 1022 9 106 2 109 LaPierre et al. (1999) Kambol et al. (2003) Tristem et al. (1990) Tristem et al. (1990) Note.—(+)ssRNA, single-strand, positive-sense RNA viruses; (-)ssRNA, single-strand, negative-sense RNA viruses; CTV, Citrus tristeza virus; BYV, Beet yellow virus; SCF-AV, Strawberry chlorotic fleck-associated virus; MV 1, Mint virus 1; SPCSV, Sweet potato chlorotic stunt virus; LCV, Little cherry virus 2; GLRaV 3, Grapevine leafroll-associated virus 3; GLRaV 12, Grapevine leafroll-associated virus 12; GLRaV 1, Grapevine leafroll-associated virus 1; FMDV, Foot-and-mouth disease virus; BNYVV, Beet necrotic yellow vein virus; BEFV, Bovine ephemeral fever virus; WEHV-2, Walleye epidermal hyperplasia virus type 2; Xen1, Xenopus laevis endogenous retrovirus Xen1; HIV-2, Human immunodeficiency virus 2; SIV-MND-2, Simian immunodeficiency virus - mnd 2. A RNA-1, 6746 nt RNA-2, 4612 nt RNA-3, 1774 nt 25kDa protein 219 1 RNA-4, 1431 nt RNA-5, 1320 nt 26kDa protein 1 B 25kDa protein 26kDa protein 25kDa protein 26kDa protein 232 7 113 AVYDLGHRPYLARRTVYEDRLTLSTHGNICRAINLLTHDNRT--SLVYHNNTKRIRFRGLLCSYHGPYCGFRALCRVMLCSLPRLCDIPINGSRDFVADPTRLDSSVNE A D H PY +R+ + + G IC + + +DN + +YH K +RF + + + F R ++ P + +G + ++S ++ AYSDDNHLPYYIQRSTHHVVRDVDYTGFICYPLQVDLNDNVEVGADIYHMKIKTMRFNVDIYN-NDVATKFPGWVRFIVFCTPPVSSWVNDGCSSLFSPFVGVNSFIDP 13 121 114 217 LLVS---NGLVIHYDRVHNVPIHTDGFEVVDFTTVFRGPGNFLLPNATNFPRSTTTDQVYMVCLVNTV-NCVLRLESELTVWVHSGLYAGDVLDVDNNVIQAPDGVDD L+ +G+ + +D ++ + H + F F FRGPGN+ L + + +T D +Y+ C+ + + L+S+ WVH + VL+ + PD +D KLLKRDGHGITVLHDGIYCL-CHQEHF-TRSFEFNFRGPGNYTLTSDVCWSPATNVDSIYVACVASWCGDSAFMLQSDSVSWVHKRFWQRPVLEFGQCLDDLPDHDND 122 227 FIG. 1. (A) Genome organization of Beet necrotic yellow vein virus (BNYVV). The segments involved in the potential gene duplication event are highlighted in black. (B) Amino acid alignment of the region of homology between the proteins encoded by segments 3 and 5. Identical amino acids are indicated between the sequences. + indicates similar residues. The Picornaviridae encode 3BVPg, a protein covalently attached to the 50 end of all virion RNAs. Uniquely among viruses of this family, Foot-and-mouth disease virus (FMDV) encodes three sequential paralogous genes for 3BVPg (King et al. 1980; Forss and Schaller 1982). Our analysis detected the second 3BVPg copy as a duplicate (e-value: 2 106, 75% identity, and a 95% positive match in a 20 amino acid region), but failed to detect the third copy at a significant value (e-value: 0.02, 56% identity, and 78% positive match in a 18 amino acid region), likely due to its very small length and 1265 MBE Simon-Loriere and Holmes . doi:10.1093/molbev/mst044 greater divergence. As there is no known specialized function for the supplementary copies of VPg, it has been suggested that having multiple copies of VPg is advantageous because it results in increased protein synthesis (Forss and Schaller 1982; Falk et al. 1992). While other gene duplication events have been proposed to have occurred during the evolution of the Picornaviridae, these were not detected as significant in our analysis. For example, the general correspondence in protein structures between the two proteases of enteroviruses— 2Apro and 3Cpro—has led to the idea that they are duplicate copies (Palmenberg et al. 2010; see Discussion). previous suggestions of gene duplication events in this region (Walker et al. 2011). The presence of additional genes between the P and M genes in the WONV genome is a feature of several plant-infecting members of the rhabdovirus genera Cytorhabdovirus and Nucleorhabdovirus (Tanno et al. 2000; Revill et al. 2005; Dietzgen et al. 2006). These genera-specific sets of additional genes, as well as insertion events, suggest that there have been major genomic rearrangements during the evolutionary history of the Rhabdoviridae. In addition, that other members of the viral order Mononegavirales similarly contain additional genes in different positions at a genera-specific scale suggests that these rearrangements may have occurred commonly in these viruses. Single-Strand, Negative-Sense RNA Viruses Among members of the genus Ephemerovirus (family Rhabdoviridae), we detected a signal of gene duplication in the genomes of both Bovine ephemeral fever virus (BEFV) (evalue: 1 107, 23% identity, and a 38% positive match in a 389 amino acid region) and Kotonkan virus (e-value: 4 109, 23% identity, and a 39% positive match in a 324 amino acid region), with the presence of two consecutive and related glycoproteins, G and GNS (Walker et al. 1992; Blasdell et al. 2012). The related Adelaide river and Obodhiang viruses also possess a second glycoprotein, likewise inserted between G and L (Wang and Walker 1993; Blasdell et al. 2012), which suggest that the duplication event could have occurred in the common ancestor of these viruses. While our analysis failed to detect a significant sequence similarity in these viruses, reflecting a greater divergence between their glycoproteins, we found very strong sequence similarity between G and GNS of the (unclassified) rhabdovirus Ngaingan virus (e-value: 9 1025, 21% identity, and a 39% positive match in a 396 amino acid region). Finally, also in the Rhabdoviridae, we describe a gene duplication event in Wongabel virus (WONV; unassigned, although a member of the Hart Park group) (Gubala et al. 2010). Specifically, there is a significant signal for paralogy (e-value: 8 109, 26% identity, and a 45% positive match in a 145 amino acid region) between the U1 and U2 proteins, both of unknown function (fig. 2). This observation supports Reverse-Transcribing Viruses Our analysis detected several duplication events in the Retroviridae, all of which have been described previously. The oncogenic Walleye epidermal hyperplasia virus (WEHV) contains two tandemly linked accessory genes—orfA and orfB—which share some sequence similarity among each other and to human cyclin D1 (LaPierre et al. 1999). This led to the suggestion that these two genes arose by gene duplication following capture of a cellular cyclin (LaPierre et al. 1999). Our analysis marginally failed to validate a significant sequence similarity for WEHV-1 (e-value: 7 104, 26% identity, and a 45% positive match in a 90 amino acid region), likely due to the small length of the region involved. However, further analysis of the WEHV-2 genome revealed a significant signal for a potential gene duplication event (e-value: 8 109, 25% identity, and a 43% positive match in a 197 amino acid region) (Table 1). Similarly, we found a very strong signal of sequence similarity (e-value: 2 1022, 42% identity, and a 50% positive match in a 113 amino acid region) for tandemly repeated proteins of Xenopus laevis endogenous retrovirus Xen1 as described previously (Kambol et al. 2003) (Table 1). Interestingly, the mechanisms proposed for the acquisition of cellular genetic material rely, as for gene duplication, on RNA recombination. A 13196 nt 1 1 U1 N B U4 P 179 U2 192 U3 M G U5 L 41 113 ERDLLLMLKEEISKFPNYQKYSSIYKIGVGILLSKSKYDFVWPDKSYLISGITDIINFPNIQRCPWDPQEDRI E +LL+ +++E+ K + K S + GI LS S + P + + D + N I P U2 protein EVELLMHIRQEMKKNKEWTKSGSFMGLCAGIALSHS---MLVPTEGLRKRLVGDFMGVLNIPLVP-DQGTDYI 47 115 U1 protein 114 175 KIDTCGIWQGKRYNLSLNL----------YFSQADPRLGRPIWESWYSSFNSRPPFMRFEIETVSDYLGFGE D D I ++T YNL LN+ + + + + + I +WY++ RP ++ F++ TVS GF + U2 protein ILNTTS------YNLDLNMWSEIKLSYTFFVCRGNGNVTKRIDTTWYANQPDRPEYLTFDLLTVSVLYGFDD 116 181 U1 protein FIG. 2. (A) Genome organization of Wongabel virus (WONV). The putative duplicated genes are highlighted in black. (B) Amino acid alignment of the region of significant sequence similarity between the proteins U1 (179 amino acids) and U2 (192 amino acids). Identical amino acids are indicated between the sequences. + represents similar amino acid residues. 1266 Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044 MBE Another well-documented case of gene duplication, which we also observed here, was in a subset of the primate lentiviruses, notably Human immunodeficiency virus type 2 (HIV-2) and the related simian immunodeficiency viruses (SIV). All these viruses possess a viral protein R (vpr) in addition to the viral protein X (vpx) present in all lentiviruses (Tristem et al. 1992), which were detected as duplicate copies in both HIV-2 (e-value: 3 1058, 73% identity, and a 81% positive match in a 90 amino acid region) and some SIVs (e-value: 9 106, 29% identity, and a 44% positive match in a 91 amino acid matching region). These small accessory proteins accumulate in the nucleus of infected cells and appear to share similar functions (Fujita et al. 2010). Hence, vpr and vpx might have arisen by gene duplication, although this could also represent a horizontal gene transfer of vpr from an SIV group (Sharp et al. 1996). viruses to dissociate from their template (Simon-Loriere and Holmes 2011). Also of importance is the possibility that duplicate genes are generated by recombination with genetic material from a related organism, in a process similar to lateral gene transfer, rather than gene duplication. Indeed, this idea is compatible with the observation that the extent of sequence similarity is sometimes greater between a duplicated gene and a homologous copy in a related species than between the duplicated copies. An illustrative example is provided by the picornaviruses Ljungan virus (LV) and Duck hepatitis virus (DHV) (Johansson et al. 2002; Tseng et al. 2007). LV and DHV harbor two and three tandemly repeated copies of the 2A gene, respectively, with the extra copies being more closely related to different viral relatives, all of which harbor only one copy of 2A. In particular, LV-2A1 and DHV-2A1 are more similar to the 2A proteins of cardio-, erbo-, tescho-, and aphthoviruses, while LV-2A2 and DHV-2A3 appear more closely related to 2A protein of parechoviruses, kobuviruses, and Avian encephalomyelitis virus. However, it is equally likely that the viruses in question descended from a common ancestor where multiple copies of the capsid existed, which were lost later in the evolutionary history of other picornaviruses. While we observed very few cases of detectable gene duplication, it is highly likely that this process played a more important role in the early diversification of viral genomes, such that protein sequence similarity has been sufficiently eroded to return nonsignificant e-values in protein BLAST analyses. Indeed, ancient gene duplication is likely to explain at least some of the variation in genome size and structure observed among RNA viruses. For example, the VP1, VP2 and VP3 proteins of picorna-like viruses share a remarkably similar three-dimensional structure, strongly suggesting that they descended from a common ancestral protein, even though there is no longer a significant signature for relatedness at the level of amino acid sequence (Rossmann and Reuckert 1987; Liljas et al. 2002). Accordingly, analyses of protein structure are likely to be the only viable way to determine the occurrence of ancient gene duplication events in RNA viruses. Discussion Although containing a diverse array of genomic organizations, replication strategies, and infecting a huge array of hosts, the most striking result from our study is that gene duplication is extremely rare in the recent evolutionary history of RNA viruses, with only sporadic cases in a survey of 1198 virus species, with no cases detected in dsRNA viruses. Hence, gene duplication appears to occur far less frequently in RNA viruses than it does in all other domains of life, including DNA viruses. This is an intriguing observation, as those cases of gene duplication documented in RNA viruses all seem to involve the action of some form of either homologous or non-homologous recombination, a process that can occur in any RNA virus and which is relatively frequent in some (Simon-Loriere and Holmes 2011). Hence, the very low rate of gene duplication in RNA viruses likely reflects the strong selective constraints against increasing genome sizes (i.e., which increases mutational burden) rather than an absence of appropriate molecular mechanisms. Mechanistically, gene duplication in RNA viruses could occur as the consequence of an upstream relocation, midreplication, of the polymerase on a genomic template, in accord with the widely accepted “copy choice” model of RNA recombination (Lai 1992). However, this model posits that the reassociation of the polymerase on a template is guided by sequence homology with the nascent strand (Zhang and Temin 1994), which makes it highly unlikely that such an upstream relocation take places. This is further supported by the markedly lower frequency of non-homologous than homologous recombination in RNA viruses (Lai 1992). However, the presence of homologous regions at both ends of a gene could favor such an event. This idea has been advanced to support the glycoprotein duplication in BEFV, where the flanking regions of both genes exhibit strong sequence similarly (McWilliam et al. 1997). While homologous recombination is a relatively rare event in negativesense RNA viruses, likely due to the coating of the nucleic acids by a nucleoprotein that prevents homology guiding of the polymerase during a template switching event, the frequent generation of defective interfering particles demonstrates the propensity of the polymerases of this group of Materials and Methods The sequences of all complete viral reference genomes (as of March 2012) were retrieved from the National Center for Biotechnology Information website (http://www.ncbi.nlm. nih.gov/) (i.e., GenBank). This resulted in a data set of 1198 viral species and which are listed in the supplementary material, Supplementary Material online. For each viral species, the amino acid sequence of each individual protein was extracted and the sequence similarity to all other proteins of the same viral genome assessed using BLASTP (Altschul et al. 1997). Proteins were considered as homologous—and hence indicative of a duplication event—when the BLASTP search returned an e-value above an arbitrary cutoff e-value of 105. Because a cutoff e-value of 105 is relatively stringent, from which we can safely exclude false-positive results, our focus is necessarily on those gene duplication events that have occurred in the relatively recent past and where there is still a phylogenetic signal for relatedness. Indeed, any 1267 Simon-Loriere and Holmes . doi:10.1093/molbev/mst044 BLAST-based analysis is necessarily a compromise between eliminating false positives and missing divergent, but true, matches. Although this approach necessarily means that we are not able to detect gene duplications that occurred early in the evolutionary history of viruses, for which no phylogenetic signal will remain, it still allows us to compare rates of gene duplication relative to those of processes like nucleotide substitution in the recent past. In addition, this methodology necessarily did not allow us to obtain information on potential intra-protein domain duplications, nor those occurring in non-coding genomic regions. Supplementary Material Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments E.C.H. acknowledges support from an NHMRC Australia Fellowship. References Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. Blasdell KR, Voysey R, Bulach D, Joubert DA, Tesh RB, Boyle DB, Walker PJ. 2012. Kotonkan and Obodhiang viruses: African ephemeroviruses with large and complex genomes. Virology 425:143–153. Boyko VP, Karasev AV, Agranovsky AA, Koonin EV, Dolja VV. 1992. Coat protein gene duplication in a filamentous RNA virus of plants. Proc Natl Acad Sci U S A. 89:9156–9160. Cao D, Barro M, Hoshino Y. 2008. Porcine rotavirus bearing an aberrant gene stemming from an intergenic recombination of the NSP2 and NSP5 genes is defective and interfering. J Virol. 82:6073–6077. Cole ST, Danos O. 1987. Nucleotide sequence and comparative analysis of the human papillomavirus type 18 genome. Phylogeny of papillomaviruses and repeated structure of the E6 and E7 gene products. J Mol Biol. 193:599–608. Davison AJ, Benko M, Harrach B. 2003. Genetic content and evolution of adenoviruses. J Gen Virol. 84:2895–2908. de la Vega I, Viñuela E, Blasco E. 1990. Genetic variation and multigene families in African swine fever virus. Virology 179:234–246. Dietzgen RG, Callaghan B, Wetzel T, Dale JL. 2006. Completion of the genome sequence of Lettuce necrotic yellows virus, type species of the genus Cytorhabdovirus. Virus Res. 118:16–22. Dolja VV, Kreuze JF, Valkonen JP. 2006. Comparative and functional genomics of closteroviruses. Virus Res. 117:38–51. Falk MM, Sobrino F, Beck E. 1992. VPg gene amplification correlates with infective particle formation in foot-and-mouth disease virus. J Virol. 66:2251–2260. Fazeli CF, Rezaian MA. 2000. Nucleotide sequence and organization of ten open reading frames in the genome of grapevine leafroll-associated virus 1 and identification of three subgenomic RNAs. J Gen Virol. 81:605–615. Forss S, Schaller H. 1982. A tandem repeat gene in a picornavirus. Nucleic Acids Res. 10:6441–6450. Fujita M, Otsuka M, Nomaguchi M, Adachi A. 2010. Multifaceted activity of HIV Vpr/Vpx proteins: the current view of their virological functions. Rev Med Virol. 20:68–76. Gritsun TS, Gould EA. 2006. The 3’ untranslated region of tick-borne flaviviruses originated by the duplication of long repeat sequences within the open reading frame. Virology 354:217–223. Gubala A, Davis S, Weir R, Melville L, Cowled C, Walker P, Boyle D. 2010. Ngaingan virus, a macropod-associated rhabdovirus, contains a 1268 MBE second glycoprotein gene and seven novel open reading frames. Virology 399:98–108. Hoelzer K, Shackelton LA, Holmes EC, Parrish CR. 2008. Within-host genetic diversity of endemic and emerging parvoviruses of dogs and cats. J Virol. 82:11096–11105. Holmes EC. 2003. Error thresholds and the constraints to RNA virus evolution. Trends Microbiol. 11:543–546. Holmes EC. 2009. The evolution and emergence of RNA viruses. Oxford: Oxford University Press. Hughes JF, Coffin JM. 2001. Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution. Nat Genet. 29:487–489. Hurles M. 2004. Gene duplication: the genomic trade in spare parts. PLoS Biol. 2: E206. Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 11: 97–108. Johansson S, Niklasson B, Maizel J, Gorbalenya AE, Lindberg AM. 2002. Molecular analysis of three Ljungan virus isolates reveals a new, close-to-root lineage of the Picornaviridae with a cluster of two unrelated 2A proteins. J Virol. 76:8920–8930. Kambol R, Kabat P, Tristem M. 2003. Complete nucleotide sequence of an endogenous retrovirus from the amphibian, Xenopus laevis. Virology 311:1–6. Karasev AV, Boyko VP, Gowda S, et al. 1995. Complete sequence of the citrus tristeza virus RNA genome. Virology 208:511–520. King AM, Sangar DV, Harris TJ, Brown F. 1980. Heterogeneity of the genome-linked protein of foot-and-mouth disease virus. J Virol. 34: 627–634. Kreuze JF, Savenkov EI, Valkonen JPT. 2002. Complete genome sequence and analyses of the subgenomic RNAs of Sweet potato chlorotic stunt virus reveal several new features for the genus Crinivirus. J Virol. 76:9260–9270. Labudovic A, Perkins H, van Leeuwen B, Kerr P. 2004. Sequence mapping of the Californian MSW strain of Myxoma virus. Arch Virol. 149: 553–570. Lai MM. 1992. RNA recombination in animal and plant viruses. Microbiol Rev. 56:61–79. LaPierre LA, Holzschu DL, Bowser PR, Casey JW. 1999. Sequence and transcriptional analyses of the fish retroviruses walleye epidermal hyperplasia virus types 1 and 2: evidence for a gene duplication. J Virol. 73:9393–9403. Liljas L, Tate J, Lin T, Christian P, Johnson JE. 2002. Evolutionary and taxonomic implications of conserved structural motifs between picornaviruses and insect picorna-like viruses. Arch Virol. 147:59–84. McWilliam SM, Kongsuwan K, Cowley JA, Byrne KA, Walker PJ. 1997. Genome organization and transcription strategy in the complex GNS-L intergenic region of bovine ephemeral fever rhabdovirus. J Gen Virol. 78:1309–1317. Meyer A, Schartl M. 1999. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol. 11:699–704. Nagai M, Sakoda Y, Mori M, Hayashi M, Kida H, Akashi H. 2003. Insertion of cellular sequence and RNA recombination in the structural protein coding region of cytopathogenic bovine viral diarrhoea virus. J Gen Virol. 84:447–452. Napuli AJ, Alzhanova DV, Doneanu CE, Barofsky DF, Koonin EV, Dolja VV. 2003. The 64-kilodalton capsid protein homolog of Beet yellows virus is required for assembly of virion tails. J Virol. 77: 2377–2384. Ohta T. 1989. Role of gene duplication in evolution. Genome 31: 304–310. Palmenberg A, Neubauer D, Skern T. 2010. Genome organization and encoded proteins. In: Ehrenfeld E, Domingo E, Roos R, editors. The picornaviruses. Washington, DC: ASM Press. Panavas T, Panaviene Z, Pogany J, Nagy PD. 2003. Enhancement of RNA synthesis by promoter duplication in tombusviruses. Virology 310: 118–129. Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044 MBE Peng CW, Peremyslov VV, Mushegian AR, Dawson WO, Dolja VV. 2001. Functional specialization and evolution of leader proteinases in the family Closteroviridae. J Virol. 75:12153–12160. Reanney DC. 1982. The evolution of RNA viruses. Annu Rev Microbiol. 36:47–73. Revill P, Trinh X, Dale J, Harding R. 2005. Taro vein chlorosis virus: characterization and variability of a new nucleorhabdovirus. J Gen Virol. 86:491–499. Rossmann MG, Reuckert RR. 1987. What does the molecular structure of viruses tell us about viral functions? Microbiol Sci. 4: 206–214. Shackelton LA, Holmes EC. 2004. The evolution of large DNA viruses: combining genomic information of viruses and their hosts. Trends Microbiol. 12:458–465. Sharp PM, Bailes E, Stevenson M, Emerman M, Hahn BH. 1996. Gene acquisition by non-homologous recombination in HIV/SIV. Nature 383:586–587. Simon-Loriere E, Holmes EC. 2011. Why do RNA viruses recombine? Nat Rev Microbiol. 9:617–626. Soltis DE, Soltis PS. 1999. Polyploidy: recurrent formation and genome evolution. Trends Ecol Evol. 14:348–352. Tanno F, Nakatsu A, Toriyama S, Kojima M. 2000. Complete nucleotide sequence of Northern cereal mosaic virus and its genome organization. Arch Virol. 145:1373–1384. Tristem M, Marshall C, Karpas A, Hill F. 1992. Evolution of the primate lentiviruses: evidence from vpx and vpr. EMBO J. 11:3405–3412. Tristem M, Marshall C, Karpas A, Petrik J, Hill F. 1990. Origin of vpx in lentiviruses. Nature 347:341–342. Tseng CH, Knowles NJ, Tsai HJ. 2007. Molecular analysis of duck hepatitis virus type 1 indicates that it should be assigned to a new genus. Virus Res. 123:190–203. Tzanetakis IE, Martin RR. 2007. Strawberry chlorotic fleck: identification and characterization of a novel Closterovirus associated with the disease. Virus Res. 124:88–94. Tzanetakis IE, Postman JD, Martin RR. 2005. Characterization of a novel member of the family Closteroviridae from Mentha spp. Phytopathology 95:1043–1048. Valli A, Lopez-Moya JJ, Garcia JA. 2007. Recombination and gene duplication in the evolutionary diversification of P1 proteins in the family Potyviridae. J Gen Virol. 88:1016–1028. Wagner GP, Pavlicev M, Cheverud JM. 2007. The road to modularity. Nat Rev Genet. 8:921–931. Walker PJ, Byrne KA, Riding GA, Cowley JA, Wang Y, McWilliam S. 1992. The genome of bovine ephemeral fever rhabdovirus contains two related glycoprotein genes. Virology 191:49–61. Walker PJ, Dietzgen RG, Joubert DA, Blasdell KR. 2011. Rhabdovirus accessory genes. Virus Res. 162:110–125. Wang Y, Walker PJ. 1993. Adelaide river rhabdovirus expresses consecutive glycoprotein genes as polycistronic mRNAs: new evidence of gene duplication as an evolutionary process. Virology 195:719–731. Zhang J. 2003. Evolution by gene duplication: an update. Trends Ecol Evol. 18:292–298. Zhang J, Temin HM. 1994. Retrovirus recombination depends on the length of sequence identity and is not error prone. J Virol. 68: 2409–2414. Zhang J, Zhang YP, Rosenberg HF. 2002. Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey. Nat Genet. 30:411–415. Zlateva KT, Vijgen L, Dekeersmaeker N, Naranjo C, Van Ranst M. 2007. Subgroup prevalence and genotype circulation patterns of human respiratory syncytial virus in Belgium during ten successive epidemic seasons. J Clin Microbiol. 45:3022–3030. 1269