Download Gene Duplication Is Infrequent in the Recent Evolutionary History of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Gene Duplication Is Infrequent in the Recent Evolutionary
History of RNA Viruses
Etienne Simon-Loriere1,2 and Edward C. Holmes*,3,4
1
Institut Pasteur, Unité de Génétique Fonctionnelle des Maladies Infectieuses, Paris, France
Centre National de la Recherche Scientifique, URA CNRS3012, Paris, France
3
Sydney Emerging Infections and Biosecurity Institute, School of Biological Sciences and Sydney Medical School, The University of
Sydney, Sydney, NSW, Australia
4
Fogarty International Center, National Institutes of Health, Bethesda, Maryland
*Corresponding author: E-mail: [email protected].
Associate editor: James McInerney
2
Abstract
Gene duplication generates genetic novelty and redundancy and is a major mechanism of evolutionary change in bacteria
and eukaryotes. To date, however, gene duplication has been reported only rarely in RNA viruses. Using a conservative
BLAST approach we systematically screened for the presence of duplicated (i.e., paralogous) proteins in all RNA viruses
for which full genome sequences are publicly available. Strikingly, we found only nine significantly supported cases of
gene duplication, two of which are newly described here—in the 25 and 26 kDa proteins of Beet necrotic yellow vein virus
(genus Benyvirus) and in the U1 and U2 proteins of Wongabel virus (family Rhabdoviridae). Hence, gene duplication has
occurred at a far lower frequency in the recent evolutionary history of RNA viruses than in other organisms. Although the
rapidity of RNA virus evolution means that older gene duplication events will be difficult to detect through sequencebased analyses alone, it is likely that specific features of RNA virus biology, and particularly intrinsic constraints on
genome size, reduce the likelihood of the fixation and maintenance of duplicated genes.
Key words: RNA virus, gene duplication, genome size, genetic redundancy.
Introduction
Mol. Biol. Evol. 30(6):1263–1269 doi:10.1093/molbev/mst044 Advance Access publication March 13, 2013
1263
Fast Track
ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Article
Gene duplication is central to the development of organismal
complexity. Gene duplication provides important evolutionary opportunities through the creation of new genetic material (Ohta 1989; Zhang 2003; Hurles 2004; Innan and
Kondrashov 2010) and has been linked to many aspects of
genome evolution (Wagner et al. 2007) and species diversification (Zhang et al. 2002; Zhang 2003). As gene duplication is
a potent way to create new biological function, it is not surprising that it occurs frequently in many organisms and sometimes as duplications of complete genomes (Meyer and
Schartl 1999; Soltis and Soltis 1999). In many species, particularly large eukaryotes, gene duplication also leads to genetic
redundancy, such that many paralogous gene copies have no
apparent function. Surveys of gene duplication in representative genomes from different domains of life indicate that
paralogous genes (and which may form multigene families)
are a common occurrence, representing as much as 40–65%
of the total number of genes (Zhang 2003). Although several
evolutionary models have been developed to explain how
duplicated genes can be fixed and maintained in genomes
(Innan and Kondrashov 2010), mechanistically gene duplication can result from a variety of processes, including unequal
crossing over and retroposition, although always related to a
form of recombination.
Despite the evolutionary importance of gene duplication,
far less is known about this process in viruses, particularly
those with RNA genomes. To date, gene duplication has
been described relatively frequently in large DNA viruses, in
which multigene families are a relatively common occurrence
(Shackelton and Holmes 2004). Among the many examples
are the numerous multigene families in African swine fever
virus (de la Vega et al. 1990), the multiple cases of duplication
in the E4 region of mastadenoviruses (Davison et al. 2003),
and the terminal inverted repeats in myxoma viruses
(Labudovic et al. 2004). Similarly, gene duplication has been
reported in a number of small DNA viruses, such as the
Papillomaviridae (Cole and Danos 1987) and Parvoviridae
(Hoelzer et al. 2008). In contrast, gene duplication has to
date only been relatively rarely documented in RNA viruses
and reflected in the marked lack of multigene families (and
genetic redundancy) compared with other organisms
(Holmes 2009). In particular, there are few reported cases in
which gene duplication has resulted in two complete open
reading frames within a viral genome, and which may be
tandemly repeated (Forss and Schaller 1982; Tristem et al.
1990; Boyko et al. 1992; Walker et al. 1992; Wang and
Walker 1993; Karasev et al. 1995; LaPierre et al. 1999; Peng
et al. 2001; Valli et al. 2007; Walker et al. 2011; and see Results).
Other duplication events in RNA viruses involve short sequence duplications in untranslated regions (Panavas et al.
2003; Gritsun and Gould 2006) and short intragenic regions
(Nagai et al. 2003; Zlateva et al. 2007; Cao et al. 2008). Such a
low frequency of gene duplication is especially striking given
MBE
Simon-Loriere and Holmes . doi:10.1093/molbev/mst044
that endogenous retroviruses have been associated with gene
duplication events in their hosts (Hughes and Coffin 2001),
suggesting that gene duplication is mechanistically possible in
RNA viruses.
Although it is likely that some gene duplication events
in RNA viruses will be difficult, if not impossible, to recover
through gene sequence analysis because of their high levels
of divergence, it may also be that these organisms experience intrinsic evolutionary constraints against gene duplication (Holmes 2009). In particular, it has been suggested
that there is a cap on the maximum size that can be
attained by an RNA virus genome, which is set by their
extremely high mutation rates (approximately one mutation per genome replication). Accordingly, both gene duplication and lateral gene transfer are expected to be rare
in RNA viruses, as any increase in genome size is likely to
increase the burden of deleterious mutations and hence
reduce fitness (Holmes 2003, 2009). An increase in viral
genome size would also result in longer replication times,
which could be selectively disadvantageous, and constraints
associated with unwinding long regions of dsRNA may
similarly limit genome size (Reanney 1982), as could
those imposed by limits to capsid size and shape. Finally,
it may be that RNA viruses are better able to create evolutionary novelty through a combination of frequent mutation and large population sizes. Indeed, a similar rationale
has been invoked to explain the low rates of recombination observed in many RNA viruses (Simon-Loriere and
Holmes 2011).
To better understand the causes and consequences of
gene duplication, as well as the determinants of this process,
it is essential to assess the frequency with which it occurs in
RNA viruses. To this end we performed a comprehensive
survey of the occurrence of gene duplication in all publicly
available families of RNA viruses.
Results
We employed a BLAST approach to analyze gene duplication
events in 1198 virus species, comprising 774 single-strand (ss),
positive-sense RNA viruses, 155 single-strand, negative-sense
RNA viruses, 119 reverse-transcribing viruses, and 150 doublestrand (ds) viruses. Despite the size of the data set analyzed,
we detected only nine statistically supported cases (i.e., at a
protein BLAST e-value of <105) of gene duplication, although a number of other viral genes exhibited nearly significant matches. In addition, all but one of these duplicate
genes are located adjacent to each other in the viral
genome. Hence, it is clear that gene duplication is a rare
and highly sporadic event in recent RNA virus evolution.
Table 1 summarizes the species, proteins, and BLAST
e-values obtained. The rarity of this process precluded any
meaningful comparison of the relative frequency of gene duplication by taxonomic group, although some viral families
such as the Rhabdoviridae (Walker et al. 2011) contain multiple occurrences, while no duplicate genes were observed in
the dsRNA viruses analyzed. We now describe each case of
gene duplication in turn.
1264
Single-Strand, Positive-Sense RNA Viruses
We describe, for the first time, a potential gene duplication
event in Beet necrotic yellow vein virus (BNYVV; genus
Benyvirus). Notably, some BNYVV isolates contain five instead
of four RNA segments, and our analysis indicates that the
26 kDa protein encoded by the fifth RNA segment exhibits
strong sequence similarities to the 25 kDa protein encoded by
the third segment (e-value: 4 1010, 22% sequence identity,
and a 43% positive match in a 217 amino acid region) (fig. 1).
Because these are multicomponent viruses it is likely that this
particular case of gene duplication occurred through a form
of segmental reassortment, rather than intrasegment recombination. Indeed, we suggest that the transmission of an additional copy of segment 3, from the same or a homologous
virus, and the subsequent functional differentiation from the
original p25 protein (or vice versa) produced this particular
genomic organization.
We also found several cases of duplication of the coat
protein (CP) in the Closteroviridae, a family of plant RNA
viruses that possess genomes up to 20 kb in length, and
which form flexuous, filamentous virions. Specifically, all
members of the Closteroviridae possess a minor coat protein
(CPm) that is located adjacent to the CP in a 50 or 30 location.
This duplication event was previously described in two members of the genus Closterovirus—Beet yellow virus (BYV) and
Citrus tristeza virus (Boyko et al. 1992). The signal for gene
duplication (i.e., gene paralogy) was found to be statistically
significant in 9 of the 33 species of Closteriviridae studied here,
and scattered among the three genera (Ampelovirus,
Clostrovirus, Crinivirus), with e-values ranging from 1 106
to 6 1024 (Table 1).
In addition, another homolog of CP (CPh) is present in
all criniviruses and, based on the identification of two conserved Arg and Asp residues, the C-terminal domain of a
64 kDa protein expressed by the closteroviruses has been
shown to be homologous to CP in BYV (Napuli et al.
2003). The corresponding protein in the Ampelovirus
genome (55 kDa protein) exhibits some sequence similarity
to the 64 kDa protein of the closteroviruses and hence is
also likely to be a distant homolog of CP. However, this
putative gene duplication event was (marginally) not significant in our analysis, with the highest value found for
the Closterovirus mint virus 1 (e-value: 9 104, 27% identity, and a 42% positive match in a 81 amino acid region).
Finally, and uniquely among the Costeroviridae, Grapevine
leafroll-associated virus 1 possesses two copies of the CPm
protein (Fazeli and Rezaian 2000), which also exhibit strong
sequence similarity (e-value: 7 1022, 38% identity, and a
58% positive match in a 125 amino acid region). The presence of multiple homologs of the CP among all species of
the Closteroviridae is suggestive of an ancient duplication
event, or series of events, that occurred prior to the diversification into the three current viral genera. Interestingly,
this viral family possess an unusual range of different
genome lengths and organizations, including mono-, bi-,
and tripartite genomes (Dolja et al. 2006), again indicative
of a history of major genomic events.
MBE
Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044
Table 1. Duplicated Genes in RNA virus Genomes.
Genome Organization
(+)ssRNA
()ssRNA
Reverse-transcribing
viruses
Family
Genus
Closterovirus
Closterovirus
Closterovirus
Closterovirus
Crinivirus
Closteroviridae
Ampelovirus
Ampelovirus
Ampelovirus
Ampelovirus
Ampelovirus
Picornaviridae Aphthovirus
NA
Benyvirus
Virus
CTV
BYV
SCF-AV
Mint virus 1
SPCSV
LCV 2
GLRaV 3
GLRaV 12
GLRaV 1
GLRaV 1
FMDV
BNYVV
Ephemerovirus
Ephemerovirus
Rhabdoviridae Unassigned
Unassigned
BEFV
Kotonkan virus
Ngaingan virus
Wongabel virus
Retroviridae
Epsilonretrovirus
Unclassified
Lentivirus
Lentivirus
WEHV-2
Xen1
HIV-2
SIV-MND-2
Gene Duplicated
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CP ! CPm
CPm1 ! CPm2
Vpg ! Vpg
p25 ! p26
Position
Blast P Value
Adjacent
2 1013
Adjacent
3 1010
Adjacent
1 106
Adjacent
2 109
Adjacent
4 106
Adjacent
6 1024
Adjacent
1 1010
Adjacent
5 1010
Adjacent
7 1022
Adjacent
1 1023
Adjacent
2 106
On different
4 1010
segments
Publication
Boyko et al. (1992)
Boyko et al. (1992)
Tzanetakis et al. (2007)
Tzanetakis et al. (2005)
Kreuze et al. (2002)
This study
This study
This study
Fazeli et al. (2000)
Fazeli et al. (2000)
Forss and Schaller (1982)
This study
G
G
G
U1
!
!
!
!
Gns
Gns
Gns
U2
Adjacent
Adjacent
Adjacent
Adjacent
1 107
4 109
9 1025
8 109
Walker et al. (1992)
Blasdell et al. (2012)
Gubala et al. (2010)
This study
orf A
orf 1
vpr
vpr
!
!
!
!
orf B
orf 2
vpx
vpx
Adjacent
Adjacent
Adjacent
Adjacent
8 109
2 1022
9 106
2 109
LaPierre et al. (1999)
Kambol et al. (2003)
Tristem et al. (1990)
Tristem et al. (1990)
Note.—(+)ssRNA, single-strand, positive-sense RNA viruses; (-)ssRNA, single-strand, negative-sense RNA viruses; CTV, Citrus tristeza virus; BYV, Beet yellow virus; SCF-AV,
Strawberry chlorotic fleck-associated virus; MV 1, Mint virus 1; SPCSV, Sweet potato chlorotic stunt virus; LCV, Little cherry virus 2; GLRaV 3, Grapevine leafroll-associated virus
3; GLRaV 12, Grapevine leafroll-associated virus 12; GLRaV 1, Grapevine leafroll-associated virus 1; FMDV, Foot-and-mouth disease virus; BNYVV, Beet necrotic yellow vein virus; BEFV,
Bovine ephemeral fever virus; WEHV-2, Walleye epidermal hyperplasia virus type 2; Xen1, Xenopus laevis endogenous retrovirus Xen1; HIV-2, Human immunodeficiency virus 2;
SIV-MND-2, Simian immunodeficiency virus - mnd 2.
A
RNA-1, 6746 nt
RNA-2, 4612 nt
RNA-3, 1774 nt
25kDa protein
219
1
RNA-4, 1431 nt
RNA-5, 1320 nt
26kDa protein
1
B
25kDa protein
26kDa protein
25kDa protein
26kDa protein
232
7
113
AVYDLGHRPYLARRTVYEDRLTLSTHGNICRAINLLTHDNRT--SLVYHNNTKRIRFRGLLCSYHGPYCGFRALCRVMLCSLPRLCDIPINGSRDFVADPTRLDSSVNE
A D H PY +R+ +
+
G IC + + +DN
+ +YH
K +RF
+ + +
F
R ++
P +
+G
+
++S ++
AYSDDNHLPYYIQRSTHHVVRDVDYTGFICYPLQVDLNDNVEVGADIYHMKIKTMRFNVDIYN-NDVATKFPGWVRFIVFCTPPVSSWVNDGCSSLFSPFVGVNSFIDP
13
121
114
217
LLVS---NGLVIHYDRVHNVPIHTDGFEVVDFTTVFRGPGNFLLPNATNFPRSTTTDQVYMVCLVNTV-NCVLRLESELTVWVHSGLYAGDVLDVDNNVIQAPDGVDD
L+
+G+ + +D ++ + H + F
F
FRGPGN+ L +
+ +T D +Y+ C+ +
+
L+S+
WVH
+
VL+
+
PD +D
KLLKRDGHGITVLHDGIYCL-CHQEHF-TRSFEFNFRGPGNYTLTSDVCWSPATNVDSIYVACVASWCGDSAFMLQSDSVSWVHKRFWQRPVLEFGQCLDDLPDHDND
122
227
FIG. 1. (A) Genome organization of Beet necrotic yellow vein virus (BNYVV). The segments involved in the potential gene duplication event are
highlighted in black. (B) Amino acid alignment of the region of homology between the proteins encoded by segments 3 and 5. Identical amino acids are
indicated between the sequences. + indicates similar residues.
The Picornaviridae encode 3BVPg, a protein covalently attached to the 50 end of all virion RNAs. Uniquely among
viruses of this family, Foot-and-mouth disease virus (FMDV)
encodes three sequential paralogous genes for 3BVPg (King
et al. 1980; Forss and Schaller 1982). Our analysis detected
the second 3BVPg copy as a duplicate (e-value: 2 106, 75%
identity, and a 95% positive match in a 20 amino acid region),
but failed to detect the third copy at a significant value
(e-value: 0.02, 56% identity, and 78% positive match in a 18
amino acid region), likely due to its very small length and
1265
MBE
Simon-Loriere and Holmes . doi:10.1093/molbev/mst044
greater divergence. As there is no known specialized function
for the supplementary copies of VPg, it has been suggested
that having multiple copies of VPg is advantageous because it
results in increased protein synthesis (Forss and Schaller 1982;
Falk et al. 1992). While other gene duplication events have
been proposed to have occurred during the evolution of the
Picornaviridae, these were not detected as significant in our
analysis. For example, the general correspondence in protein
structures between the two proteases of enteroviruses—
2Apro and 3Cpro—has led to the idea that they are duplicate
copies (Palmenberg et al. 2010; see Discussion).
previous suggestions of gene duplication events in this region
(Walker et al. 2011). The presence of additional genes between the P and M genes in the WONV genome is a feature
of several plant-infecting members of the rhabdovirus genera
Cytorhabdovirus and Nucleorhabdovirus (Tanno et al. 2000;
Revill et al. 2005; Dietzgen et al. 2006). These genera-specific
sets of additional genes, as well as insertion events, suggest
that there have been major genomic rearrangements during
the evolutionary history of the Rhabdoviridae. In addition,
that other members of the viral order Mononegavirales similarly contain additional genes in different positions at a genera-specific scale suggests that these rearrangements may
have occurred commonly in these viruses.
Single-Strand, Negative-Sense RNA Viruses
Among members of the genus Ephemerovirus (family
Rhabdoviridae), we detected a signal of gene duplication in
the genomes of both Bovine ephemeral fever virus (BEFV) (evalue: 1 107, 23% identity, and a 38% positive match in a
389 amino acid region) and Kotonkan virus (e-value: 4 109,
23% identity, and a 39% positive match in a 324 amino acid
region), with the presence of two consecutive and related
glycoproteins, G and GNS (Walker et al. 1992; Blasdell et al.
2012). The related Adelaide river and Obodhiang viruses also
possess a second glycoprotein, likewise inserted between G
and L (Wang and Walker 1993; Blasdell et al. 2012), which
suggest that the duplication event could have occurred in the
common ancestor of these viruses. While our analysis failed to
detect a significant sequence similarity in these viruses, reflecting a greater divergence between their glycoproteins, we
found very strong sequence similarity between G and GNS of
the (unclassified) rhabdovirus Ngaingan virus (e-value:
9 1025, 21% identity, and a 39% positive match in a 396
amino acid region).
Finally, also in the Rhabdoviridae, we describe a gene duplication event in Wongabel virus (WONV; unassigned, although a member of the Hart Park group) (Gubala et al.
2010). Specifically, there is a significant signal for paralogy
(e-value: 8 109, 26% identity, and a 45% positive match
in a 145 amino acid region) between the U1 and U2 proteins,
both of unknown function (fig. 2). This observation supports
Reverse-Transcribing Viruses
Our analysis detected several duplication events in the
Retroviridae, all of which have been described previously.
The oncogenic Walleye epidermal hyperplasia virus (WEHV)
contains two tandemly linked accessory genes—orfA and
orfB—which share some sequence similarity among each
other and to human cyclin D1 (LaPierre et al. 1999). This
led to the suggestion that these two genes arose by gene
duplication following capture of a cellular cyclin (LaPierre
et al. 1999). Our analysis marginally failed to validate a significant sequence similarity for WEHV-1 (e-value: 7 104, 26%
identity, and a 45% positive match in a 90 amino acid region),
likely due to the small length of the region involved. However,
further analysis of the WEHV-2 genome revealed a significant
signal for a potential gene duplication event (e-value:
8 109, 25% identity, and a 43% positive match in a 197
amino acid region) (Table 1). Similarly, we found a very strong
signal of sequence similarity (e-value: 2 1022, 42% identity,
and a 50% positive match in a 113 amino acid region) for
tandemly repeated proteins of Xenopus laevis endogenous retrovirus Xen1 as described previously (Kambol et al. 2003)
(Table 1). Interestingly, the mechanisms proposed for the
acquisition of cellular genetic material rely, as for gene duplication, on RNA recombination.
A
13196 nt
1
1
U1
N
B
U4
P
179
U2
192
U3
M
G
U5
L
41
113
ERDLLLMLKEEISKFPNYQKYSSIYKIGVGILLSKSKYDFVWPDKSYLISGITDIINFPNIQRCPWDPQEDRI
E +LL+ +++E+ K
+ K S
+ GI LS S
+ P +
+ D +
N
I
P
U2 protein EVELLMHIRQEMKKNKEWTKSGSFMGLCAGIALSHS---MLVPTEGLRKRLVGDFMGVLNIPLVP-DQGTDYI
47
115
U1 protein
114
175
KIDTCGIWQGKRYNLSLNL----------YFSQADPRLGRPIWESWYSSFNSRPPFMRFEIETVSDYLGFGE
D
D I ++T YNL LN+
+ + + + + I +WY++
RP ++ F++ TVS
GF +
U2 protein ILNTTS------YNLDLNMWSEIKLSYTFFVCRGNGNVTKRIDTTWYANQPDRPEYLTFDLLTVSVLYGFDD
116
181
U1 protein
FIG. 2. (A) Genome organization of Wongabel virus (WONV). The putative duplicated genes are highlighted in black. (B) Amino acid alignment of the
region of significant sequence similarity between the proteins U1 (179 amino acids) and U2 (192 amino acids). Identical amino acids are indicated
between the sequences. + represents similar amino acid residues.
1266
Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044
MBE
Another well-documented case of gene duplication, which
we also observed here, was in a subset of the primate lentiviruses, notably Human immunodeficiency virus type 2 (HIV-2)
and the related simian immunodeficiency viruses (SIV). All
these viruses possess a viral protein R (vpr) in addition to the
viral protein X (vpx) present in all lentiviruses (Tristem et al.
1992), which were detected as duplicate copies in both HIV-2
(e-value: 3 1058, 73% identity, and a 81% positive match in
a 90 amino acid region) and some SIVs (e-value: 9 106,
29% identity, and a 44% positive match in a 91 amino acid
matching region). These small accessory proteins accumulate
in the nucleus of infected cells and appear to share similar
functions (Fujita et al. 2010). Hence, vpr and vpx might have
arisen by gene duplication, although this could also represent
a horizontal gene transfer of vpr from an SIV group (Sharp
et al. 1996).
viruses to dissociate from their template (Simon-Loriere and
Holmes 2011).
Also of importance is the possibility that duplicate genes
are generated by recombination with genetic material from a
related organism, in a process similar to lateral gene transfer,
rather than gene duplication. Indeed, this idea is compatible
with the observation that the extent of sequence similarity is
sometimes greater between a duplicated gene and a homologous copy in a related species than between the duplicated
copies. An illustrative example is provided by the picornaviruses Ljungan virus (LV) and Duck hepatitis virus (DHV)
(Johansson et al. 2002; Tseng et al. 2007). LV and DHV
harbor two and three tandemly repeated copies of the 2A
gene, respectively, with the extra copies being more closely
related to different viral relatives, all of which harbor only one
copy of 2A. In particular, LV-2A1 and DHV-2A1 are more
similar to the 2A proteins of cardio-, erbo-, tescho-, and
aphthoviruses, while LV-2A2 and DHV-2A3 appear more closely related to 2A protein of parechoviruses, kobuviruses, and
Avian encephalomyelitis virus. However, it is equally likely that
the viruses in question descended from a common ancestor
where multiple copies of the capsid existed, which were lost
later in the evolutionary history of other picornaviruses.
While we observed very few cases of detectable gene duplication, it is highly likely that this process played a more
important role in the early diversification of viral genomes,
such that protein sequence similarity has been sufficiently
eroded to return nonsignificant e-values in protein BLAST
analyses. Indeed, ancient gene duplication is likely to explain
at least some of the variation in genome size and structure
observed among RNA viruses. For example, the VP1, VP2 and
VP3 proteins of picorna-like viruses share a remarkably similar
three-dimensional structure, strongly suggesting that they descended from a common ancestral protein, even though
there is no longer a significant signature for relatedness at
the level of amino acid sequence (Rossmann and Reuckert
1987; Liljas et al. 2002). Accordingly, analyses of protein structure are likely to be the only viable way to determine the
occurrence of ancient gene duplication events in RNA viruses.
Discussion
Although containing a diverse array of genomic organizations,
replication strategies, and infecting a huge array of hosts, the
most striking result from our study is that gene duplication is
extremely rare in the recent evolutionary history of RNA viruses, with only sporadic cases in a survey of 1198 virus species, with no cases detected in dsRNA viruses. Hence, gene
duplication appears to occur far less frequently in RNA viruses
than it does in all other domains of life, including DNA viruses.
This is an intriguing observation, as those cases of gene duplication documented in RNA viruses all seem to involve the
action of some form of either homologous or non-homologous recombination, a process that can occur in any RNA
virus and which is relatively frequent in some (Simon-Loriere
and Holmes 2011). Hence, the very low rate of gene duplication in RNA viruses likely reflects the strong selective constraints against increasing genome sizes (i.e., which increases
mutational burden) rather than an absence of appropriate
molecular mechanisms.
Mechanistically, gene duplication in RNA viruses could
occur as the consequence of an upstream relocation, midreplication, of the polymerase on a genomic template, in
accord with the widely accepted “copy choice” model of
RNA recombination (Lai 1992). However, this model posits
that the reassociation of the polymerase on a template is
guided by sequence homology with the nascent strand
(Zhang and Temin 1994), which makes it highly unlikely
that such an upstream relocation take places. This is further
supported by the markedly lower frequency of non-homologous than homologous recombination in RNA viruses (Lai
1992). However, the presence of homologous regions at
both ends of a gene could favor such an event. This idea
has been advanced to support the glycoprotein duplication
in BEFV, where the flanking regions of both genes exhibit
strong sequence similarly (McWilliam et al. 1997). While homologous recombination is a relatively rare event in negativesense RNA viruses, likely due to the coating of the nucleic
acids by a nucleoprotein that prevents homology guiding of
the polymerase during a template switching event, the frequent generation of defective interfering particles demonstrates the propensity of the polymerases of this group of
Materials and Methods
The sequences of all complete viral reference genomes (as of
March 2012) were retrieved from the National Center for
Biotechnology Information website (http://www.ncbi.nlm.
nih.gov/) (i.e., GenBank). This resulted in a data set of 1198
viral species and which are listed in the supplementary material, Supplementary Material online. For each viral species,
the amino acid sequence of each individual protein was extracted and the sequence similarity to all other proteins of the
same viral genome assessed using BLASTP (Altschul et al.
1997). Proteins were considered as homologous—and
hence indicative of a duplication event—when the BLASTP
search returned an e-value above an arbitrary cutoff e-value of
105. Because a cutoff e-value of 105 is relatively stringent,
from which we can safely exclude false-positive results, our
focus is necessarily on those gene duplication events that
have occurred in the relatively recent past and where there
is still a phylogenetic signal for relatedness. Indeed, any
1267
Simon-Loriere and Holmes . doi:10.1093/molbev/mst044
BLAST-based analysis is necessarily a compromise between
eliminating false positives and missing divergent, but true,
matches. Although this approach necessarily means that we
are not able to detect gene duplications that occurred early in
the evolutionary history of viruses, for which no phylogenetic
signal will remain, it still allows us to compare rates of gene
duplication relative to those of processes like nucleotide substitution in the recent past. In addition, this methodology
necessarily did not allow us to obtain information on potential intra-protein domain duplications, nor those occurring in
non-coding genomic regions.
Supplementary Material
Supplementary material is available at Molecular Biology and
Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
E.C.H. acknowledges support from an NHMRC Australia
Fellowship.
References
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res. 25:
3389–3402.
Blasdell KR, Voysey R, Bulach D, Joubert DA, Tesh RB, Boyle DB, Walker
PJ. 2012. Kotonkan and Obodhiang viruses: African ephemeroviruses
with large and complex genomes. Virology 425:143–153.
Boyko VP, Karasev AV, Agranovsky AA, Koonin EV, Dolja VV. 1992. Coat
protein gene duplication in a filamentous RNA virus of plants. Proc
Natl Acad Sci U S A. 89:9156–9160.
Cao D, Barro M, Hoshino Y. 2008. Porcine rotavirus bearing an aberrant
gene stemming from an intergenic recombination of the NSP2 and
NSP5 genes is defective and interfering. J Virol. 82:6073–6077.
Cole ST, Danos O. 1987. Nucleotide sequence and comparative analysis
of the human papillomavirus type 18 genome. Phylogeny of papillomaviruses and repeated structure of the E6 and E7 gene products.
J Mol Biol. 193:599–608.
Davison AJ, Benko M, Harrach B. 2003. Genetic content and evolution of
adenoviruses. J Gen Virol. 84:2895–2908.
de la Vega I, Viñuela E, Blasco E. 1990. Genetic variation and multigene
families in African swine fever virus. Virology 179:234–246.
Dietzgen RG, Callaghan B, Wetzel T, Dale JL. 2006. Completion of the
genome sequence of Lettuce necrotic yellows virus, type species of
the genus Cytorhabdovirus. Virus Res. 118:16–22.
Dolja VV, Kreuze JF, Valkonen JP. 2006. Comparative and functional
genomics of closteroviruses. Virus Res. 117:38–51.
Falk MM, Sobrino F, Beck E. 1992. VPg gene amplification correlates with
infective particle formation in foot-and-mouth disease virus. J Virol.
66:2251–2260.
Fazeli CF, Rezaian MA. 2000. Nucleotide sequence and organization of
ten open reading frames in the genome of grapevine leafroll-associated virus 1 and identification of three subgenomic RNAs. J Gen
Virol. 81:605–615.
Forss S, Schaller H. 1982. A tandem repeat gene in a picornavirus. Nucleic
Acids Res. 10:6441–6450.
Fujita M, Otsuka M, Nomaguchi M, Adachi A. 2010. Multifaceted activity of HIV Vpr/Vpx proteins: the current view of their virological
functions. Rev Med Virol. 20:68–76.
Gritsun TS, Gould EA. 2006. The 3’ untranslated region of tick-borne
flaviviruses originated by the duplication of long repeat sequences
within the open reading frame. Virology 354:217–223.
Gubala A, Davis S, Weir R, Melville L, Cowled C, Walker P, Boyle D. 2010.
Ngaingan virus, a macropod-associated rhabdovirus, contains a
1268
MBE
second glycoprotein gene and seven novel open reading frames.
Virology 399:98–108.
Hoelzer K, Shackelton LA, Holmes EC, Parrish CR. 2008. Within-host
genetic diversity of endemic and emerging parvoviruses of dogs
and cats. J Virol. 82:11096–11105.
Holmes EC. 2003. Error thresholds and the constraints to RNA virus
evolution. Trends Microbiol. 11:543–546.
Holmes EC. 2009. The evolution and emergence of RNA viruses. Oxford:
Oxford University Press.
Hughes JF, Coffin JM. 2001. Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution. Nat Genet. 29:487–489.
Hurles M. 2004. Gene duplication: the genomic trade in spare parts. PLoS
Biol. 2: E206.
Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 11:
97–108.
Johansson S, Niklasson B, Maizel J, Gorbalenya AE, Lindberg AM. 2002.
Molecular analysis of three Ljungan virus isolates reveals a new,
close-to-root lineage of the Picornaviridae with a cluster of two
unrelated 2A proteins. J Virol. 76:8920–8930.
Kambol R, Kabat P, Tristem M. 2003. Complete nucleotide sequence of
an endogenous retrovirus from the amphibian, Xenopus laevis.
Virology 311:1–6.
Karasev AV, Boyko VP, Gowda S, et al. 1995. Complete sequence of the
citrus tristeza virus RNA genome. Virology 208:511–520.
King AM, Sangar DV, Harris TJ, Brown F. 1980. Heterogeneity of the
genome-linked protein of foot-and-mouth disease virus. J Virol. 34:
627–634.
Kreuze JF, Savenkov EI, Valkonen JPT. 2002. Complete genome sequence
and analyses of the subgenomic RNAs of Sweet potato chlorotic
stunt virus reveal several new features for the genus Crinivirus.
J Virol. 76:9260–9270.
Labudovic A, Perkins H, van Leeuwen B, Kerr P. 2004. Sequence mapping
of the Californian MSW strain of Myxoma virus. Arch Virol. 149:
553–570.
Lai MM. 1992. RNA recombination in animal and plant viruses.
Microbiol Rev. 56:61–79.
LaPierre LA, Holzschu DL, Bowser PR, Casey JW. 1999. Sequence and
transcriptional analyses of the fish retroviruses walleye epidermal
hyperplasia virus types 1 and 2: evidence for a gene duplication.
J Virol. 73:9393–9403.
Liljas L, Tate J, Lin T, Christian P, Johnson JE. 2002. Evolutionary
and taxonomic implications of conserved structural motifs between picornaviruses and insect picorna-like viruses. Arch Virol.
147:59–84.
McWilliam SM, Kongsuwan K, Cowley JA, Byrne KA, Walker PJ. 1997.
Genome organization and transcription strategy in the complex
GNS-L intergenic region of bovine ephemeral fever rhabdovirus.
J Gen Virol. 78:1309–1317.
Meyer A, Schartl M. 1999. Gene and genome duplications in vertebrates:
the one-to-four (-to-eight in fish) rule and the evolution of novel
gene functions. Curr Opin Cell Biol. 11:699–704.
Nagai M, Sakoda Y, Mori M, Hayashi M, Kida H, Akashi H. 2003.
Insertion of cellular sequence and RNA recombination in the structural protein coding region of cytopathogenic bovine viral diarrhoea
virus. J Gen Virol. 84:447–452.
Napuli AJ, Alzhanova DV, Doneanu CE, Barofsky DF, Koonin EV, Dolja
VV. 2003. The 64-kilodalton capsid protein homolog of Beet yellows virus is required for assembly of virion tails. J Virol. 77:
2377–2384.
Ohta T. 1989. Role of gene duplication in evolution. Genome 31:
304–310.
Palmenberg A, Neubauer D, Skern T. 2010. Genome organization and
encoded proteins. In: Ehrenfeld E, Domingo E, Roos R, editors. The
picornaviruses. Washington, DC: ASM Press.
Panavas T, Panaviene Z, Pogany J, Nagy PD. 2003. Enhancement of RNA
synthesis by promoter duplication in tombusviruses. Virology 310:
118–129.
Gene Duplication in RNA Viruses . doi:10.1093/molbev/mst044
MBE
Peng CW, Peremyslov VV, Mushegian AR, Dawson WO, Dolja VV. 2001.
Functional specialization and evolution of leader proteinases in the
family Closteroviridae. J Virol. 75:12153–12160.
Reanney DC. 1982. The evolution of RNA viruses. Annu Rev Microbiol.
36:47–73.
Revill P, Trinh X, Dale J, Harding R. 2005. Taro vein chlorosis virus:
characterization and variability of a new nucleorhabdovirus. J Gen
Virol. 86:491–499.
Rossmann MG, Reuckert RR. 1987. What does the molecular
structure of viruses tell us about viral functions? Microbiol Sci. 4:
206–214.
Shackelton LA, Holmes EC. 2004. The evolution of large DNA viruses:
combining genomic information of viruses and their hosts. Trends
Microbiol. 12:458–465.
Sharp PM, Bailes E, Stevenson M, Emerman M, Hahn BH. 1996. Gene
acquisition by non-homologous recombination in HIV/SIV. Nature
383:586–587.
Simon-Loriere E, Holmes EC. 2011. Why do RNA viruses recombine? Nat
Rev Microbiol. 9:617–626.
Soltis DE, Soltis PS. 1999. Polyploidy: recurrent formation and genome
evolution. Trends Ecol Evol. 14:348–352.
Tanno F, Nakatsu A, Toriyama S, Kojima M. 2000. Complete nucleotide
sequence of Northern cereal mosaic virus and its genome organization. Arch Virol. 145:1373–1384.
Tristem M, Marshall C, Karpas A, Hill F. 1992. Evolution of the primate
lentiviruses: evidence from vpx and vpr. EMBO J. 11:3405–3412.
Tristem M, Marshall C, Karpas A, Petrik J, Hill F. 1990. Origin of vpx in
lentiviruses. Nature 347:341–342.
Tseng CH, Knowles NJ, Tsai HJ. 2007. Molecular analysis of duck hepatitis
virus type 1 indicates that it should be assigned to a new genus.
Virus Res. 123:190–203.
Tzanetakis IE, Martin RR. 2007. Strawberry chlorotic fleck: identification
and characterization of a novel Closterovirus associated with the
disease. Virus Res. 124:88–94.
Tzanetakis IE, Postman JD, Martin RR. 2005. Characterization of a novel
member of the family Closteroviridae from Mentha spp.
Phytopathology 95:1043–1048.
Valli A, Lopez-Moya JJ, Garcia JA. 2007. Recombination and gene duplication in the evolutionary diversification of P1 proteins in the family
Potyviridae. J Gen Virol. 88:1016–1028.
Wagner GP, Pavlicev M, Cheverud JM. 2007. The road to modularity. Nat
Rev Genet. 8:921–931.
Walker PJ, Byrne KA, Riding GA, Cowley JA, Wang Y, McWilliam S. 1992.
The genome of bovine ephemeral fever rhabdovirus contains two
related glycoprotein genes. Virology 191:49–61.
Walker PJ, Dietzgen RG, Joubert DA, Blasdell KR. 2011. Rhabdovirus
accessory genes. Virus Res. 162:110–125.
Wang Y, Walker PJ. 1993. Adelaide river rhabdovirus expresses consecutive glycoprotein genes as polycistronic mRNAs: new evidence of
gene duplication as an evolutionary process. Virology 195:719–731.
Zhang J. 2003. Evolution by gene duplication: an update. Trends Ecol
Evol. 18:292–298.
Zhang J, Temin HM. 1994. Retrovirus recombination depends on the
length of sequence identity and is not error prone. J Virol. 68:
2409–2414.
Zhang J, Zhang YP, Rosenberg HF. 2002. Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey. Nat
Genet. 30:411–415.
Zlateva KT, Vijgen L, Dekeersmaeker N, Naranjo C, Van Ranst M. 2007.
Subgroup prevalence and genotype circulation patterns of human
respiratory syncytial virus in Belgium during ten successive epidemic
seasons. J Clin Microbiol. 45:3022–3030.
1269
Related documents