Download Separating derived from ancestral features of mouse and human

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative trait locus wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Essential gene wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Oncogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transposable element wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene desert wikipedia , lookup

Non-coding DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Genomic library wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Metagenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human genome wikipedia , lookup

Human Genome Project wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Microevolution wikipedia , lookup

Minimal genome wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
734
Biochemical Society Transactions (2009) Volume 37, part 4
Separating derived from ancestral features
of mouse and human genomes
Chris P. Ponting1 and Leo Goodstadt
MRC Functional Genomics Unit, University of Oxford, Department of Physiology, Anatomy and Genetics, South Parks Road, Oxford OX1 3QX, U.K.
Abstract
To take full advantage of the mouse as a model organism, it is essential to distinguish lineage-specific
biology from what is shared between human and mouse. Investigations into shared genetic elements
common to both have been well served by the draft human and mouse genome sequences. More recently,
the virtually complete euchromatic sequences of the two reference genomes have been finished. These
reveal a high (∼5%) level of sequence duplications that had previously been recalcitrant to sequencing and
assembly. Within these duplications lie large numbers of rodent- or primate-specific genes. In the present
paper, we review the sequence properties of the two genomes, dwelling most on the duplications, deletions
and insertions that separate each of them from their most recent common ancestor, approx. 90 million years
ago. We consider the differences in gene numbers and repertoires between the two species, and speculate
on their contributions to lineage-specific biology. Loss of ancient single-copy genes are rare, as are gains
of new functional genes through retrotransposition. Instead, most changes to the gene repertoire have
occurred in large multicopy families. It has been proposed that numbers of such ‘environmental genes’ rise
and fall, and their sequences change, as adaptive responses to infection and other environmental pressures,
including conspecific competition. Nevertheless, many such genes may be under little or no selection.
Introduction
During the Cretaceous Period, approx. 90 million years ago,
a line of small mammals, probably similar in appearance to
modern tree shrews, split into two lineages. One of these led
to the primates, including Homo sapiens, the other to rodents
such as the laboratory mouse, Mus musculus. Despite the
obvious morphological differences between humans and
mice, close observation over 100 years of mouse genetics has
demonstrated the many close anatomical and physiological
affinities between humans and mice [1,2].
Yet how are these species similar or different in their genes
and in genomes? How much has changed in these genomes in
the last 90 million years, and how much has resisted change
to preserve essential ancestral functions? If Darwinian
adaptation has been responsible for the divergence between
rodents and primates, would this have left a corresponding
signature in their genomes? To answer these and other
questions the sequencing of human and mouse genomes
was required, followed by detailed comparisons pinpointing
those nucleotides or exons or genes or chromosomal
segments that have remained intact in both lineages since
their last common ancestor.
The draft genome sequences of humans in 2001 [3] and
mice in 2002 [4] were great boons for research in genomics,
genetics and evolution. Yet, these draft sequences were never
intended to be comprehensive. It was always realized that
Key words: comparative genomics, copy number variant, functional innovation, gene duplication,
lineage-specific biology, mammalian evolution.
Abbreviation used: ORF, open reading frame.
1
To whom correspondence should be addressed (email [email protected]).
C The
C 2009 Biochemical Society
Authors Journal compilation sequencing of the extremely repetitive and transposon-rich
heterochromatin would be too technically challenging.
However, because heterochromatin is believed to have low
gene density and is largely silenced, its absence has been met
with little complaint.
On the other hand, it was of great concern that a large
amount of euchromatic sequence (>10% of human and
>6.5% of mouse) was also found to be absent from the initial
drafts. The process by which this additional sequence has
been painstakingly added to genome assemblies is known
as ‘finishing’, although even ‘finished’ euchromatin is not
perfect and will contain a low level of missing and inaccurate
sequence for some time to come [5].
The ‘finished’ human genome sequence was reported in
2004 [5] and a similar sequence for mice will be reported
elsewhere soon [6]. The purpose of the present review is to
provide a brief overview of the similarities and differences
between these two genome sequences in the light of newfound sequence previously absent from draft assemblies.
Mind the gaps
The human genome assembly extends to 3.09 Gb (Table 1),
containing 99% of the euchromatin and 94% of the entire
genome [5]. This is a slightly larger genome assembly than
for the mouse, which comprises approx. 2.66 Gb. Similarities
between chromosome numbers and genome sizes belie
the substantial rearrangement of the ancestral genome,
particularly in the rodent lineage, with over 300 large
(>300 kb) blocks of genes being reordered, and many more
smaller scale rearrangements observed [4]. Across the whole
Biochem. Soc. Trans. (2009) 37, 734–739; doi:10.1042/BST0370734
Protein Evolution: Sequences, Structures and Systems
Table 1 Properties of finished human and mouse reference
Figure 1 Mouse and human gene duplication as a function of age
genome assemblies
NCBI Builds 36.1 and 36 respectively.
These are extant genes from the two reference genome assemblies.
Their age of duplication has been estimated phylogenetically from the
Property
Human
Mouse
Assembled genome size (Gb)
3.091
2.661
Segmentally duplicated
sequence (Mb)
Interspersed repeats (Gb)
159.2 (5.52%)
126.0 (4.94%)
1.406
1.091
Number of gaps
Sequence in gaps (Mb)
Number of gene models
118
9.343
19042
1218
6.088
20210
Coding sequence (%)
1.07
1.27
of the human genome, only 40% of sequence can be aligned
to the mouse genome, with much of the remainder
representing remnants of transposable elements that have
been frequently inserted and deleted in each lineage, much
of which has decayed beyond recognition.
Finishing the genome assemblies revealed that the draft
assemblies were particularly deficient in segmental duplications, defined as >1 kb fragments of genomic sequence
with high sequence identity (>90%) that map to multiple
locations [7]. The repetitive nature of this sequence explains
its recalcitrance to assembly, especially via the whole genome
shotgun approach. Segmental duplications are now known
to cover approx. 5% of both human and mouse euchromatin
[8]. These newly discovered duplicated sequences would
perhaps be only of passing interest, except that they contain a
high density of duplicated protein-coding genes (see below).
These same regions also tend to be highly variable in
structure, including copy number, among unrelated human
or mouse individuals [9–11]. The mouse sequence is from a
single highly inbred individual female from the laboratory
black 6 (C57BL/6) strain. However, the human reference
genome represents the agglomeration of contributions
from an anonymous panel of outbred individuals [3],
although more than half the assembly stems from a single
contributor. Each individual carries ∼0.5% copy number
variant sequence, often in a heterozygous state [12].
The human genome sequence is thus not a consensus
representing the most frequent variants. Instead, because of
selection for size in choosing insert clones for sequencing,
parts of the human reference genome may contain a
systematic bias towards the incorporation of variable regions,
especially those with high copy numbers. In any case, even for
the mouse where copy number variants are seen in abundance
between different inbred mouse strains [8], structural
variation between individuals highlights the limitations of a
single reference genome in representing an entire population.
Gene repertoires
Interest in relating gene numbers to organismal complexity
has waned considerably since the discovery that the genome
ratio of the duplicates’ branch length to the distance to the rodent
primate speciation node, and by assuming that the split occurred 90
million years ago.
of the simple nematode Caenorhabditis elegans contains
almost as many protein-coding genes as humans [13–15].
Nevertheless, the reduction of the human gene count from
an initial 32 000 in the draft human genome publication [3] to
its current level of approx. 19 000 shows the many inherent
difficulties in gene predictions as well as the great progress
that has been made since.
Initial gene counts were greatly inflated by the inclusion
of fragmentary gene models, many of which represent
the debris of non-functional pseudogenes. Spurious ORFs
(open reading frames) present by chance in RNA transcripts
were also misidentified even in the absence of either
protein-coding potential or evolutionary conservation [15].
Disruptions to putative ORFs such as small insertions,
deletions and frameshifts were difficult to distinguish from
sequence errors that are often found among even bona fide
gene predictions in a draft genome sequence. Finally, it has
never been straightforward to distinguish one gene from its
genomic neighbour, particularly as (non-coding or chimaeric)
transcripts can often be found to span the two [16,17].
The current low estimates of gene number in the human
genome [13–15] result from the exclusion of erroneous
gene models that fail criteria for protein-coding potential
and conservation. These rely upon three widely held
observations: that intron positions and especially phase are
very well conserved within coding sequence, that confirmed
cases of mammalian protein-coding genes recruited de novo
from non-coding sequence are very rare, and that retrotransposition generally results in non-functional sequence with
immediate loss of constraint. This last assumption appears to
be the most fallible, since promoter elements are sometimes
fortuitously present at the 5 -end of the integration site of
retrotransposons. However, empirically, because only about
one functional retrogene arises per million years, this results
C The
C 2009 Biochemical Society
Authors Journal compilation 735
736
Biochemical Society Transactions (2009) Volume 37, part 4
in only a small underestimation of ∼100 genes since the
human mouse divergence [18,19].
The great majority of genes are thus gained in each
lineage not by de novo creation or retrotransposition, but
instead by whole gene duplication, often in tandem copies;
simultaneously, functional genes are lost by deletion or
after disruptive mutations (‘pseudogenization’). The fates
of genes on each of the human and mouse lineages are thus
best described by gene births and deaths in the time since
their last common ancestor approx. 90 million years ago.
Figure 1 shows the rate of accumulation of extant genes in
both Mus musculus and Homo sapiens lineages. The large
numbers of the most recently duplicated genes largely reflect
copy number variable genes: many of these are unlikely to
be fixed in future populations (see below).
Correspondence between mouse and
human genes
To a Well-Connected Mouse
(Upon reading of the genetic closeness of mice and men.)
Wee, sleekit, cow’rin, tim’rous beastie,
Braw science says that at the leastie
We share full ninety-nine per cent
O’genes, where’er the odd ane went.
c 2003 John Updike. Originally published in The
New Yorker. All rights reserved.
This reworking of the 18th Century Robert Burns
poem (“To a Mouse”) was sparked by the Nature editorial
commentary that 99% of mouse genes have “direct
counterparts in humans” [20]. This statement unfortunately
gives the misleading impression that human and mouse genes
mostly correspond, and that divergence in their genomes is
thus largely either within the sequence of each corresponding
gene, or in regulatory regions.
In fact, the mouse and human gene repertoires have been
dramatically remodelled since the divergence of the lineages
∼90 million years ago: 20% (3852 of 19042) of human genes
and 24% (5020 of 20210) of mouse genes are duplicate
copies which have arisen since their last common ancestor
(Figure 2). As for “where’er the odd ane went”, the Nature
commentary reflects that in fewer than 1% of genes has every
homologue become extinct in the other lineage. These are
genes that had long persisted before the common ancestor
and yet have latterly become dispensable. This may no
doubt reflect profound changes to the functional repertoire.
An example of a loss of an erstwhile essential gene is that
of EYS whose disruption in humans results in a form of
adolescent onset blindness (retinitis pigmentosa), but whose
loss approx. 80 million years ago seems to have little affected
the rodents [21]. A more recent example in evolution involves
the human-specific loss of the myosin heavy chain MYH16
which has been argued in a ‘less is more’ hypothesis to confer
a selective benefit in increasing the cranial capacity [22].
Approx. 15 187 genes have remained unduplicated in both
human and mouse lineages since their last common ancestor
C The
C 2009 Biochemical Society
Authors Journal compilation Figure 2 Mouse genes have a higher synonymous nucleotide
substitution rate (dS ) and have accumulated more lineage-specific
duplicates than human genes
(A) Mouse and human phylogeny drawn to the dS scale. (B) The number
of 1:1 mouse and human orthologues (black) and the number of gene
duplicates unique to each species (grey).
Table 2 Properties of human and mouse simple 1:1 orthologues
Properties are median values. d N , non-synonymous substituion; d S ,
synonymous substitution.
Property
Value
Counts of 1:1 orthologues
15 187
dN
dS
d N /d S ratio
0.057
0.58
0.095
Amino acid sequence identity (%)
Coding sequence identity (%)
Aligned sequence length (codons)
88.2
85.3
434
(Table 2). These ‘simple orthologues’ have also substantially
conserved their sequences: their median amino acid and
nucleotide identities are 88 and 85% respectively. These
are the genes expected to convey much of the functional
repertoire that is conserved among mammals and, more
broadly, among other animals. It is thus appropriate that
these simple orthologues are being specifically targeted for
disruption in large-scale phenotypic screens in the mouse to
illuminate conserved mammalian biology.
The estimated gene count is substantially higher in mice
(20 210) than in humans (19 042) because of the larger number
of rodent-specific gene duplicates. This is almost entirely due
to genes with roles in chemosensation, such as olfactory and
vomeronasal receptors and pheromone genes. The cull of
primate-specific chemosensation genes relative to murid rodents and other mammals seems to have accelerated in the oldworld primates in the last ∼25 million years [23,24]. The contrast in gene repertoires perhaps reflects the divergent sensory
requirements for mammals that differ in diet and behaviour.
Gene duplications in mice are predominantly found in
tandem copies of genomic sequence [8], and most probably
arise via unequal crossover or non-allelic homologous
recombination. In contrast, human duplicates are more often
dispersed to different chromosomes in recombinations that
were mediated by the primate-specific Alu-SINE (short
interspersed element) retrotransposons [25,26].
Protein Evolution: Sequences, Structures and Systems
Large genomic duplications are less frequent than smaller
ones. Thus compact genes are more likely to be duplicated
with a complete and intact ORF, as well as the accompanying
promoter and other necessary non-coding regulatory
sequence. Median sizes of human- or mouse-lineage-specific
genes are thus approx. 3- and 6-fold smaller than those
of simple orthologues (6.6 or 3.3 kb compared with 26.6
or 21.2 kb respectively); they have fewer exons (medians
of five and three compared with nine for both mice and
humans); and they are 5- and 12-fold (in humans or mice)
more likely to contain only a single coding exon.
Unequal crossing-over can also generate novel functional
genes, either via the formation of chimaeric genes or as partial
duplicates (e.g. [27]). For example, a duplication of a fragment
of the rodent Dlg5 gene spawned an extended family of
testis-specific genes that have become extremely widespread
in the mouse and rat genomes [28]. Partial duplications have
also occasionally been observed within duplicated genes.
The primate-specific LPA gene, for example, was formed
initially by a complete duplication of PLG with a subsequent
succession of further internal duplications of individual
domains [29].
Innovative functions of lineage-specific
genes
Duplicated genes form a far from random selection of all
genes. Aside from the bias in gene sizes mentioned above,
several functional categories are overrepresented among
genes specific to either primate or rodent lineages. These
include (i) chemosensation genes which are more numerous
in rodents, and genes associated with (ii) immunity or host
defence (e.g. T-cell receptor genes), (iii) with detoxification
(e.g. cytochrome P450), and (iv) with reproduction (e.g.
pheromone or cancer-testis antigen genes [30]).
Many of these genes are of considerable biological
interest and are found in regions of the genome rich in
segmental duplications and interspersed repeats, and are
thus only present in finished genome assemblies. The
extreme repetitive nature of the Y chromosome has required
particularly focused efforts to complete its sequence. Indeed,
further copies of rapidly evolving gene families can be
expected to be discovered as the remaining gaps in the human
and mouse genomes are closed.
These functional biases might imply that gene duplication
events are largely adaptive: expansions in the gene repertoire
may be a response to environmental challenges from
pathogens and parasites, or confer reproductive advantages
over other individuals from the same species. Evidence
of positive selection at specific codons [dN /dS ratio (ratio of
non-synonymous to synonymous substitutions) significantly
greater than 1] has often been cited for duplicated gene
families (e.g. [31,32]).
However, with the small mammalian (especially primate)
effective population sizes, chance gene duplications may
remain in the population for some time or proceed to
fixation when they are not advantageous or even mildly
deleterious. This would explain the recent provenance and
hence short lifespans of gene duplicates (Figure 1), some of
which are unfixed and copy-number-variable within human
or mouse populations. The observed overrepresentation of
functional categories may simply be because duplications
of genes in other classes tend to be deleterious (in part due
to stoichiometric constraints) and thus are preferentially
purged from the population [33,34].
Even where evidence for positive selection is cited, this
may instead be the result of loss of constraint in redundant
gene copies after duplication. Other cases of apparent
adaptive change can be due to biased gene conversion [35].
This is a bias in the recombination-associated repair process
following double-strand breaks and can lead to the erroneous
inference of positive selection [36,37]. It is suggested
that biased gene conversion among adjacent tandemly
duplicated genes may drive their accelerated evolution
[38].
On the other hand, the short lifespan of many gene
duplicates may point to the short period of advantage that
novelty confers in a constantly changing adaptive landscape.
A high turnover in the gene repertoire with many births and
deaths would be seen in an evolutionary ‘arms race’ with
pathogens [39]. There can also be positive selection among
genes expressed in germline or stem cells [40–42] which
confers no advantage on either the individual or population.
Instead, gain-of-function mutations that, for example, favour
clonal expansion of mutant spermatogonia [43], may underlie
much strong diversifying selection.
Almost 40 years after Ohno [44] first proposed that gene
duplications are the principal forces for the generation of
novelty in evolution, the relative contributions of adaptation
and genetic drift in maintaining our gene repertoires thus
remain to be determined. The availability of further primate
and rodent assembled genomes of high quality on one
hand, and extensive allele frequency data for copy number
variants in natural primate and rodent populations on
the other, may be required to decide between these two
evolutionary scenarios [45].
Concluding remarks
Strenuous efforts have now provided near-complete
euchromatic genome sequence assemblies of humans and
mice. These have revealed large numbers of lineage-specific
genes lying within segmentally duplicated genomic regions
that were previously recalcitrant to sequencing and assembly.
Many of these novel genes are studied only rarely, partly
because of their relatively late arrival into genome sequence
assemblies, and partly due precisely to the absence of single
orthologous genes in the other species: inferences from
model organisms about lineage-specific human genes have
to be treated with the greatest caution, while a case has to be
made for the relevance of each mouse-specific gene to human
physiology and disease. Nevertheless, if we are to fully
appreciate the functional repertoire of our own species, and
C The
C 2009 Biochemical Society
Authors Journal compilation 737
738
Biochemical Society Transactions (2009) Volume 37, part 4
that of the mouse, our most important model organism, then
these genes demand much further scrutiny in the future.
Funding
We gratefully acknowledge support from the Medical Research
Council.
References
1 Paigen, K. (2003) One hundred years of mouse genetics: an intellectual
history. I. The classical period (1902–1980). Genetics 163, 1–7
2 Paigen, K. (2003) One hundred years of mouse genetics: an intellectual
history. II. The molecular revolution (1981–2002). Genetics 163,
1227–1235
3 Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J.,
Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial
sequencing and analysis of the human genome. Nature 409, 860–921
4 Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F.,
Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al.
(2002) Initial sequencing and comparative analysis of the mouse
genome. Nature 420, 520–562
5 International Human Genome Sequencing Consortium (2004) Finishing
the euchromatic sequence of the human genome. Nature 431,
931–945
6 Church, D.M., Goodstadt, L., Hillier, L.W., Zody, M.C., Goldstein, S., She, X.,
Bult, C.J., Agarwala, R., Cherry, J.L., DiCuccio, M. et al. (2009)
Lineage-specific biology revealed by a finished genome assembly of the
mouse. PLoS Biol. 7, e1000112
7 Eichler, E.E., Clark, R.A. and She, X. (2004) An assessment of the
sequence gaps: unfinished business in a finished human genome.
Nat. Rev. Genet. 5, 345–354
8 She, X., Cheng, Z., Zollner, S., Church, D.M. and Eichler, E.E. (2008) Mouse
segmental duplication and copy number variation. Nat. Genet. 40,
909–914
9 Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P.,
Maner, S., Massa, H., Walker, M., Chi, M. et al. (2004) Large-scale copy
number polymorphism in the human genome. Science 305, 525–528
10 Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y.,
Scherer, S.W. and Lee, C. (2004) Detection of large-scale variation in the
human genome. Nat. Genet. 36, 949–951
11 Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M.,
Haugen, E., Hayden, H., Albertson, D., Pinkel, D. et al. (2005)
Fine-scale structural variation of the human genome. Nat. Genet. 37,
727–732
12 McCarroll, S.A., Kuruvilla, F.G., Korn, J.M., Cawley, S., Nemesh, J.,
Wysoker, A., Shapero, M.H., de Bakker, P.I., Maller, J.B., Kirby, A. et al.
(2008) Integrated detection and population-genetic analysis of SNPs and
copy number variation. Nat. Genet. 40, 1166–1174
13 Goodstadt, L. and Ponting, C.P. (2006) Phylogenetic reconstruction of
orthology, paralogy, and conserved synteny for dog and human. PLoS
Comput. Biol. 2, e133
14 Goodstadt, L., Heger, A., Webber, C. and Ponting, C.P. (2007) An analysis
of the gene complement of a marsupial, Monodelphis domestica:
evolution of lineage-specific genes and giant chromosomes. Genome
Res. 17, 969–981
15 Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M.F., Kellis, M.,
Lindblad-Toh, K. and Lander, E.S. (2007) Distinguishing protein-coding
and noncoding genes in the human genome. Proc. Natl. Acad. Sci. U.S.A.
104, 19428–19433
16 Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O.,
Emanuelsson, O., Zhang, Z.D., Weissman, S. and Snyder, M. (2007) What
is a gene, post-ENCODE? History and updated definition. Genome Res.
17, 669–681
17 Denoeud, F., Kapranov, P., Ucla, C., Frankish, A., Castelo, R.,
Drenkow, J., Lagarde, J., Alioto, T., Manzano, C., Chrast, J. et al. (2007)
Prominent use of distal 5 transcription start sites and discovery of a
large number of additional exons in ENCODE regions. Genome Res. 17,
746–759
C The
C 2009 Biochemical Society
Authors Journal compilation 18 Marques, A.C., Dupanloup, I., Vinckenbosch, N., Reymond, A. and
Kaessmann, H. (2005) Emergence of young human genes after a burst
of retroposition in primates. PLoS Biol. 3, e357
19 Kaessmann, H., Vinckenbosch, N. and Long, M. (2009) RNA-based gene
duplication: mechanistic and evolutionary insights. Nat. Rev. Genet. 10,
19–31
20 Gunter, C. and Dhand, R. (2002) Human biology by proxy. Nature 420,
509
21 Abd El-Aziz, M.M., Barragan, I., O’Driscoll, C.A., Goodstadt, L.,
Prigmore, E., Borrego, S., Mena, M., Pieras, J.I., El-Ashry, M.F., Safieh, L.A.
et al. (2008) EYS, encoding an ortholog of Drosophila spacemaker, is
mutated in autosomal recessive retinitis pigmentosa. Nat. Genet. 40,
1285–1287
22 Stedman, H.H., Kozyak, B.W., Nelson, A., Thesier, D.M., Su, L.T., Low,
D.W., Bridges, C.R., Shrager, J.B., Minugh-Purvis, N. and Mitchell, M.A.
(2004) Myosin gene mutation correlates with anatomical changes in the
human lineage. Nature 428, 415–418
23 Shi, P. and Zhang, J. (2007) Comparative genomic analysis identifies an
evolutionary shift of vomeronasal receptor gene repertoires in the
vertebrate transition from water to land. Genome Res. 17, 166–174
24 Liman, E.R. (2006) Use it or lose it: molecular evolution of sensory
signaling in primates. Pflugers Arch. 453, 125–131
25 Bailey, J.A., Liu, G. and Eichler, E.E. (2003) An Alu transposition model for
the origin and expansion of human segmental duplications. Am. J. Hum.
Genet. 73, 823–834
26 Kim, P.M., Lam, H.Y., Urban, A.E., Korbel, J.O., Affourtit, J., Grubert, F.,
Chen, X., Weissman, S., Snyder, M. and Gerstein, M.B. (2008) Analysis of
copy number variants and segmental duplications in the human
genome: evidence for a change in the process of formation in recent
evolutionary history. Genome Res. 18, 1865–1874
27 Wang, X. and Zhang, J. (2006) Remarkable expansions of an X-linked
reproductive homeobox gene cluster in rodent evolution. Genomics 88,
34–43
28 Spiess, A.N., Walther, N., Muller, N., Balvers, M., Hansis, C. and Ivell, R.
(2003) SPEER: a new family of testis-specific genes from the mouse.
Biol. Reprod. 68, 2044–2054
29 Lawn, R.M., Schwartz, K. and Patthy, L. (1997) Convergent evolution of
apolipoprotein(a) in primates and hedgehog. Proc. Natl. Acad. Sci. U.S.A.
94, 11992–11997
30 Emes, R.D., Goodstadt, L., Winter, E.E. and Ponting, C.P. (2003)
Comparison of the genomes of human and mouse lays the foundation of
genome zoology. Hum. Mol. Genet. 12, 701–709
31 Karn, R.C., Clark, N.L., Nguyen, E.D. and Swanson, W.J. (2008) Adaptive
evolution in rodent seminal vesicle secretion proteins. Mol. Biol. Evol.
25, 2301–2310
32 Jackson, M., Watt, A.J., Gautier, P., Gilchrist, D., Driehaus, J., Graham, G.J.,
Keebler, J., Prugnolle, F., Awadalla, P. and Forrester, L.M. (2006) A
murine specific expansion of the Rhox cluster involved in embryonic
stem cell biology is under natural selection. BMC Genomics 7, 212
33 Dopman, E.B. and Hartl, D.L. (2007) A portrait of copy-number
polymorphism in Drosophila melanogaster. Proc. Natl. Acad. Sci. U.S.A.
104, 19920–19925
34 Nguyen, D.Q., Webber, C. and Ponting, C.P. (2006) Bias of selection on
human copy-number variants. PLoS Genet. 2, e20
35 Marais, G. (2003) Biased gene conversion: implications for genome and
sex evolution. Trends Genet. 19, 330–338
36 Berglund, J., Pollard, K.S. and Webster, M.T. (2009) Hotspots of biased
nucleotide substitutions in human genes. PLoS Biol. 7, e26
37 Galtier, N., Duret, L., Glemin, S. and Ranwez, V. (2009) GC-biased gene
conversion promotes the fixation of deleterious amino acid changes in
primates. Trends Genet. 25, 1–5
38 Hurst, L.D. (2009) Evolutionary genomics: a positive becomes a negative.
Nature 457, 543–544
39 Dawkins, R. and Krebs, J.R. (1979) Arms races between and within
species. Proc. R. Soc. London Ser. B 205, 489–511
40 Laukaitis, C.M., Heger, A., Blakley, T.D., Munclinger, P., Ponting, C.P. and
Karn, R.C. (2008) Rapid bursts of androgen-binding protein (Abp) gene
duplication occurred independently in diverse mammals. BMC Evol. Biol.
8, 46
41 Birtle, Z., Goodstadt, L. and Ponting, C. (2005) Duplication and positive
selection among hominin-specific PRAME genes. BMC Genomics 6,
120
Protein Evolution: Sequences, Structures and Systems
42 Johnson, M.E., Viggiano, L., Bailey, J.A., Abdul-Rauf, M., Goodwin, G.,
Rocchi, M. and Eichler, E.E. (2001) Positive selection of a gene family
during the emergence of humans and African apes. Nature 413,
514–519
43 Goriely, A., McVean, G.A., Rojmyr, M., Ingemarsson, B. and Wilkie, A.O.
(2003) Evidence for selective advantage of pathogenic FGFR2 mutations
in the male germ line. Science 301, 643–646
44 Ohno, S. (1970) Evolution by gene duplication, Springer-Verlag,
Heidelberg
45 Perry, G.H., Yang, F., Marques-Bonet, T., Murphy, C., Fitzgerald, T., Lee,
A.S., Hyland, C., Stone, A.C., Hurles, M.E., Tyler-Smith, C. et al. (2008)
Copy number variation and evolution in humans and chimpanzees.
Genome Res. 18, 1698–1710
Received 10 February 2009
doi:10.1042/BST0370734
C The
C 2009 Biochemical Society
Authors Journal compilation 739