Download Genomics of the evolutionary process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polymorphism (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Population genetics wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene desert wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Human genetic variation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

NUMT wikipedia , lookup

Gene wikipedia , lookup

Transposable element wikipedia , lookup

Koinophilia wikipedia , lookup

Designer baby wikipedia , lookup

History of genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome (book) wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Minimal genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

ENCODE wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human Genome Project wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome editing wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Review
TRENDS in Ecology and Evolution
Vol.21 No.6 June 2006
Genomics of the evolutionary process
Andrew G. Clark
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
Comparative analysis of genome sequences has become
the primary means by which functional elements are
first identified, often preceding even the identification of
their function. Although this approach capitalizes on
the conservation of homologous functions, it has also
been successful in identifying evolutionary novelties,
including new genes and pathways. As I discuss here,
the analysis of multiple alignments of sequences from
species on a known phylogeny has provided rich detail
about the heterogeneities in the process of genome
changes. Inferences of positive selection acting on
protein-encoding genes have provided clues about the
role of adaptive evolution in the past. These methods
also identify negatively selected genes, providing some
clue to genes that are most likely to be mutable to a
disease-causing state.
Comparative genomics: discovery and hypothesisdriven science
One of the most wonderful things about comparative
genomics is that it has turned a whole generation of
molecular biologists into evolutionists, full of excitement
about the way that evolution has sculpted exquisite
modifications to organismal genomes and eager to tell
stories about it. At the same time, one of its worst disasters
is that it has created a hoard of genomics investigators
who think that evolutionary biology is just fun, speculative story telling. Sadly, much of the scientific publication
industry seems to respond to the herd as much as it does to
scientific rigor, and so we have a bit of a mess on our
hands. Fortunately, this is all a temporary aberration and,
eventually, the noise will be separated from the signal, and
progress will march on in understanding what genome
sequence divergence really means. The genomics enterprise has fallen upon us so suddenly that it is hardly
surprising that a thoughtful and contemplative field such
as evolutionary biology is struggling to keep up. It is good
for the field to have this upheaval, as it has brought in a
wealth of new ideas and approaches. In the meantime, it
has become clear that complete genome sequencing of
multiple species is providing a deep and inspiring set of
new problems and solutions for evolutionary biologists,
including those working on species that they thought were
phylogenetically remote from any model organism.
Comparative evolutionary genomic analysis has proceeded along two parallel paths. The first takes the old
hypothesis- and model-driven approach in population
biology, where a problem is developed abstractly, even to
the point of a mathematical model, and the data are then
Corresponding author: Clark, A.G. ([email protected]).
Available online 5 May 2006
considered for their ability to assess the merits of the
model. This approach has been empowered by genomic
data, because their sheer volume makes the power of the
tests high. However, the models are generally simplified
cartoons of reality and, by the time one has whole-genome
data, the many ways that the real data can depart from
the clean, simple model become all too apparent. The
second approach is more akin to the voyage of the HMS
Beagle, setting sail to who knows where, amassing
genome sequence data on our hard drives and pawing
through it to discover things we have not seen before.
Although it was easy to make fun of this ‘discovery science’
approach when it was first espoused, we now know that it
has done exactly what was promised: all kinds of exciting,
unexpected things happen to genomes as they diverge,
and a thorough description of these observations has
considerable merit. In many ways, this descriptive,
discovery aspect of genomic science is a recapitulation of
the grand era of natural history during the 1800s, when
the detailed cataloging of nature gave way to the hunger to
understand why there is such diversity and structure to
the natural world.
Descriptive genomics
Sound genomic analysis must always begin with a
thorough description. We do not yet know the function of
all the elements of any genome, but at least we can attempt
to provide a useful description of the data at hand. With just
the slightest scratching beneath the surface, these
descriptive studies reveal a great deal to stimulate
thinking and questioning about genome evolution. In
many ways, the natural historical description of a genome
is more pregnant with implied evolutionary process than is
a description of the natural history of an organism. This is
because each genome carries clear echoes of its past.
Consider the inference of evolutionary process that
comes from the analysis of genomic duplications reported
by Evan Eichler’s group [1]. By simple computational
algorithms, these researchers have found that nearly onethird of the duplications in the human genome are not
present in chimpanzee and that, overall, the process of
duplication has altered more nucleotides than have singlenucleotide substitutions since our common ancestry with
chimpanzees. Despite all this fluidity that has been
introduced by gene duplication and rearrangement,
there is variation in the degree of maintenance of synteny
(chromosomal co-occurrence of groups of genes) across
disparate species. Why should this be so? Is it simply
chance, or is there a level of genomic organization that
has functional consequences? This second question is
www.sciencedirect.com 0169-5347/$ - see front matter Q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tree.2006.04.004
Review
TRENDS in Ecology and Evolution
a challenge because it demands comparison across
multiple disparate species and their genomes.
There has been some recent progress on making
evolutionary inferences about the determinants of genome
size by the analysis of transposons [2], or through the
analysis of gene loss in parasitic plants [3] and
endosymbiotic microbes [4]. It seems to be a common
theme in evolution that when genes are no longer needed,
especially after adopting a parasitic life style, then they
are simply lost. Parasitic plants, for instance, can dispense
with the maintenance of costly genes involved in
photosynthesis. Gene transfers, both horizontal (i.e.
movement of segments of genomes from across distantly
related species) and otherwise, remain challenging partly
because they break the rules of phylogenetic inference
[5,6]. For genetic sequence divergence that remains
concordant with a species phylogeny, sophisticated analysis remains possible, but once genes undergo horizontal
transfer, their phylogeny departs from the species
phylogeny, and the best that one can do is estimate the
rate of transfer.
Many other attributes of genome structure still require
evolutionary explanations. Why are genomes structured
to have regions with such wildly fluctuating gene density
(‘deserts and jungles’)? Is this adaptive in any sense, or
does it arise as a result of local constraints on sequence
evolution? What is the role of heterochromatin in genome
evolution, and why is there such variation among
organisms in how much heterochromatin their genomes
bear? Centromeres have a central role in chromosome
disjunction, but their structure gives no clue as to how
they function or how their functions evolve, other than the
likelihood that selfish elements drive them [7]. Although
Vol.21 No.6 June 2006
317
there is already a library full of papers on the evolution of
transposable elements, it is clear that the analysis of
multiple complete genome sequences will enable an
unprecedented assessment of their intragenomic demography. Horizontal transfers can occur among highly
unpredictable pairs of species. There is even movement
of genetic material from the mitochondrion out to the
nuclear genome, and many higher organisms experience
this at a sufficient rate that the complete mitochondrial
genome occurs in pieces within the nuclear genome [8].
The initial genome sequences of the ascidian Ciona
intestinalis and of the mosquito Anopheles gambiae were
only partially successful owing to problems in assembling
some regions of the genome that appeared to be
maintaining two widely disparate haplotypes. Were
these the result of ancient introgression events? What
maintains these regions in the genome? Across different
individuals are the same regions maintaining this
heterogeneity? How common is this kind of genome
structure?
Finding functional elements
One can know little about the field of evolutionary biology
and still make use of the idea that conserved elements of
genome sequences are more likely to be functional than
are regions that are substituted beyond recognition. This
argument can be carried too far (Figure 1), and it is best to
avoid the conclusion that just because something is
conserved it must have a vital function. There might be
another molecular process that gives a portion an
abnormally low mutation rate, for example by virtue of
palindromic repeats whose head-to-head redundancy
enables gene conversion across the repeats. A widely
APOE
Gibbon
Baboon
Marmoset
Mouse
Dog
Cow
Chicken
Fugu
50.100
50.101
50.102
50.103
50.104
50.105
50.106
Position on chr 19 (Mb)
TRENDS in Ecology & Evolution
Figure 1. Identifying conserved DNA sequence elements. Alignment of genome sequences and tools for visualizing those alignments have proven remarkably useful in
identifying how evolution has resulted in the conservation of genetic elements and the erasure of nonessential sequence. The figure shows a multi-species alignment of a
homologous region to chr 19 in humans spanning the gene encoding apolipoprotein E. Each band of the figure has a Y axis spanning from 50% to 100% sequence identity.
The three exons, indicated in blue, show strong conservation in all mammals, whereas the gene is missing altogether from chicken and fugu. The phylogenetic tree shows
only the rough topology of the phylogeny and not the temporal scale. For more information on related visualization tools, see http://genome.lbl.gov/vista/index.shtml (tools
from this website were used to generate the figure).
www.sciencedirect.com
318
Review
TRENDS in Ecology and Evolution
cited conclusion drawn from the sequencing of the mouse
genome is that, because of the pattern of conservation
with human genome sequence, 5% of mammalian
genomes are undergoing purifying natural selection, a
form of natural selection that results in the conservation
of sequence across species [9]. This estimate has not faced
any serious challenge, but the initial calculation is based
on a simple model that cannot accommodate the
intricacies of variation in the process of mutation itself.
Recently, a series of papers reported on exceptionally
conserved non-protein-encoding elements in the genome
[10–13]. These regions are polymorphic within species and
have a spectrum of nucleotide frequencies that, in a
manner consistent with purifying selection, departs from
that expected under neutrality [14]. The existence of
polymorphism rules out the hypothesis that the exceptionally slow divergence might arise from a simple lack of
mutation. It seems inescapable that these regions must
encode a conserved function. The conserved elements are
probably involved in a variety of disparate functions. One
such function is encoded by noncoding RNAs or microRNAs, which have a key role in regulating gene
expression, and whose double-stranded structure presents
a signal that is amenable to genome-wide scanning [15].
Recently, it was shown that a conserved enhancer element
is related to an ancient retroelement, owing to its
redundancy and within-genome similarity (Gill Bejerano,
pers. commun.).
In addition to seeking regions with exceptionally low
rates of DNA base substitution, attempts have been made
to identify functional regions by their unusually low rates
of insertion and deletion in the sequence [16]. This
approach also finds regions that are remarkably devoid
of change in the human versus mouse comparison, and a
model fit suggests that w3% of the genome appears to be
undergoing purifying natural selection (assuming a
reasonable distribution of insertion rate variation).
One of the most important first tasks for investigators
studying genomic sequence is to identify the proteinencoding genes, a task that remains surprisingly challenging. The reason that such genes are so hard to find is that
they are so diverse in their structures, the signatures of
different components of genes can be subtle and genes can
overlap in the most bizarre ways. The panoply of nested
genes, overlapping genes, genes with shared exons and
exons that are read in more than one reading frame is
sufficiently bizarre to make one wonder whether accurate
gene finding is something that a computer will ever be able
to do. A development that will only continue to increase in
power and utility is to make use of pairwise alignments
and multi-alignments to find genes [17]. These methods
use both the among-species conservation and the conservation of putative functional attributes of genes (start
codon, splice signals and polyadenylation signals) to
identify probable features of genes. Possession of a
complete gene list provides a powerful tool for comparison
between species, because the inference of the absence of a
gene implies that the respective function is either lost or
replaced by another gene. Such exhaustive lists of
character states (gene presence or absence) is also
remarkably informative for phylogeny reconstruction.
www.sciencedirect.com
Vol.21 No.6 June 2006
As well as finding genes, the identification of regulatory
regions of genes is another problem that relies on
evolutionary conservation. It seemed as though the entire
genomics community was expecting the analysis of the
mouse genome sequence to identify all the conserved
regulatory elements between human and mouse, and to
use this conservation to annotate the regulatory features of
the human genome. Unfortunately, many of the regulatory
elements are not conserved, and many of the small
upstream gene regions that are conserved appear not to
have a regulatory function [18]. By comparing sequences of
many closely related primates, Boffelli et al. [19] found that
regulatory regions have a deficit of nucleotide differences
compared with flanking regions, and so they leave a
shadow of their presence. This ‘phylogenetic shadowing’
works even better than was initially proposed, and
extensions of this approach have been used successfully
to reveal regulatory features of genomes. The goal of
finding regulatory features is so important that another
major initiative of the National Human Genome Research
Institute (http://www.nhlbi.nih.gov) was launched to
identify exhaustively the functional features of 1% of the
human genome, selected to span the range of gene density,
recombination rate and several other attributes. This
‘ENCODE’ project (http://www.genome.gov/10005107) is
proving to be a goldmine for the detailed analysis of a
small portion of the genome. The project also underscores
the nature and depth of our ignorance of the remaining
99% of the genome by enabling us to compare the
analysis of the low- and high-resolution data for these
regions. This was especially well used in the Haplotype
Map project (http://www.hapmap.org), where the inference
of linkage disequilibrium with unobserved variation could
be tested in the denser ENCODE regions of the
genome [20].
Evolutionary biologists all too often find themselves in
the maddening situation of having to rely on sequences of
reasonably good quality, but whose annotation of
functional attributes seems to be little more than a
guess. One supposes that the reason so little funding
goes into genome annotation compared with sequence
generation is that annotation is somehow less glamorous,
and that perhaps in the future automatic annotation will
get better. In the meantime, a bigger limit to progress in
evolutionary genomic analysis is poor annotation rather
than lack of sequence.
Non-uniformity of the substitution process
One of the first models of molecular evolution was to
suppose that the four bases of DNA undergo substitution
at equal rates, m. This Jukes-Cantor model [21] provided a
quantitative relation between the substitution rate and
the level of sequence divergence as a function of time.
However, it was woefully inadequate because empirical
data showed that transitions occur at a much greater rate
than transversions, giving rise to a model with two
parameters, one for each class of mutation. Now, there is
a large diversity of such substitution models and a suite of
tools that enable one to fit these models (by estimating the
parameters that quantify the various rates) to sequence
alignments. However, as one scans along a genome, these
Review
TRENDS in Ecology and Evolution
substitution rate estimates vary widely. Even worse, the
best-fitting model for the substitution process for one
region of the genome has a different structure from the
best-fitting model for other genomic regions. This is
especially evident in mammals, whose genomes have
great within-species heterogeneity in base composition,
with regions of high GC content having a different
substitution rate matrix compared with regions of low
GC content. Remarkably, translocations of genetic
material from a region of low GC to high GC content
results in a shift in the substitution process so that the
newly arrived segment tends, over time, to match the GC
content of the surrounding sequence. As proud as we are of
these modeling efforts, we also know that they fail to
capture all the details of how substitutions occur. For
example, it is clear that even noncoding sequences have
strong biases in dinucleotide content [22]. This implies
that a better-fitting model would accommodate these
neighboring base effects, and could not be represented by
a simple matrix of transitions from one single base to the
next. One particular class of neighboring-base effect is the
direct empirical observation that CpG dinucleotides in
mammals have about ten times the mutation rate of
other dinucleotides.
Transposable elements provide a useful tool for
comparing the substitution process across different parts
of the genome, because they started from multiple
identical sequences and the elements observed at present
can be placed on a phylogeny with directional substitutions. Singh et al. [23] applied this approach to the 1500
nonfunctional DNAREP1-DM elements in the genomes of
Drosophila melanogaster and found enormous heterogeneity in the process of substitution. One of these biases
is a recombination-associated GC bias, which can explain
the positive correlation between GC content and local
recombination rates.
Inference of adaptive evolution of protein-encoding
genes
One of the problems in evolutionary analysis that predated genomics and yet provides important insight to
genomic analysis is the inference of past adaptive
evolution based on DNA sequence. Positive natural
selection has been inferred by a variety of approaches,
including the relative rates of substitution at synonymous
(amino-acid preserving) and nonsynonymous (amino-acid
changing) sites [24,25], comparisons between levels of
polymorphism and divergence [26,27] and analysis of
geographical patterns of variation [28]. All these methods
can be applied at a genome-wide level once there is
genome-wide alignment of protein-encoding genes, and,
more rarely, when there is genome-wide sequence data on
multiple individuals to provide polymorphism data (e.g.
[29]). One is tempted to think that this means that any
microsatellite or single-nucleotide polymorphism (SNP)
data, such as the data from the human international
Haplotype Map project could apply, but, unfortunately,
most of the methods perform best with full sequence data.
Even with complete sequence data, the past demographic
changes in the population can result in departures of the
patterns of polymorphism (the site frequency spectrum in
www.sciencedirect.com
Vol.21 No.6 June 2006
319
particular) from the neutral case. Thus, the challenge in
doing genome-wide analysis is to make use of the contrasts
of many genes embedded in the same population
demography, and to make tests of selection that are robust
to demographic effects [30,31].
Maximum likelihood methods for estimating rates of
divergence and for tests of the neutral null hypothesis
have been scaled up to comparisons of protein sequences of
human, chimpanzee and mouse [32,33]. This choice of
species was not optimal for the test because human and
chimpanzee are too closely related, and mouse is too
distantly related from primates. By choosing sets of
species with evolutionary divergence appropriate for
these methods, such as the six or so species most closely
related to Drosophila melanogaster, or the plant species in
the family Solanaceae [34], these methods will be
particularly useful for identifying lineage-specific changes
in selective pressures. The methods continue to be refined,
especially as we learn about the roles of variation in the
nucleotide substitution process, and of the way that
changes in demographic history can impact patterns of
genetic variation. One exceptional application of patterns
of polymorphism and divergence is to identify particular
residues in proteins that are likely to have deleterious
consequences by virtue of being evolutionarily invariant
across multiple species, but showing a radical change in
the mutant in question [35,36]. When one compares the
level of conservation across mammals at amino acid
residues, those positions that harbor major mendelian
disorders in humans tend to be more conserved than are
residues that are polymorphic in humans without medical
consequence. Similarly, there is some promise for methods
that predict alleles of human genes that could be
associated with disease, again based only on polymorphism and divergence data [29].
Looking forward: major growth areas in comparative
genomics
The first and most obvious trend in comparative genomics
that is easy to predict is that the data will continue to pour
in at a mind-numbing rate. I wrote this review before the
raft of analysis of the dozen Drosophila genome sequences
were open for free analysis (http://rana.lbl.gov/drosophila).
These genome data will enable any analysis to be done
that makes use of a phylogeny on a genome-wide scale.
We have seen a beginning to such phylogeny-based
analysis in mammals (e.g. [37]), and the power of these
approaches is much improved with five and more
species [38].
The second most obvious trend is that the tools by
which comparative genomics are being done now will see
considerable improvement. Methods for whole-genome
alignment are currently a crude shadow of what should
be possible. Alignment tools should combine the best
inference of functional elements, especially proteinencoding genes, and the alignments should be done
applying the best evolutionary models for the conservation of those functional elements. Genome browsers for
representing multiple species genome-wide alignments
are a nice start, but they are cumbersome, error prone
and also need radical improvement. We thus need easier
320
Review
TRENDS in Ecology and Evolution
ways to visualize and extract comparative sequence
information [39].
As the tools are improved, and as more intraspecific
genome variation becomes available, the power to make
inference about microevolutionary processes will increase,
because the most powerful tests of function make use of
patterns of polymorphism within species as well as
divergence between species. The best way to connect
sequence differences with functional differences is
through experimental manipulation of model organisms.
I would argue that the best path to understanding how
species diverge at a genomic level is to better understand
those processes within populations that generate and
maintain genomic variation. A massive experiment is
being done with humans in the medical genetics community, where studies are assessing associations between
variation at phenotypic and genotypic levels by testing
directly association in large samples that are measured for
medically relevant phenotypes and 500 000 or more SNPs.
Such studies are being carried out by the Wellcome Trust
Case-Control Consortium (http://www.wtccc.org.uk)
and one of many such efforts sponsored by the US
National Institutes of Health is to perform genotyping
of w500 000 SNPs in the Framingham Heart Study
(http://www.framingham.com/heart/) for purposes of
identifying genetic variation associated with elevated
risk for heart disease. This kind of genome-wide assessment of polymorphism provides unprecedented power
with which to quantify variation and make inferences
about past demographic and selective forces. The efforts in
medical applications have driven the cost of genotyping
down far enough that it seems inevitable that this kind of
data will become available for a variety of organisms.
There is a huge challenge in designing not only the
analytical procedures for making sensible statistical
inference from such data, but also in optimizing the
experimental design in the first place.
An understanding of the major revolutions in evolution
demands methods of analysis that can deal with gross
changes, such as radical reworking of body plans [40]. The
best way to understand how development changes
through evolution is to first acquire a better understanding of the gene regulatory networks within a model
species. For example, the extraordinary engineering of the
endo16 regulatory network in sea urchins took years of
work for Eric Davidson’s group to tease apart [41], and
now a comparative analysis of genetic regulation of early
development in the sea star is revealing many significant
parallels [42]. The future for evolutionary developmental
genetics is certainly bright.
Twenty years spanning the origin of genomics
The idea of sequencing the full genome of multiple
organisms could have been on the minds of some science
fiction writers in 1986, but most of the readers of the first
issue of TREE would have been stunned to know that, in
just 20 years, genomics would have progressed as far as it
has. Although there were a few complete genome
sequences known in 1986, including the complete human
mitochondrial genome sequence [43], the richness of the
questions that can be pursued with whole-genome data
www.sciencedirect.com
Vol.21 No.6 June 2006
continues to inspire cohorts of students and researchers.
Much of this is driven by a combination of computational
rigor and novel statistical modeling. The first methods to
estimate rates of substitution at synonymous and nonsynonymous nucleotide positions [44] were published just
one year prior to the inaugural issue of TREE. This has
grown to a massive enterprise that is now applied
routinely to whole-genomes worth of encoding sequences
to identify genes that appear to exhibit accelerated
protein evolution.
Many questions that were prominent in evolutionary
genetics before 1986 remain so, and have been enriched by
genomic data. The idea of understanding speciation by
experimentally determining the genetic basis for interspecific hybrid sterility and inviability was an active field
and is now one that flourishes anew through tools that
enable the determination of the individual genes that have
large effects on hybrid performance [45–47]. The depth of
our understanding of hybrid gene function has been
greatly expanded by being able to query the transcript
abundance of almost the entire genome [48], and
dysregulation of gene networks in hybrids provides a
unique view of how those networks evolve in the
first place.
As well as the great intellectual satisfaction that is
coming from discoveries in comparative genomics, it is a
branch of evolutionary biology that also has enormous
practical benefit. Perhaps the most obvious is that the
most rapid way to understand which aspects of a genome
have functional roles is through patterns of sequence
conservation, and our ability to annotate functional
elements of our own genome relies almost entirely on
this approach [20]. But the same could be said not only of
the genomes of model organisms, but about the genomic
basis for phenotypes of crucial pathogenic importance,
such as the transmission of Plasmodium by Anopheles
mosquitoes [49,50]. Dobzhansky’s claim is overused, but
one thing that genomic science has made patently obvious
is the essential role of evolutionary thinking in making
sense of biology.
References
1 Cheng, Z. et al. (2005) A genome-wide comparison of recent
chimpanzee and human segmental duplications. Nature 437, 88–93
2 Petrov, D.A. et al. (2000) Evidence for DNA loss as a determinant of
genome size. Science 287, 1060–1062
3 Wolfe, K.H. et al. (1992) Function and evolution of a minimal plastid
genome from a nonphotosynthetic parasitic plant. Proc. Natl. Acad.
Sci. U. S. A. 89, 10648–10652
4 Moran, N.A. (2003) Tracing the evolution of gene loss in obligate
bacterial symbionts. Curr. Opin. Microbiol. 6, 512–518
5 Mower, J.P. et al. (2004) Plant genetics: gene transfer from parasitic to
host plants. Nature 432, 165–166
6 Lolle, S.J. et al. (2005) Genome-wide non-mendelian inheritance of
extra-genomic information in Arabidopsis. Nature 434, 505–509
7 Malik, H.S. et al. (2002) Recurrent evolution of DNA-binding motifs in
the Drosophila centromeric histone. Proc. Natl. Acad. Sci. U. S. A. 99,
1449–1454
8 Richly, E. and Leister, D. (2004) NUMTs in sequenced eukaryotic
genomes. Mol. Biol. Evol. 21, 1081–1084
9 Waterston, R.H. et al. (2002) Mouse Genome Sequencing Consortium.
Initial sequencing and comparative analysis of the mouse genome.
Nature 420, 520–562
Review
TRENDS in Ecology and Evolution
10 Dermitzakis, E.T. et al. (2002) Numerous potentially functional but
non-genic conserved sequences on human chromosome 21. Nature
420, 578–582
11 Dermitzakis, E.T. et al. (2003) Evolutionary discrimination of
mammalian conserved non-genic sequences (CNGs). Science 302,
1033–1035
12 Bejerano, G. et al. (2004) Ultraconserved elements in the human
genome. Science 304, 1321–1325
13 Siepel, A. et al. (2005) Evolutionarily conserved elements in
vertebrate, insect, worm, and yeast genomes. Genome Res. 15,
1034–1050
14 Drake, J.A. et al. (2006) Conserved noncoding sequences are
selectively constrained and not mutation cold spots. Nat. Genet. 38,
223–227
15 Lall, S. et al. (2006) A genome-wide map of conserved microRNA
targets in C. elegans. Curr Biol. 16, 460–471
16 Lunter, G. et al. (2006) Genome-wide identification of human
functional DNA using a neutral indel model. PLoS Comput. Biol. 2, e5
17 Korf, I. et al. (2001) Integrating genomic homology into gene structure
prediction. Bioinformatics 17(Suppl. 1), S140–S148
18 Wray, G.A. et al. (2003) The evolution of transcriptional regulation in
eukaryotes. Mol. Biol. Evol. 20, 1377–1419
19 Boffelli, D. et al. (2003) Phylogenetic shadowing of primate sequences
to find functional regions of the human genome. Science 299,
1391–1394
20 ENCODE Project Consortium. (2004) The ENCODE (ENCyclopedia
Of DNA Elements) Project. Science 306, 636–640
21 Jukes, T.H. and Cantor, C.R. (1969) Evolution of protein molecules. In
Mammalian Protein Metabolism III (Munro, H., ed.), pp. 21–132,
Academic Press
22 Karlin, S. and Mrazek, J. (1997) Compositional differences within and
between eukaryotic genomes. Proc. Natl. Acad. Sci. U. S. A. 94,
10227–10232
23 Singh, N.D. et al. (2005) Genomic heterogeneity of background
substitutional patterns in Drosophila melanogaster. Genetics 169,
709–722
24 Hughes, A.L. and Nei, M. (1988) Pattern of nucleotide substitution at
major histocompatibility complex class I loci reveals overdominant
selection. Nature 335, 167–170
25 Nielsen, R. and Yang, Z. (1998) Likelihood models for detecting
positively selected amino acid sites and applications to the HIV-1
envelope gene. Genetics 148, 929–936
26 Hudson, R.R. et al. (1987) A test of neutral molecular evolution based
on nucleotide data. Genetics 116, 153–159
27 McDonald, J.H. and Kreitman, M. (1991) Adaptive protein evolution
at the Adh locus in Drosophila. Nature 351, 652–654
28 Akey, J.M. et al. (2004) Population history and natural selection shape
patterns of genetic variation in 132 genes. PLoS Biol. 2, e286
29 Bustamante, C.D. et al. (2005) Natural selection on protein-coding
genes in the human genome. Nature 437, 1153–1157
30 Jensen, J.D. et al. (2005) Distinguishing between selective sweeps and
demography using DNA polymorphism data. Genetics 170, 1401–1410
Vol.21 No.6 June 2006
321
31 Wright, S.I. et al. (2005) The effects of artificial selection on the maize
genome. Science 308, 1310–1314
32 Clark, A.G. et al. (2005) Ascertainment bias in studies of human
genome-wide polymorphism. Genome Res. 15, 1496–1502
33 Nielsen, R. et al. (2005) A scan for positively selected genes in the
genomes of humans and chimpanzees. PLoS Biol. 3, 976–985
34 Mueller, L.A. et al. (2005) The SOL Genomics Network: a comparative
resource for Solanaceae biology and beyond. Plant Physiol. 138,
1310–1317
35 Sunyaev, S. et al. (2001) Prediction of deleterious human alleles. Hum.
Mol. Genet. 10, 591–597
36 Ng, P.C. and Henikoff, S. (2003) SIFT: Predicting amino acid changes
that affect protein function. Nucleic Acids Res. 31, 3812–3814
37 Clark, A.G. et al. (2003) Inferring nonneutral evolution from human–
chimp–mouse orthologous gene trios. Science 302, 1960–1963
38 Wong, W.S. et al. (2004) Accuracy and power of statistical methods for
detecting adaptive evolution in protein coding sequences and for
identifying positively selected sites. Genetics 168, 1041–1051
39 Miller, W. et al. (2004) Comparative genomics. Annu. Rev. Genomics.
Hum. Genet. 5, 15–56
40 Angelini, D.R. and Kaufman, T.C. (2005) Comparative developmental
genetics and the evolution of arthropod body plans. Annu. Rev. Genet.
39, 95–119
41 Yuh, C.H. and Davidson, E.H. (1996) Modular cis-regulatory
organization of Endo16, a gut-specific gene of the sea urchin embryo.
Development 122, 1069–1082
42 Otim, O. et al. (2005) Expression of AmHNF6, a sea star orthologue of
a transcription factor with multiple distinct roles in sea urchin
development. Gene Expr. Patt. 5, 381–386
43 Anderson, S. et al. (1981) Sequence and organization of the human
mitochondrial genome. Nature 290, 457–465
44 Li, W-H. et al. (1985) A new method for estimating synonymous and
nonsynonymous rates of nucleotide substitution considering the
relative likelihood of nucleotide and codon changes. Mol. Biol. Evol.
2, 150–174
45 Presgraves, D.C. et al. (2003) Adaptive evolution drives divergence of
a hybrid inviability gene between two species of Drosophila. Nature
423, 715–719
46 Barbash, D.A. et al. (2003) A rapidly evolving MYB-related protein
causes species isolation in Drosophila. Proc. Natl. Acad. Sci. U. S. A.
100, 5302–5307
47 Sun, S. et al. (2004) The normal function of a speciation gene,
Odysseus, and its hybrid sterility effect. Science 305, 81–83
48 Ranz, J.M. et al. (2004) Anomalies in the expression profile of
interspecific hybrids of Drosophila melanogaster and Drosophila
simulans. Genome Res. 14, 373–379
49 Richards, S. et al. (2005) Comparative genome sequencing of
Drosophila pseudoobscura: chromosomal, gene, and cis-element
evolution. Genome Res. 15, 1–18
50 Christophides, G.K. et al. (2004) Comparative and functional
genomics of the innate immune system in the malaria vector
Anopheles gambiae. Immunol. Rev. 198, 127–148
AGORA initiative provides free agriculture journals to developing countries
The Health Internetwork Access to Research Initiative (HINARI) of the WHO has launched a new community scheme with the UN Food
and Agriculture Organization.
As part of this enterprise, Elsevier has given 185 journals to Access to Global Online Research in Agriculture (AGORA). More than 100
institutions are now registered for the scheme, which aims to provide developing countries with free access to vital research that will
ultimately help increase crop yields and encourage agricultural self-sufficiency.
According to the Africa University in Zimbabwe, AGORA has been welcomed by both students and staff. ‘It has brought a wealth of
information to our fingertips’ says Vimbai Hungwe. ‘The information made available goes a long way in helping the learning, teaching
and research activities within the University. Given the economic hardships we are going through, it couldn’t have come at a better time.’
For more information visit:
http://www.healthinternetwork.net
www.sciencedirect.com