Download Gene Networks Have a Predictive Long-Term Fitness

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Genetic algorithm wikipedia , lookup

Transcript
Gene Networks Have a Predictive Long-Term Fitness
Michael E. Palmer
Department of Biology
Stanford University
Stanford, CA 94305, USA
+1-415-867-3653
[email protected]
ABSTRACT
new experimental replicate, that genotype A will, on average,
increase by factor wA, and genotype B will increase by factor wB,
over one generation. In the field of evolutionary computation
(EC), the one-generation is predictive by fiat: we can
intentionally program the simulation such that the genotypes
will increase in number by their fitness factors, on average over
one generation. But what about the predictability of evolutionary
success over the long-term? In EC, we do not explicitly program
the long-term dynamics: they are implicitly produced by the
many interacting parts of our model. This is comparable to the
situation in biology, where the short-term rules of population
genetics are well understood, but long-term evolution is often
considered to be unpredictable, nondeterministic, and even
creative, like a “blind watchmaker” [2].
Using a model of evolved gene regulatory networks, we
illustrate several quantitative metrics relating to the long-term
evolution of lineages. The k-generation fitness and k-generation
survivability measure the evolutionary success of lineages. An
entropy measure is used to quantify the predictability of lineage
evolution. The metrics are readily applied to any system in
which lineage membership can be periodically counted, and
provide a quantitative characterization of the genetic landscape,
genotype-phenotype map, and fitness landscape. Evolution is
shown to be surprisingly predictable in gene networks: only a
small number of the possible outcomes are ever observed in
multiple replicate experiments. We emphasize the view that the
lineage (not the individual, or the genotype) is the evolving
entity over the long term. Notably, the long-term fitness is
distinct from the short-term fitness. Since evolution is repeatable
over the long-term, this implies long-term selection on lineages
is possible; the evolutionary process need not be “short-sighted”.
If we wish to evolve very complex artifacts, it will be expedient
to promote the long-term evolution of the genetic architecture by
tailoring our models to emphasize long-term selection.
Neither biologists, nor practitioners of EC, typically think of
selection as acting directly on lineages, rather than on
individuals or genotypes. However, in models in which longterm evolutionary outcomes are 1) predictable, and 2) not fully
predicted by the short-term fitness, it is often appropriate to
frame the evolutionary process as acting on lineages. Lineage
evolution will in this case describe the long-term dynamics both
simply and with sufficient accuracy. We might slightly increase
our descriptive accuracy by including the voluminous details of
organismal-level selection; but this would hide higher-level
consistencies best captured concisely as lineage-level selection.
Categories and Subject Descriptors
I.2.2 [Artificial Intelligence]: Automatic programming. D.1.2
[Software Engineering]: Automatic programming. J.3. [Life
and Medical Sciences]: Biology and genetics.
In this article, we employ the “gene networks” model of Wagner
[10] and Siegal and Bergman [8] to study the repeatability of
long-term evolutionary outcomes. We illustrate use of several
metrics of the long-term evolutionary success of lineages,
including the k-generation fitness and the k-generation
survivability. Using an entropy measure, we quantify the
predictability of long-term evolution in gene networks.
Previously, an computational model of lattice proteins has been
shown to have a predictive long-term fitness [7]; here, we will
show the same for the gene networks model.
General Terms
Measurement, Experimentation, Theory.
Keywords
gene networks, lineage selection, long-term evolution,
survivability, predictability, repeatability, entropy, complexity.
1. INTRODUCTION
2. METHODS
2.1 Gene networks model
As employed in biology, the standard one-generation fitness is
useful because it is often predictive. For example, a
microbiologist might estimate the fitnesses of two competing
strains of bacteria, in multiple replicate experiments. These
fitness estimates will enable the microbiologist to predict, in a
In the gene networks model, the phenotype of an individual is an
M-dimensional vector representing the expression rates of M
gene products, as shown on the right side of Figure 1. In the
variant of the model [5] used here, the elements of the
phenotype vector range from 0 to 1 (representing no expression,
to maximal expression).
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
GECCO’13, July 6–10, 2013, Amsterdam, The Netherlands.
Copyright © 2013 ACM 978-1-4503-1963-8/13/07…$15.00.
The heritable genotype is an M by M matrix, shown on the left
of Figure 1. Element (i, j) of the matrix represents the regulatory
action of gene product j upon the expression of gene product i,
and may be negative (inhibiting expression), zero (no effect), or
positive (promoting expression).
727
The parameter σ (set to 0.1 here) permits control of the severity
of selection (lower σ makes relative selection stronger).
Initially, a random fraction c of the matrix elements are set to
values drawn from the standard normal distribution, and the rest
are set to zero; this permits control of the amount of epistasis.
This epistatic interaction among the genes makes the adaptive
landscape of the model multimodal and complex.
Mutations of the genotype matrix are effected by selecting a new
value from the standard normal distribution for one of the nonzero matrix elements; zeroes never mutate.
At each generation, we repeat the following operations in order:
replication proportional to fitness; sampling to the carrying
capacity N; mutation; migration (if there are multiple demes).
2.2 Measuring the fitness of lineages
We define an asexual lineage as a founding individual organism
and zero or more generations of its descendants. A lineage is
therefore monophyletic. We assign each founder a unique tag
number. This tag is inherited without error, permitting us to
count the number of living descendants of a founder, or the
“membership” of a lineage, at any time. (Here, all reproduction
is asexual, but similar tracking of sexual lineages is possible if
we restrict the founders of lineages to be all the members of a
new species at the moment of speciation. With this restriction,
sexual lineages, like asexual ones, may never fuse together [6].)
Figure 1. During “development” the gene expression
vector P is repeatedly multiplied by the regulatory
matrix G, and then normalized.
We conduct multiple replicate runs of a given experimental
scenario. All replicates of a scenario are identical in their
founding individuals, their initial and target vectors (uinit and
utarg), and in their experimental parameters (e.g., the mutation
rate µ, the parameters a, c, and σ). However, the replicates differ
stochastically: a different random seed is used so that stochastic
differences arise in 1) which mutations occur, 2) how the
population is sampled down to the carrying capacity during
reproduction, and 3) random migration (in those experiments
below that include multiple demes). Thus the evolutionary
outcomes in a set of replicates of a given scenario provide a
statistical picture of what may be expected to occur in that
scenario when the initial conditions and other parameters are
fixed, but random mutation, sampling, and migration occur.
At the beginning of the developmental process, each
individual’s M-dimensional phenotype
vector is set to a fixed
.
initial vector uinit (with M elements selected uniformly from the
interval [0, 1]). As illustrated in Figure 1, each iteration of the
developmental process consists of two parts: first, the genotype
matrix and the current phenotype vector are multiplied together;
and second, the entries of the resulting vector are normalized
with the following sigmoid normalization function, which maps
each element x of the matrix product back to the interval [0, 1]:
The parameter a controls the steepness of the sigmoid function;
here, we use an a value of 1. The vector of normalized elements
becomes the phenotype vector for the next iteration of the
developmental process.
The standard one-generation fitness from population genetics
measures the expected factor of increase in the count of a
particular genotype over a time span of one generation. The
fitness is useful because it is (often) predictive: if we measure
the fitness in multiple replicates of a given scenario, we can
(often) expect that a new replicate will yield a similar factor of
increase in number for a given genotype. (The degree of
predictability will depend on the scenario, as we discuss below.)
If the individual’s phenotype converges to any vector in less
than 100 developmental iterations (rather than entering a cycle,
or not converging), then that individual is declared to be viable;
otherwise, it “dies” and is removed from the population.
Convergence is determined as follows. Let d be the distance
metric between two phenotype vectors ua and ub defined by:
Extending the standard one-generation fitness, Palmer and
Feldman [6] previously defined a long-term fitness of lineages,
as follows:
π "# N ( t ) = n $% is defined as the probability that a lineage has n
members at generation t. This can be estimated with a sufficient
number of longitudinal experimental replicates in which lineage
membership is periodically counted.
At the end of each developmental iteration, an average
phenotypic vector uavg over the last τ=10 iterations is computed.
If the average d between the last τ phenotype vectors and uavg is
below the threshold value ε=10-4, then convergence has
occurred, and the individual is declared viable.
N ( t ) is the expected number of members of the lineage at
generation t, namely,
An M-dimensional phenotypic “target vector”, utarg, the M
elements of which are floating point numbers chosen uniformly
from the interval [0, 1], defines the best possible phenotype. For
all viable offspring, we compute the fitness, f, using the distance
between the final, converged phenotype, uphen, and the target
vector:
N ( t ) = ∑ nπ "# N ( t ) = n $% .
n
Wk ( t ) is the “k-fitness” at generation t, namely, the ratio of the
expected number of members of the lineage at generation t+k, to
the expected number at generation t:
Wk (t ) =
728
N (t + k )
,
N (t )
(t ≥ 0,
k ≥ 1) .
Note that W1 ( t ) is equivalent to the standard one-generation
fitness. We will refer to k=1 as the immediate term; k>1 as the
short term; and k>>1 as the long term. Note that the standard
(k=1) fitness can only depend on the standing variation in a
lineage. However, different lineages have different tendencies to
generate new variation over time. On time spans of k>1, the
variation generated by a lineage begins to affect its k-fitness; for
k>>1 this generated variation becomes increasingly important.
H joint (k) estimates the average uncertainty in the outcome of a
new experimental replicate at generation k. Note that
0 ≤ H joint (k) ≤ L (in units of bits).
If H joint (k) = 0 , then the same state always occurs at generation
, then
k: the outcome is perfectly predictable. If
the outcomes
unpredictable.
The term evolvability has come to mean the tendency to produce
adaptive variation [12]. However, it has been shown [6, 7] that
not only adaptive variation matters: the tendency of lineage to
avoid deleterious variation, to disperse in space, and to diversify
in phenotype can also matter (the latter two are sometimes
described as “bet hedging”) to the long-term lineage fitness, and
to its long-term survival.
are
equally
likely, and
thus
maximally
We also compute H i (k) , the entropy of the survival state of
each distinct lineage i. (Below, we will show that these lineage
entropies, H i (k) , are useful for interpreting the variation in the
joint entropy, H joint (k) , over time.) The k-survivability of
lineage i, Ski ( 0 ) , is the probability of its survival to generation k
Palmer and Feldman [6] also defined the “k-generation
survivability”, Sk ( t ) , which measures the probability that a
(given that it was present at generation 0), so H i (k) turns out to
:
be a simple function of
lineage will survive to generation t+k, given that it has living
members at generation t:
.
Sk ( t ) = π ⎡⎣ N ( t + k ) ≠ 0∣N ( t ) ≠ 0 ⎤⎦ .
Because a lineage has only two possible survival states,
Selection on lineages is ultimately determined by extinction,
because once a lineage proceeds to zero members, it cannot
return. Thus Sk ( t ) is the more fundamental metric of lineage
0 ≤ H i (k) ≤ 1 (in units of bits).
The joint entropy and the lineage entropies obey the following
inequalities:
success; however, in the shorter term, extinctions may be rare,
so Wk ( t ) can be a useful proxy.
H joint (k) ≥ max i "# H i ( k ) $% , and
The question we address here is: how repeatable are
evolutionary outcomes in gene networks over the long term?
Does observing the outcome of multiple replicates give us
power to predict the long-term outcome of a new replicate?
H joint (k) ≤ ∑ H i ( k ) .
i
All of these metrics can be readily estimated in any situation in
which we can: 1) periodically count the membership of each
lineage, and 2) conduct a sufficient number of experimental
replicates. They provide a quantitative description of the
predictability of a particular experimental scenario, and
predictions of the evolutionary success of each lineage.
There are two ways that the k-fitness and k-survivability could
fail to be useful: 1) if long-term outcomes are simply not very
predictable; in that case, estimating the long-term metrics will
not be informative about what will happen in a new replicate; or
2) if the short-term outcomes (e.g., the 1-fitness) completely, or
very strongly, determine long-term outcomes.
Experimental setup
We will consider the long-term evolutionary outcome of a
replicate to be set of lineages, out of those initially present, that
survive in the long term. For example, if one lineage always
outcompetes the others, driving them extinct, in every replicate,
then the outcome is certain predictable.
We present two sets of experiments. In the single deme
experiments, we examine a single deme (D=1) of N=8192
individuals. The population is founded by L=128 founder
individuals that were screened for viability; N/L clones of each
is made for a total of N individuals. The mutation rate is µ=0.01
per individual per generation, σ=0.1, a=1, and c=0.25. We
simulated 32 scenarios; each scenario corresponds to a new set
of founders and new uinit and utarg vectors. We ran R=512
replicates of each scenario. In each replicate, all of the above
conditions are the same, except we set a different random seed
just before evolution begins; this produces different stochastic
mutations and stochastic sampling (down to the population size
N at each generation) in each replicate of a scenario. Thus the
replicates as a group paint a statistical picture of the scenario.
Each replicate runs for Gmax=2,000 generations.
We use the following quantitative metric of evolutionary
predictability [6]. Call s(i, j, k) the “survival state” of lineage i in
replicate experiment j at generation k. s(i, j, k) = 1 if lineage i
has any living members at generation k of replicate j, and 0
otherwise. The “joint survival state” M(j, k) is a binary vector
composed of elements s(i, j, k), ordered by i; it indicates the
realized survival state of all lineages at generation k of replicate
j. If there are L lineages, there are 2L possible joint survival
states at any time, in one replicate. By averaging over multiple
replicates, we can estimate pm(k), the probability that each joint
survival state m will occur at generation k.
In the multiple deme experiments, we examine a metapopulation
of D=16 demes of N=512 individuals (for a total population of
ND=8192), connected by random migration among all demes.
There were again L=128 founders in total. All ND/L=64 clones
of each founder are initially assigned to the same deme. The
initial vector, uinit, is the same in all demes, within a scenario.
However, each of the 16 demes is assigned a distinct target
vector, utarg, related in the following way: We generate a first
We compute the entropy of the joint survival state at time k,
,
as a measure of the unpredictability of evolutionary outcomes in
a particular evolutionary system (i.e., a particular scenario).
729
eventually fixes. Its long-term fitness at generation k=2,000 is
=128; thus it has increased by a factor of 128, which
vector (by drawing M elements uniformly from the range 0 to 1
inclusive) and assign it to deme 1. We make a copy of this
vector, subject each of its M elements to a chance p=0.1 of being
replaced with a new random draw, and assign this vector to
deme 2. By this process, we sequentially generate a vector for
each of the 16 demes, with a chance of the vector being
modified each time; each subsequent vector may thus become
increasingly different from the first one. This is to create some
continuity among the demes, to give new migrants a greater
chance to invade. We conducted R=512 replicates of 32 such
scenarios. Again, each scenario has different founders and target
expression vectors; and each replicate of a scenario differs only
in the stochastic events (mutation, sampling, migration) that
occur. Here, the number of generations is Gmax=100,000
(because it takes longer to reach equilibrium in a metapopulation
[9]), µ=0.01 per individual per generation, σ=0.1, a=1, and
c=0.25; the migration rate is mig=0.01.
is the maximum possible. (Since each lineage is founded with
N/L clones and the maximum count is N, the maximum possible
at any k is L.). In the center plot in the top row of Figure
2, the plot of
indicates that lineage 0018 never goes
extinct: it has a 100% chance of survival to generation 2,000; all
other lineages go extinct between approximately generation 5
and 200. This happens in all 512 replicates (otherwise they
would show a small probability of surviving); thus the
evolutionary outcome – survival of lineage 0018 only – is
extremely repeatable for this scenario (first row). The top right
panel of Figure 2 quantifies this predictability at each generation
k: H joint (k) , in black, starts at zero because, initially, all 128
lineages are certain to be alive. It increases rapidly to a value of
9. Since we performed 512 replicates, the maximum observable
joint entropy is log2(R=512)=9; this is observed when a different
joint survival state M(j, k) occurs in every replicate j. Thus our
observations of H joint (k) are “clipped” at a maximum of 9, as
3. RESULTS
3.1 Single deme
In Figure 2 we show two scenarios selected from the 32 single
deme (D=1) scenarios. Each row plots data from one scenario.
The columns show
,
, and
, respectively,
indicated by the horizontal dotted line in the entropy plots (right
column). (We would have to run more replicates in order to raise
this clipping line.) H joint (k) is high between approximately
with colored lines for each of the L=128 lineages. (The error
show the standard error of the mean.) In
bars about
generations 5 to 200 because of the uncertainty in the exact
, top
extinction order of many lineages (note in plot of
addition, the plots in the right column also show H joint (k) , in
black. All L=128 lineages are plotted; however, in each row,
only the five lineages with the highest “weight” (sum of N ( t )
center plot, that many lineages are tending to extinction).
over all t) [3] are explicitly labeled in the figure legends. (For
example, in the first scenario (row), all 128 lineages are plotted
but only the highest-weight lineages, 0018, 0115, 0003, 0103,
0039, are explicitly labeled.)
this point, we can be quite certain that a single joint survival
state will be observed: that in which all lineages except lineage
0018 are extinct. We can predict with high certainty that, in a
new replicate of this scenario, lineage 0018 would again survive
alone after generation 200; in contrast, we will be unable to
predict with certainty which lineages would be surviving, or
extinct, around generation 10 of a new replicate.
In the top left of Figure 2, the plot of
Since all lineages except one (i.e., lineage 0018) go extinct by
generation ~200, H joint (k) decreases to zero by this point. After
shows that lineage
0018, in red, immediately dominates the population, and
Figure 2. Two selected single-deme scenarios (rows). Columns show k-fitness, k-survivability, and entropy. First row: red
lineage dominates. Second row: green is initially fitter, but red wins in the long term; red has a higher “long-term” fitness.
730
In the second row of Figure 2, we show data from another
selected scenario for D=1 (out of the 32 single deme scenarios
conducted). In this case, something interesting happens: see the
bottom left panel of Figure 2. Although lineage 0030 (red)
dominates in the long term, it is lineage 0107 (green) that has
the higher
in the short term. Thus, the short-term fitness
and the long-term fitness are distinct. The initial standing
variation in a lineage (in our experiments, a lineage initially
consists of clones of the founding individual) contributes to the
immediate (k=1) and short term (k>1) fitness; however, the new
variation that a lineage produces contributes to its longer-term
(k>>1) fitness. The green lineage is initially more fit, but the red
lineage adapts more effectively. Thus, the green lineage has a
higher 30-fitness; we expect it to increase by about 95x by
generation 30 (i.e., its
= 95). The red lineage increases
only about 25x by this time. However, the red lineage has a
higher 2,000-fitness: we expect it to increase 105x by generation
= 105), while the green lineage is
2,000 (i.e., its
Figure 3. Joint entropy averaged over 32 single-deme
scenarios. Long-term uncertainty is less than one bit!
expected to increase only about 20x by k=2,000.
scenarios is plotted (error bars show the standard error of the
mean). We see that the two selected scenarios shown in the
previous figure were not atypical:
typically increases
In the center panel in the bottom row, the green lineage is only
~18% likely to survive in the long-term (k=2,000); the red
lineage has a much higher
of ~82%. In the long term,
rapidly to the maximum observable value (log2(R=512)=9), and
decreases to just 0.8 bits in the long term. A typical single-deme
scenario (with parameters as above) is more predictable than a
fair coin flip! Despite the fact that we initially begin with L=128
lineages, one can say, roughly speaking, that there are typically
only about two lineages in contention to dominate in the long
term, in any given scenario. In this sense, we say that evolution
is very repeatable in gene networks.
the marathoner, rather than the sprinter, tends to win this race.
In the right panel of the second row, notice that the long term
, in black, is not zero (unlike in the first
entropy,
row). This quantifies how uncertain the long-term outcome is in
this scenario: about 0.75 bits. We could predict the outcome of a
new replicate with less uncertainty than predicting a fair coin
flip (1 bit). The colored lines in the bottom right panel plot
from generation
H i (k) ; they make it clear that the
In a single deme, by generation 2,000, one lineage will almost
always fix. But if we split the population up into a structured
population of multiple demes, then different lineages can
survive in different demes for very long periods [9]. Will
evolution still be so repeatable in the multi-deme case?
~100 onward is mostly due to uncertainty in the extinction of the
red and green lineages; others are likely to be already extinct.
In Figure 3, the average of
over all 32 single deme
Figure 4. Two selected multi-deme scenarios (rows). The short-term and long-term fitnesses are again distinct, and outcomes
are quite repeatable (low entropy), despite the fact that multiple lineages can survive in the long term.
731
3.2 Multiple demes
initially the most fit, but loses out to lineage 0030 (red) in the
long term. In Figure 6, we can see why.
In Figure 4, we present two scenarios selected from 32 multideme scenarios (D=16 demes of N=512 individuals each, for a
total population of ND=8192). The runs are of Gmax=100,000
generations; other parameters are as specified in Experimental
setup. Again, all lineages are plotted in each panel, but only the
highest “weight” lineages are explicitly named in the panel
legends. In the first scenario (row) of Figure 4, a single lineage,
0055 (red) dominates from the beginning to the end of the
experiment, as can be seen in the top left panel. Note in the
center panel in the top row, that the red lineage always survives
in the long term, but multiple other lineages also have a chance
of surviving to generation 100,000. The sum of
over
all lineages is more than 2.0, indicating that about two lineages
survive, on average. (In contrast, in a single deme, a single
lineage will almost always fix within 2,000 generations, so the
long-term k-survivabilities of the lineages sum to ~1.0.) In the
top right panel, the joint entropy in the long term is ~2.5 bits.
In the second row of Figure 4, another multi-deme scenario is
shown. We selected a scenario in which lineage 0047 (blue) has
a high short-term fitness, but does not adapt as well as lineage
0105 (red) or lineage 0026 (green) over the long term. Thus, the
short-term and long-term fitnesses are distinct. In the middle
panel of the bottom row, multiple lineages have at least a 10%
chance of surviving in the long term (in one or more demes). In
the bottom right panel, the long-term joint entropy is ~7 bits.
Figure 6. Plots of expected counts for each unique genotype
belonging to the two highest-weight lineages (0030, red; and
0107, green) from the second scenario of Figure 2 support
the view of the lineage as the selected entity.
All genotypes that attained at N ( t ) of at least 10 are plotted1
and the black line indicates the total N ( t ) for the entire lineage.
The 20 highest-weight genotypes are explicitly named in each
panel. The names are ordered (top to bottom) by the time that
the genotype attains its peak value of N ( t ) . (Later-peaking
genotypes are fitter than earlier-peaking genotypes, in general.)
For example, the founding genotype of the red lineage has the
name, JU.XANHV.HCU.SX.X.K.D.LTUZ.FRDT.KUC. This string
is divided into ten parts separated by “.” characters; each part
corresponds to one row of the genotype’s M by M matrix.
Within one row, zero values are ignored. (Recall that, in the
model, zero values may never mutate into nonzeros.) Nonzero
values (which are normally distributed when initially generated)
are mapped through the cumulative distribution function (CDF)
of the standard normal to produce a number between zero and
one; this number is mapped uniformly onto the capital letters
ABC...XYZ. Thus an “A” and “Z” represent extreme negative
and positive matrix values, respectively. The letters within each
row are concatenated, and the rows linked with “.” characters.
The name is thus a discretized representation of the matrix, and
can be used to roughly follow inheritance and mutation by eye.
For example, the second-highest-peaking red genotype arises
Figure 5. Joint entropy averaged over 32 multi-deme
scenarios. Long-term uncertainty is still low,
considering that there are 2128 possible outcomes.
In Figure 5, the joint entropy averaged over 32 multi-deme
scenarios is shown: the uncertainty is about 5.3 bits in the long
term for the multi-deme scenarios (as defined). Starting with L
lineages, with 2L possible outcomes (joint survival states), only a
handful of the possible outcomes are observed with significant
frequency. Thus, we may again say that, even for multi-deme
scenarios, evolution in gene networks is very predictable. (See
[7] for a metric that quantifies how surprising a particular
degree of predictability is, based on the negentropy.)
3.3 The lineage as the selected entity
In Figure 6, we plot the expected counts, N ( t ) , of individual
genotypes. In the two panels of the figure, we plot genotypes
from the two highest-weight lineages from the second row of
Figure 2; this is the scenario in which lineage 0107 (green) is
1
N ( t ) is averaged over 256 replicates; at any generation in a
replicate, genotypes with a count of 1 are ignored.
732
The gene networks model used here was originally derived from
a model of artificial neural networks [11]; it is not unlikely that
commonly-used neural network models also possess a predictive
long-term fitness.
around generation 15 and peaks around generation 100; it is
named JU.XANHV.HCU.SA.X.K.D.LTUZ.FRDT.KUC. It has a
single mutation relative to the lineage founder: the row SX
changes to SA (where X represents a very positive matrix value,
and A a very negative value). This mutation and others at the
same locus (e.g., to SD, SC, SE) permit the red lineage, which is
initially less numerous than the green lineage (see bottom left
panel of Figure 2), to overtake it. Subsequently, a number of
genotypes (lower eight labels in upper (red) panel of Figure 6)
descend from these and other contemporaneous competitors in a
third wave of increase. The green lineage, founded by
GO.EOVHO.UFQS.DSG.OXU.EZ.EOHR.ZFQ.MEX.JEI,
does
diversify into many other genotypes, but fails – repeatedly in
many replicates – to promptly find a genotype to answer the
challenge of the red lineage’s SX to SA mutant (and similar
variants). This figure reinforces the view that the lineage is the
entity that possesses a long-term fitness, rather than a particular
individual or genotype. The long-term success of the red lineage
is not due only to its founding genotype, or solely to any
particular genotype. The failure of the green lineage is due to the
repeated inability of the lineage as a whole to attain superior
points in genotype space. The immediate fitness of the founding
genotype only influences, but does not fully determine, the longterm success of the lineage.
Natural selection has been described as being “short-sighted”
[1]. We have shown here that this need not be the case in gene
networks (for example, in the bottom rows of Figure 2 and
Figure 4, respectively). The long-term fitness is only partly
predicted by the short-term fitness, and is therefore distinct from
it. Long-term selection on lineages can repeatedly trump shortterm selection, under some conditions. (For example, see [6] for
the effects of population size on the balance between short-term
and long-term selection.)
In asexuals, each individual organism is the founder of a new
lineage. Also, each organism is a member of many lineages:
those founded by each of its ancestors. (We will fully treat the
extension to sexual lineages in a future paper. Here, suffice it to
say that sexual lineages are similar if we restrict the founding of
sexual lineages to speciation events: the members of each new
species comprise the founders of a lineage. This restriction
implies that sexual lineages, like asexual lineages, cannot fuse.)
When an organism dies, it affects the k-fitness and ksurvivability of many different lineages: those founded by each
of its ancestors. Likewise, the reproductive success of one
organism affects the success of its many ancestor lineages. In
the simulations discussed above, we selected a set of founding
individuals at generation zero and tracked their
and
4. DISCUSSION
A particular evolutionary process might be chaotic: a tiny
difference in the state at one moment in time will lead to
unbounded future changes in the dynamics. However, we have
shown that this is not the case in the gene network model, as
defined. Instead, long-term evolution in gene networks is very
repeatable, and hence a new replicate experiment is predictable.
This implies the existence of basins of attraction in the multidimensional space formed by the genetic landscape (i.e., the
mutational neighborhood in this, asexual, case), the genotypephenotype map (the gene network “developmental process”
illustrated in Figure 1), the fitness landscape (due to the fitness
function and target expression vector(s) in a particular scenario),
and other details of the population genetic model. The entire
metapopulation as a whole can be considered to follow these
basins of attraction, from the initial state, to the final long-term
state (as described by the joint survival state of the lineages at
generation k). Although different stochastic events (mutation,
sampling, migration) occur between replicates, nonetheless the
basins of attraction, or “evolutionary pathways”, are constant
within an experimental scenario, and we have shown that they
converge to a relatively small number of outcomes with
significant probability. The metrics illustrated here give a
quantitative summary of this high-dimensional space of
pathways. Despite this complexity, the metrics are easily
measured in any model in which one can: 1) periodically count
the membership of each lineage, and 2) conduct a sufficient
number of experimental replicates of a given scenario.
; but we could have also selected any individual at some
generation t of any replicate and “forked” the replicate at that
and
of the newlypoint, in order to compute the
founded lineage from that point. Thus, selection on lineages is
layered and recursive: many layers of overlapping lineage
selection are proceeding simultaneously at all times.
The “viewpoint” of the lineage as the selected entity may seem
alien. To a lineage, survival is more fundamentally important
than increase to high numbers of individual members [6].
Individuals are expendable, until their numbers become small.
While lineages do have to be immediately fit to survive in the
short term, their ability to adapt and diversify is also important
to their survival over the long term. Individual organisms do not
have such long-term concerns.
If we wish to employ the power of artificial evolution on a large
scale to produce artifacts of great complexity – artifacts that are
more complex than we can design or code manually – then we
must understand how to harness long-term selection. Without a
degree of long-term repeatability, the evolutionary process is
doomed to “short-sightedness”, and reduced to brute-force
search, or worse: we may apply datacenter-scale computing, but
still remain trapped in short-term evolutionary dead ends. Using
the metrics illustrated here, we can measure long-term
predictability in any given model. One can then tweak and
adjust the model – and measure again – in an attempt to
emphasize long-term over short-term selection. This suggests a
modus operandi to harness long-term evolution to efficiently
search very high-dimensional landscapes. It is over the long
term that the “magic” of evolution occurs: the gradual accretion
of genetic and phenotypic complexity. Emphasizing longer-term
selection will enable evolutionary methods to construct a path of
multiple stepping-stones – ones that we cannot ourselves foresee
– to meet the complex goals that we define for it.
Repeatability of evolution has previously been shown for an in
silico model of lattice proteins [7]. If the present result in gene
networks, and the result in lattice proteins, carry over to
biological regulatory networks and biological proteins, then
much of biological evolution might be conceived of as
proceeding via similar high-dimensional basins of attraction.
Indeed, evolutionary repeatability has been observed in vivo: in
a model of protein evolution [4, 13] and in bacteria [14].
733
5. ACKNOWLEDGMENTS
[7] Palmer, M.E. et al. 2013. Long-term evolution is
surprisingly predictable in lattice proteins. Journal of the
Royal Society Interface. 10, 82 (Feb. 2013), 20130026.
This work was supported in part by NIH grant GM28016.
6. REFERENCES
[8] Siegal, M.L. and Bergman, A. 2002. Waddington's
canalization revisited: developmental stability and
evolution. Proceedings of the National Academy of
Sciences. 99, 16 (Aug. 2002), 10528–10532.
[1] Clune, J. et al. 2008. Natural Selection Fails to Optimize
Mutation Rates for Long-Term Adaptation on Rugged
Fitness Landscapes. PLoS Computational Biology. 4, 9
(Sep. 2008), e1000187.
[2] Dawkins, R. 1986. The Blind Watchmaker. W. W. Norton
& Company.
[9] Slatkin, M. 1981. Fixation probabilities and fixation times
in a subdivided population. Evolution; international
journal of organic evolution. 35, 3 (1981), 477–488.
[3] Desai, M.M. and Fisher, D.S. 2007. Beneficial Mutation
Selection Balance and the Effect of Linkage on Positive
Selection. Genetics. 176, 3 (Jun. 2007), 1759–1798.
[10] Wagner, A. 1996. Does evolutionary plasticity evolve?
Evolution; international journal of organic evolution.
(1996), 1008–1023.
[4] Lozovsky, E.R. et al. 2009. Stepwise acquisition of
pyrimethamine resistance in the malaria parasite.
Proceedings of the National Academy of Sciences. 106, 29
(2009), 12025–12030.
[11] Wagner, A. 1994. Evolution of gene networks by gene
duplications: a mathematical model and its implications on
genome organization. Proceedings of the National
Academy of Sciences. 91, 10 (May. 1994), 4387–4391.
[5] Palmer, M.E. and Feldman, M.W. 2009. Dynamics of
hybrid incompatibility in gene networks in a constant
environment. Evolution; international journal of organic
evolution. 63, 2 (Mar. 2009), 418–431.
[12] Wagner, G.P. and Altenberg, L. 1996. Perspective:
Complex adaptations and the evolution of evolvability.
Evolution; international journal of organic evolution. 50, 3
(1996), 967–976.
[6] Palmer, M.E. and Feldman, M.W. 2012. Survivability Is
More Fundamental Than Evolvability. PloS one. 7, 6 (Jul.
2012), e38025.
[13] Weinreich, D.M. et al. 2006. Darwinian Evolution Can
Follow Only Very Few Mutational Paths to Fitter Proteins.
Science. 312, 5770 (May. 2006), 111–114.
[14] Woods, R.J. et al. 2011. Second-Order Selection for
Evolvability in a Large Escherichia coli Population.
Science. 331, 6023 (2011), 1433–1436.
734