Download Beisel, Craig Jason

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Population genetics wikipedia , lookup

Species distribution wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
RA Fisher
1890 - 1962
“Natural selection is a
mechanism for
generating an
exceedingly high
degree of
improbability”
Testing for the Extreme Value
Domain of Attraction of Beneficial
Fitness Effects
Craig J. Beisel
Bioinformatics and Computational Biology
Department of Mathematics
[email protected]
www.beisel.net
Concepts
Natural Selection
The differential survival and reproduction of
individuals within a population based on hereditary
characteristics.
Concepts
Adaptation
The adjustment of an organism or population to a
new or altered environment through genetic
changes brought about by natural selection.
Concepts
Phenotype
The overall attributes of an organism arising due to
the interaction of its genotype with the environment.
Concepts
Genotype
The specific genetic makeup of an individual
Concepts
Fitness
Describes the ability of a genotype to reproduce.
More formally, it is defined as the ratio of the counts
of a genotype before and after one generation.
Concepts
Fitness Landscape
A function mapping genotype into fitness.
Concepts
Fitness Distribution
The distribution of fitness for every possible
genotype in a fixed environment.
Lethal
Moderate
High
Mutational Landscape Model
John Maynard Smith
(1920 – 2004)
First remarked that
adaptation does not take
place in phenotypic
space, but in sequence
space…
Mutational Landscape Model
Gillespie (1983)
Given a sequence of nucleotides of length L,
There are 4L possible sequences.
Each sequence has 3L neighboring sequences which
are exactly one point mutation away.
Mutational Landscape Model
Additionally, if we assume Strong Selection and
Weak Mutation (SSWM) then we can ignore the
possibility of clonal interference.
Formally 2Ns >>1, Nμ<1
Therefore new mutants will fix (or not) in the
population before the next mutant arises.
Also, double mutants and neutral/deleterious
mutations can be ignored.
Mutational Landscape Model
Consider a sequence in an environment where it is
currently the most fit.
A small change occurs in the environment which
shifts it to be the ith most fit sequence among its
one-step mutant neighbors where i is small.
Mutational Landscape Model
There are then i-1 more fit sequences which the
population could move to.
Notice that the fitnesses of these sequences are in
the tail of the fitness distribution.
Mutational Landscape Model
We would like to find the probability of the population
fixing mutant j when starting with sequence i.
Since we are dealing with only the tail of the fitness
distribution we can apply EVT.
Orr’s One Step Model
Assumptions
The fitness distribution is in the Gumbel domain of
attraction and therefore the fitnesses of the i-1
more fit one-step mutants can be considered to be
drawn from an ‘exponential’ distribution by GPD.
This will allow a result which is independent of the
underlying fitness distribution.
Orr’s One Step Model
Lemma
Let X1,…, Xn be iid observations where Xi~Exp and
X(1),…,X(n) be their corresponding order statistics.
Then the spacings defined ΔXi = X(i-1) – X(i) are
distributed exponential and
E(ΔXi) = ΔX1 / i
Sukhatme (1937)
Orr’s One Step Model
Since j 2sj (Haldane 1927)
Orr’s One Step Model
Taking the expected value…
Orr’s One Step Model
Notice, we have an expression for the expected
transition probability which is independent of the
fitness of the individual sequences and depends
only on i and j.
Orr’s One Step Model
Can this model be validated empirically?
Orr’s One Step Model
Experimental
Evolution
Natural Isolate ID11
~3% differ from G4
Microviridae
Host - E. Coli
5577 bp
Orr’s One Step Model
20 one-step walks
9 observed mutations
Rokyta et al (2005)
Orr’s One Step Model
Concluded Orr’s transition probabilities did not
explain data as well as Wahl model even after
correcting the model for mutation bias.
Orr’s One Step Model
Where did Orr go wrong?
Perhaps, the tail of the fitness distribution is not in
the Gumbel domain of attraction and therefore not
exponentially distributed?
Extreme Value Theory
Extreme Value Theory
Field of statistical theory
which attempts to
describe the distribution
of extreme values
(maxima and minima) of
a sample from a given
probability distribution.
Extreme Value Theory
Notice that extreme values of a sample
generally fall in the tail of the underlying
probability distribution. For example the
maximum of a sample of size 10 from a
standard normal distribution…
Extreme Value Theory
Since the tail is all that must be considered,
many results of extreme value theory are
independent of the underlying probability
distribution.
In fact, EVT shows almost all probability
distributions can be classified into three
groups by their tail behavior.
Extreme Value Theory
These three types are…
Gumbel Most Common Distributions
Exponential, Normal, Gamma, etc.
Fréchet Heavy Tail Distributions
Cauchy
Weibull Finite Tail distributions
Extreme Value Theory
EVT allows all three types of tail behavior to
be described by the Generalized Pareto
Distribution (GPD)
tau – scale
kappa-shape
Extreme Value Theory
EVT allows all three types of tail behavior to
be described by the Generalized Pareto
Distribution (GPD)
Extreme Value Theory
The GPD not only provides the natural
alternative distribution for testing against
the exponential in this context, the null
model of k=0 is nested which allows the
application of Maximum Likelihood and
Likelihood Ratio Testing.
Maximum Likelihood and LRT
Log-Likelihood for the GPD is given…
Maximum Likelihood and LRT
Distribution of the LRT test statistic?
Although a common approximation is to
assume Chi-squared with one degree of
freedom, this does not appear to be the
case here.
Distribution of the test statistic was
calculated using parametric bootstrap.
Maximum Likelihood and LRT
Power
Probability of rejecting the null when the
alternative is true.
1-P(Type II error)
Can we hope to reject the null with a given
data set?
Maximum Likelihood and LRT
Maximum Likelihood and LRT
Sensitivity Analysis
Determine the inflation of the Type I error
rate under violations of the null.
If null is rejected, what is the chance that
rejection was due to inflation of alpha due
to violations in the assumptions of the null
hypothesis?
Maximum Likelihood and LRT
Violations of the Null Assumptions
1. Small effect mutations have low
probability of fixation and therefore may
not be observed.
2. Observations include measurement error
which may be normal or log-normal.
Maximum Likelihood and LRT
Maximum Likelihood and LRT
GPD is stable to shifts of
threshold, analyze data
relative to the smallest
observed!
Maximum Likelihood and LRT
Maximum Likelihood and LRT
If measurement error is not considered and
our test rejects it is likely that we are safe
in our conclusion assuming error is small.
In the event that we fail to reject, it is likely
due to the loss of power encountered
when operating under a false null
hypothesis.
In this case, we must reanalyze our data
incorporating measurement error.
Maximum Likelihood and LRT
The likelihood equation of normal or
lognormal measurement error conditional
on the GPD has no closed form ;(
Maximum Likelihood and LRT
Maximum Likelihood and LRT
Standard optimization procedures fail to
converge…
Metropolis-Hastings and Bayesian Methods
MH Algorithm
Given X(t)
1. Generate Y(t) ~ g(y-x(t))
2. Take X(t) =
Y(t) with probability min(1,f(Y(t))/f(X(t)))
X(t) otherwise
If g(z) is normal (symmetric) then
convergence to posterior is assured
Metropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1
mean=-1.64
95%CI=(-.826,-2.70)
Metropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1
mean=.893
95%CI=(.509,1.41)
Metropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1
mean=-1.818
CI=(-1.47,-2.23)
Metropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1
mean=.083
95%CI=(.034,.160)
Thanks to…
Darin Rokyta
Paul Joyce
Holly Wichman
Jim Bull
IBEST
NIH
E. Coli
References
Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–
1129.
Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.
Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.
Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.
Evolution 56:1317–1330.
Orr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. Genetics
163:1519–1526.
Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the
mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen.
37:441–444.
Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated
landscapes under strong selection and weak mutation. Journal of Theoretical Biology ,
243, (1), 114-120, 2006.
Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of
attraction for beneficial fitness effects. (Submitted Genetics)