Download Maximum Likelihood Estimation for Allele Frequencies

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Maximum Likelihood Estimation
for Allele Frequencies
Biostatistics 666
Lecture 7
Last Three Lectures:
Introduction to Coalescent Models
z
Computationally efficient framework
• Alternative to forward simulations
z
Predictions about sequence variation
• Number of polymorphisms
• Frequency of polymorphisms
Coalescent Models: Key Ideas
z
Proceed backwards in time
z
Genealogies shaped by
• Population size
• Population structure
• Recombination rates
z
Given a particular genealogy ...
• Mutation rate predicts variation
Next Series of Lectures
z
Estimating allele and haplotype frequencies
from genotype data
• Maximum likelihood approach
• Application of an E-M algorithm
z
Challenges
• Using information from related individuals
• Allowing for non-codominant genotypes
• Allowing for ambiguity in haplotype assignments
Objective: Parameter Estimation
z
Learn about population characteristics
• E.g. allele frequencies, population size
z
Using a specific sample
• E.g. a set sequences, unrelated individuals, or
even families
Maximum Likelihood
z
A general framework for estimating
model parameters
z
Find the set of parameter values that
maximize the probability of the observed
data
z
Applicable to many different problems
Example: Allele Frequencies
z
Consider…
• A sample of n chromosomes
• X of these are of type “a”
• Parameter of interest is allele frequency…
⎛n⎞ X
L( p | n, X ) = ⎜⎜ ⎟⎟ p (1 − p ) n − X
⎝X⎠
Evaluate for various parameters
p
1-p
L
0.0
1.0
0.000
0.2
0.8
0.088
0.4
0.6
0.251
0.6
0.4
0.111
0.8
0.2
0.006
1.0
0.0
0.000
Likelihood Plot
0.4
For n = 10 and X = 4
Likelihood
0.3
0.2
0.1
0
0.0
0.2
0.4
0.6
Allele Frequency
0.8
1.0
In this case
z
The likelihood tells us the data is most
probable if p = 0.4
z
The likelihood curve allows us to
evaluate alternatives…
• Is p = 0.8 a possibility?
• Is p = 0.2 a possibility?
Example: Estimating 4Nµ
z
Consider S polymorphisms in sample of
n sequences…
L(θ | n, S ) = Pn ( S | θ )
z
Where Pn is calculated using the Qn and
P2 functions defined previously
Likelihood Plot
Likelihood
With n = 5, S = 10
MLE
4Nµ
Maximum Likelihood Estimation
z
Two basic steps…
a) Write down likelihood function
L(θ | x) ∝ f ( x | θ )
b) Find value of θˆ that maximizes L(θ | x)
z
In principle, applicable to any problem
where a likelihood function exists
MLEs
z
Parameter values that maximize likelihood
• θ where observations have maximum probability
z
Finding MLEs is an optimization problem
z
How do MLEs compare to other estimators?
Comparing Estimators
z
How do MLEs rate in terms of …
• Unbiasedness
• Consistency
• Efficiency
z
For a review, see Garthwaite, Jolliffe,
Jones (1995) Statistical Inference,
Prentice Hall
Analytical Solutions
z
Write out log-likelihood …
l(θ | data ) = ln L(θ | data )
z
Calculate derivative of likelihood
dl(θ | data )
dθ
z
Find zeros for derivative function
Information
z
The second derivative is also extremely useful
⎡ d 2 l(θ | data ) ⎤
Iθ = − E ⎢
⎥
2
θ
d
⎣
⎦
1
Vθˆ =
Iθ
z
z
The speed at which log-likelihood decreases
Provides an asymptotic variance for estimates
Allele Frequency Estimation …
z
When individual chromosomes are
observed this does not seem tricky…
z
What about with genotypes?
z
What about with parent-offspring pairs?
Coming up …
z
We will walk through allele frequency
estimation in three distinct settings:
• Samples single chromosomes …
• Samples of unrelated Individuals …
• Samples of parents and offspring …
I. Single Alleles Observed
z
Consider…
• A sample of n chromosomes
• X of these are of type “a”
• Parameter of interest is allele frequency…
⎛n⎞ X
L( p | n, X ) = ⎜⎜ ⎟⎟ p (1 − p ) n − X
⎝X⎠
Some Notes
z
The following two likelihoods are just as good:
⎛n⎞ X
L( p; X , n) = ⎜⎜ ⎟⎟ p (1− p ) n− X
⎝X⎠
n
L( p; x1 , x2 ...xn , n) = ∏ p xi (1− p )1−xi
i=1
z
For ML estimation, constant factors in likelihood
don’t matter
Analytic Solution
z
z
The log-likelihood
⎛n⎞
ln L(θ | n, X ) = ln⎜⎜ ⎟⎟ + X ln p + (n − X ) ln(1 − p )
⎝X⎠
The derivative
d ln L( p | X ) X n − X
= −
dp
p 1− p
z
Find zero …
Samples of
Individual Chromosomes
z
The natural estimator (where we count the
proportion of sequences of a particular
type) and the MLE give identical solutions
z
Maximum likelihood provides a justification
for using the “natural” estimator
II. Genotypes Observed
Genotype
Observed
Frequency
A1A1
n11
p11
Genotypes
A1A2
A2A2
n12
n22
p12
p22
Total
n=n11+n12+n22
1.0
Alleles
Genotype
A1
Observed
Frequency
n1=2n11+n12
p1=n1/2n
A2
n2=2n22+n12
p2=n2/2n
Total
2n=n1+n2
1.0
Consider a Set of Genotypes…
z
Use notation nij to denote the number of
individuals with genotype i / j
z
Sample of n individuals
Genotype
Observed
Frequency
A1A1
n11
p11
Genotypes
A1A2
A2A2
n12
n22
p12
p22
Total
n=n11+n12+n22
1.0
Allele Frequencies by Counting…
z
A natural estimate for allele frequencies is to
calculate the proportion of individuals carrying
each allele
Alleles
Genotype
A1
Observed
Frequency
n1=2n11+n12
p1=n1/2n
A2
n2=2n22+n12
p2=n2/2n
Total
2n=n1+n2
1.0
MLE using genotype data…
z
Consider a sample such as ...
Genotype
Observed
z
A1A1
n11
Genotypes
A1A2
A2A2
n12
n22
Total
n=n11+n12+n22
The likelihood as a function of allele
frequencies is …
n!
n11
n12
n22
( p ² ) (2 pq ) (q² )
L ( p; n ) =
n11! n12! n22!
Which gives…
z
Log-likelihood and its derivative
l = ln L = (2n11 + n12 ) ln p1 + (2n22 + n12 ) ln(1 − p1 ) + C
dl 2n11 + n12 2n22 + n12
=
−
dp1
p1
(1 − p1 )
z
Giving the MLE as …
pˆ1 =
(2n11 + n12 )
2(n11 + n12 + n22 )
Samples of
Unrelated Individuals
z
Again, natural estimator (where we count
the proportion of alleles of a particular
type) and the MLE give identical solutions
z
Maximum likelihood provides a justification
for using the “natural” estimator
III. Parent-Offspring Pairs
Parent
A1A1
Child
A1A2
A1A1
a1
a2
0
a1+a2
A1A2
a3
a4
a5
a3+a4+a5
A2A2
0
a6
a7
a6+a7
a1+a3
a2+a4+a6
a5+a7
N pairs
A2A2
Probability for Each Observation
Parent
A1A1
Child
A1A2
A2A2
A1A1
A1A2
A2A2
1.0
Probability for Each Observation
Parent
A1A1
Child
A1A2
A1A1
p13
p12p2
0
p12
A1A2
p12p2
p1p2
p1p22
2p1p2
A2A2
0
p1p22
p23
p22
p12
2p1p2
p22
1.0
A2A2
Which gives…
ln L =
p2 = 1 − p1
B = 3a1 + 2(a2 + a3 ) + a4 + (a5 + a6 )
C = (a2 + a3 ) + a4 + 2(a5 + a6 ) + 3a7
pˆ1 =
B
(B + C )
Which gives…
(
)
ln L = a1 ln p13 + (a2 + a3 ) ln p12 p2 + a4 ln( p1 p2 )
(
)
+ (a5 + a6 ) ln p1 p22 + a7 ln p23 + constant
= B ln p1 + C ln(1 − p1 )
p2 = 1 − p1
B = 3a1 + 2(a2 + a3 ) + a4 + (a5 + a6 )
C = (a2 + a3 ) + a4 + 2(a5 + a6 ) + 3a7
pˆ 1 =
B
(B + C )
Samples of
Parent Offspring-Pairs
z
The natural estimator (where we count the
proportion of alleles of a particular type)
and the MLE no longer give identical
solutions
z
In this case, we expect the MLE to be
more accurate
Comparing Sampling Strategies
z
We can compare sampling strategies by
calculate the information for each one
⎡ d 2 l(θ | data ) ⎤
Iθ = − E ⎢
⎥
2
θ
d
⎣
⎦
1
Vθˆ =
Iθ
z
Which one to you expect to be most
informative?
How informative is each setting?
pq
z Single chromosomes
Var ( p ) =
N
z
z
Unrelated individuals
pq
Var ( p) =
2N
Parent offspring trios
pq
Var ( p) =
3 N − a4
Other Likelihoods
z
Allele frequencies when individuals are…
• Diagnosed for Mendelian disorder
• Genotyped at two neighboring loci
• Phenotyped for the ABO blood groups
z
z
Many other interesting problems…
… but some have no analytical solution
Today’s Summary
z
Examples of Maximum Likelihood
z
Allele Frequency Estimation
• Allele counts
• Genotype counts
• Pairs of Individuals
Take home reading
z
Excoffier and Slatkin (1996)
z
Introduces the E-M algorithm
Widely used for maximizing likelihoods in
genetic problems
z
Properties of Estimators
For Review
Unbiasedness
z
An estimator is unbiased if
E (θˆ) = θ
bias(θˆ) = E (θˆ) − θ
z
z
Multiple unbiased estimators may exist
Other properties may be desirable
Consistency
z
An estimator is consistent if
(
)
P | θˆ − θ |> ε → 0 as n → ∞
z
z
for any ε
Estimate converges to true value in
probability with increasing sample size
Mean Squared Error
z
MSE is defined as
{
}
2
⎛
ˆ
ˆ
MSE (θ ) = E ⎜ (θ − θ ) + (θ − θ ) ⎞⎟
⎝
⎠
= var(θˆ) + bias(θˆ) 2
z
If MSE → 0 as n → ∞ then the estimator
must be consistent
• The reverse is not true
Efficiency
z
The relative efficiency of two estimators
is the ratio of their variances
var(θˆ2 )
if
> 1 then θˆ1 is more efficient
var(θˆ1 )
z
Comparison only meaningful for
estimators with equal biases
Sufficiency
z
Consider…
•
•
z
Observations X1, X2, … Xn
Statistic T(X1, X2, … Xn)
T is a sufficient statistic if it includes all information
about θ in the sample
•
•
Distribution of Xi conditional on T is independent of θ
Posterior distribution of θ conditional on T is independent
of Xi
Minimal Sufficient Statistic
z
There can be many alternative sufficient
statistics.
z
A statistic is a minimal sufficient statistic
if it can be expressed as a function of
every other sufficient statistic.
Typical Properties of MLEs
z
Bias
•
z
Consistency
•
z
Typically, MLEs are asymptotically efficient estimators
Sufficiency
•
z
Subject to regularity conditions, MLEs are consistent
Efficiency
•
z
Can be biased or unbiased
Often, but not always
Cox and Hinkley, 1974
Strategies for Likelihood
Optimization
For Review
Generic Approaches
z
Suitable for when analytical solutions are
impractical
z
Bracketing
Simplex Method
Newton-Rhapson
z
z
Bracketing
z
Find 3 points such that
•
•
z
θa < θ b < θ c
L(θb) > L(θa) and L(θb) > L(θc)
Search for maximum by
• Select trial point in interval
• Keep maximum and flanking points
Bracketing
6
5
3
4
1
2
The Simplex Method
z
Calculate likelihoods at simplex vertices
• Geometric shape with k+1 corners
• E.g. a triangle in k = 2 dimensions
z
At each step, move the high vertex in the
direction of lower points
The Simplex Method II
Original Simplex
high
reflection
reflection and
expansion
low
contraction
multiple
contraction
One parameter maximization
z
Simple but inefficient approach
z
Consider
• Parameters θ = (θ1, θ2, …, θk)
• Likelihood function L (θ; x)
z
Maximize θ with respect to each θi in turn
• Cycle through parameters
The Inefficiency…
θ2
θ1
Steepest Descent
z
Consider
• Parameters θ = (θ1, θ2, …, θk)
• Likelihood function L (θ; x)
z
Score vector
•
d ln( L) ⎛ d ln( L)
d ln( L) ⎞
⎟⎟
S=
, ...,
= ⎜⎜
dθ
dθ k ⎠
⎝ dθ1
Find maximum along θ + δS
Still inefficient…
Consecutive steps are perpendicular!
Local Approximations to
Log-Likelihood Function
In the neighboorhood of θi
l(θ) ≈ l(θi ) + S (θ − θ i ) − 1 (θ − θ i ) t Iθ (θ − θi )
2
where
is the loglikelihood function
l(θ) = ln L(θ)
is the score vector
S = dl(θi )
Iθ = −d ²l(θ i ) is the observed information matrix
Newton’s Method
Maximize the approximation
l(θ) ≈ l(θi ) + S(θ − θ i ) − 1 (θ − θi ) t I (θ − θ i )
2
by setting its derivative to zero...
S − I (θ − θ i ) = 0
and get a new trial point
−1
θ i +1 = θ i + I S
Fisher Scoring
z
Use expected information matrix instead of
observed information:
⎡ d 2 l(θ ) ⎤
E ⎢−
2 ⎥
d
θ
⎣
⎦
instead of
d 2 l(θ | data )
−
dθ 2
Compared to Newton-Rhapson:
Converges faster when estimates
are poor.
Converges slower when close to
MLE.