Download Lecture 5: Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Confidence interval wikipedia , lookup

Least squares wikipedia , lookup

Bias of an estimator wikipedia , lookup

Maximum likelihood estimation wikipedia , lookup

Transcript
Lecture 5: Estimation
Goals
• Basic concepts of estimation
• Statistical approaches for estimating parameters
• Parametric interval estimation
• Nonparametric interval estimation (bootstrap)
“Central Dogma” of Statistics
Probability
Population
Descriptive
Statistics
Sample
Inferential Statistics
Estimation
• Estimator: Statistic whose calculated value is used
to estimate a population parameter, "
• Estimate: A particular realization of an estimator, "ˆ
!
• Types of Estimators:
- point estimate: single number that can be regarded
! as the
most plausible value of "
- interval estimate: a range of numbers, called a confidence
interval indicating, can be regarded as likely containing the
true value of "
!
Properties of Good Estimators
• In the Frequentist world view parameters are
fixed, statistics are rv and vary from sample to
sample (i.e., have an associated sampling distribution)
• In theory, there are many potential estimators for a
population parameter
• What are characteristics of good estimators?
Statistical Jargon for Good Estimators
Good Estimators Are:
• Consistent: As the sample size increases "ˆ gets closer to "
(
)
lim P $ˆ % $ > & = 0
n "#
!
• Unbiased: E["ˆ ] = "
!
!
• Precise: Sampling distribution of "ˆ should have a small
!standard error
!
Bias Versus Precision
Precise
Biased
Unbiased
Imprecise
Methods of Point Estimation
1. Method of Moments
2. Maximum Likelihood
3. Bayesian
Methods of Moments
• Advantage: simplest approach for constructing an
estimator
• Disadvantage: usually are not the “best”
estimators possible
• Principle:
Equate the kth population moment E[Xk] with the kth sample
moment
!
1
X ik
"
n n
and solve for the unknown parameter
Method of Moments Example
• How can I estimate the scaled population mutation rate:
" = 4N e µ
• Brief (very brief) expose of coalescent theory:
time
T2
!
Coalescent times follow a geometric distribution
4N
E[Ti ] =
i(i "1)
T3
T4
n
Tc = " iTi
!
i= 2
Method of Moments Example
n
E[Tc ] = " iE[Ti ]
i= 2
!
Method of Moments Example
n
n
4Ni
E[Tc ] = " iE[Ti ] ="
i(i #1)
i= 2
i= 2
!
Method of Moments Example
n
n
4Ni
1
E[Tc ] = " iE[Ti ] ="
= 4N "
i(i #1)
i #1
i= 2
i= 2
i= 2
!
n
Method of Moments Example
n
n
n
4Ni
1
E[Tc ] = " iE[Ti ] ="
= 4N "
i(i #1)
i #1
i= 2
i= 2
i= 2
E[Sn ] = µE[Tc ]
!
!
n
1
E[Sn ] = µ • 4N #
i "1
i= 2
n
!
!
1
E[Sn ] = " $
i #1
i= 2
mom
!
"ˆ =
Sn
n
1
$ i #1
i= 2
Methods of Point Estimation
1. Method of Moments
2. Maximum Likelihood
3. Bayesian
Introduction to Likelihood
• Before an experiment is performed the outcome is unknown.
Probability allows us to predict unknown outcomes based
on known parameters:
P(Data | " )
• For example:
!
n
x
x
P(x | n, p) = ( ) p (1" p)
!
n"x
Introduction to Likelihood
• After an experiment is performed the outcome is known.
Now we talk about the likelihood that a parameter would
generate the observed data:
L(" | Data)
L(" | Data) = P(Data | " )
• For example:
!
!
n
x
x
L( p | n, x) = ( ) p (1" p)
n"x
• Estimation proceeds by finding the value of " that makes the
observed data most likely
!
!
Let’s Play T/F
• True or False: The maximum likelihood estimate (mle) of "
gives us the probability of "ˆ
• False - why?
!
!
• True or False: The mle of " is the most likely value of "ˆ
• False - why?
!
• True or False: Maximum likelihood is cool
!
Formal Statement of ML
• Let x1, x2, …, xn be a sequence of n observed variables
• Joint probability:
P(x1, x2, …, xn | " ) = P(X1=x1)P(X2=x2)… P(Xn=xn)
n
= " P(X i = x i )
i=1
!
• Likelihood is then:
n
L( " | x1!, x2, …, xn ) = " P(X i = x i )
i=1
n
!
Log L( " | x1, x2, …, xn ) = " log[P(X i = x i )]
!
i=1
MLE Example
• I want to estimate the recombination fraction between locus
A and B from 5 heterozygous (AaBb) parents. I examine 30
gametes for each and observe 4, 3, 5, 6, and 7 recombinant
gametes in the five parents. What is the mle of the
recombination fraction?
Probability of observing X = r recombinant gametes for a single
parent is binomial:
n
P(X = r) = ( r )" r (1# " ) n#r
!
MLE Example: Specifying Likelihood
Probability:
P(r1, r2, …, rn | " , n) = P(R1 = r1)P(R2 = r2)… P(R5 = r5)
n
r1
n
n
r1
n#r1
r1
n#r2
"
(1#
"
)
•
"
(1#
"
)
•
...•
P(r1, r2, …, rn | " , n) = ( )
(r )
(r )" r1 (1# " ) n#r5
2
5
!
Likelihood:
!
5
!
L(" | r1, r2, …, rn , n) = " (nr )# r (1$ # ) n$r
i
i
i
i=1
5
Log L =
!
n
log
(
" ri) + ri • log# + (n $ ri )log(1$ # )
i=1
!
!
MLE Example: Maximizing the Likelihood
• Want to find p such that Log L is maximized
5
n
=
log
Log L " ( ri ) + ri • log # + (n $ ri )log(1$ # )
i=1
• How?
1. !
Graphically
2. Calculus
3. Numerically
MLE Example: Finding the mle of p
"
Methods of Point Estimation
1. Method of Moments
2. Maximum Likelihood
3. Bayesian
World View According to Bayesian’s
• The classic philosophy (frequentist) assumes parameters
are fixed quantities that we want to estimate as precisely
as possible
• Bayesian perspective is different: parameters are random
variables with probabilities assigned to particular values
of parameters to reflect the degree of evidence for that
value
Revisiting Bayes Theorem
P(B | A)P(A)
P(A | B) =
P(B)
n
!
P(B) = " P(B | Ai )P(Ai )
Continuous
P(B) =
Discrete
i=1
!
!
" P(B | A)P(A)dA
Bayesian Estimation
• In order to make probability statements about " given some observed
data, D, we make use of Bayes theorem
f (" ) f (D | " )
f (" )L(" | D)
f (" | D) =
=
!
f (D)
#" f (" ) f (D | " )d"
!"#$%&'"&(!()'*%)'+"",(-(.&'"&
!
The prior is the probability of the parameter and represents what was
thought before seeing the data.
The likelihood is the probability of the data given the parameter and
represents the data now available.
The posterior represents what is thought given both prior information and
the data just seen.
Bayesian Estimation: “Simple” Example
• I want to estimate the recombination fraction between locus
A and B from 5 heterozygous (AaBb) parents. I examine 30
gametes for each and observe 4, 3, 5, 6, and 7 recombinant
gametes in the five parents. What is the mle of the
recombination fraction?
• Tedious to show Bayesian analysis. Let’s simplify and ask what
the recombination fraction is for parent three, who had 5
observed recombinant gametes.
Specifying The Posterior Density
f (" | n = 30, r = 5) =
f (" ) f (r = 5 | ", n = 30)
0.5
#
f (r = 5 | ", n = 30) f (" ) d"
0
prior = f (" ) = uniform[0, 0.5] = 0.5
!
likelihood = P(r = 5 | ", n = 30) =
!normalizing
constant
!
30
5
(
ri
n#ri
"
(1#
"
)
)
0.5
=
# P(r = 5 | ", n = 30) f (" ) d"
0
0.5
!
30
5
25
0.5
•
"
(1"
)
d" ! 6531
=
(5 ) #
!
0
Specifying The Posterior Density
f (" | n = 30, r = 5) =
f (" ) f (r = 5 | ", n = 30)
0.5
#
f (r = 5 | ", n = 30) f (" ) d"
0
!
5
25
0.5 • (30
"
(1#
"
)
)
5
f (" | n = 30, r = 5) =
6531
Ta da…
!
f (" | n = 30, r = 5)
!
"
Interval Estimation
• In addition to point estimates, we also want to understand
how much uncertainty is associated with it
• One option is to report the standard error
• Alternatively, we might report a confidence interval
• Confidence interval: an interval of plausible values for
the parameter being estimated, where degree of plausibility
specifided by a “confidence level”
Interpreting a 95% CI
• We calculate a 95% CI for a hypothetical sample mean to be
between 20.6 and 35.4. Does this mean there is a 95%
probability the true population mean is between 20.6 and 35.4?
• NO! Correct interpretation relies on the long-rang frequency
interpretation of probability
µ
• Why is this so?