Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 1 of 8
Classical Statistics – basic principles
• All unknown parameters are fixed.
• “Probability” = “long-run frequency” (i.i.d. repetitions)
• Although the likelihood function is often used (max likelihood, likelihood
ratio test, score test, information matrix, Wald test),
nevertheless these uses are justified only by large sample
approximations to frequentist calculations!!
Classical estimation
1) Moment estimators
2) Maximum likelihood estimators
3) Estimate something else, then “plug-in”.
Examples:
Estimate E(f(X)) by
.
Estimate s.d.(X) by
(If
is MLE, so is
. If
is moment estimator,
is NOT.)
Some notation
(quantile function; inverse c.d.f.)
> qnorm(1-0.025)
[1] 1.959964
> pnorm(1.96)
[1] 0.9750021 , which is close to 1 – (0.05 x ½).
A 95% confidence interval for the normal mean, given an i.i.d. sample, is
.
Note that
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 2 of 8
What if the X’s are NOT normal?
If variance(X) is finite and known and equals ,
then by the central limit theorem www.mathsisfun.com/data/quincunx.html
.
So
(confidence interval).
What is a “95% confidence interval, really?
A function (algorithm) CI,
, such that
”.
The “coverage probability” is 1 – 0.05. No particular realization
(CI1(data),CI2 (data))
can be said to have the coverage probability.
I.e. it’s a property of a “recipe”, not of a particular interval.
(Section 7.5 shows strange examples that highlight this peculiar interpretation.)
Approximate confidence interval for the binomial mean
Let Y ~ binom(n,p), so that Y is a sum of i.i.d. Bernoullis Xi, and
and
.
Then an estimated standard error of the mean is:
.
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 3 of 8
Example: n=10, Y=8. Then the s.e.m. (standard error of the mean) is
=
= 0.133.
From R, a confidence interval based on the normal approximation is
0.8+c(-1,1)*sqrt(.8*.2/9)*qnorm(0.975)
which gives
( 0.5386715, 1.0613285).
The normal approx. can give confidence intervals extending beyond [0,1]!!
Now that seems pretty silly.
Do we really have more “confidence” in p=1.06 than in p=0.52?
Later, we’ll see why an “exact interval” is
(0.4439040, 0.9747889),
obtained by thinking of confidence intervals as values not rejected by a test.
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 4 of 8
The Student’s t confidence interval; pivotal quantities.
Here we will take into account the fact that in some cases we really don’t know the variance.
It has to be estimated.
Example:
Data about cheese:
X=c(0.86, 1.53, 1.57, 1.81, 0.99, 1.09, 1.29, 1.78, 1.29, 1.58).
(measurements of acid concentration in cheese).
Goal: 90% confidence interval.
Notation: in the book,
,
Then
,
.
.
The approximate interval we got before is
=
= mean(x)+c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n)
= (1.208565, 1.549435).
Suppose that in reality
,
and we use the same “recipe” to construct confidence intervals.
What is the real coverage probability? See  meanCoverageOfConfidenceIntervals.R
Answer: around 86%, lower than the 90% claimed.
Doesn’t sound bad, but, the NON-coverage is 14% instead of 10%, which is 40% worse.
The reason it’s off: our “recipe” assumes we know the true sd(x), but we don’t.
So let’s back up & try again. First, a detour.
The Chi-Square distribution
If
Z1,...,Zn ~ Norm(0,1)
i.i.d., then the sum of squares is a chi-square random variable:
~ “chi-square on n degrees of freedom”.
This the identical distribution as Gamma(n/2, 1/2), i.e. with shape parameter n/2 and rate parameter
½ (or scale parameter 2). For example, for n = 2, the squared distance
point formed by i.i.d. standard normals is Exponential with scale parameter 2.
from the origin of a
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 5 of 8
The joint distribution of the sample mean and sample variance: i.i.d. NORMAL.
The key fact:
for the i.i.d normal case,
,
a)
(the degrees of freedom are reduced by 1 because the mean is estimated).
b) X and
are STATISTICALLY INDEPENDENT.
The quantity
is unknown because the true variance is unknown.
pchisq(K,n – 1).
However, we can say for sure that
We can turn that into a probability statement focusing on
(though
is NOT the r.v.).
pchisq(K,n – 1).
Therefore, if we want a 90% confidence interval for , for example, we set the probability (right-hand
side of the equation) to 0.05 and then to 0.95, and remember that pchisq and qchisq are inverse
functions, so that K becomes (respectively)
qchisq(0.05,n – 1) and qchisq(0.95,n – 1).
Example: cheese data.
For n=10, qchisq(0.05,n – 1) = 3.325 and qchisq(0.95,n – 1)=16.92.
(Notice the lost degree of freedom!)
and
In our data, the sample standard deviation is
.
.
So the endpoints are
The UPPER point 0.2906 corresponds to the LOWER probability (0.05).
(And the LOWER point 0.0571 corresponds to the UPPER probability (0.95).)
Why? because the upper point is the value at which the data are at the LOW end of possible
outcomes.
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 6 of 8
The significance test interpretation of confidence intervals
The confidence interval endpoints are the places where the data are at the boundary of plausible
probability.
Lower
CI point
observed
Upper tail
Upper
CI point
Lower tail
The t distribution
The t will help us provide an “honest” CI for the normal mean.
If
where Z is standard normal and Y is chi-square on
then T is distributed as a
degrees of freedom, and
,
random variable.
Notation: independent
Properties:
As
, this approaches the standard normal.
For small , it has fat tails.
For
, they are so fat that the mean and variance don’t exist (Cauchy distribution).
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 7 of 8
Because of the key fact (a) and (b) above, define
and
.
Then
But this involves only
.
. So we can now construct an EXACT confidence interval for
NOT require the “plug-in principle”, pretending that we know
.
that does
BIOINF 2118
N09 and N10: Estimation Parts 3 and 4
Page 8 of 8
Cheese example; confidence intervals (“meanCoverageOfConfidenceIntervals.R”.):
### Based on the normal approximation
> mean(x)+c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n)
[1] 1.208565 1.549435
### Based on the t distribution
> mean(x)+c(-1,1)*qt(0.95,9)*sd(x)/sqrt(n)
[1] 1.189058 1.568942
## from "11-estimation Part 4.doc"
cheeseX=c(0.86, 1.53, 1.57, 1.81, 0.99, 1.09, 1.29, 1.78, 1.29, 1.58)
simulateConfidenceInterval = function(X=cheeseX, CImethod=c("normal", "t"),
simMethod=c("normal","Bootstrap"), nsims=10000, nreps=2) {
CImethod = CImethod[1]; simMethod = simMethod[1]
mu = mean(X); sig = sd(X); n = length(X)
simFunction = function(repNumber) {
x = switch(simMethod,
normal=rnorm(n, mu, sig),
Bootstrap=sample(X, 10, replace=TRUE) )
interval = switch(CImethod,
normal=mean(x)+
c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n),
t =mean(x)+
c(-1,1)*qt(0.95,9)*sd(x)/sqrt(n)
)
covers = (mu > interval[1]) & (mu < interval[2])
return(covers) ## Boolean
}
unix.time( {
for(rep in 1:nreps) {
coverResult = sapply(1:nsims, simFunction)
cat(mean(coverResult), " CImethod=", CImethod, " simMethod=", simMethod, "\n")
}
})
}
simulateConfidenceInterval()
simulateConfidenceInterval(CImethod="t")
simulateConfidenceInterval(simMethod="Bootstrap")
simulateConfidenceInterval(CImethod="t", simMethod="Bootstrap")