Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 1 of 8 Classical Statistics – basic principles • All unknown parameters are fixed. • “Probability” = “long-run frequency” (i.i.d. repetitions) • Although the likelihood function is often used (max likelihood, likelihood ratio test, score test, information matrix, Wald test), nevertheless these uses are justified only by large sample approximations to frequentist calculations!! Classical estimation 1) Moment estimators 2) Maximum likelihood estimators 3) Estimate something else, then “plug-in”. Examples: Estimate E(f(X)) by . Estimate s.d.(X) by (If is MLE, so is . If is moment estimator, is NOT.) Some notation (quantile function; inverse c.d.f.) > qnorm(1-0.025) [1] 1.959964 > pnorm(1.96) [1] 0.9750021 , which is close to 1 – (0.05 x ½). A 95% confidence interval for the normal mean, given an i.i.d. sample, is . Note that BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 2 of 8 What if the X’s are NOT normal? If variance(X) is finite and known and equals , then by the central limit theorem www.mathsisfun.com/data/quincunx.html . So (confidence interval). What is a “95% confidence interval, really? A function (algorithm) CI, , such that ”. The “coverage probability” is 1 – 0.05. No particular realization (CI1(data),CI2 (data)) can be said to have the coverage probability. I.e. it’s a property of a “recipe”, not of a particular interval. (Section 7.5 shows strange examples that highlight this peculiar interpretation.) Approximate confidence interval for the binomial mean Let Y ~ binom(n,p), so that Y is a sum of i.i.d. Bernoullis Xi, and and . Then an estimated standard error of the mean is: . BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 3 of 8 Example: n=10, Y=8. Then the s.e.m. (standard error of the mean) is = = 0.133. From R, a confidence interval based on the normal approximation is 0.8+c(-1,1)*sqrt(.8*.2/9)*qnorm(0.975) which gives ( 0.5386715, 1.0613285). The normal approx. can give confidence intervals extending beyond [0,1]!! Now that seems pretty silly. Do we really have more “confidence” in p=1.06 than in p=0.52? Later, we’ll see why an “exact interval” is (0.4439040, 0.9747889), obtained by thinking of confidence intervals as values not rejected by a test. BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 4 of 8 The Student’s t confidence interval; pivotal quantities. Here we will take into account the fact that in some cases we really don’t know the variance. It has to be estimated. Example: Data about cheese: X=c(0.86, 1.53, 1.57, 1.81, 0.99, 1.09, 1.29, 1.78, 1.29, 1.58). (measurements of acid concentration in cheese). Goal: 90% confidence interval. Notation: in the book, , Then , . . The approximate interval we got before is = = mean(x)+c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n) = (1.208565, 1.549435). Suppose that in reality , and we use the same “recipe” to construct confidence intervals. What is the real coverage probability? See meanCoverageOfConfidenceIntervals.R Answer: around 86%, lower than the 90% claimed. Doesn’t sound bad, but, the NON-coverage is 14% instead of 10%, which is 40% worse. The reason it’s off: our “recipe” assumes we know the true sd(x), but we don’t. So let’s back up & try again. First, a detour. The Chi-Square distribution If Z1,...,Zn ~ Norm(0,1) i.i.d., then the sum of squares is a chi-square random variable: ~ “chi-square on n degrees of freedom”. This the identical distribution as Gamma(n/2, 1/2), i.e. with shape parameter n/2 and rate parameter ½ (or scale parameter 2). For example, for n = 2, the squared distance point formed by i.i.d. standard normals is Exponential with scale parameter 2. from the origin of a BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 5 of 8 The joint distribution of the sample mean and sample variance: i.i.d. NORMAL. The key fact: for the i.i.d normal case, , a) (the degrees of freedom are reduced by 1 because the mean is estimated). b) X and are STATISTICALLY INDEPENDENT. The quantity is unknown because the true variance is unknown. pchisq(K,n – 1). However, we can say for sure that We can turn that into a probability statement focusing on (though is NOT the r.v.). pchisq(K,n – 1). Therefore, if we want a 90% confidence interval for , for example, we set the probability (right-hand side of the equation) to 0.05 and then to 0.95, and remember that pchisq and qchisq are inverse functions, so that K becomes (respectively) qchisq(0.05,n – 1) and qchisq(0.95,n – 1). Example: cheese data. For n=10, qchisq(0.05,n – 1) = 3.325 and qchisq(0.95,n – 1)=16.92. (Notice the lost degree of freedom!) and In our data, the sample standard deviation is . . So the endpoints are The UPPER point 0.2906 corresponds to the LOWER probability (0.05). (And the LOWER point 0.0571 corresponds to the UPPER probability (0.95).) Why? because the upper point is the value at which the data are at the LOW end of possible outcomes. BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 6 of 8 The significance test interpretation of confidence intervals The confidence interval endpoints are the places where the data are at the boundary of plausible probability. Lower CI point observed Upper tail Upper CI point Lower tail The t distribution The t will help us provide an “honest” CI for the normal mean. If where Z is standard normal and Y is chi-square on then T is distributed as a degrees of freedom, and , random variable. Notation: independent Properties: As , this approaches the standard normal. For small , it has fat tails. For , they are so fat that the mean and variance don’t exist (Cauchy distribution). BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 7 of 8 Because of the key fact (a) and (b) above, define and . Then But this involves only . . So we can now construct an EXACT confidence interval for NOT require the “plug-in principle”, pretending that we know . that does BIOINF 2118 N09 and N10: Estimation Parts 3 and 4 Page 8 of 8 Cheese example; confidence intervals (“meanCoverageOfConfidenceIntervals.R”.): ### Based on the normal approximation > mean(x)+c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n) [1] 1.208565 1.549435 ### Based on the t distribution > mean(x)+c(-1,1)*qt(0.95,9)*sd(x)/sqrt(n) [1] 1.189058 1.568942 ## from "11-estimation Part 4.doc" cheeseX=c(0.86, 1.53, 1.57, 1.81, 0.99, 1.09, 1.29, 1.78, 1.29, 1.58) simulateConfidenceInterval = function(X=cheeseX, CImethod=c("normal", "t"), simMethod=c("normal","Bootstrap"), nsims=10000, nreps=2) { CImethod = CImethod[1]; simMethod = simMethod[1] mu = mean(X); sig = sd(X); n = length(X) simFunction = function(repNumber) { x = switch(simMethod, normal=rnorm(n, mu, sig), Bootstrap=sample(X, 10, replace=TRUE) ) interval = switch(CImethod, normal=mean(x)+ c(-1,1)*qnorm(0.95)*sd(x)/sqrt(n), t =mean(x)+ c(-1,1)*qt(0.95,9)*sd(x)/sqrt(n) ) covers = (mu > interval[1]) & (mu < interval[2]) return(covers) ## Boolean } unix.time( { for(rep in 1:nreps) { coverResult = sapply(1:nsims, simFunction) cat(mean(coverResult), " CImethod=", CImethod, " simMethod=", simMethod, "\n") } }) } simulateConfidenceInterval() simulateConfidenceInterval(CImethod="t") simulateConfidenceInterval(simMethod="Bootstrap") simulateConfidenceInterval(CImethod="t", simMethod="Bootstrap")