Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ESTIMATION AND CONFIDENCE INTERVALS Up to now we assumed that we knew the parameters of the population. Example. Binomial experiment Sampling from a normal population σ. knew probability of success p. knew mean µ and st. dev. In practice parameters are unknown! ENTER STATISTICAL INFERENCE Use sample data Need to estimate parameters and/or Test hypotheses/make decisions about parameter values Estimation of the population mean: σ is known We will assume here that: We have normal population OR a large sample size ( n > 30). Case 1. Sample from a normal population. Point estimators for the normal parameters Point estimator of µ: Point estimator of variance σ2: Both estimators are very good. They are unbiased and have the smallest spread among all unbiased estimators. CONFIDENCE INTERVALS Often we seek a range of values – an interval – for the unknown parameter. We want to have some confidence that the interval covers the parameter – CONFIDENCE INTERVAL (CI) We construct CIs based on sample data and theoretical knowledge about distributions, like the Central Limit Theorem or Normal approximation to Binomial. CONFIDENCE INTERVAL FOR THE MEAN, NORMAL POPULATION: σ known. Let sample from N(µ, σ), µ unknown, σ known. Definition: The interval ( - zα/2 σ/√n, + zα/2 σ/√n) is called a 95% Confidence Interval for the mean. Example: A 95% CI for the mean would be: ( -1.96σ/√n, +1.96σ/√n) because for the 95% confidence level, zα/2 is 1.96. Example Suppose X, scores on a test, have a normal distribution with unknown mean and σ = 60. The sample values X1, X2, …, X900 give sample mean Find a 95% as well as 98% confidence intervals for the true mean test score. Solution. Point estimate of µ is n=900, σ = 60. For 95% CI, zα/2=1.96, so 95% CI is: =(272 - 1.96(60/√900, 272 + 1.96(60/√900) = ( 268.08, 275.92). For 98% CI, zα/2=2.33, so 98% CI is: =(272 – 2.33(60/√900, 272 + 2.33(60/√900) = ( 267.35, 276.65). Note that the 98% CI is longer than the 95% CI. We gained confidence, but we lost accuracy. What if the population is not normal? Case 2. If the population is not normal, but the sample size is large, then by the Central Limit Theorem has approximately standard normal distribution no matter what the population is. So, provided that the sample size is large, we can use the same confidence intervals for data sets from any population. This is one of many practical applications of the CLT! CONFIDENCE INTERVAL FOR MEAN WITH KNOWN σ – choice of the sample size for a given margin of error. Let sample from N(µ, σ), µ unknown, σ known. The margin of error m= half-length of the CI for µ, m = on: the confidence level via (as conf. level the variability in the population σ (as σ the sample size (as n ,m ,m ,m depends ); ); ). To decrease the error, but keep the confidence level unchanged, we need to increase the sample size. For a (1-α) CI for µ (σ known) to have margin of error m, we need sample size Example Suppose are heights from a normal distribution with σ=10’’. In a sample of size 16 we obtained . a) Find a 95% confidence interval for µ. What is its margin of error? b) Find the sample size needed for the margin of error to be 3 inches. Solution. σ =10’’, n=16, m=3. a) C = 0.95, so α = 0.05, so zα/2=1.96. 95% CI for the mean (sampling from normal distribution) is: =(60 +/-1.96(10)/√16) = (55.1, 64.9). Margin of error=(64.9 - 55.1)/2 = 4.9. b) Finally, n = 43. We need a larger sample size to get smaller error, with the same confidence level. Example: Assume that we want to estimate the mean IQ score for the population of statistics professors. How many statistics professors must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 2 IQ points of the population mean? Assume that σ = 15, as is found in the general population. Solution: z α/ 2 = 1.96, m = 2, σ = 15 With a simple random sample of only 217 statistics professors, we will be 95% confident that the sample mean will be within 2 IQ points of the true population mean µ. Example Randomly selected students from a large university participated in an experiment to test their ability to determine when 1 min (60 seconds) has passed. Forty students yielded a sample mean of 58.3 sec. Assuming that σ=9.5 sec, construct a 95% confidence interval for the population mean time for all students in that university. Solution: z α/ 2 = 1.96, = 58.3sec. , σ =9.5sec. 95% CI for the true mean perception time of 1 min: The margin of error for this estimate is: 2.9 Example re time perception contd. How many students should be sampled, so that the margin of error is less than 1 sec., with the same confidence level? What if we lower the confidence level to 80%? z α/ 2 = z 0.1 = 1.28 NOTE: Lowering the confidence level allows for a smaller sample size. CONFIDENCE INTERVAL FOR PROPORTION Binomial experiment, X = number of successes, n=number of trials, p=probability of success is unknown. Goal: Estimate the true/population proportion of successes p in a Binomial experiment using an interval estimate. Point estimate: NOTE: For large sample size n, Thus, is normally distributed with has a standard normal distribution. LARGE SAMPLE CONFIDENCE INTERVAL FOR THE PROPORTION Binomial experiment, X = number of successes, n=number of trials, p=probability of success. A C=(1-α) confidence interval for the population proportion p is given by if the sample size is large. C= confidence level is the probability/proportion of times 1- α that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times. The confidence level is also called degree of confidence, or the confidence coefficient. NOTES: Most common choices are 90%, 95%, or 99%. C is often expressed as percentage. EXAMPLE 400 people were asked if they are in favor of death penalty. 250 favored death penalty. Find a point estimate and a 95% confidence interval for the true proportion of people who are in favor of death penalty. Solution. X=number of people in the sample who favor death penalty, X=250, n=400, p= true proportion of people in the population who favor death penalty. Point estimate of p: 95% CI: C=0.95, so zα/2=1.96, and 95% CI for p is: The interval (0.578, 0.672) covers the true proportion of people in favor of death penalty with probability 0.95. SAMPLE SIZE AND MARGIN OF ERROR Question: What is the sample size required to keep the margin of error on the level m for a specified confidence level C=(1-α)? Margin of error = half length of the confidence interval, m = Solving for n, we get: Since p is unknown, we have to substitute our best guess for p, say p0. We can use info from previous surveys for p0. If no previous info available, use conservative approach and plug 0.5 for p, to get: EXAMPLE. Death penalty contd. 1. Find the sample size required for the margin of error to be 0.02 for a 90% confidence interval for the proportion of people in favor of death penalty. Solution. We have prior info from the survey n=1586. p0=0.625. Use it! Round UP! 2. Find the sample size required for the margin of error to be 0.02 for a 90% CI for p assuming no prior information about p. Solution. We have no prior info about p. Round UP! n=1692. NOTE: Without prior information, we need larger sample size to have the same margin of error. CONFIDENCE INTERVAL FOR MEAN WITH UNKNOWN σ If the population variance σ is unknown, estimate it using the sample variance S: Then, replace σ with S in the Z-statistic used for the confidence interval: Get t-statistic: The t-statistic does not have standard normal distribution when n is small. It has a t distribution with (n-1) degrees of freedom. The number of degrees of freedom is a parameter of the t distribution. NOTE. T-distribution is also called “Student’s t distribution”. T-distribution T-distribution has similarities and differences with standard Normal distribution: symmetric around zero, but it has fatter (heavier) tails. As degrees of freedom increase, t distribution becomes indistinguishable from standard normal. See last row of t- table CONFIDENCE INTERVAL FOR MEAN WITH UNKNOWN σ Following a procedure similar to the one for constructing CI for µ when σ was known, when data was from a normal population, we can find a CI for µ when σ is not known. We replace σ with S and standard normal percentiles with percentiles from t distribution. Let be observations from a normal distribution with mean µ and standard deviation σ, both µ and σ unknown. A C=(1-α) confidence interval for µ is given by when the data is from a normal population with σ unknown. Here tα/2 satisfies P(t(n-1)>tα/2 )=α/2, i.e. tα/2 is a percentile from t(n-1) distribution. We use t-table or a statistical computer package (e.g. MINTAB) to find percentiles of t-distribution: tα/2. EXAMPLE A biologist studying brain weights of tigers took a random sample of 16 animals and measured their brain weights in ounces. This data gave a sample mean of 10 and standard deviation of 3.2 ounces. Assuming that these weights follow a Normal distribution, find a 95% confidence interval for the true mean tiger brain weight µ. Solution. =10, s=3.2, n=16, need 95%CI for µ. C=0.95, so α=0.05, so α/2=0.025. Since n=16, then n-1=15. We need t(15)0.025. From the table t(15)0.025=2.131, so the 95% CI for µ is: 10+/- 2.131 (3.2)/√16 = (8.295, 11.705) oz. We are 95% confident that the true mean weight of a tiger brain is between 8.295 and 11.705 oz. Example: A study found the body temperatures of 106 healthy adults. The sample mean was 98.2 degrees and the sample standard deviation was 0.62 degrees. Find the margin of error E and the 95% confidence interval for µ. Solution: n = 106, = 98.20o, s = 0.62o, α = 0.05, α /2 = 0.025, degrees of freedom=105, t α/ 2 = 1.984 , (Table A-3 with df=100) Margin of error= 1.984 (0.62)/√106 =0.1195 95% CI for the mean temperature: 98.2+/- 1.984 (0.62)/√106 = (98.08, 98.32)