* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Estimating with Confidence
Survey
Document related concepts
Transcript
It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon. Hooke Chapter 10 - Sec 10.1 TWO METHODS FOR INFERENCE: CONFIDENCE INTERVALS are used to estimate the "value" of a population parameter. The interval establishes boundaries between which we can have a specified level of confidence about our parameter of interest. There are different levels of confidence and we will use the most common ones and also learn to calculate any desired level. TESTS of SIGNIFICANCE assess the evidence for a "claim" about the population as a result of gathered data. These tests determine whether the results can be explained by chance occurrence or not and whether the results differ enough from chance to be statistically significant. Both procedures are based on sampling distributions, from sample proportions or sample means, and report the probabilities that state what would happen if we used inference methods many times. Long run regular behavior is required. Inference is most reliable when data comes from RANDOMIZED samples. We must rely on previously learned concepts especially normal distributions and the Central Limit Theorem as we move forward with our logic. We also will rely on standard deviation of the sample = standard deviation of the population divided by the square root of N (number of trials in the sample). When we make a claim about a population parameter, we can say that the parameter is "somewhere around" our sample statistic. SOMEWHERE AROUND is not precise enough. A better question would be "How would the sampling statistic vary if we took many samples of equal size from the same population." Estimating with Confidence: Suppose I want to know how often teenagers go to the movies. Specifically, I want to know how many times per month a typical teenager (ages 13 through 17) goes to the movies. Suppose I take an SRS of 100 teenagers and calculate the sample mean to be x 2.1 The sample mean is an unbiased estimator of the unknown population mean μ, so I would estimate the population mean to be approximately 2.1. However, a different sample would have given a different sample mean, so I must consider the amount of variation in the sampling model for x . ▪ The sampling model for x is approximately normal. (CLT) ▪ The mean of the sampling model is μ. ▪ The standard deviation of the sampling model is n assuming the population size is at least 10n. μ - 2σ μ μ + 2σ Suppose we know that the population standard deviation is σ = 0.5. Then the standard 0.5 deviation for the sampling model is = .05 100 n Then 95% of our samples will produce a statistic x that is between μ – 0.10 and μ + 0.10. Therefore in 95% of our samples, the interval between μ – 0.10 and μ + 0.10 will contain the parameter μ (the true population mean). The margin of error is 0.10. For our sample of 100 teenagers, x 2.1 . Because the margin of error is 0.10, then we are 95% confident that the true population mean lies somewhere in the interval 2.1 ± 0.10 or [2.0, 2.2]. The interval [2.0, 2.2] is a 95% confidence interval because we are 95% confident that the unknown μ lies between 2.0 and 2.2. How do we construct confidence intervals? Start with sample data. Compute an interval that has probability C of containing the true value of the parameter. This is called a level C confidence interval. Since the sampling model of the sample mean x is approximately normal, we can use normal calculations to construct confidence intervals. For a 95% confidence interval, we want the interval corresponding to the middle 95% of the normal curve. For a 90% confidence interval, we want the interval corresponding to the middle 90% of the normal curve. And so on… If we are using the standard normal curve, we want to find the interval using z-values. Suppose we want to find a 90% confidence interval for a standard normal curve. If the middle 90% lies within our interval, then the remaining 10% lies outside our interval. Because the curve is symmetric, there is 5% below the interval and 5% above the interval. Find the z-values with area 5% below and 5% above. These z-values are denoted ± z*. Because they come from the standard normal curve, they are centered at mean 0. z* is called the upper p critical value, with probability p lying to its right under the standard normal curve. For a 95% confidence interval, we want the z-values with upper p critical value 2.5%. For a 99% confidence interval, we want the z-values with upper p critical value 0.5%. Remember that z-values tell us how many standard deviations we are above or below the mean. To construct a 95% confidence interval, we want to find the values 1.96 standard deviation below the mean and 1.96 standard deviations above the mean, or μ ± 1.96σ. Using our sample data, this is x 1.96 n , assuming the population is at least 10n. In general, to construct a level C confidence interval using our sample data, we want to find x z * n . The margin of error is z * n . Note that the margin of error is a positive number. It is not an interval. We would like high confidence and a small margin of error. A higher confidence level means a higher percentage of all samples produce a statistic close to the true value of the parameter. Therefore we want a high level of confidence. A smaller margin of error allows us to get closer to the true value of the parameter, so we want a small margin of error. So how do we reduce the margin of error? ▪ Lower the confidence level (by decreasing the value of z*) ▪ Lower the standard deviation ▪ Increase the sample size. To cut the margin of error in half, increase the sample size by four times the previous size. ** You can have high confidence and a small margin of error if you choose the right sample size.** To determine the sample size n that will yield a confidence interval for a population mean with a specified margin of error m, set the expression for the margin of error to be less than or equal to m and solve for n. z* z * m OR n n m 2 CAUTION!! These methods only apply to certain situations. In order to construct a level C confidence interval using the formula x z * n 1) the data must be an SRS 2) we must know the population standard deviation 3) we want to eliminate (if possible) any outliers. The margin of error only covers random sampling errors. Things like under-coverage, non-response, and poor sampling designs can cause additional errors. STEPS TO CONSTRUCT A CONFIDENCE INTERVAL: 1. Identify the population of interest and the parameter you want to draw conclusions about. (μ = the true mean….) 2. Verify conditions are met/Assumptions (SRS, approx. normal, pop at least 10n) 3. Name the procedure (1 sample mean Z interval)/write the formula, do calculations 4. Interpret results in the context of the problem. (Based on this sample, I am ___% confident that the true mean is between ____ and ____) Using TI-83, press STAT – TESTS – 7:Zinterval, adjust your settings, choose Calculate