Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 9 SAMPLING DISTRIBUTIONS AND CONFIDENCE INTERVALS 1 9.1 CHAPTER OBJECTIVES Motivation for Point Estimators Common Point Estimators Desirable Properties of Point Estimators Distribution of the Sample Mean: Large Sample or Known The Central Limit Theorem-A More Detailed Look Drawing Inferences by Using the Central Limit Theorem Large-Sample Confidence Intervals for the Mean 2 9.1 CHAPTER OBJECTIVES Distribution of the Sample Mean: Small Sample and Unknown Small-Sample Confidence Intervals for the Mean Confidence Intervals for Qualitative Data Sample Size Calculations 3 9.2 MOTIVATIONS FOR POINT ESTIMATORS A point estimate is a single number calculated from sample data. It is used to estimate a parameter of the population. A point estimator is the formula or rule that is used to calculate the point estimate for a particular set of data. 4 9.3 COMMON POINT ESTIMATORS 9.3.1 Point Estimators for Quantitative Variables If you are studying a single quantitative variable, then you typically wish to know the value of the population mean and the value of the population standard deviation. That is, you wish to know the center and the variability in the population. We will use the sample mean, X , to estimate and we will use the sample standard deviation, s, to estimate . 5 9.3 COMMON POINT ESTIMATORS For example, you may wish to compare: – the weight of diapers produced by two different machines – the sales of a product at two different locations – the time it takes to get your burger at McDonalds and Burger King – the salaries for men and women in the same occupation 6 9.3 COMMON POINT ESTIMATORS For example, if you wanted to compare your salary to your friend's salary, what would you do with the two salary figures? If you said subtract one number from the other, then you are correct. By convention we always subtract the second mean from the first and thus we need to estimate the true difference between l and 2, or l-2. It makes sense to estimate this true difference by using the actual difference in the two sample means or X X . 7 1 2 9.3 COMMON POINT ESTIMATORS Another way to compare two numbers is to find the ratio of one to the other. If the numbers are the same, then the ratio will be 1. We will use ratios to compare the amount of variation in the first population to the amount of variation in the second population. 8 9.3 COMMON POINT ESTIMATORS 9.3.2 Point Estimators for Qualitative Variables If the variable you are studying is a qualitative variable then you typically wish to know what proportion or percentage of the population has a particular characteristic. For example, you may wish to know the percentage of defective items in the population. The true unknown population percentage is labeled . In this case we will use the sample proportion, p, to estimate . 9 9.3 COMMON POINT ESTIMATORS We often wish to compare two qualitative variables. For example, you may wish to compare – the proportion of men who would buy a new product with the proportion of women who would buy it – the proportion of defectives produced by the second shift with the proportion of defectives produced by the third shift – the proportion of young people who like the new packaging with the proportion of older people who like it 10 9.3 COMMON POINT ESTIMATORS We would like to compare the true population proportions 1 to 2 and we accomplish the comparison with a subtraction. So we wish to estimate the true difference in the population proportions, 1-2. It makes sense to estimate this true difference by using the difference in the two sample proportions or p1-p2. 11 12 9.4 DESIRABLE PROPERTIES OF POINT ESTIMATORS In order to develop the properties of point estimators, we focus on estimators for the population mean, . Based on what we have seen so far, it makes sense to consider using one or more of these statistics as our point estimator of the unknown value . 13 9.4 DESIRABLE PROPERTIES OF POINT ESTIMATORS In summary, what we really want is a point estimator with the following two properties: – The point estimator should yield a number close to the unknown population parameter. – The point estimator should not have a great deal of variability. 14 9.4 DESIRABLE PROPERTIES OF POINT ESTIMATORS These two properties are more precisely stated as follows: – The point estimator should be unbiased. – The point estimator should have a small standard deviation. An unbiased estimator yields an estimate that is fair. It neither systematically overestimates the parameter nor systematically underestimates the parameter. 15 9.5 DISTRIBUTION OF THE SAMPLE MEAN, X 9.5.1 Putting Z-scores and the Empirical Rule to Use To calculate the Z-score we need the following formula: Dis tan ce between the data value and the average Z S tan dard deviation Z X x x 16 9.5 DISTRIBUTION OF THE SAMPLE MEAN, X 9.5.2 The Central Limit Theorem The standard error is the standard deviation of a point estimator. It measures how much the point estimator or sample statistic varies from sample to sample. The probability distribution of a point estimator or a sample statistic is called a sampling distribution. 17 9.5 DISTRIBUTION OF THE SAMPLE MEAN, X The Central Limit Theorem applies when you have a large enough sample size. A "large enough" sample size depends on how much the population distribution deviates from a normal distribution. Typically, if the sample size is larger than 30 then it is considered large enough. The larger the sample size, the better the normal approximation will be. 18 9.6 THE CENTRAL LIMIT THEOREM 9.6.1 The Shape of the Sampling Distribution of X The diaper company is taking samples of size n= 5 every hour. Because each individual sample size is small (n=5), to apply the CLT we will need to assume that the underlying distribution of diaper weights is normal or close to a normal shape. 19 9.6 THE CENTRAL LIMIT THEOREM We can get a sense of the shape of the population distribution by examining a histogram of sample observations. 20 9.6 THE CENTRAL LIMIT THEOREM 9.6.2 The Mean of the Sampling Distribution of X The second point of the Central Limit Theorem is that the mean of the sampling distribution of X equals the mean of the population you are sampling from. This means the center of the histogram of the X 's should be . 21 9.6 THE CENTRAL LIMIT THEOREM 9.6.3 The Standard Error of the Sampling Distribution of X The third point of the theorem says that the standard deviation of the X 's (also called the standard error) depends on two things: – the amount of variability you start with in the population, , – and the sample size, n. 22 9.6 THE CENTRAL LIMIT THEOREM 9.6.4 Summary of Central Limit Theorem Combining all three of the points of the Central Limit Theorem, we get Figure 9.6A, which displays the sampling distribution of X ; when n is sufficiently large. 23 9.6 THE CENTRAL LIMIT THEOREM 9.6.4 Summary of Central Limit Theorem We know from our work on the normal distribution that 68% of values will fall within one standard deviation of the mean, 95% will fall within two standard deviations of the mean, and 99.7% will fall within three standard deviations of the mean. 24 9.7 DRAWING INFERENCES BY USING THE CENTRAL LIMIT THEOREM 9.7.1 Using the Central Limit Theorem We used s, the sample standard deviation, as an estimate of in finding the Z-score, since was not given. This is not precisely the correct procedure but it is close enough given the large sample size. Z X X sX X X s/ n 25 9.8 LARGE-SAMPLE CONFIDENCE INTERVALS FOR THE MEAN 9.8.1 The Basics of Confidence Intervals Let's examine the components of the confidence interval. First of all, it has a lower bound for , called L and an upper bound for , called U. Finally, it has a probability value, which is called the confidence level and is labeled 1-. For any individual interval, is either in the interval or it is not. 26 9.8 LARGE-SAMPLE CONFIDENCE INTERVALS FOR THE MEAN In general, a confidence interval for the population mean has the following form: P(L U) = 1- A confidence interval or an interval estimate is a range of values with an associated probability or confidence level, 1- . The probability quantifies the chance that the interval contains the true population parameter. 27 9.8 LARGE-SAMPLE CONFIDENCE INTERVALS FOR THE MEAN 9.8.2 Confidence Interval for : Normally Distributed Population and Known Standard Deviation Remember that X is an unbiased estimate of and so it makes sense to put X right in the middle of the interval. To find the lower bound we take X and subtract e, and to find the upper bound we take X and add the value of e. 28 9.8 LARGE-SAMPLE CONFIDENCE INTERVALS FOR THE MEAN Suppose we decide we want to construct a 95% confidence interval ( =0.05) for the population mean, . Finally, the standard deviation of the sample mean, also known as the standard error, is equal to / n . Bringing these three pieces of information together tells us that e should equal 2 / n . 29 9.8 LARGE-SAMPLE CONFIDENCE INTERVALS FOR THE MEAN To get the correct value for Z, you must use that procedure with a tail area probability equal to /2. Label this value as Z/2. For a 95% confidence interval we divide = 0.05 by 2 and find that the area in one of the tails is 0.025. 30 31