Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Mean field particle methods wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Regression toward the mean wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Final Exam Review Key Directions: Answer the following questions using the formulas on page seven as a reference. The 2002 GSS provides the following statistics for the average years of education and associated standard deviations for lower, working, middle, and upper class respondents. Use this data to answer the questions 1 – 4, assuming that years of education are normally distributed. Table 1. Mean and Std Dev for Education by Class Mean Standard Deviation N Lower Class 10.34 3.18 89 Working Class 12.64 2.50 675 Middle Class 14.19 2.39 675 Upper Class 15.00 3.27 49 Source: GSS 2002 1. What proportion of working class respondents have 12 to 16 years of education? Always start with a picture; since 12 is below the mean and 16 is above the mean, we are looking for the following area: To calculate the Z score, we use the formula: Z= Y Y SY The Z score for a value of 12 is: 12.00 12.64 2.50 0.64 = 2.50 Z = -0.26 The Z score for a value of 16 is 16.00 12.64 2.498 3.36 = 2.498 = 1.34 Z The area between a Z of -0.26 and the mean is 0.1026. The area between a Z of 1.34 and the mean is 0.4099, so the total area between the scores is Area .1026 0.4099 0.5125 So, the proportion of working class respondents with 12-16 years of education is 0.5125. 2. What proportion of upper class respondents have 12 to 16 years of education? Again since 12 is below the mean and 16 is above the mean, we are looking for the following area: The Z score for a value of 12 is: 12.00 15.00 3.27 3.00 0.92 = 3.27 = – 0.92 Z The Z score for a value of 16 is 2 16.00 15.00 0.31 3.27 1.00 = 3.27 = 0.31 Z The area between a Z of -0.92 and the mean is .3212. The area between a Z of 0.31 and the mean is 0.1217, so the total area between the scores is Area 0.3212 .1217 0.4429 So, the proportion of upper class respondents with 12-16 years of education is 0.4429. 3. What is the probability that a working class respondent, drawn at random from the population, will have more than 16 years of education? To answer this question, we have to standardize the raw score and find the area beyond the Z score. We should start with a picture: The Z score for a value of 16 is 16.00 12.64 2.50 3.36 = 2.50 = 1.34 Z The area between a Z of 1.34 and the tail of the distribution (Column C) is .0901. So, the probability of a working class respondent having more than 16 years of education is 0.0901. 3 4. What is the probability that a middle class respondent, drawn at random from the population, will have less than 12 years of education? To answer this question, we have to standardize the raw score and find the area beyond the Z score. We should start with a picture: The Z score for a value of 12 is 12.00 14.19 2.39 2.19 2.39 = -0.91 Z The area between a Z of 0.91 and the tail of the distribution (Column C) is 0.1814. So, the probability of a middle class respondent having less than 12 years of education is 0.1814. 4 The mean family income of a large southern city is $34,000, with a standard deviation (for the population) of $5,000. Imagine that you took a sample of 200 city residents and you calculated the mean income for that sample. Answer questions 5 – 6 based on this scenario. 5. What is the probability that your sample mean is between $33,000 and $34,000? Since the mean is $34,000, we want to find the area under the curve between $33,000 and $34,000. Starting with a diagram, we are looking for the following area under the curve: For this question, we want to use the sampling distribution of the sample mean. We can standardize our sample mean by using the formula: Z= Y Y Y N Notice that the standard error is in the denominator. We can calculate the standard error by using the formula: Y Y N The standard error of the mean of $34,000 is: $5,000 $353.55 200 A sample mean of $33,000 corresponds to a Z score of Z 33,000 34,000 353.55 5 Z 1,000 2.83 353.55 = – 2.83 The area between the score and the mean (which is $34,000) is about .4977, which is also the probability of a mean between $33,000 and $34,000. 6. What is the probability that the sample mean exceeds $37,000? For this question, we want to find the area beyond the Z score. We start with a picture: Z 37,000 34,000 8.48 353.55 3,000 353.55 = 8.48 This value is so large that it is not represented in Appendix B, which means that the probability is essentially zero for all intents and purposes. 6 Use the data from Table 1 (on page 1) to complete questions 7 – 9. 7. Construct the 95 percent confidence interval for the mean number of years of education for lower class and middle class respondents. Interpret the results. The general formula for a confidence interval is: CI = Y Z( Y ) The formula tells us to take the sample mean and subtract from it and add to it the quantity of the product between a Z score and the standard error. Since we are not given the population standard deviation, we must estimate it with the sample standard deviation. For a 95 percent confidence interval, we choose a Z score of 1.96. For the lower class respondents: The standard error is equal to: SY SY N 3.18 89 0.337 The confidence interval is equal to: Confidence Interval = 10.34 1.96(0.337) = 10.34 0.661 = 9.68 to 11.00 We can interpret this by saying that we are 95 percent confident that the true mean is no less than 9.68 and no greater than 11. For middle class respondents: SY SY N 2.391 675 Confidence Interval 0.092 = 14.19 1.96(0.092) = 14.19 0.18 = 14.01 to 14.37 We can interpret this by saying that we are 95 percent confident that the true mean is no less than 14.01 and no greater than 14.37. 7 8. Construct the 99 percent confidence interval for the mean number of years of education for lower class and middle class respondents. Interpret the results. For lower class respondents: SY SY N 3.18 89 0.337 Confidence Interval = 10.34 2.58(0.337) = 10.34 0.869 = 9.47 to 11.21 We can interpret this by saying that we are 99 percent confident that the true mean is no less than 9.47 and no greater than 11.21. For middle class respondents: SY SY N 2.391 675 Confidence Interval 0.092 = 14.19 2.58(0.092) = 14.19 0.237 = 13.95 to 14.43 We can interpret this by saying that we are 99 percent confident that the true mean is no less than 13.95 and no greater than 14.43. 9. As the confidence level increases, what happens to the size of the confidence interval? How does the confidence interval affect the precision of the estimate? As the confidence level rises, so does the width of the confidence interval. As the width of the confidence interval increases, the precision of the estimate decreases. 8 10. It is known that, nationally, doctors working for Heath Maintenance Organizations (HMOs) average 13.5 years of experience in their specialties, with a standard deviation of 7.6 years. The executive director of an HMO in a western state is interested in determining whether or not its doctors have less experience than the national average. A random sample of 150 doctors from the HMO shows a mean of only 10.9 years of experience. Test the hypothesis that doctors in this HMO have less experience than the national average. Use an alpha level of .01. Make certain to follow the five steps in hypothesis testing. This question involves a test of a single sample mean and the population. It is a onesided test. 1) the first step in hypothesis testing is to state assumptions We assume: 1. A random sample was used 2. The variable years of experience is measured on an interval-ratio level 3. Because N > 50, the assumption of normal population is not required 2) Second, we state the research and null hypothesis and the selected alpha level We want to test the hypothesis that doctors in this HMO have less experience than the national average, so this is a one-sided test: H1: Y < 13.5 years H0: Y = 13.5 years We choose an alpha of 0.01 3) Third, we select the sampling distribution and specify the test statistic We are given the population standard deviation, so we do not need to estimate it with the sample standard deviation. Hence we can use the Z distribution. The formula for the Z statistic is: Z= Y Y Y N 4) Now we compute the Z statistic: We plug the numbers we are given into the formula: 9 Z= 10.9 13.5 7 .6 150 2.6 = 7.6 12.25 2.6 = 0.62 = – 4.19 5) Now we make a decision and interpret the results Drawing a picture will make it easier to make a decision: The Z value obtained is –4.19. The p value for a Z of –4.19 is less than .001 for a one-tailed test. This is less than the alpha level of .01, and so the P value is less than the alpha level. We reject the null hypothesis. We have evidence in favor of the research hypothesis, and we conclude that the doctors at the HMO do have less experience than the population of doctors at all HMOs. 10 11. The 2000 International Social Survey Programme (ISSP) collected data on the educational attainment of males and females. Based on a random sample of 618 cases, males were found to have an average of 11.85 years of education with a standard deviation of 3.98 years. A random sample of 732 females found an average of 11.34 years of education with a standard deviation of 3.74 years. Using a .05 alpha level, test whether there is a significant difference in educational attainment between men and women. This question involves a test of the difference between two sample means. It is a twosided test. 1) Assumptions: 1. Independent random samples are used. 2. The variable years of education is measured at an interval-ratio level of measurement. 3. Because N1 > 50 and N2 > 50, the assumption of normal population is not required 4. The population variances are assumed equal 2) Research and null hypotheses and alpha level H1: 1 2 H0: 1 = 2 = .05 3) Sampling Distribution and Test Statistic Because we use the standard deviation to calculate the standard error of the sampling distribution, we use the t distribution t= Y1 Y2 S Y1 Y2 SY1 Y2 ( N1 1) SY21 ( N 2 1) SY22 ( N1 N 2 ) 2 df = (N1 + N2) – 2 11 N1 N 2 N1 N 2 4) Computing the test statistic df 618 732 2 1348 S Y 1 Y 2 (618 1)3.98 2 732 13.74 2 (618 732) 2 = (617)15.8404 73113.9876 (1,350) 2 618 732 618(732) 1,350 452,376 = 9,773.527 10,224.94 0.002984 (1,348) = 19,998.46 (0.054628) 1,348 = 14.83565 (0.054628) = 3.851708 0.054628 = .21 11.85 11.34 .21 . 51 = t .21 t = 2.43 5) Make Decision, interpret results We start with a diagram: Based on t obtained of 2.43, we can reject the null hypothesis. The probability of 2.43 lies between .02 and .01 for a two-tailed test, which is less than the alpha of .051. Based on the ISSP dataset, we conclude that 1 We do not need to divide the alpha by two because the p value for the two-sided test is given in the table 12 there is a relationship between gender and educational attainment. Men have an average of .51 more years of education than women. 13 Useful Formulas Z= Y Y SY Y Y Y M Y N CI = Y Z( Y ) Z= Y Y Y N t= Y Y SY N df = N – 1 t= Y1 Y2 S Y1 Y2 SY1 Y2 ( N1 1) SY21 ( N 2 1) SY22 ( N1 N 2 ) 2 df = (N1 + N2) – 2 N1 N 2 N1 N 2 14