Download Statistics Notes 10 - Home Page of Vance A. Hughey

Lecture Notes for Principles of Statistics II (Economics 262) Western Nevada College Copyright © 2008 By Vance A. Hughey Carson City, Nevada The following material is copyrighted. The text of this publication, or any part thereof, may not be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, storage in an information retrieval system, or otherwise, without the prior written permission of Vance A. Hughey. Statistical Inference About Means and Proportions with Two Populations Page 1 10. Statistical Inference About Means and Proportions with Two Populations—Expands the discussion of interval estimation and hypothesis testing to cases where two populations are involved. Specifically, this chapter discusses the selection of random samples and performs statistical analyses that will enable the drawing of conclusions about the difference between means and/or proportions of two populations. Estimation of the Difference Between the Means of Two Populations—Independent Samples Sometimes we are faced with situations where we have two separate populations where the difference between the means of the two populations is important. Examples of the kinds of questions that can be answered using this procedure include: • Is the average return on investment in common stocks different than the average return on municipal bonds? • Are women more likely to drink white wine than men? • Is one hospital more likely to perform cesarean sections than another hospital? • Is it reasonable to expect do-it-yourselfers to be younger than non-do-it-yourselfers? This procedure is also useful in conducting efficacy studies. These are studies to determine the capacity of a product or procedure to produce a desired effect. To begin with, we will take two random samples, one each from different populations. From this sample we will be able to compute the sample mean for each sample: x1 = sample mean for the sample taken from population 1 x 2 = sample mean for the sample taken from population 2 We can then use the sample means as point estimates of the population means: µ1 = mean of population 1 µ2 = mean of population 2 And since the difference between the mean of population 1 and the mean of population 2 is µ1 − µ 2 , we can say that x1 − x 2 is a point estimate of the difference between the means of two populations. This is the point estimate of interest in answering the kinds of questions I mentioned a few minutes ago. And this point estimate, like other point estimators, has a sampling distribution of its own with a mean = µ1 − µ2 and a standard deviation, Statistical Inference About Means and Proportions with Two Populations Page 2 σ12 σ 22 + . If we use a large sample, n ≥ 30 , then the sampling distribution can n1 n2 be approximated by a normal probability distribution. σ x1 −x 2 = The sampling distribution of x1 − x2 is ! x1 " x2 ! 12 ! 22 = + n1 n2 x1 " x2 µ1 " µ2 The interval estimate of the difference between the means of two populations follows the same procedure that we have used for interval estimation in Chapter 8: x1 − x2 ± zα / 2σ x1 −x 2 Since s may not be known, we can again substitute s for σ giving us: sx1 −x 2 = s12 s22 + n1 n2 Hypothesis Tests About the Difference Between the Means of Two Populations— σ 1 and σ 2 Known Now let’s conduct a hypothesis test. As usual, we have to consider two different cases. The first is the case where both σ 1 and σ 2 are assumed to be known. The second is the case where both σ 1 and σ 2 are assumed to be unknown. σ 1 and σ 2 Known—Let’s use the Greystone Department Store example beginning on page 395 to illustrate how to conduct this hypothesis test: Statistical Inference About Means and Proportions with Two Populations Page 3 The Greystone Department Store study provided the following data on the ages of customers from independent random samples taken at two store locations: Inner-City Store Suburban Store n1 = 36 n 2 = 49 x1 = 40 years x 2 = 35 years σ 1 = 9 years σ 2 = 10 years The questions we need to answer are: a. State the hypotheses that could be used to detect a difference between the population mean ages at the two stores. b. What is the value of the test statistic? c. What is the p-value? d. At α = .05 , what is your conclusion? For α = .05 , test the hypothesis H 0 : µ1 − µ2 = 0 against the alternative hypothesis H a : µ1 − µ2 ≠ 0 . Since we are using α = .05 , and since we are going to conduct a twotailed test, we need to find zα / 2 . Since 1− α = .95 and .95 ÷ 2 = 0.4750 , zα / 2 = 1.96 . If we calculate a test statistic z-value less than –1.96 or greater than 1.96, we will reject the null hypothesis and conclude that the two means are not equal. 2 "x "! x–!1 #xx2 = 1 2 !/2 = .025 !/2 = .025 -1.96 0 z 1.96 reject H0 The test statistic is z = 2 " " " 1" 22 2 + !+! n1 n1 n2 n2 2 1 reject H0 (x1 − x2 ) − D0 σ 12 σ 22 + n1 n2 , where D0 = µ1 − µ2 . Statistical Inference About Means and Proportions with Two Populations Page 4 Using the sample standard deviation in place of the population standard deviation, we get z = 2.41. Since 2.41 > 1.96, we will conclude that we must reject the null hypothesis; customers at the two stores differ in terms of their ages. σ 1 and σ 2 Unknown—Now we can see how hypothesis testing differs when σ 1 and σ 2 are not known. When we compare two means, the population standard deviations are rarely known. This creates technical problems that require modification of the methods we just employed. In particular, we will employ the t-distribution much as we did in the case of inferences about a single population mean when σ was not known. In computing the degrees of freedom, we use the formula: df = n1 + n2 − 2 . We could use the formula on page 403, but it is rather complicated and doesn’t yield a value that is significantly difference than the one using the simpler formula. df = ⎛ s12 s22 ⎞ ⎜⎝ n + n ⎟⎠ 1 2 2 2 1 ⎛ s12 ⎞ 1 ⎛ s22 ⎞ + n1 − 1 ⎜⎝ n1 ⎟⎠ n2 − 1 ⎜⎝ n2 ⎟⎠ 2 Let’s do an exercise to solve a problem where σ 1 and σ 2 are unknown. Salary surveys of marketing and management majors show the following starting annual salary data: Marketing Majors Management Majors n1 = 14 n 2 = 16 x1 = $19, 800 x 2 = $19, 300 s1 = $1, 000 s 2 = $1, 400 The null hypothesis is that the mean annual salaries are the same for both majors. H 0 : µ1 − µ2 = 0 H a : µ1 − µ2 ≠ 0 Degrees of freedom = n1 + n2 − 2 = 28 . Reject H0 : if t < −2.048 or t > 2.048 (we find this value in the t-distribution table where df = 28 and area in upper tail is .025). Statistical Inference About Means and Proportions with Two Populations Page 5 t= t= ( x1 − x2 ) − D0 s12 s22 + n1 n2 (19, 800 − 19, 300 ) − ( 0 ) = 1.14 1, 000 2 1, 400 2 + 14 16 Do not reject H 0 ; there is no significant difference between the mean annual salaries of the two majors. Inferences About the Difference Between the Means of Two Populations—Matched Samples Sometimes it is either necessary or desirable to eliminate “noise” in the sampling process to assure a better test. “Noise reduction” in experimental design involves choosing samples in order to reduce the inherent variation and thereby sharpen the ability of a test to detect differences between populations. Using the method of matched samples, researchers are able to block out variations in sample data that are attributable to factors other than the process being examined. Matched sample design typically involves wanting to test two different processes or procedures. In this case, we could have two different groups of workers perform their tasks using one or the other of these processes or procedures. But, that would introduce the possibility of sampling error even if there were no difference in the procedures. To limit this noise, we can design a test where one group is used to test both procedures. Then the difference between the results for each element of the group will be averaged. If there were absolutely no difference between the two procedures, we would expect the average difference to be zero. Since it is unlikely for the sample data to result in a difference of zero, we need to be able to assess the probability that a sample could result in a particular non-zero value even though the population value was, in fact, zero. To use this matched-sample methodology, we still need to establish a null hypothesis and alternative hypothesis. But now we will let µ d = the mean of the difference values for the population and establish the hypotheses as follows: H0 : µd = 0 H a : µd ≠ 0 Statistical Inference About Means and Proportions with Two Populations Page 6 n Now instead of using x we will use d = n the difference data is sd = ∑ (d i=1 they are x and s , respectively. i −d) n −1 ∑d i=1 n i . And the sample standard deviation for 2 . Note: Calculators will treat d and sd as if We will then apply the standard techniques of statistical inference. The example on page 410 of the text uses a small sample with n = 6 , which dictates the use of the t-distribution: t= d − µd sd / n We also assume the population of differences has a normal population. Let’s once again refer to an exercise in the text to see if we can solve a matched-sample problem—question 21 on page 414. 21. A market research firm used a sample of individuals to rate the potential to purchase for a particular product before and after the individuals saw a new television commercial about the product. The potential to purchase ratings were based on a 0 to 10 scale, with higher values indicating hire potential to purchase. The null hypothesis stated that the mean rating “after” would be less than or equal to the mean rating “before.” Rejection of this hypothesis would provide the conclusion that the commercial improved the mean potential to purchase rating. Use α =.05 and the following data to test the hypothesis and comment on the value of the commercial: Individual 1 2 3 4 5 6 7 8 Purchase Rating After Seeing Commercial 6 6 7 4 3 9 7 6 Purchase Rating Before Seeing Commercial 5 4 7 3 5 8 5 6 Statistical Inference About Means and Proportions with Two Populations Difference 1 2 0 1 –2 1 2 0 Page 7 Difference = rating after – rating before. H0 : µd ≤ 0 H a : µd > 0 With n − 1 = 7 degrees of freedom, reject H 0 if t > 1.895 d = 0.63 sd = 1.3025 t= d − µd sd / n = 0.63 − 0 1.3025/ 8 = 1.37 Do not reject H 0 ; we cannot conclude that seeing the commercial improves the potential to purchase. Inferences About the Difference Between the Proportions of Two Populations When considering the difference between two population proportions, we will use the following properties: Mean : E( p1 − p2 ) = p1 − p2 Standard Deviation: σ p1 − p2 = p1 (1− p1 ) n1 + p2 (1− p2 ) n2 The sampling distribution of p1 − p2 can be approximated by a normal probability distribution if the sample sizes are large. We have large samples if: n1 p1 ; n1 (1 − p1 ) ; n 2 p2 ; and n 2 (1 − p2 ) are all greater than or equal to 5. To compute an interval estimate of the difference between the proportions of two populations (assuming a large sample case) use the following formula: p1 − p2 ± zα 2σ p1 − p2 Statistical Inference About Means and Proportions with Two Populations Page 8 where 1− α is the confidence coefficient. If σ p1 − p2 is unavailable, use the estimate of σ p1 − p2 : p1 (1− p1 ) s p1 − p2 = n1 + p2 (1− p2 ) n2 When conducting hypothesis tests using proportions, the test statistic is written as: z= ( p1 − p2 ) − ( p1 − p2 ) ⎛1 1⎞ p (1 − p ) ⎜ + ⎟ ⎝ n1 n2 ⎠ When the null hypothesis to be tested is H 0 : p1 − p2 = 0 , we are assuming that p1 = p2 , so we do not need to use the individual values for p1 − p2 in the formula for s p1 − p2 . Instead, we only need one value of p since it can be used as an estimate for both p1 and p2 . Again, we will need to pool the two sample results to get one estimate of the population proportion: p= n1 p1 + n2 p2 n1 + n2 The formula for the standard deviation of the sampling distribution then becomes: s p1 − p2 = ⎛1 1⎞ p (1 − p ) ⎜ + ⎟ ⎝ n1 n2 ⎠ Statistical Inference About Means and Proportions with Two Populations Page 9

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistics Notes 10 - Home Page of Vance A. Hughey