Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 11 Comparing Two Populations or Treatments Section 11.1 Inferences Concerning the Difference between Two Populations or Treatment Means Using Independent Samples Suppose we have a population of adult men with a mean height of 71 inches and standard deviation of 2.5 inches. We also have a population of adult women with a mean height of 65 inches and standard deviation of 2.3 inches. Assume heights are normally distributed. Suppose we take a random sample of 30 men and a random sample of 25 women from their respective populations and calculate the difference On the next slide we will in their heights (man’s height – woman’s height). investigate this distribution. If we did this many times, what would the distribution of differences be like? Male Heights Female Heights Randomly take one of the sample means for the 71 sM = 2.5 65 sF = 2.3 malesSuppose and one ofwe took repeated Suppose we took repeated thesamples sample samples of size n = 25 from themeans of size n = 30 from the for the females population of female heights population and of male heights and and find the calculated the sample difference means. calculated We the sample means. We in mean would have the sampling would have the sampling heights. distribution of xF 71 s xM Doing this repeatedly, we will create the sampling distribution of (xM – xF) 2.5 30 distribution of xM. 65 xM - xF s xF 2. 3 25 2 2.5 2.3 σ x M -x F = + 30 25 6 2 Heights Continued . . . Describe the sampling distribution of the difference in mean heights between men and women. The sampling distribution is normally distributed with 2.5 2.3 s ( ) ( ) xM xF 71 65 6 30 25 xM x F 2 2 What is the probability that the difference in mean heights of a random sample of 30 men and a random sample of 25 women is less than 5 inches? P((x M x F ) 5) .0618 6 Notation - Comparing Two Means Mean Standard Value Variance Deviation Population or Treatment 1 1 s Population or Treatment 2 2 s 2 1 2 2 s1 s2 More Notation - Comparing Two Means Sample Standard Size Mean Variance Deviation Population or Treatment 1 n1 x1 s12 s1 Population or Treatment 2 n2 x2 s22 s2 Properties of the Sampling Distribution of x1 – x2 If the random samples on which x1 and x2 are based are selected independently of one another, then 1. μ x -x = μ x - μ x = μ1 - μ 2 1 2 1 2 2 2 σ σ σ σ 2 1 2 of x – x is always σ = σ + σ = + σ = + 1 2 2. xMean and x1 -x 2 1 -x 2 value of n n n2 – 2, so x1 – x2 isnan 1 x1 – unbiased x2 statistic for estimating 1 – 2. 2 2 2 2 1distribution 2 The sampling x1 x2 centered at the1value of 2 1 3. In n1variance and n2 are large or theispopulation distributions are The of both the differences the (at least approximately) normal, x1 and x2 each have (at least sum of the variances. approximately) normal distributions. This implies that the sampling distribution of x1 – x2 is also (approximately) normal. The properties for the sampling distribution of x1 – x2 implies that x1 – x2 can be standardized to obtain a variable with a sampling distribution that is approximately the standard normal (z) distribution. When two random samples are independently selected and Weare must n1 and n2 are both large or the population distributions If s1ofknow and ss (at least approximately) normal, the distribution 2 1isand z= x1 - x 2 - (μ1 - μ 2 ) σ12 σ 22 + n1 n 2 unknown s2 in we order musttouse uset this distributions. procedure. is described (at least approximately) by the standard normal (z) distribution. Two-Sample t Test for Comparing Two Populations Null Hypothesis: H0: 1 – 2 = hypothesized value Test Statistic: t= x1 - x 2 - hypothesized value 2 2 s s 1 A conservative + 2 estimate of the Pn1 found n2 The hypothesized is tvalue can be byvalue using the oftenwith 0, but there are times of curve the number of degrees The appropriate df for the two-sample t test is freedom equal to the smaller of when we are interested in 2 V1 +V2 (n1 – 1) or (n2 – 1).that 2 difference testing for a is 2 s df= 2 s 1 2 V2 = where V1 = n not and V1 V22 0. n2 1 + n1 -1 n 2 -1 The computed number of df should be truncated to an integer. Two-Sample t Test for Comparing Two Populations Continued . . . Null Hypothesis: H0: 1 – 2 = hypothesized value Alternative Hypothesis: P-value: Ha: 1 – 2 > hypothesized value Area under the appropriate t curve to the right of the computed t Ha: 1 – 2 < hypothesized value Area under the appropriate t curve to the left of the computed t Ha: 1 – 2 ≠ hypothesized value 2(area to right of computed t) if +t or 2(area to left of computed t) if -t Another Way to Write Hypothesis Statements: H0: 1 = - 22 = 0 Ha: 1 < - 22< 0 Ha: 1 > - 22> 0 Ha: 1 -≠22≠ 0 When the hypothesized value is 0, we Be sure to can rewrite define theseBOTH 1 and 2! hypothesis statements: Two-Sample t Test for Comparing Two Populations Continued . . . Assumptions: 1) The two samples are independently selected random samples from the populations of interest 2) The sample sizes are large (generally 30 or larger) or the population distributions are (at least approximately) normal. When comparing two treatment groups, use the following assumptions: 1) Individuals or objects are randomly assigned to treatments (or vice versa) 2) The sample sizes are large (generally 30 or larger) or the treatment response distributions are approximately normal. Are women still paid less than men for comparable work? A study was carried out in which salary data was collected from a random sample of men and from a random sample of women who worked as purchasing managers and who were subscribers to Purchasing magazine. Annual salaries (in thousands of dollars) appear below (the actual sample sizes were much larger). Use a = .05 to determine if there isIfconvincing evidence that mean annual we had defined 1 as thethe mean salary for salary for male purchasing managers is greater than the mean annual female purchasing managers and 2 as the salary for female purchasing managers. mean salary for male purchasing managers, alternative Men 81 69 then 81 the 76 correct 76 74 69 76 79 65 hypothesis would be the difference in the Women 78 60 67 61 62 73 71 58 68 48 means is less than 0. H0: 1 – 2 = 0 Ha: 1 – 2 > 0 Where 1 = mean annual salary for male State the hypotheses: purchasing managers and 2 = mean annual salary for female purchasing managers Salary War Continued . . . Men 81 69 Women 78 60 H0: 1 – 2 = 0 Ha: 1 – 2 > 0 81 67 76 61 76 62 74 73 69 71 76 58 79 68 65 48 Where 1 = mean annual salary for male purchasing managers and 2 = mean annual salary for female purchasing managers Assumptions: 1) Given two independently selected random samples of male and female purchasing managers. Men 2) Since the sample sizes are small, we must Even though Verify these the are assumptions samples from subscribers of determine if it is plausible that the sampling Women Purchasing of the distributions for magazine, each of the the two authors populations are study believed it 60 approximately normal. the samples boxplots are was reasonable to Since view the as representative of reasonably symmetrical with no outliers, it is the populations of interest. plausible that the sampling distributions are approximately normal. 80 Salary War Continued . . . Men 81 69 81 76 76 74 69 76 79 65 Women 78 60 67 61 62 73 71 58 68 48 Where 1 = mean annual salary for male H0: 1 – 2 = 0 managers and 2 = mean annual Ha: 1 – What 2 > 0potentialpurchasing type for of error salary female purchasing managers could we have made with this 74.6 64.6 0 3.11 conclusion? t Test Statistic: (round down) this 8.62 Type I 5.4 2 Truncate value. 10 10 P-value =.004 a = .05 Now find the area to the 2.916 7.3962 Since the P-value < a, in wethe reject is convincing dfH 0. There 15.14 15 right of t = 3.11 t-curve 2 2 evidence that the mean salaryCompute for male purchasing 2.916the 7 .396statistic test P-value, with than df = 15. To find the for first find managers is higher the mean salary female 9and P-value 9 purchasing managers. the appropriate df. Mean Fill Example We would like to compare the mean fill of 32 ounce cans of beer from two adjacent filling machines. Past experience has shown that the population standard deviations of fills for the two machines are known to be s1 = 0.043 and s2 = 0.052 respectively. A sample of 35 cans from machine 1 gave a mean of 16.031 and a sample of 31 cans from machine 2 gave a mean of 16.009. State, perform and interpret an appropriate hypothesis test using the 0.05 level of significance. Mean Fill Example Continued 1 = mean fill from machine 1 2 = mean fill from machine 2 H0: 1 - 2 = 0 Ha: 1 2 Significance level: a = 0.05 Test statistic: z x1 x 2 hypothesized value s s n1 n 2 2 1 2 2 x1 x 2 0 s12 s22 n1 n 2 Mean Fill Example Continued Since n1 and n2 are both large (> 30) we do not have to make any assumptions about the nature of the distributions of the fills. This example is a bit of a stretch, since knowing the population standard deviations (without knowing the population means) is very unusual. Accept this example for what it is, just a sample of the calculation. Generally this statistic is used when dealing with “what if” type of scenarios and we will move on to another technique that is somewhat more commonly used when s1 and s2 are not known. Mean Fill Example Continued Calculation: z X1 X 2 0 s12 s 22 n1 n 2 16.031 16.009 0.0432 0.0522 35 31 P-value: P-value = 2P(z > 1.86) = 2P(z < -1.86) = 2(0.0314) = 0.0628 1.86 Mean Fill Example Continued Since the P-value > a, we fail to reject H0. There is not convincing evidence that the two machines produce bottles with different mean fills. Cold Medicine Example In an attempt to determine if two competing brands of cold medicine contain, on the average, the same amount of acetaminophen, twelve different tablets from each of the two competing brands were randomly selected and tested for the amount of acetaminophen each contains. The results (in milligrams) follow. Use a significance level of 0.01. Brand A 517, 495, 503, 491 503, 493, 505, 495 498, 481, 499, 494 Brand B 493, 508, 513, 521 541, 533, 500, 515 536, 498, 515, 515 State and perform an appropriate hypothesis test. Cold Medicine Example Continued 1 = the mean amount of acetaminophen in cold tablet brand A 2 = the mean amount of acetaminophen in cold tablet brand B H0: 1 = 2 (1 - 2 = 0) Ha: 1 2 (1 - 2 0) Significance level: a = 0.01 Test Statistic t x1 x 2 hypothesized mean s12 s 22 n1 n 2 x1 x 2 0 s12 s 22 n1 n 2 Cold Medicine Example Continued Assumptions: The samples were selected independently and randomly. Since the samples are not large, we need to be able to assume that the populations (of amounts of acetaminophen) are both normally distributed. Cold Medicine Example Continued Assumptions (continued): As we can see from the normality plots and the boxplots, the assumption that the underlying distributions are normally distributed appears to be quite reasonable. Cold Medicine Example Continued Calculation: n1 12, x1 497.83, s1 8.830 n 2 12, x 2 515.67, s1 15.144 t x1 x 2 0 2 1 2 2 s s n1 n 2 497.83 515.67 0 2 8.830 15.144 12 12 2 3.52 Cold Medicine Example Continued Calculation: s12 8.83002 V1 6.4974 n1 121 s 22 15.144 2 V2 = =19.112 n2 12 2 V V (6.4974 19.112) 1 2 df 17.7 2 2 2 2 V1 V2 6.4974 19.112 n1 1 n 2 1 11 11 2 We truncate the degrees of freedom to give df = 17. Cold Medicine Example Continued P-value: From the table of tail areas for t curve (Table IV) we look up a t value of 3.5 with df = 17 to get 0.001. Since this is a two-tailed alternate hypothesis, P-value = 2(0.001) = 0.002. Conclusion: Since P-value = 0.002 < 0.01 = a, H0 is rejected. The data provides strong evidence that the mean amount of acetaminophen is not the same for both brands. Specifically, there is strong evidence that the average amount per tablet for brand A is less than that for brand B. The Two-Sample t Confidence Interval for the Difference Between Two Population or Treatment Means The general formula for a confidence interval for 1 – 2 when 1) The two samples are independently selected random samples from the populations of interest 2) The sample sizes are large (generally 30 or larger) or the population distributions are (at least approximately) normal. is 2 2 s s 1 the2 following For a comparison two treatments, critical value) use + x1 -x 2 of ±(t n1 n 2 assumptions: The t critical value is based on 1) Individuals2 or objects are randomly assigned to V +V2 vice versa) treatments df= 21 (or s V1 V22 s V= V= + 2) The sample sizes are large (generally 30 or larger) or where and n n n1 -1 n 2 -1 the treatment response distributions are approximately dfnormal. should be truncated to an integer. 2 1 1 2 2 2 1 2 In a study on food intake after sleep deprivation, men were randomly assigned to one of two treatment groups. The experimental group was required to sleep only 4 hours on each of two nights, while the control group was required to sleep 8 hours on each of two nights. The amount of food intake (Kcal) on the day following the two nights of sleep was measured. Compute a 95% confidence interval for the true difference in the mean food intake for the two sleeping conditions. 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 3099 3338 the mean x and standard deviation for x4 = 3924 s4 =Find 829.67 8 = 4069.27 s8 = 952.90 each treatment. Food Intake Study Continued . . . 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 x4 = 3924 s4 = 829.67 3099 3338 x8 = 4069.27 s8 = 952.90 Assumptions: 1) Men were randomly assigned to two treatment groups Verify the assumptions. 2) The assumption of normal response 4-hour distributions is plausible because both 8-hour boxplots are approximately 4000 symmetrical with no outliers. Food Intake Study Continued . . . 4-hour sleep 3585 4470 3068 5338 2221 4791 4435 3187 3901 3868 3869 4878 3632 4518 8-hour sleep 4965 3918 1987 4993 5220 3653 3510 4100 5792 4547 3319 3336 4304 4057 3099 3338 x4 Based = 3924upon s4 =this 829.67 x8 = a4069.27 interval, is there significants8 = 952.90 difference in the mean food intake for the two 2 No, sincesleeping 0 is in the confidence there is not 829.672 952.90interval, conditions? (3924 4069.27) 2.05 (814.0, 523.5) 15 mean 15 food intake for the two convincing evidence that the Calculate the interval. sleep conditions are different. We are 95% confident that the true difference in the mean Interpret the interval in context. food intake for the two sleeping conditions is between -814.1 Kcal and 523.5 Kcal. Thread Example Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. The sample data is given in the following table. Construct a 98% confidence interval for the difference of the population means. Thread A Thread B Sample Sample Sample Standard Size mean Deviation 50 78.3 5.62 50 87.2 6.31 Thread Example Continued s12 5.622 V1 0.632 n1 50 2 2 2 s 6.31 V2 0.796 n2 50 0.632 0.796 df 2 0.632 0.796 49 49 2 2 96.7 Truncating, we have df = 96. Thread Example Continued Looking on the table of t critical values (table III) under 98% confidence level for df = 96, (we take the closest value for df , specifically df = 120) and have the t critical value = 2.36. 5.622 6.312 78.3 87.2 2.36 50 50 8.9 2.82 The 98% confidence interval estimate for the difference of the means of the tensile strengths is (-11.72, -6.08) Octane Example A student recorded the mileage he obtained while commuting to school in his car. He kept track of the mileage for twelve different tanksful of fuel, involving gasoline of two different octane ratings. Compute the 95% confidence interval for the difference of mean mileages. His data follow: 87 Octane 26.4, 27.6, 29.7 28.9, 29.3, 28.8 90 Octane 30.5, 30.9, 29.2 31.7, 32.8, 29.3 Octane Example Continued Let the 87 octane fuel be the first group and the 90 octane fuel the second group, so we have n1 = n2 = 6 and x1 =28.45, s1 1.228, x 2 =30.73, s 2 1.392 s 22 1.3922 V2 0.3231 n2 6 s12 1.2282 V1 0.2512 n1 6 0.2512 0.3231 df 2 2 0.2512 0.3231 5 5 Truncating, we have df = 9. 2 9.8 Octane Example Continued Looking on the table under 95% with 9 degrees of freedom, the critical value of t is 2.26. 2 1 2 2 s s x1 - x 2 (t critical value) n1 n 2 1.2282 1.3922 28.45 30.73 2.26 6 6 -2.28 1.71 The 95% confidence interval for the true difference of the mean mileages is (-3.99, -0.57). Octane Example Continued Comments: We had to assume that the samples were independent and random and that the underlying populations were normally distributed since the sample sizes were small. If we randomized the order of the tankfuls of the two different types of gasoline we can reasonably assume that the samples were random and independent. By using all of the observations from one car we are simply controlling the effects of other variables such as year, model, weight, etc. Octane Example Continued By looking at the following normality plots, we see that the assumption of normality for each of the two populations of mileages appears reasonable. Given the small sample sizes, the assumption of normality is very important, so one would be a bit careful utilizing this result. Pooled t Test • Used when the variances of the two populations are equal (s1 = s2) • CombinesP-values information fromusing boththe samples computed pooled tto create a “pooled” estimate offrom the the common procedure can be far actual variance which in placevariances of the two P-valueisif used the population are not equal. sample standard deviations When the population variances are equal, • Is not widely used due to itsis sensitivity to any the pooled t procedure better at detecting departure from the equal variance assumption deviations from H0 than the two-sample t test. Section 11.2 Difference in Means with Paired Samples Suppose that an investigator wants to determine if regular aerobic exercise improves blood pressure. A random sample of people who jog regularly and a second random sample of people who do not exercise regularly are selected independently of one another. Can we conclude that the difference in mean blood pressure is attributed to jogging? What about other factors like weight? One way to avoid these difficulties would be to pair subjects by weight then assign one of the pairs to jogging and the other to no exercise. Summary of the Paired t test for Comparing Two Population or Treatment Means Null Hypothesis: H0: d = hypothesized value x d - hypothesized value Where The the mean of the hypothesized value is usually Test Statistic: d is t= sd differences 0 –inmeaning the paired that there is no n Where n is the number ofobservations sample differencesdifference. and xd and sd are the mean and standard deviation of the sample differences. This test is based on df = n – 1. Alternative Hypothesis: Ha: d > hypothesized value Ha: d < hypothesized value Ha: d ≠ hypothesized value or P-value: Area to the right of calculated t Area to the left of calculated t 2(area to the right of t) if +t 2(area to the left of t) if -t Summary of the Paired t test for Comparing Two Population or Treatment Means Continued Assumptions: 1. The samples are paired. 2. The n sample differences can be viewed as a random sample from a population of differences. 3. The number of sample differences is large (generally 30 or more) or the population distribution of differences is (at least approximately) normal. Is this an example of paired samples? An engineering association wants to see if there is a difference in the mean annual salary for electrical engineers and chemical engineers. A random sample of electrical engineers is surveyed about their annual income. Another random sample of chemical engineers is surveyed about their annual income. No, there is no pairing of individuals, you have two independent samples Is this an example of paired samples? A pharmaceutical company wants to test its new weight-loss drug. Before giving the drug to volunteers, company researchers weigh each person. After a month of using the drug, each person’s weight is measured again. Yes, you have two observations on each individual, resulting in paired data. Can playing chess improve your memory? In a study, students who had not previously played chess participated in a program in which they took chess lessons and played chess daily for 9 months. Each student took a memory test before starting the chess program and again at the end of the 9month period. Test the claim at the 0.05 level of significance If we had subtracted Post-test minus alternative Student 1 2 3 4 5Pre-test, 6 7then8the 9 10 11 12 hypothesis Pre-test 510 610 640 675 600 550 610would 625 be 450 the 720mean 575 675 greater Post-test 850 790 850 775 700difference 775 700 is850 690 than 775 0. 540 680 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 H0: d = 0 First, find the differences: the hypotheses. Ha: dState < 0 pre-test minus post-test. Where d is the mean memory score difference between students with no chess training and students who have completed chess training Playing Chess Continued . . . Student 1 2 3 4 5 6 7 8 9 10 11 12 Pre-test 510 610 640 675 600 550 610 625 450 720 575 675 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 H0: d = 0 Ha: d < 0 Where d is the mean memory score difference between students with no chess training and students who have completed chess training Assumptions: 1) Although the sample of students isVerify not a random sample, the assumptions investigator believed that it was reasonable to view the 12 sample differences as representative of all such differences. 2) A boxplot of the differences is approximately symmetrical with no outliers so the assumption of normality is plausible. Playing Chess Continued . . . Student 1 2 3 4 5 6 7 8 9 10 11 12 Pre-test 510 610 640 675 600 550 610 625 450 720 575 675 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 H0: d = 0 Ha: d < 0 Test Statistic: Where d is the mean memory score difference between students with no chess training and students who have State the conclusion in context. completed chess training t= -144.6-0 Compute the test statistic and =-4.56 109.74 P-value. 12 P-value ≈ 0 df = 11 a = .05 Since the P-value < a, we reject H0. There is convincing evidence to suggest that the mean memory score after chess training is higher than the mean memory score before training. Paired t Confidence Interval for d When 1. 2. 3. The samples are paired. The n sample differences can be viewed as a random sample from a population of differences. The number of sample differences is large (generally 30 or more) or the population distribution of differences is (at least approximately) normal. the paired t interval for d is sd x d ±(t critical value) n Where df = n - 1 Playing Chess Revisited . . . Student 1 2 3 4 5 6 7 8 9 10 11 12 Pre-test 510 610 640 675 600 550 610 625 450 720 575 675 Post-test 850 790 850 775 700 775 700 850 690 775 540 680 Difference -340 -180 -210 -100 -100 -225 -90 -225 -240 -55 35 -5 109.74 144.6 1.796 ( 201.5, 87.69) 12 Compute a 90% confidence interval for the Wedifference are 90% confident the true mean mean in memorythat scores before difference in memory scores before chess chess training and the memory scores after training and the memory chess training. scores after chess training is between -201.5 and -87.69. Weight Loss Example A weight reduction center advertises that participants in its program lose an average of at least 5 pounds during the first week of the participation. Because of numerous complaints, the state’s consumer protection agency doubts this claim. To test the claim at the 0.05 level of significance, 12 participants were randomly selected. Their initial weights and their weights after 1 week in the program appear on the next slide. Set up and perform an appropriate hypothesis test. Weight Loss Example Continued Member Initial Weight One Week Weight 1 195 195 2 153 151 3 174 170 4 125 123 5 149 144 6 152 149 7 135 131 8 143 147 9 139 138 10 198 192 11 215 211 12 153 152 Weight Loss Example Continued Member Initial Weight One Week Difference Initial -1week Weight 0 195 1 195 2 153 151 2 3 174 170 4 4 125 123 2 5 149 144 5 6 152 149 3 7 135 131 4 8 143 147 -4 9 139 138 1 10 198 192 6 11 215 211 4 12 153 152 1 Weight Loss Example Continued d = mean of the individual weight changes (initial weight–weight after one week) This is equivalent to the difference of means: d = 1 – 2 = initial weight - 1 week weight H0: d = 5 Ha: d < 5 Significance level: a = 0.05 Test statistic: t x d hypothesized value x d 5 sd sd n n Weight Loss Example Continued Assumptions: According to the statement of the example, we can assume that the sampling is random. The sample size (12) is small, so from the boxplot we see that there is one outlier but never the less, the distribution is reasonably symmetric and the normal plot confirms that it is reasonable to assume that the population of differences (weight losses) is normally distributed. 57 Weight Loss Example Continued Calculations: According to the statement of the example, we can assume that the sampling is random. The sample size (10) is small, so n 12, x d 2.333, s d 2.674 x d 5 2.333 5 t 3.45 sd 2.674 12 n P-value: This is a lower tail test, so looking up the t value of 3.0 under df = 11 in the table of tail areas for t curves (table IV) we find that the P-value = 0.002. Weight Loss Example Continued Conclusions: Since P-value = 0.002 < 0.05 = a, we reject H0. We draw the following conclusion. There is strong evidence that the mean weight loss for those who took the program for one week is less than 5 pounds. Weight Loss Example Continued Minitab returns the following when asked to perform this test. This is substantially the same result. Paired T-Test and CI: Initial, One week Paired T for Initial - One week Initial One week Difference N 12 12 12 Mean 160.92 158.58 2.333 StDev 28.19 27.49 2.674 SE Mean 8.14 7.93 0.772 95% upper bound for mean difference: 3.720 T-Test of mean difference = 5 (vs < 5): T-Value = -3.45 P-Value = 0.003 Section 11.3 Large-Sample Inferences Concerning the Difference Between Two Population or Treatment Proportions Some people seem to think that duct tape can fix anything . . . even remove warts! Investigators at Madigan Army Medical Center tested using duct tape to remove warts versus the more traditional freezing treatment. Suppose that the duct tape treatment will successfully remove 50% of warts and that the traditional freezing treatment will successfully remove 60% of warts. Let’s investigate the sampling distribution of pfreeze - ptape pfreeze = the true proportion of warts that are successfully removed by freezing ptape = the true proportion of warts that are successfully removed by using duct tape Randomly take pfreeze = .6 ptape = .5 one of the sample Suppose we repeatedly treated Suppose we repeatedly treated proportions for the 100 warts using the duct tape 100 warts using the traditional freezing treatment method and calculatedand the one of the freezing treatment and calculated sample proportion of warts that are the proportion of warts that are proportions successfully removed. We would for the removed. We would successfully duct tape have the sampling distribution of the sampling distribution .6 .5 .6(.4) have of treatment and find s pˆ .5(.5) s pˆ ptape100 . pfreeze the difference. 100 freeze tape Doing this repeatedly, we will create the sampling distribution of (pfreeze – ptape) pfreeze - ptape s pˆ ˆ freeze ptape .1 .6(.4) .5(.5) 100 100 Use: n1 pˆ1 n2 pˆ2 pˆc 1 2 n1 n2 Properties of the Sampling Distribution of 𝒑 - 𝒑 If two random samples are selected independently of one a When performing another, the following properties hold: Since hypothesis test,forwe the value p1 null andwill p2use are the unknown, 1. pˆ1 pˆ2 p1 p2 hypothesis that p𝑝11 we will combine This says that the sampling distribution of 𝑝1 - 𝑝2 is centered at p1 – p2 and p are equal. We so 𝑝1 - 𝑝2 is an unbiased statistic for estimatingand p1 –𝑝p222.to estimate the will not know the p1 (1 p1 ) p2 (1 p2 ) common value of p1 s common value for p1 2. pˆ1 pˆ2 and p2 n1 n2 and p2. 3. If both n1 and n2 are large (that is, if n1p1 > 10, n1(1 – p1) > 10, n2p2 > 10, and n2(1 – p2) > 10), then 𝑝1 and 𝑝2 each have a sampling distribution that is approximately normal, and their difference 𝑝1 - 𝑝2 also has a sampling distribution that is approximately normal. Summary of Large-Sample z Test for p1 – p2 = 0 Null Hypothesis: H0: p1 – p2 = 0 Test Statistic: Use: n1pˆ 1 +n 2 pˆ 2 p̂c = n1 +n 2 pˆ 1 -pˆ 2 -(p1 -p 2 ) z= pˆ c (1-pˆ c ) pˆ c (1-pˆ c ) + n1 n2 Alternative Hypothesis: Ha: p1 – p2 > 0 Ha: p1 – p2 < 0 Ha: p1 – p2 ≠ 0 P-value: area to the right of calculated z area to the left of calculated z 2(area to the right of z) if +z or 2(area to the left of z) if -z Another Way to Write Hypothesis statements: H00:: pp11 -=pp22= 0 H H p11 >- p p22 > 0 Haa:: p Haa:: p H p11 <- p p22 < 0 H p11 ≠- pp22≠ 0 Haa:: p Be sure to define both p1 & p2! Summary of Large-Sample z Test for p1 – p2 = 0 Continued Assumptions: 1) The samples independently chosen Since pare 1 and p2 are unknown we must use and 𝑝2 to verify that the samples large random𝑝1samples or treatments were are assigned enough. at random to individuals or objects. 2) Both sample sizes are large n1𝑝1 > 10, n1(1 - 𝑝1) > 10, n2𝑝2 > 10, n2(1 - 𝑝2) > 10 Investigators at Madigan Army Medical Center tested using duct tape to remove warts. Patients with warts were randomly assigned to either the duct tape treatment or to the more traditional freezing treatment. Those in the duct tape group wore duct tape over the wart for 6 days, then removed the tape, soaked the area in water, and used an emery board to scrape the area. This process was repeated for a maximum of 2 months or until the wart was gone. The data follows: n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 Treatment Do these data suggest that freezing is less successful than duct tape in removing warts? Duct Tape Continued . . . n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 Treatment H0: p1 – p2 = 0 Ha: p1 – p2 < 0 Where p1 is the true proportion of warts that would be successfully removed by freezing and p2 is the true proportion of warts that would be successfully removed by duct tape Assumptions: 1) Subjects were randomly assigned to the two treatments. 2) The sample sizes are large enough because: n1𝑝1 = 100(.6) = 60 > 10 n1(1 - 𝑝1) = 100(.4) = 40 > 10 n2𝑝2 = 104(.88) = 91.52 > 10 n2(1 - 𝑝2) = 104(.12) = 12.48> 10 Duct Tape Continued . . . n Number with wart successfully removed Liquid nitrogen freezing 100 60 Duct tape 104 88 Treatment H0: p1 – p2 = 0 Ha: p1 – p2 < 0 z .6 .85 0 4.56 .73(.27) .73(.27) 100 104 p̂ c 60 88 0.73 100 104 P-value ≈ 0 a = .01 Since the P-value < a, we reject H0. There is convincing evidence to suggest the proportion of warts successfully removed is lower for freezing than for the duct tape treatment. Student Retention Example A group of college students were asked what they thought the “issue of the day”. Without a pause the class almost to a person said “student retention”. The class then went out and obtained a random sample (questionable) and asked the question, “Do you plan on returning next year?” The responses along with the gender of the person responding are summarized in the following table. Gender Male Female Response Yes No Maybe 211 45 19 141 32 9 Test to see if the proportion of students planning on returning is the same for both genders at the 0.05 level of significance. Student Retention Example Continued p1 = true proportion of males who plan on returning p2 = true proportion of females who plan on returning n1 = number of males surveyed n2 = number of females surveyed 𝑝1 = sample proportion of males who plan on returning 𝑝2 = sample proportion of females who plan on returning Null hypothesis: H0: p1 – p2 = 0 Alternate hypothesis: Ha: p1 – p2 0 Student Retention Example Continued Significance level: a = 0.05 Test statistic: z p1 p 2 pc (1 p c ) p c (1 pc ) n1 n2 Assumptions: The two samples are independently chosen random samples. Furthermore, the sample sizes are large enough since n1 p1 = 211 10, n1(1- p1) = 64 10 n2p2 = 141 10, n2(1- p2) = 41 10 Student Retention Example Continued Calculations: n1p1 n 2 p 2 211 141 352 pc 0.7702 n1 n 2 275 182 457 z p1 p 2 p c (1 p c ) p c (1 p c ) 275 182 0.76727 0.77473 0.77024(1 0.77024) 275 -0.0074525 -0.19 0.040198 0.77024(1 0.77024) 182 Student Retention Example Continued P-value: The P-value for this test is 2 times the area under the z curve to the left of the computed z = -0.19. P-value = 2(0.4247) = 0.8494 Conclusion: Since P-value = 0.849 > 0.05 = a, the hypothesis H0 is not rejected at significance level 0.05. There is no evidence that the return rate is different for males and females.. Washing Machine Example A consumer agency spokesman stated that he thought that the proportion of households having a washing machine was higher for suburban households then for urban households. To test to see if that statement was correct at the 0.05 level of significance, a reporter randomly selected a number of households in both suburban and urban environments and obtained the following data. Number Proportion Suburban Urban Number surveyed having washing machines having washing machines 300 250 243 181 0.810 0.724 Washing Machine Example Continued p1 = proportion of suburban households having washing machines p2 = proportion of urban households having washing machines p1 - p2 is the difference between the proportions of suburban households and urban households that have washing machines. H0: p1 - p2 = 0 H a: p 1 - p 2 > 0 Washing Machine Example Continued Significance level: a = 0.05 Test statistic: z p1 p 2 pc (1 p c ) p c (1 pc ) n1 n2 Assumptions: The two samples are independently chosen random samples. Furthermore, the sample sizes are large enough since n1 p1 = 243 10, n1(1- p1) = 57 10 n2p2 = 181 10, n2(1- p2) = 69 10 Washing Machine Example Continued Calculations: n1p1 n 2 p 2 243 181 424 pc 0.7709 n1 n 2 300 250 550 z p1 p2 pc (1 pc ) pc (1 pc ) n1 n2 0.810 0.742 1 1 0.7709(1 0.7709) 300 250 2.390 Washing Machine Example Continued P-value: The P-value for this test is the area under the z curve to the right of the computed z = 2.39. The P-value = 1 - 0.9916 = 0.0084 Conclusion: Since P-value = 0.0084 < 0.05 = a, the hypothesis H0 is rejected at significance level 0.05. There is sufficient evidence at the 0.05 level of significance that the proportion of suburban households that have washers is more than the proportion of urban households that have washers. A Large-Sample Confidence Interval for p1 – p2 When 1)The samples are independently chosen random samples or treatments were assigned at random to individuals or objects 2) Both sample sizes are large n1𝑝1 > 10, n1(1 - 𝑝1) > 10, n2𝑝2 > 10, n2(1 - 𝑝2) > 10 a large-sample confidence interval for p1 – p2 is calculated by: pˆ 1 -pˆ 2 ± z critical value pˆ 1 (1-pˆ 1 ) pˆ 2 (1-pˆ 2 ) + n1 n2 The article “Freedom of What?” (Associated Press, February 1, 2005) described a study in which high school students and high school teachers were asked whether they agreed with the following statement: “Students should be allowed to report controversial issues in their student newspapers without the approval of school authorities.” It was reported that 58% of students surveyed and 39% of teachers surveyed agreed with the statement. The two samples – 10,000 high school students and 8000 high school teachers – were selected from schools across the country. Compute a 90% confidence interval for the difference in proportion of students who agreed with the statement and the proportion of teachers who agreed with the statement. Newspaper Problem Continued . . . p1 = .58 p2 = .39 1) Assume that it reasonable to regard thesedoes two samples as Based onisthis confidence interval, there appear being independently selecteddifference and representative of theof populations to be a significant in proportion students of interest. who agreed with the statement and the proportion of 2) Both sample sizes areagreed large enough teachers who with the statement? Explain. n1p1 = 10000(.58) > 10, n1(1 – p1) = 10000(.42) > 10, n2p2 = 8000(.39) > 10, n2(1 – p2) = 8000(.61) > 10 .58(.42) .39(.61) (.58 .39) 1.645 (.178, .202) 10000 8000 We are 90% confident that the difference in proportion of students who agreed with the statement and the proportion of teachers who agreed with the statement is between .178 and .202. Survey Example A student assignment called for the students to survey both male and female students (independently and randomly chosen) to see if the proportions that approve of the College’s new drug and alcohol policy. A student went and randomly selected 200 male students and 100 female students and obtained the data summarized below. Number Number that Proportion surveyed approve that approve Female 100 43 0.430 Male 200 61 0.305 Use this data to obtain a 90% confidence interval estimate for the difference of the proportions of female and male students that approve of the new policy. Survey Example Continued For a 90% confidence interval the z value to use is 1.645. This value is obtained from the bottom row of the table of t critical values (Table III). We use p1 to be the female’s sample approval proportion and p2 as the male’s sample approval proportion. 0.430(1 0.430) 0.305(1 0.305) (0.430 0.305) 1.645 100 200 (0.125) 0.097 or (0.028,0.222) We are 90% confident that the proportion of females that approve of the policy exceeds the proportion of males that approve of the policy by somewhere between 0.028 and 0.222.