* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MATH 183 Estimate for Difference in Means
Survey
Document related concepts
Transcript
Dr. Neal, WKU MATH 183 Estimate for Difference in Means Consider a measurement X on two distinct populations: !1 where X has mean µ1 and variance !12 , and ! 2 where X has mean µ 2 and variance ! 22 . Our goal is to estimate the difference of averages µ1 ! µ2 . To do so, we obtain a random sample of size n1 from !1 and let x1 be its sample mean and S1 be its sample deviation. Then we obtain an independent random sample of size n2 from population ! 2 and let x2 be its sample mean and S2 be its sample deviation. Ultimately we want to conclude, with a certain level of confidence, whether one average is significantly higher than the other average, or whether there is no significant difference in the averages. With “large” populations and non-trivial measurements, it is almost certainly the case that the averages µ1 and µ 2 will be different. However, they may be close enough to each other so that a confidence interval for the difference µ1 ! µ2 contains the number 0 and has a small margin of error. In this case, we might say that there is “no statistical difference” in the means. Average and Variance of a Difference Let X and Y denote random measurements on a population and let W = X ! Y denote all random differences in these measurements. The important facts that we need are I. The “average of the difference” equals the “difference of the averages.” That is, µW = µ X – µY . II. When measurements X and Y are obtained independently, then the variance of the difference is the sum of the variances. That is, 2 = ! 2X + ! Y2 . !W III. The previous results apply to all possible differences in sample means x1 ! x2 from independent samples on !1 and ! 2 . We know that all possible values of x1 have mean !2 ! µ1 and standard deviation 1 ; so their variance is 1 . Likewise, all possible values of n1 n1 ! 22 . Now let W = x1 ! x 2 be all possible differences in x2 have mean µ 2 and variance n2 sample means. Then µ W = µ1 ! µ 2 2 !W !12 ! 22 = + n1 n2 !W = !12 ! 22 + . n1 n2 Moreover, for large sample sizes n1 and n2 , W is approximately normally distributed. Dr. Neal, WKU IV. Because W ≈ N( µ W , ! W ) , with probability r , the values of W = x1 ! x2 will be within µ W ± z! / 2 " W , where z! / 2 is the appropriate z -score. That is, with probability r = 1 ! " the values of x1 ! x2 will be within # 12 # 22 . (µ1 ! µ 2 ) ± z" / 2 + n1 n2 But then with probability r = 1 ! " , the values of µ1 ! µ2 will be within #12 # 22 . (x1 ! x2 ) ± z" / 2 + n1 n2 Confidence Interval for Difference in Means From the previous result, we obtain the confidence interval formula for the difference in means: #12 # 22 µ1 ! µ2 ≈ (x1 ! x2 ) ± z" / 2 + n1 n2 (2-SampZInt on Calc.) Arbitrary Measurements: Need large samples. Normal Measurements: Any sample sizes work. With large samples, we may replace !1 and ! 2 with estimates or upper bounds: µ1 ! µ2 ≈ (x1 ! x2 ) ± z" / 2 S12 S22 U12 U22 or µ1 ! µ2 ≈ (x1 ! x2 ) ± z" / 2 + + n1 n2 n1 n2 • If we disregard the ! 2 statistics, then this formula reduces to the confidence interval z " " 12 for a single mean µ1 . It becomes µ1 ≈ x1 ± z! / 2 = x ± ! /2 . n n1 Dr. Neal, WKU • To account for smaller populations of sizes N1 and N2 , we could obtain a smaller margin of error by using µ1 ! µ2 " (x1 ! x2 ) ± z# / 2 $12 %' N1 ! n1 (* $ 22 %' N2 ! n2 (* + n1 '& N1 ! 1 *) n2 '& N2 ! 1 *) There are the same usual problems with this formula as with the general confidence interval for the mean. Namely, (i) For arbitrary measurements, we need large sample sizes to obtain accuracy and small margins of error. (ii) The use of the z -score comes from an approximate standard normal distribution applied to all possible difference in sample means W = x1 ! x 2 ; (iii) The true standard deviations are often unknown. As with the confidence interval for the mean, we can improve the accuracy and overcome these problems when sampling from normally distributed measurements. Example 1. The tensile strength (in pounds per square inch) is being measured on two different manufactures of a synthetic fiber. For a sample of 35 fibers having 15% cotton, the sample statistics were x1 = 9.8 with S1 = 3.5. For an independent sample of 30 fibers having 35% cotton, the statistics were x2 = 10.1 with S2 = 3.4. Find a 95% confidence interval for the difference in average tensile strength among all such manufactures of 15% cotton fibers and 35% cotton fibers. Explain the interval in words. Solution. (By hand) µ1 ! µ2 ≈ (x1 ! x2 ) ± z" / 2 3. 52 3. 42 S12 S22 = (9.8 – 10.1) ± 1. 96 + + 35 30 n1 n2 = !0. 3 ± 1.68 ; or –1.98 ≤ µ1 ! µ2 ≤ 1.38. That is, the average tensile strength of the 15% cotton fibers is from 1.98 psi lower to 1.38 psi higher than that of the 35% cotton fibers. With this interval, µ1 ! µ2 could equal 0; so the average tensile strengths could be equal. From this data alone, there is not a statistically significant difference in the average tensile strength, as evidenced by the closeness of x1 = 9.8 and x2 = 10.1. This type of confidence interval also can be computed with the 2–SampZInt feature (item 9) from the STATS TESTS menu. Set the Inpt to Stats, then enter the variables (using the sample deviations for !1 and ! 2 ) and calculate. Dr. Neal, WKU Example 2. We wish to see if there is any apparent difference in high school grade point average between girls and boys in the state who choose to go to college. The following data is a random collection of high school GPAs from a group of Kentucky high school graduates in their first year of college. Find a 90% confidence interval for the difference between average female and average male grade point average. Explain the interval in words. 3.25 3.76 2.78 Random Collection of Female High School GPAs 3.32 3.05 3.08 4.00 3.68 3.05 3.26 3.46 3.70 3.00 3.24 3.52 3.80 3.06 2.84 4.00 3.15 3.75 3.56 3.44 3.78 3.25 3.33 2.74 3.62 3.28 3.14 3.75 4.00 3.76 Random Collection of Male High School GPAs 2.32 2.90 3.00 4.00 2.12 3.50 2.11 3.72 3.04 3.42 4.00 2.38 2.49 2.92 2.48 2.50 3.76 2.18 3.68 4.00 2.78 2.48 2.72 2.34 3.04 3.78 4.00 Solution. Because the data sets are of the same size, we can compute the statistics of both samples at once with the 2–Var Stats command. Enter the data into lists L1 (female) and L2 (male) in the STAT Edit screen. Then enter the command 2–Var Stats L1, L2. After computing the statistics, we see that the average female GPA is x1 = 3.363, with a sample deviation of S1 ≈ 0.3474. The average male GPA is x2 = 3.105667 with S2 ≈ 0.67254. (In each case there are 30 measurements.) Note: If the data sets are of different sizes, then the 2–Var Stats command will not work. In this case, simply execute the 1–Var Stats command on each list and note the values of n , x , and S for each sample. Now to find a 90% confidence interval for the true difference in average GPA: µ1 ! µ2 ≈ (3.363 – 3.105667) ± 1.645 0.34742 0.67254 2 + ≈ 0.257333 ± 0.227343, or 30 30 0.03 ≤ µ1 ! µ2 ≤ 0.485. Thus we can assert that among all college bound students in the state, the average female GPA is from 0.03 grade points higher to 0.485 grade points higher than the average male GPA. Dr. Neal, WKU Difference in Proportions We also can apply the formula to the special case of a difference in proportions p1 ! p2 . 2 Recall that for a proportion p , the population variance is ! = p(1 ! p) which is 2 estimated by S = p (1 ! p ) . So for large populations, a confidence interval for the difference in proportions is p1 – p2 ≈ ( p1 – p2 ) ± z! / 2 p1 (1 ! p1) p2 (1 ! p2 ) (2–PropZInt on Calc.) + n1 n2 or use p1 – p2 ≈ ( p1 – p2 ) ± z! / 2 0.25 0.25 + n1 n2 • When one or more population sizes is small, then we could include its small population correction factor to decrease the margin of error. For two small populations, we have p1 – p2 ≈ ( p1 – p2 ) ± z! / 2 p1 (1 ! p1 ) " N1 ! n1 % p2 (1 ! p2 ) " N2 ! n2 % $ ' + $ ' n1 # N1 ! 1 & n2 # N2 ! 1 & or p1 – p2 ≈ ( p1 – p2 ) ± z! / 2 0.25 "$ N1 ! n1 %' 0.25 "$ N2 ! n2 %' + n1 $# N1 ! 1 '& n2 $# N2 ! 1 '& Example 3. A poll commissioned by the Center on Addiction and Substance Abuse at Columbia University found that 1340 of 2000 adults and 304 out of 400 youths interviewed believe that popular culture encourages drug use. Find a 95% confidence interval for the true difference in proportions between adults and youths who have this belief. Explain the interval in words. Solution. By hand: Using the confidence interval for two “large” populations, we have 1340 304 = 0.67 (for adults), p2 = = 0.76 (for children), and then p1 = 2000 400 p1 ! p2 ≈ (0.67 ! 0. 76) ± 1. 96 (0.67)(0.33) (0.76)(0.24) = –0.09 ± 0.04665. + 2000 400 0.25 0.25 Or, p1 – p2 ≈ (0.67 ! 0.76) ± 1.96 ≈ –0.09 ± 0.0537 + 2000 400 Using the first interval –0.09 ± 0.04665, we have – 0.1366 ≤ p1 ! p2 ≤ –0.04335 or equivalently 0.04335 ≤ p2 ! p1 ≤ 0.1366. So the proportion of all children who believe that popular culture encourages drug use is from about 4.3 percentage points higher to about 13.7 percentage points higher than that of adults. Dr. Neal, WKU Note: Do not say that the proportion is from 4.3% higher to 13.7% higher. difference in proportions always should be expressed in terms of percentage points. The The first type of confidence interval for a difference in (large population) proportions also can be found with the 2–PropZInt feature (item B) from the STAT TESTS screen. Practice Exercises 1. A bank is offering two different credit card plans. Sample groups are being monitored to see if there is a significant difference in the average amounts the groups charge per quarter. The summary statistics for one quarter are below. Group Plan A Plan B n 152 148 x $1987.54 $2056.89 S $392.68 $413.12 Find a 95% confidence interval for the difference in the average amounts charged this quarter among all people in the two groups. Explain the interval in words. 2. A nationwide survey found that 260 out of 1040 people under age 30 regularly download music using peer-to-peer services. But only 78 out of 975 people at least 30 years old regularly do so. Find a 98% confidence interval for the difference in the true proportions of those who download music in these age groups. Explain the interval in words. Answers: 1. The 2–SampZInt gives –160.6 ≤ µ1 ! µ2 ≤ 21.901. The average amount charged per quarter by Plan A customers is from $160.60 lower to $21.90 higher than the average amount charged by Plan B customers. 2. The 2–PropZInt gives 0.13279 ≤ p1 ! p2 ≤ 0.20721. The proportion of people under 30 who download music using peer-to-peer is from 13.28 pct. points higher to 20.72 pct. points higher than the proportion of people 30 or older who do so.