Download Hatfield.Topic 8

Topic 8 - Comparing two samples • Confidence intervals/hypothesis tests for two means • Hypothesis test for two variances 1 Comparing two populations • Sometimes we want to compare two populations rather making decisions about a single population. • For example, we might want to compare two population means or two population proportions to see if they are equal. – Is the expected drying time for one type of paint lower than that of another type of paint? – Is a new drug more effective? Either increased or decreased mean versus the “established” drug, or increased or decreased percentage vs. control – Does the new method actually result in increased crop yields or percentages, or decrease in tons lost to insects, etc. 2 Behind the scenes. What do the distributions look like? 3 Comparing two population means • Suppose we have two independent samples, X1,…,Xm and Y1,…,Yn, from two separate populations. • A natural statistic for comparing the two population means, mX and mY, is X  Y . • E(X  Y )  E(X )  E(Y )  mx  my from chapter 5 • Var ( X  Y )  Var (X )  Var (Y )   x2 m   y2 n • The distribution of X  Y is also Normal for m and n both large. 4 Large samples test for comparing population means To test H0: mX – mY = D0, use the test statistic Z  HA X  Y  D0 sX2 /m  sY2 /n Reject H0 if mX – mY < D0 Z < -za mX – mY > D0 Z > za mX – mY ≠ D0 |Z| > za/2 5 Home sales data A realtor in Albuquerque wants to argue that houses in the Northeast are more expensive on average than those in the rest of town. NE = 0 indicates a home was not in the Northeast. Test the appropriate hypotheses with a = 0.01. 6 This is what the StatCrunch data looks like. Summary statistics for PRICE: Group by: NE NE n Mean 0 39 97,282 1 78 110,769 Variance 1,026,531,010 1,612,360,830 Std. Dev. Std. Err. 32,040 5,130 40,154 4,547 Median 94,000 98,500 7 Here’s the output in StatCrunch Hypothesis test results: μ1 : mean of PRICE where NE=1 (Std. Dev. not specified) μ2 : mean of PRICE where NE=0 (Std. Dev. not specified) μ1 - μ2 : mean difference H0 : μ1 - μ2 = 0 HA : μ1 - μ2 > 0 Difference μ1 - μ2 n1 78 n2 39 Sample Mean 13487.18 Std. Err. Z-Stat P-value 6855.115 1.967462 0.0246 8 What does it look like? Ztest  110769  97282 320402 39 2 40154  13487   1.967 6855.1147 78 9 Large samples confidence interval for the difference between two population means • A large sample (1-a)100% confidence interval for mX – mY is X  Y  za /2 sX2 /m  sY2 /n • For the home sales data, what is a 99% confidence interval for the difference between sale prices in the Northeast and the rest of town? 10 Equal population variances • Suppose we assume that the two populations have a common variance 2. • Var (X  Y )  2 m  2 n   2( 1 1  ) m n • We can then estimate this common variance using the pooled sample variance: 2 2 ( m  1) s  ( n  1) s X Y s 2p  n m 2 11 Small samples test for comparing population means from Normal distributions with equal variances To test H0: mX – mY = D0, use the test statistic T  X  Y  D0 s p 1/m  1/n HA Reject H0 if mX – mY < D0 T < -ta,n+m-2 mX – mY > D0 T > ta,n+m-2 mX – mY ≠ D0 |T| > ta/2,n+m-2 12 THC example with equal variances The active component in marijuana is THC. An experiment was conducted to compare two slightly different configurations of this substance. The THC data set contains the time until the effect was perceived for 6 subjects exposed to each configuration. Is there any evidence that the mean time to perception is different between the two configurations using a = 0.01? 13 Here’s what the calculations look like. Pooled standard deviation Summary statistics: Column n Mean THC1 6 18.786667 THC2 6 18.011667 Variance 34.908108 19.519497 Std. Dev. Std. Err. 5.908309 2.412057 4.418088 1.803677 (6  1)34.9081  (6  1)19.5195  27.2138 662 s p  5.216685 s 2p  sp 1 1 1 1   5.216685   3.01185 m n 6 6 14 What does it look like? 18.78667  18.01167 Ttest   0.2573 3.01185 p  value  (1  tail )  2 x0.4011  0.8022 Twice the one tail value. 15 Small samples confidence interval for the difference between two population means • Assuming equal variances, a small sample (1-a)100% confidence interval for mX – mY is X  Y  ta /2,n m 2s p 1/m  1/n • For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations? 16 Unequal population variances • The pooled procedures we have discussed previously are fairly robust to the assumption of equal variances. • In other words if the two population variances are relatively close, the procedures perform well: – The level of significance for the hypothesis test is close to what it should be – The coverage probability for the confidence interval is close to what it should be • If the variances are quite different, then we need a different procedure. 17 Small samples test for comparing population means from Normal distributions with unequal variances To test H0: mX – mY = D0, use the test statistic T  X  Y  D0 sX2 /m  sY2 /n with degrees of freedom (sX2 /m  sY2 /n )2 v 2 (sX /m )2 (sY2 /n )2  m 1 n 1 HA Reject H0 if mX – mY < D0 T < -ta,v mX – mY > D0 T > ta,v mX – mY ≠ D0 |T| > ta/2,v 18 Small samples confidence interval for the difference between two population means… with unequal variances. • Assuming unequal variances, a small sample (1-a)100% confidence interval for mX – mY is X  Y  ta /2,v sx2 /m  sY2 /n • For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations? 19 Comparing two population variances • Suppose two chemical companies can supply a raw material, but we suspect the variability in concentration may differ between the two. • The standard deviation of concentration in a random sample of 15 batches from company 1 was found to be 4.7 g/l (variance 22.09). A sample of 21 batches from company 2 yielded a standard deviation of 5.8 g/l (variance 33.64). • Is there sufficient evidence to conclude that the variability in concentration differs for the two companies? 20 Test for comparing population variances from Normal distributions To test H0: X2 Y2, use the test statistic HA 2 X 2 Y s F  s Reject H0 if X2 > Y2 F > Fa,m-1,n-1 X2 < Y2 F < F1a,m-1,n-1 X2 ≠ Y2 F > Fa/2,m-1,n-1 or F < F1a/2,m-1,n-1 21 Chemical example • Is there sufficient evidence to conclude that the variability in concentration differs for the two companies with a = 0.05? • Demonstrate the F calculator. 22 Confidence interval for the ratio of two Normal population variances • A (1-a)100% confidence interval for X2/Y2 is  sX2 /sY2 s X2 /sY2  ,    F1a /2,m 1,n 1 Fa /2,m 1,n 1  • For the THC example, what is a 95% confidence interval for the ratio of concentration variances? The additional file for Topic 8 contains examples of large and small scale tests on the differences in population means and proportions. 23 Paired data • Sometimes we have a third variable that connects elements from the X and Y samples. • In this case, the assumption of independence between the two samples may be violated. • Is there any evidence that the first twin and the second twin have different average weights among boy-boy twins? • In this case, the twins are clearly connected by the mother. • It might be better to base our test on the n pairwise differences, Di = Xi – Yi. 24 Paired test for comparing population means To test H0: mX – mY = D0, use the test statistic T  D  D0 sD n HA Reject H0 if mX – mY < D0 T < -ta,n-1 mX – mY > D0 T > ta,n-1 mX – mY ≠ D0 |T| > ta/2,n-1 25 Twins example • Load the Twins data from StatCrunch sample data sets. • Is there any evidence that Twin A and Twin B have different average weights among boy-boy twins with a = 0.1? 26 Additional pooled vs. paired • Example: The article “Sex and Race Discrimination in the New Car Showroom: A fact or Myth” (J. Consumer Affairs, 1977, pp 107-113) reports the results of an experiment in which individuals of different races and sexes visited 9 car dealerships to request the best possible deal on a certain car. The actual car prices obtained are shown below: 27 Summary data: x  4476.778, sx2  40118.69, sx  200.2965 y  4388.444, s y2  18405.28, s y  135.6661 Is there sufficient evidence at α = 0.05 to conclude that the dealerships are quoting different prices for the black woman and the white man? The standard deviations are relatively close, so we could consider this as a pooled test of differences, with the following results; 28 29 Two ways to look at the situation Why did we get such poor results from our test? The assumption in a pooled test is that there’s independence of data. In other words, any values from the woman’s distribution of prices are independent of values from the man’s distribution…. A valid comparison in that situation looks like this…. 30 However, we know that’s not the case. Prices from dealership 1 can be compared to each other (M to W), dealership 2, etc. There’s a relationship between the prices, a “pairing variable”. They are not independent and when viewed correctly, the data shows something completely diffferent….. 31 32 33 Paired confidence interval for the difference between two population means • A small sample (1-a)100% confidence interval for mX – mY is D  ta /2,n 1sD / n • For the car price example, what is a 90% confidence interval for the mean difference between the prices quoted to the black woman vs. the white man? • CarData 34 Comparing two population proportions • A natural statistic for comparing the two population ˆ X  pˆY . proportions, pX and pY, is p • ˆX  p ˆY )  E ( p ˆ X )  E( p ˆY )  pX  pY E( p pˆ X (1  pˆ X ) pˆY (1  pˆY ) • Var ( p ˆ X  pˆY )   m n 1 1  p(1  p )(  ), with common p m n ˆ X  pˆY is also Normal for m and n • The distribution of p both large. 35 Large samples test for comparing population proportions To test H0: pX – pY = 0, use the test statistic HA Z  ˆX  p ˆY  0 p 1 1 ˆ (1  p ˆ )( p  ) m n Reject H0 if pX – pY < 0 Z < -za pX – pY > 0 Z > za pX – pY ≠ 0 |Z| > za/2 Please note that the common p listed above is calculated as the total number of successes overall in the study, divided by the total number of observations….. 36 Polio example • The following table summarizes a study of the efficacy of the Salk vaccine. (Please note that I changed the actual percentages who got polio in this example to make the numbers MUCH more workable….don’t panic). Treatment Total Patients Polio Vaccine 2,000 30 Placebo 2,000 100 • Was the vaccine effective? Test at a = 0.05. 37 Large samples confidence interval for the difference between two population proportions • A large sample (1-a)100% confidence interval for pX – pY is ˆX  p ˆY  za /2 p ˆ X (1  p ˆ X )/m  pˆY (1  pˆY )/n p • For the Polio data, what is a 95% confidence interval for the difference between the proportion who contract the disease under each treatment? (0.015  0.05)  1.96 [0.015(0.985)]/2000  [0.05(0.95)]/2000    0.035  0.01093 (0.0459; 0.0241) 38

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hatfield.Topic 8