* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 2
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Regression toward the mean wikipedia , lookup
Time series wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 2: Hypothesis tests on a µ − µ when data is paired Section 10-4 Paired t-test • Sometimes when we want to compare the means of two groups, the data has been collected in a paired scenario, so not from independent sample groups. • Some examples of the paired t-test: – Comparison of mean lifetime of brakes from Midas and Brembo (brake companies) ∗ n=20 cars are chosen. The front left and right brakes include one brake from each company (randomly assigned). From each car, we have a measurement from each group. 1 {If we had 40 cars, and we put 1 brake in each car (20 got Midas and 20 got Brembo) we would have independent groups and would perform a 2-sample t-test, the pairing is gone.} – Comparison of mean corn yield by Dekalb and Pioneer ∗ n=25 fields are chosen on the east side of Iowa. In each field, half is planted with Dekalb, half is planted with Pioneer. Yield is recorded for each brand in each field. From each field, we have a measurement from each group. {If we had 50 fields, and we randomly assigned 25 fields to Dekalb and 25 fields to Pioneer, we would have independent groups and would perform a 2-sample t-test, the pairing is gone.} 2 – Comparison of mean IQ scores for children in low-income families to high-income families ∗ n=18 adopted sets of twins that were each raised in the two different environments. From each set of twins, we have a measurement from each group (lowincome and high-income). {If we had 18 low-income kids, and 18 highincome kids (no relation) and compared their means, we would have independent groups and would perform a 2-sample t-test, the pairing is not present.} 3 • IT’S ABOUT HOW THE DATA WAS COLLECTED. Before data collection, the questions of interest above (i.e. the comparisons of means) could’ve been approached with a 2-sample t-test (i.e. independence between groups) or a paired t-test (i.e. not independence between groups), but once the data are collected, only one of these is appropriate. You’ll need to recognize which analysis is appropriate for the given data collection scenario. One element that must be true in a paired t-test is that we must have an equal number of observations from each group, because they’re paired. 4 • Other common examples of when the paired t-test arises (repeated measures): ∗When we have two measurements on each of many individuals. – Comparison of before diet weight and after diet weight. We’re checking the efficacy of the diet. µ1 is the mean weight before diet, and µ2 is the mean weight after diet. ∗ For each of the n=30 individuals, we have a before and after weight. From each person, we have 2 measurements, one from each group (before and after weight), the data is paired. – Two nurses were arguing about which did a better job of drawing blood (in terms of comfort to the patient). For n = 10 patients, blood was drawn once from each 5 nurse (in random order and on different days) and after each draw they were asked about their level of discomfort on a scale from 1 to 5. µ1 is the mean level of discomfort from nurse 1, and µ2 is the mean level of discomfort from nurse 2. ∗ For each of the n=10 individuals, we have 2 measurements, one from each group (nurse 1 and nurse 2), the data is paired. • From a statistical viewpoint, paired experiments tends to be desirable because we can compare treatments within a single individual (essentially reducing the noise around the signal). There is often lots of variability from one individual to the next, which makes signal detection more difficult in 2-sample t-test setup compared to a paired set-up. 6 • In a paired t-test, we analyze the DIFFERENCES, not the individual measurements. – Example: Schizophrenia (New England Journal of Medicine) Claim: A small left hippocampus in the brain is associated with schizophrenia. Data: The size of left hippocampus in n = 5 sets of twins, one with schizophrenia and one without. set 1 2 3 4 5 Normal Twin 1.94 1.78 1.25 1.44 2.06 Schiz. Difference Twin xD = xnorm − xschiz 1.27 0.67 1.28 0.50 1.02 0.23 1.63 -0.19 1.93 0.13 7 In 4 out of 5 of the twins, the left hippocampus was larger in the normal twin. DIFFERENCES: We will let µD = µnorm − µschiz . If there is no difference in size, then µD = 0. There is probably a large variability in size of the left hippocampus in the general population, so by getting twins for this study who would be expected to have fairly similar sizes, we have controlled for some of that variability (they’re genetically similar) making it easier to detect a subtle difference (due to the disease) if it exists. NOTE: Because this is a paired design, we will analyze the differences, not the original data. 8 1. State Hypotheses H0 : µD = 0 H1 : µD > 0 {because µD = µnorm − µschiz } 2. Test statistic Inference on µD is based on the sample mean of differences x̄D , where the sample mean of differences x̄D is xD1 + xD2 + · · · + xDn x̄D = n and the sample standard deviation of differences SD where 2 Pn i=1 xDi − x̄D 2 SD = n−1 The test statistic... 9 Under H0 true, the T0 test statistic X̄D − µD0 √ T0 = SD / n is distributed as T0 ∼ tn−1 For this schizophrenia data, we have √ = 1.79, t0 = 0.268−0 0.334/ 5 and T0 ∼ t4 (because we had 5 differences) 3. P-value P (T0 >1.79)=0.0740 {one-sided test} 4. Decision Letting α = 0.05, the p-value is not less than α, so we fail to reject H0. 10 5. Checking assumptions Our analysis was on the differences xDi and we performed a t-test. We should check that the differences are nearly normally distributed. With such a small data set, there’s not much info to go on, but we will assume we have normality. There is not sufficient statistical evidence at the α = 0.05 level to conclude that normal brain hippocampus’ are larger than schizophrenic brain hippocampus’. 11 • Example: Car emissions on highway and in-town Claim: Mean level of emissions is less for highway driving than for stop-and-go in-town driving. Data: Each car is driven both on the highway and in-town (in random order). car 1 2 3 4 5 6 7 8 Stopand-Go Highway Difference Emission Emission xD = xSG − xhighway 1500 941 559 870 456 414 1120 893 227 1250 1060 190 3460 3107 353 1110 1339 -229 1120 1346 -226 880 644 236 12 Sample data: x̄D = 190.5 and sD = 284.1 6 out of 8 show a larger emission for stopand-go. DIFFERENCES: We will let µD = µSG − µhighway . If no difference in emissions, then µD = 0. If stop-and-go has higher emissions, µD > 0. There is a large variability in emissions from one car to the next, so by considering both environments for a single car, we have controlled for that car-to-car variability, and we can compare the environments within a single car, making it easier to detect a difference in emissions due to environment if it exists. 13 Perform the hypothesis test on the claim. 1. State Hypotheses H0 : µD = 0 H1 : µD > 0 {as µD = µSG−µhighway } 2. Test statistic 190.5 − 0 √ = 1.897 t0 = 284.1/ 8 and under H0 true, T0 ∼ t7 3. P-value P (T0 > 1.897) = 0.0498 {one-sided test} 4. Decision Letting α = 0.05, the p-value is very close to the 0.05 threshold. Since 0.0498 is less than α, we reject H0. 14 5. Checking assumptions We will assume we have approximate normality of the differences. There IS sufficient statistical evidence at the α = 0.05 level to conclude that the mean level of emissions is less for highway driving than for stop-and-go driving. 15 Some comments on paired t-tests: • If n is large, we don’t need to check the normality of the differences (xdi values) because the central limit theorem will give us normality of a sample mean. • We often set-up the difference as the hypothesized larger mean minus the hypothesize smaller mean (in order to work with a positive test statistic). But you ALWAYS need to state which difference you’re taking. • In general, paired designs are more powerful than independent two-sample t-tests. This is because there’s often a lot of variability in one experimental unit to the next (cars, people, etc.). But on the hand, if there isn’t much variability from one experimental unit to the next... 16 If there isn’t much variability from one experimental unit to the next, then we don’t really gain from doing a paired design (compared to doing a 2-sample t-test). • If the data is paired and presented as n1 = a and n2 = b, we know that n1 = n2 (because its paired). And eventhough we have n1 + n2 measurements, we REALLY only have n differences, and this n is what matters for our t distribution, as tn−1. 17 100(1-α)% Confidence interval for µD : • The point estimate for µD is x̄D • We can form a 100(1-α)% confidence interval for the mean population difference µD the same as before: If x̄D and sD are the sample mean and standard deviation of the differences of n random pairs of normally distributed measurements, a 100(1 − α)% confidence interval for µD is √ x̄D ± tα/2,n−1 · sD / n ——————————————————— See worksheet: “Matched pair or Two-sample t-test?” 18