Download Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 10: STATISTICAL INFERENCE
FOR TWO SAMPLES
Part 2: Hypothesis tests on a µ − µ
when data is paired
Section 10-4 Paired t-test
• Sometimes when we want to compare the
means of two groups, the data has been collected in a paired scenario, so not from independent sample groups.
• Some examples of the paired t-test:
– Comparison of mean lifetime of brakes from
Midas and Brembo (brake companies)
∗ n=20 cars are chosen. The front left and
right brakes include one brake from each
company (randomly assigned). From each
car, we have a measurement from each
group.
1
{If we had 40 cars, and we put 1 brake in
each car (20 got Midas and 20 got Brembo)
we would have independent groups and would
perform a 2-sample t-test, the pairing is gone.}
– Comparison of mean corn yield by Dekalb
and Pioneer
∗ n=25 fields are chosen on the east side
of Iowa. In each field, half is planted
with Dekalb, half is planted with Pioneer. Yield is recorded for each brand
in each field. From each field, we have
a measurement from each group.
{If we had 50 fields, and we randomly assigned 25 fields to Dekalb and 25 fields to Pioneer, we would have independent groups and
would perform a 2-sample t-test, the pairing
is gone.}
2
– Comparison of mean IQ scores for children in low-income families to high-income
families
∗ n=18 adopted sets of twins that were
each raised in the two different environments. From each set of twins, we have
a measurement from each group (lowincome and high-income).
{If we had 18 low-income kids, and 18 highincome kids (no relation) and compared their
means, we would have independent groups
and would perform a 2-sample t-test, the pairing is not present.}
3
• IT’S ABOUT HOW THE DATA WAS
COLLECTED.
Before data collection, the questions of interest above (i.e. the comparisons of means)
could’ve been approached with a 2-sample
t-test (i.e. independence between groups)
or a paired t-test (i.e. not independence
between groups), but once the data are
collected, only one of these is appropriate.
You’ll need to recognize which analysis is
appropriate for the given data collection
scenario.
One element that must be true in a paired
t-test is that we must have an equal number of observations from each group, because they’re paired.
4
• Other common examples of when the paired
t-test arises (repeated measures):
∗When we have two measurements on each
of many individuals.
– Comparison of before diet weight and after
diet weight. We’re checking the efficacy of
the diet. µ1 is the mean weight before diet,
and µ2 is the mean weight after diet.
∗ For each of the n=30 individuals, we
have a before and after weight. From
each person, we have 2 measurements,
one from each group (before and after
weight), the data is paired.
– Two nurses were arguing about which did
a better job of drawing blood (in terms
of comfort to the patient). For n = 10
patients, blood was drawn once from each
5
nurse (in random order and on different
days) and after each draw they were asked
about their level of discomfort on a scale
from 1 to 5. µ1 is the mean level of discomfort from nurse 1, and µ2 is the mean
level of discomfort from nurse 2.
∗ For each of the n=10 individuals, we
have 2 measurements, one from each group
(nurse 1 and nurse 2), the data is paired.
• From a statistical viewpoint, paired experiments tends to be desirable because we can
compare treatments within a single individual (essentially reducing the noise around the
signal).
There is often lots of variability from one individual to the next, which makes signal detection more difficult in 2-sample t-test setup compared to a paired set-up.
6
• In a paired t-test, we analyze the
DIFFERENCES,
not the individual measurements.
– Example: Schizophrenia
(New England Journal of Medicine)
Claim: A small left hippocampus in the
brain is associated with schizophrenia.
Data: The size of left hippocampus in
n = 5 sets of twins, one with schizophrenia
and one without.
set
1
2
3
4
5
Normal
Twin
1.94
1.78
1.25
1.44
2.06
Schiz.
Difference
Twin xD = xnorm − xschiz
1.27
0.67
1.28
0.50
1.02
0.23
1.63
-0.19
1.93
0.13
7
In 4 out of 5 of the twins, the left hippocampus was larger in the normal twin.
DIFFERENCES:
We will let µD = µnorm − µschiz .
If there is no difference in size, then
µD = 0.
There is probably a large variability in
size of the left hippocampus in the general population, so by getting twins for
this study who would be expected to have
fairly similar sizes, we have controlled for
some of that variability (they’re genetically similar) making it easier to detect a
subtle difference (due to the disease) if it
exists.
NOTE: Because this is a paired design, we will
analyze the differences, not the original data.
8
1. State Hypotheses
H0 : µD = 0
H1 : µD > 0
{because µD =
µnorm − µschiz }
2. Test statistic
Inference on µD is based on the sample mean of differences x̄D , where the
sample mean of differences x̄D is
xD1 + xD2 + · · · + xDn
x̄D =
n
and the sample standard deviation of
differences SD where
2
Pn
i=1 xDi − x̄D
2
SD =
n−1
The test statistic...
9
Under H0 true, the T0 test statistic
X̄D − µD0
√
T0 =
SD / n
is distributed as T0 ∼ tn−1
For this schizophrenia data, we have
√ = 1.79,
t0 = 0.268−0
0.334/ 5
and T0 ∼ t4
(because we had 5 differences)
3. P-value
P (T0 >1.79)=0.0740 {one-sided test}
4. Decision
Letting α = 0.05, the p-value is not less
than α, so we fail to reject H0.
10
5. Checking assumptions
Our analysis was on the differences xDi
and we performed a t-test. We should
check that the differences are nearly
normally distributed. With such a small
data set, there’s not much info to go on,
but we will assume we have normality.
There is not sufficient statistical evidence
at the α = 0.05 level to conclude that normal brain hippocampus’ are larger than
schizophrenic brain hippocampus’.
11
• Example: Car emissions on highway and
in-town
Claim: Mean level of emissions is less for
highway driving than for stop-and-go in-town
driving.
Data: Each car is driven both on the highway
and in-town (in random order).
car
1
2
3
4
5
6
7
8
Stopand-Go Highway
Difference
Emission Emission xD = xSG − xhighway
1500
941
559
870
456
414
1120
893
227
1250
1060
190
3460
3107
353
1110
1339
-229
1120
1346
-226
880
644
236
12
Sample data: x̄D = 190.5 and sD = 284.1
6 out of 8 show a larger emission for stopand-go.
DIFFERENCES:
We will let µD = µSG − µhighway .
If no difference in emissions, then µD = 0.
If stop-and-go has higher emissions, µD > 0.
There is a large variability in emissions from
one car to the next, so by considering both
environments for a single car, we have controlled for that car-to-car variability, and we
can compare the environments within a single car, making it easier to detect a difference
in emissions due to environment if it exists.
13
Perform the hypothesis test on the claim.
1. State Hypotheses
H0 : µD = 0
H1 : µD > 0
{as µD = µSG−µhighway }
2. Test statistic
190.5 − 0
√ = 1.897
t0 =
284.1/ 8
and under H0 true, T0 ∼ t7
3. P-value
P (T0 > 1.897) = 0.0498 {one-sided test}
4. Decision
Letting α = 0.05, the p-value is very close
to the 0.05 threshold. Since 0.0498 is less
than α, we reject H0.
14
5. Checking assumptions
We will assume we have approximate normality of the differences.
There IS sufficient statistical evidence at the
α = 0.05 level to conclude that the mean
level of emissions is less for highway driving
than for stop-and-go driving.
15
Some comments on paired t-tests:
• If n is large, we don’t need to check the normality of the differences (xdi values) because
the central limit theorem will give us normality of a sample mean.
• We often set-up the difference as the hypothesized larger mean minus the hypothesize smaller
mean (in order to work with a positive test
statistic). But you ALWAYS need to state
which difference you’re taking.
• In general, paired designs are more powerful
than independent two-sample t-tests. This
is because there’s often a lot of variability in
one experimental unit to the next (cars, people, etc.).
But on the hand, if there isn’t much variability from one experimental unit to the next...
16
If there isn’t much variability from one experimental unit to the next, then we don’t
really gain from doing a paired design (compared to doing a 2-sample t-test).
• If the data is paired and presented as n1 = a
and n2 = b, we know that n1 = n2 (because
its paired). And eventhough we have n1 + n2
measurements, we REALLY only have n differences, and this n is what matters for our
t distribution, as tn−1.
17
100(1-α)% Confidence interval for µD :
• The point estimate for µD is x̄D
• We can form a 100(1-α)% confidence interval for the mean population difference µD
the same as before:
If x̄D and sD are the sample mean and
standard deviation of the differences of n
random pairs of normally distributed measurements, a 100(1 − α)% confidence interval for µD is
√
x̄D ± tα/2,n−1 · sD / n
———————————————————
See worksheet:
“Matched pair or Two-sample t-test?”
18