Download Comparing Means in Two Populations

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Analysis of variance wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Comparing Means in Two
Populations
Overview
• The previous section discussed hypothesis testing when
sampling from a single population (either a single mean
or two means from the same population).
• Now we’ll consider how to compare sample means from
two populations.
• Towards the end of the course, we’ll discuss comparing
means from more than two populations.
• When we’re comparing the means from two independent
samples we usually ask: “How does one sample mean
compare with the other?”
• However, focusing just on comparing the means can be
premature.
• It’s safer to first consider the variability of each sample,
the pattern of any outliers, the shape of the distributions.
• Then it may be safe to assume a normal distributions;
but not always.
• So, we’ll also discuss approaches to answering these
questions when we’re not comfortable with the
assumption of normality, or when this assumption is just
not defendable.
Two Sample Means
• Cities and counties: Returning to the Alabama
SOL pass-rates, is there a difference between
cities and counties?
• Recall last time we looked at a difference across
years in the same population.
• Here, we want to look at the difference in one
year between two populations: city high schools
and county high schools.
Phase 1: State the Question,
• 1. Evaluate and describe the data
• Begin by looking at the data. Where did the data come
from? What are the observed statistics?
• The source of this data is Alabama Department of
Education.
• The first step in any data analysis is evaluating and
describing the data.
• These first steps are also called preliminary analysis, to
distinguish them from the definitive (or outcome)
analysis.
Preliminary Analyses
• The goal of a preliminary analysis is to
describe and inform.
• To give a description of the data.
• Keep in mind that, the goal of a definitive
analysis is decision making or hypothesis
testing.
Preliminary Analysis
• What are the observed statistics?
• Use the Fit Y by X platform to look at a
graphical and tabular summary of the
data, as in the next figure.
• Note: Previously, in Step 1, we used the
Distribution of Y reports to identify and fix errors
and to further understand the data.
• You can use the Distn of Y but you would have
to run it twice – here you can see everything in
one place.
• There are three components to this figure: The
dot plots, the box plots, and the quantiles table.
Let’s look at each.
The Dot plot
• A dot plot shows the continuous Y-variable’s (Algebra I
2000 pass rate) values along the vertical axis and the
nominal X-variable’s (City Yes or No) values along the
horizontal axis.
• So we see the two groups along the horizontal axis; City
= “No” and City = “Yes”. The width of the groups is
proportional to the sample size of each group; there are
more “No” (non-cities) values so it is drawn wider. This
follows the “your eye goes to ink” rule.
• Groups with larger samples are more informative than
groups with smaller samples so the larger n group is
drawn bigger.
Dot plots
• Along the vertical axis, we see the 10th grade, year 2000
Algebra I SOL pass-rate; One dot for each high school.
Values range from 0% passing to 100% passing.
• The horizontal spreading of the values is done so that
you can see each school’s scores better (called “Jittered
Points”). The amount of horizontal spread is random; so,
don’t try to interpret the scores for points farther to the
right or left than scores closer to the center (horizontally).
• Of course, the vertical values are interpretable; that is, a
school at the top has a higher pass-rate than a school at
the bottom.
Box Plots
• These side-by-side box plots describe the shape
of the distributions within each group.
• These plots do not assume normality so we use
them to begin to answer the question, “Is each
group normally distributed?”
• In the box plots we can easily see whether the
values are symmetric about the median.
• Look for these “warning flags” that the data is
not normal:
Box Plots
• Is the distance between the median and the 75%tile
different than the distance between the median and the
25%tile?
• Is the upper whisker-bar (actually, the 90%tile) more
distant from the median than the lower whisker-bar (the
10%tile)?
• Are the high-extreme tail-values more distant from the
median than the low-extreme tail-values?
• These informal, graphic assessments don’t raise any
warning flags for these data.
• The dotted horizontal line represents the mean value for
all schools (not considering the group).
Quantiles Report
• If a more detailed comparison of values is
needed, the numerical values plotted in the box
plots are shown in the Quantiles Report.
Quantiles Report
• For instance, in the City = “No” group, the
distance from the median to the 75%tile (~49 vs
65, 16 points) is about the same as the distance
between the median and the 25%tile (~49 vs 34,
about 15 points).
• But, as in the Distribution platform, the preferred
way to answer the question, “Is each group
normally distributed?” is with a normal quantile
plot.
Normal Quantile Plots
• Actually, the more proper phrasing of the
question is: “Within each group, is each
group normally distributed?”
• That is, it may be that if we were to lump
both groups together, the data would
appear non-normal.
• We must take group membership into
account when making this assessment.
Interpretation
• Follow the same interpretation of the
normal quantile plot as we discussed with
a single mean.
– Are each group of black dots along a straight
line?
– In the SOL data, these two sets of points
follow the lines fairly well, with some
departure in the tails.
Normality
• So, we now have enough information to answer
the question, “Within each group, is each group
normally distributed?”
– If the answer is “Yes” or “Probably” then we can
proceed with parametric tests to compare the means.
The Central Limit Theorem can apply if the sample
size is “large.” The rule of thumb is if the total n is at
least 30 (n1 + n2 ≥ 30).
– If the answer to the normality question is “No” or “I
doubt it” then we’ll use nonparametric methods to
answer our question.
Preliminary analysis, showing means
• If the data is normally distributed then means
and SDs make sense.
• If these distributional assumptions are
unwarranted, then we should consider
nonparametric methods.
• Thus, the next thing to do in our preliminary
analysis may be to get rid of the box plots and
quantile plot and to show the means and
standard deviations calculated within each
group.
Means
• This figure shows the means, here connected with a line,
and a short dashed bar that is one standard-error “error
bars.”
• The long dashed lines above and below the means are
one standard deviation away from their respective mean.
• The means and standard errors and deviations can be
shown by selecting Means and Std Dev from the main
Options menu. From the Display Options sub-menu in
the Options menu, select the options necessary.
Means and SDs Report
• We can use it to describe the following:
– the number of observations in each group,
– the means of each group,
– the standard deviation within each group.
• Recall that the SE is not a “descriptive statistic”
for the data, it is used for inference about the
mean.
• JMP includes the SE here because it is used to
form confidence intervals about the mean.
• Note: You can change the number of decimal places
displayed in any JMP report:
– Double-click a number in the report. A dialog will appear.
– Change the number of decimal places.
Summary: Preliminary Analysis
• So far, what have we learned about the data?
• We have not found any errors in the data.
• We’re comfortable with the assumption of
normality within each group.
• We’ve obtained descriptive statistics for each of
the group we’re comparing.
Preliminary Results
• Also, at this point, we can look at the two means
and make a guess, “is there a difference
between cities and counties?”
• City schools seem to be about 12 points below
non-city schools, and with SEs < 3, this seems
like a “big” difference.
• Recall that the t-statistic is the ratio of the
difference to a standard error.
• The ratio of 12 to 3 is bigger than 2.
2. Review assumptions
• As always there are three questions to consider:
• Is the process used in this study likely to yield data
that is representative of each of the two populations?
– Yes, it is the population
• Is observation in the two samples independent of the
others?
– Yes.
• Is the sample size sufficient?
– Yes, both groups are “large” and we’re comfortable with
normality for both groups.
Bottom line
• We have to be comfortable that the first
two assumptions are met before we can
proceed at all.
• If we’re comfortable with the normality
assumption, then we proceed, as below.
• Later, we’ll discuss what to do when
normality can not be safely assumed.
3. State the question—in the
form of hypotheses
• Let’s refer to the two groups as “1” and “2” for
notational purposes. Using these as subscripts,
there are three possible null hypotheses:
1. The null hypothesis is µ1 ≤ µ2,
2. The null hypothesis is µ1 ≥ µ2 , or
3. The null hypothesis is a fixed value,
µ1 =
µ2.
• And the alternative hypothesis is the opposite of
the null.
test statistic =
summary statistic - hypothesized paramter
standard error of the summary statistic
Phase 2: Decide How to
Answer the Question
• 4. Decide on a summary statistic that reflects
the question
• Recall the general test statistic:
summary statistic - hypothesized paramter
test statistic =
standard error of the summary statistic
Difference Score
• In this situation (as in comparing paired means),
we are going to use the difference score as our
summary statistic:
The relevant statistic is y1 − y 2 or the observed difference of the two means.
• The hypothesized parameter is easy: µ1 − µ2 = 0 , since under any of the three
null hypotheses a difference of 0 would result in “failing to reject” the null
hypothesis.
Standard Error
• What about the standard error? There are two possibilities for the standard error
of y1 − y 2 . The two possibilities depend upon the two standard deviations within
each group.
o Are they the same?
o Or do the two groups have different standard deviations?
Same SD
• If the standard deviations (or variances) within
the two populations are equal than the standard
error of the difference is easy.
• We just “average” the two estimated standard
deviations and obtained a pooled estimate.
• The variance of the mean difference is the sum
of the standard errors of each mean
Same SD
2
σ1
n1
2
σ2
+
n2
The t-statistic
We’ll use the t-test to compare the two sample means and, using a pooled estimate for
the variance called s p2 , we calculate:
t=
y1 − y 2
s p2
n1
+
s p2
n2
Estimating σ
• The pooled variance estimate is a weighted
average of the two individual-group variances:
2
sp
n1 − 1)
(
=
2
s1
+ ( n2 − 1)
2
s2
n1 + n2 − 2
Under the equal variance assumption, we
calculate the p-value using df = n1 + n2 – 2.
Unequal SD
• If the variances are not equal, the calculation is
more complicated:
t′ =
y1 − y 2
2
s1
n1
+
2
s2
n2
t-prime
• Note that the separate variance estimates are
used in this “t prime” statistic, not the pooled
estimated for variance.
• Further, the df is not a simple function of just n1
and n2.
• The details of these calculations are not
important.
• What we need to know is how to proceed using
JMP.
Deciding on the correct t-test
• Which test should we use?
• We may not need to choose; if the two sample
sizes are equal (n1 = n2) the two methods give
identical results.
• It’s even pretty close if the n’s are slightly
different.
• If one n is more than 1.5 times the other (in the
SOL case, n1 = 306 and n2 = 93, which is over 3
times as large), you’ll have to decide which t-test
to use.
Decision
• Decide whether the standard deviations
are different.
– Use the equal variance t-test if they are the
same, or
– Use the unequal variance t-test if they are
different.
• Or, you could decide not to decide; use
the unequal variance t-test. It’s more
conservative.
Determining Equal SDs
•
There are three ways to make this
decision.
1. Inspect the two standard deviation
estimates.
2. Use the normal quantile plot.
3. Test for equal standard deviations.
Inspect the SDs
• Refer to the Means and Std Deviations report.
• Look at the two standard deviations, in this case
22.1 and 19.9.
• Form the ratio of the largest to the smallest.
• If the ratio is larger than about 3, then the two
SD’s may be unequal (in our case, the ratio is
3.3).
• For a better answer to the question, see the
normal quantile plot.
Normal Quantile Plot, SDs
• If the two standard deviations are equal then the
slopes for the two lines in the normal quantile
plot will be the same (the lines will be parallel).
• In our case the lines have roughly the same
slope.
• So, for the SOL data the assumption of equal
variability seems safe.
Questionable Parallel?
•
•
1.
2.
3.
4.
If the slopes of the two lines are in that “gray area”
between clearly parallel and clearly not parallel, what do
we do?
There are four possibilities:
Ignore the problem and be risky: use the equal variance
t-test.
Ignore the problem and be conservative: use the
unequal variance t-test.
Make a formal test of unequal variability in the two
groups.
Compare the means using nonparametric methods.
What if not Parallel?
• Here, the data appear
to be reasonably
normal (this isn’t the
question) but the lines
are not parallel – they
start out close and
end far apart.
• Here, not only do the
variances appear to
be unequal but
normality is also in
questions.
• We’ll look at this case
later.
Test for equal SDs
• In the case of the first figure, where we’ll
be using a t-test, but we’re not sure which
one, JMP provides a way to test for equal
variance in the main options menu.
Choosing between the tests of
equal variance
• Of the five tests: O’Brien’s, the Brown-Forsythe test,
Levene’s test, Bartlett’s test, and the F-test;
• the last three are out of date and are not recommended.
• There’s not much difference between O’Brien’s and
Brown-Forsythe.
• Brown-Forsythe is more robust (resistant to outlying
observations), so we’ll use this result.
• What are we testing?
Variance test
• The null-hypothesis for these tests are,
“the variances are equal.”
• So, if the Prob>F value for the BrownForsythe test is < 0.05, then you will reject
the null hypothesis (universal decision
rule) and conclude that the groups have
unequal variances.
Reject Equal SDs?
• The report also shows the result for the t-test to
compare the two means, allowing the standard
deviations to be unequal.
• This is the “unequal variance t-test.” Here is a
written summary of the results using this
method:
• “The two groups were compared using an
unequal variance t-test and found to be
significantly different (t = 7.1, df = 217.4, p-value
< 0.0001). School districts in cities had lower
scores ….”
Unequal test df?
• Notice the degrees of freedom – it isn’t a whole
number.
• That is because it is based on a “weighted”
contribution of each sample (with unequal ns) to
the standard error estimate.
• You can round the number off but ONLY to one
decimal place, do not round to a whole number.
Step 5: Random Variation
• Recall the rough interpretation that t’s
larger than 2 are likely not due to chance.
• For either type of t
6. State a decision rule
• The universal decision rule
• Reject H0: if p-value < a.
Phase 3: Answer the
Question
•
•
7. Calculate the statistic
There are three possible statistics that
may be appropriate:
1. an equal variance t-test,
2. an unequal variance t-test, or
3. the nonparametric Wilcoxon rank-sum test.
Equal variance
• If the equal variance assumption is reasonable, then the
standard t-test is appropriate.
• Note: When reporting a t-test it’s assumed that, unless
you specify otherwise, it’s the equal-variance t-test.
• The next figure shows the means diamonds in the dot
plot, the t-test report, and the means for a oneway
ANOVA (Analysis Of VAriance) report.
• We’ll cover oneway ANOVA later in the course. When
there are only two groups, the t-test and ANOVA give
identical results.
In JMP
• To compare the two means using an equal variance ttest in JMP:
– Choose Means/Anova/Pooled t from the main options menu.
• This adds the Oneway ANOVA report and means
diamonds.
• The t-value, df, and p-value are shown in the t-Test
report.
• However, only the two-tailed p-value is reported (under
Prob>|t|).
• If the null-hypothesis specified that we were testing for
equality, then this is the p-value we want.
One-tail p-values
• First, which group did JMP use for y1 and which
for y2? JMP uses the order of the X-variable:
• If the X-variable is character, JMP alphabetically
sorts the values and whichever comes first is y1.
• If the X-variable is numeric, JMP uses the
smallest value of the X-variable as y1.
• If your alternative was HA: µ1 > µ2 and the t-test value is positive, then the onetailed p-value is half the two-tail Prob>|t| in the report.
• If the t-test value was negative then you’ve observed a difference in the opposite
direction from that expected. The p-value is one minus half the Prob>|t| in the
report.
• If your alternative was HA: µ1 < µ2 and the t-test value is negative, then the onetailed p-value is half the two-tail Prob>|t| in the report.
• If the t-test value was positive then you’ve observed a difference in the opposite
direction from that expected. The p-value is one minus half the Prob>|t| in the
report.
Unequal variance
• If the variances are not equal or if we just want a
more conservative test then see the bottom
portion of the Tests that the Variances are Equal
report.
• The unequal-variances t-test is listed as the
Welch Anova.
• Report the t-value, df and p-value, as in the
equal variance case.
Nonparametric comparison of
the medians
• If normality isn’t reasonable, then you can use a
nonparametric test.
• You will use a test that compares the medians
between the two groups.
• The nonparametric test is based solely on the
ranks of the values of the Y-variable.
• In JMP, choose
– Nonparametric > Wilcoxon test.
Wilcoxon
• The Wilcoxon rank-sum test (also called the
Mann-Whitney test) ranks all the Y-values (in
both groups) and then compares the sum of the
ranks in each group (the groups are specified by
the X-variable).
• If the median of the first group is, in fact equal to
the median of the second group, then the sum of
the ranks should be equal for equal sample
sizes
Reporting Wilcoxon
• When reporting the results of a nonparametric test, it’s
usual to only report the p-value.
• In the above report, there are two p-values, one for the
z-test and one using a chi-square value.
• The p-values will rarely be different.
• For large samples report the p-value from the normal
approximation.
• For smaller samples, use the chi-square
• For really small samples you should probably consult a
statistician to help you obtain exact p-values.
Steps 8 & 9
• 8. Make a statistical decision
– Using all three tests, the two groups are
different. All p-values are < 0.05.
• 9. State the substantive conclusion
– The schools in cites have significantly lower
mean pass rates (35.9% vs 48.8%) and
significantly different median pass rates
(36.0% vs 49.2%)
Phase 4: Communicate the
Answer to the Question
•
10. Document our understanding with text, tables, or
figures
• The year 2000 Alabama SOL pass-rates in 10th grade
Algebra I were divided into two groups according to
whether the school was a city or county high school.
There were n = 306 schools within city school-districts
and n = 92 in county school districts. The observed
average pass rates within city schools was 35.9% (SD =
19.9) and pass rates outside of cities were 48.8% (SD =
22.1). Using a two-tailed t-test, we conclude that the
observed means are significantly different (t = 5.0, df =
396, p-value < 0.0001). From this we conclude that city
schools have a significantly lower pass rate compared to
county schools. The 95% confidence interval about the
mean difference is between 7.8% and 17.9%.
Less text, replaced by
information in a table
• Alternatively, it may be more
straightforward to include many of the
numbers in a table and state your results
in text.
• So instead of the above paragraph, you
could do this:
• The summary results for the year 2000 Alabama SOL
pass rate percentages in 10th grade algebra I are shown
in Table 1. Schools were divided into cities if their
district name contained “City” and were otherwise
classified as a county school district. From these results
we conclude that schools in cities had a pass-rate that
was significantly lower than the pass-rates compared to
county schools. The 95% confidence interval about the
mean difference is between 7.8% and 17.9%.
Table 1. 2000 Alabama SOL Pass-Rates in 10th Grade Algebra I
For Schools in Cities and in Counties
Location
City
County
Difference
*t
Number of Pass Rate
Schools
(SD)
92
35.9 (19.90)
306
48.8 (22.05)
12.9*
= 5.0, df = 396, p-value < 0.0001
SE
2.23
1.62
2.57
95% CI
31.5 – 40.4
51.2 – 46.4
7.8 – 17.9
Less text, replaced by
information in a figure
• As another alternative, it may be more
informative to describe the results in a
figure.
• Instead of the above, you could do this:
• The summary results for the year 2000 Alabama
SOL pass rate percentages in 10th grade
algebra I are shown in Figure 12. Schools were
divided into city and county high schools. From
these results we conclude that schools in cities
had a pass-rate that was significantly lower than
the pass-rates compared to county schools(t =
5.0, df = 396, p-value < 0.0001). The 95%
confidence interval about the mean difference is
between 7.8% and 17.9%.
100
90
80
70
60
50
40
30
20
10
0
County High Schools
mean = 48.8, SD = 22.05
100
90
80
70
60
50
40
30
20
10
0
City High Schools
mean = 35.9, SD = 19.90
Figure 12. 2000 Alabama SOL Pass-Rates in 10th Grade Algebra I
For Schools in Cities and in Counties
Summary: Two Independent
Means
• Briefly, here is how to proceed when comparing the
means obtained from two independent samples.
• Describe the two groups and the values in each group.
What summary statistics are appropriate? Are there
missing values? (why?)
• Assess the normality assumption. If normality is not
warranted, then do a nonparametric test to compare the
medians.
• If normality is warranted, then assess the equal variance
assumption.
Summary (cont)
• Report confidence intervals on each of the
means if normality is reasonable.
• Perform the appropriate statistical test: equal
variance t-test, unequal variance t-test, or the
Wilcoxon rank-sum test. Determine the p-value
that corresponds to your hypothesis.
• Reject or fail to reject? State your substantive
conclusion.
Summary (cont)
•
•
Additional note: Say you conclude that the
groups have different means. How do you
describe what the different means are?
If you’ve followed the above recipe, you are in
one of three situations:
1. Normality is reasonable and the variances are equal
2. Normality is reasonable and the variances are
unequal
3. Normality is not warranted
Summary (cont)
• Normality is reasonable and the variances are
equal, use the equal variance t-test.
• The write up reads “… the means are
significantly different (t = x.xx, df =xxx, p-value =
0.xxxx).”
• Also give a table of means, SEs and 95%CIs
just like the “Means for Oneway Anova”.
Summary (cont)
• The variances are unequal, use unequalvariance t-test.
• The write up reads “… the means are
significantly different (unequal variance t = x.xx,
df =xxx, p-value = 0.xxxx).”
• Also give a table of means, SEs and 95%CIs
just like the “Means and Std. Deviations” report.
• Note: the means are the same.
• The SEs and CIs are different.
Summary (cont)
• Normality is unreasonable, use Wilcoxon’s test.
• The write up reads “… the medians are
significantly different (by Wilcoxon’s signed-rank
test, p-value = 0.xxxx).”
• Also give a table of medians and IQRs.
• There is a way to put 95%CIs on these
estimates but not using any easily available
software.
Always
• Report a measure of the center and
spread.
• For all three tests, report the p-value and
make a decision based upon your
hypothesis.
• Your final statement should summarize the
results in terms of the experiment (no
statistics),