Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Nonparametric Tests
PBS Chapter 16
© 2009 W.H. Freeman and Company
Objectives (PBS Chapter 16)
Nonparametric Tests

The Wilcoxon rank sum test

The Normal approximation for W

What hypotheses does Wilcoxon test?

The Wilcoxon signed rank test

The Normal approximation for W+

Dealing with ties

The Kruskal-Wallis test
Assumptions for inference
For the inference methods for means we have already studied, we
assumed that the variables had Normal distributions in the
population(s) from which we drew our data.
Robustness: some skew was acceptable, especially if the sample size
was large.
What happens if plots suggest the data is clearly not Normal,
especially if the sample size is small?
Options for non-Normal data and small n
1.
Is lack of Normality due to outliers? If an outlier appears to be “real data,”
you have to leave it in, but if you have reason to think there is an error in
that data, you may be able to remove it.
2.
Try transforming the data. For example, use a logarithm for right-skewed
data.
3.
Try another standard distribution. Other procedures can replace the t
procedures, if data (especially right-skewed data) fits another distribution.
4.
Use modern bootstrap methods and permutation tests. Heavy computing
avoids requiring Normality or any other specific form of sampling
distribution.
5.
Use other nonparametric methods. Discussed in this chapter.
Ranks
Hypotheses for rank tests just replace the mean with the median.
For strongly skewed data, we prefer the median to the mean for
describing the center of the data.
To rank observations, first arrange them in order from smallest to
largest. The rank of each observation is its position in this ordered list,
starting with rank 1 for the smallest observation.
Example: Earnings of hourly bank workers
A large bank has been
accused of discrimination in
paying its hourly workers. The
table gives the annual
earnings of two random
samples of National Bank
workers.
Back-to-back stemplot
Back-to-back stemplot of
the earnings of black male
and female bank workers.
The stems are thousands
of dollars and the leaves
are hundreds of dollars.
The female distribution
appears reasonably
Normal. The male
distribution is strongly
skewed.
Earnings of hourly bank workers
Earnings 14,714
Rank
1
15,953 16,015 16,555 16,576 16,853 16,890 ….
2
3
4
5
6
7 … and so on

First rank all 27 observations together.

Arrange them in order from smallest to largest.

The boldface numbers are the earnings of the males.

Note that 4 lowest earnings are for women, and 4 highest are for men.

The idea of rank tests is to look just at position.

Working with ranks allows us to dispense with the numerical values of
the data and the specific conditions on the shape of the distribution,
such as Normality.
Earnings of hourly bank workers
Compare the sums of the ranks from the two groups.
Group
Sum of Ranks
Females
183
Males
195
Because there are more women than men, we would expect the
sum of female ranks to be greater if there were no systematic
gender differences. But how much greater?
Wilcoxon Rank Sum Test
Draw an SRS of size n1 from one population and draw an independent SRS of
size n2 from a second population. There are N observations in all, where N =
n1 + n2. Rank all N observations. The sum W of the ranks for the first sample
is the Wilcoxon rank sum statistic. If the two populations have the same
continuous distribution, then W has mean
W 
n1 ( N  1)
2
and standard deviation
n1n2 ( N  1)
W 
12
The Wilcoxon rank sum test rejects the hypothesis that the two populations
have identical distributions when the rank sum W is far from its mean.
Earnings of hourly bank workers
Group
Sum of Ranks
Females
183
Males
195

In this study, we want to test the hypotheses:
H0: No difference in distribution of earnings of black females and males.
Ha: Male earnings are systematically higher.

The test statistic is the rank sum W = 195 for the 12 men.
Example: Earnings of hourly bank workers
N = 27, n1 (men) = 12, and n2 (women) = 15.
The sum of ranks for the 12 men has mean and standard deviation:
n1 ( N  1) 4(9)
W 
 168  18
2
2
n1n2 ( N  1)
(4)(4)(9)
W 
 20.494
 3.464
12
12
Observed rank sum W = 195 is only 1.3 standard deviations above the mean.
Software tells us that the P-value for P(W  195) is 0.0998.
We cannot reject the null hypothesis.
We do not have enough evidence to say that male earnings are higher in the
entire population of National Bank hourly workers.
.
The Normal Approximation for W
To calculate the P-value for the rank sum Wilcoxon test, we need to
know the sampling distribution of W when the null hypothesis is true.
With or without software, P-values for the Wilcoxon test are often based
on the fact that the rank sum statistic W becomes approximately
Normal as the two sample sizes increase.
Test statistic:
z
W  W
W
W  n1 ( N  1) / 2

n1n2 ( N  1) /12
Example: Earnings of hourly bank workers
Here W = 195 with mean 168 and standard deviation = 20.494, so we get
z
W  W
W
23  18
= 1.32  1.44
3.464
P  value  PP(Z
( Z > 1.32)
1.44)=0.0934
0.0749
We can improve this approximation by using the continuity correction. You
use this for a variable that takes only whole-number values, like W. Act as if
each whole number occupies the entire interval from 0.5 below the number to
0.5 above it. Then we use W = 194.5 and get:
z
W  W
W
22.5  18
= 1.29
 1.30
3.464
= 0.0985
P  value  PP(Z
( Z>1.29)
1.30)
 0.0968
Software tells us that the exact P-value for P(W  195) is 0.0998
What hypotheses does Wilcoxon test?
If we assume that our sample is Normally distributed, we can use
the two-sample t test for means. H :  = 
0
1
2
Ha: 1 > 2
When the distribution may not be Normal, we might restate the
hypotheses in terms of population medians rather than means.
H0: median1 = median2
Ha: median1 > median2
The Wilcoxon rank sum test will test the hypotheses above only if an
additional condition is met: both populations must have
distributions of the same shape.
What hypotheses does Wilcoxon test?
The same shape condition is too strict to be reasonable in practice.
A more useful statement of the hypotheses compares two
continuous distributions, whether or not they have the same
shape.
H0: the two distributions are the same
Ha: one has values that are systematically larger
These hypotheses are considered “nonparametric” because they do
not include a parameter. They are just stated in words.
Dealing with ties in rank tests
Up until now, our data has had no two values exactly the same.
However, we often find observations tied at the same value.
The usual practice is to assign all tied values to the average of the
ranks they occupy.
In practice, software is required to use rank tests when the data
contains tied values.
Matched pairs: the Wilcoxon signed rank test
Example: A study of early childhood education asked kindergarten
students to tell fairy tales that had been read to them earlier in the
week. Each child told two stories. The first had been read to them and
the second had been read but also illustrated with pictures. An expert
listened to a recording of the children and assigned a score for certain
uses of language. Here are the data for five low-progress readers in a
pilot study:
Compare absolute values of the differences between the before and
after results.
Matched pairs: the Wilcoxon signed rank test
The test statistic is the sum of the ranks of the positive differences
(highlighted in blue).
This is the Wilcoxon signed rank statistic.
Its value here is W+ = 4 + 5 = 9.
Matched pairs: the Wilcoxon signed rank test
Draw an SRS of size n from a population for a matched pairs study,
and take the differences in responses within pairs. Rank the absolute
values of these differences. The sum W+ of the ranks for the positive
differences is the Wilcoxon signed rank statistic. If the distribution of
the responses is not affected by the different treatments within pairs,
then W+ has mean and standard deviation:
W 
n(n  1)

4
W 
n(n  1)(2n  1)

24
The Wilcoxon signed rank test rejects the hypothesis that there are
no systematic differences within pairs when the rank sum W+ is far from
its mean.
Matched pairs: the Wilcoxon signed rank test
For the storytelling example, W+ = 9 and n = 5, so the mean and
standard deviation are:
W 
W 
n(n  1) 5(5  1)


 7.5
4
4
n(n  1)(2n  1)
5(6)(11)


 3.708
24
24
The observed value of W+ = 9 is only slightly larger than the mean. We
now expect that the data is not statistically significant. The data shows
a small effect but not a significant one. A larger sample size may show
a larger effect.
The P-value from software is 0.4062 which agrees with this conclusion.
The Normal approximation for W+
The distribution of the signed rank statistic when the null
hypothesis (no difference) is true becomes approximately
Normal as the sample size becomes large.
We can then use Normal probability calculations (with the
continuity correction) to obtain approximate P-values for W+.
For the storytelling example (although n = 5 is not a large
sample), our P-value is really P(W   9) , but with the
continuity correction we change it to:
8.5  7.5 

P(W   8.5)  P  Z 

3.708 

 P( Z  0.27)  0.394
Dealing with ties in the signed rank test
Ties among absolute differences:
 Handle just like the regular rank sum tests—assign average
ranks.
 Makes finding the P-value more complicated.
 There is no longer an exact distribution for W+.
 The standard deviation needs to be adjusted before the Normal
approximation can be used.
Ties within a pair:
 Create a difference of 0 (before = after).
 Because these differences are neither positive nor negative, we
drop these pairs from our sample.
 Only reduces the number of observations, n.
Comparing several samples: the Kruskal-Wallis test
ANOVA hypotheses:
Data should come from independent random samples, all Normally distributed
with the same standard deviation
Kruskal-Wallis hypotheses:
1. Data should come from independent random samples. The response has a
continuous (but not necessarily Normal) distribution.
2. Data should come from independent random samples. The response has a
continuous (but not necessarily Normal) distribution, and the samples come
from population distributions of the same shape (not necessarily Normal).
H 0 : M 0  M1  M 3  M 9
H a : not all four medians are equal
Example: Weeds among the corn
Lamb’s-quarter is a common weed in corn fields. A researcher planted corn at
the same rate in 16 small plots of ground, then randomly assigned the plots to 4
groups. He weeded the corn rows by hand to allow a fixed number of lamb’squarter plants to grow in each meter of corn row. These numbers were 0, 1, 3,
and 9 in the four groups of plots. No other weeds were allowed to grow, and all
plots received identical treatment except for the weeds.
Here are the yields of corn (bushels per acre) in each of the plots.
Example: Weeds among the corn
Here are the summary statistics for the corn yield.
Can we safely use ANOVA? The standard deviations don’t pass the
largest s < 2 (smallest s) test, and there were outliers in the original
data that cannot be removed.
Can we use the median Kruskal-Wallis test? The different standard
deviations suggest that the distributions do not all have the same
shape.
Example: Weeds among the corn
Rank all 16 observations in order from smallest to largest.
Note the tied
observations
Kruskal-Wallis test statistic
Example: Weeds among the corn
Kruskal-Wallis test statistic:
Using Table E with df = 3, the P-value is 0.10 < P < 0.15.
We do not reject the null hypothesis.