Download also call the H test

Document related concepts

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Slide 1
Copyright © 2004 Pearson Education, Inc.
Chapter 12
Nonparametric Statistics
12-1
12-2
12-3
12-4
12-5
12-6
12-7
Overview
Sign Test
Wilcoxon Signed-Ranks Test for
Matched Pairs
Wilcoxon Rank-Sum Test for
Two Independent Samples
Kruskal-Wallis Test
Rank Correlation
Runs Test for Randomness
Copyright © 2004 Pearson Education, Inc.
Slide 2
Slide 3
Section 12-1 & 12-2
Overview and Sign Test
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Overview
Slide 4
Definitions

Parametric tests
Parametric tests require assumptions about the
nature or shape of the populations involved.

Nonparametric tests
Nonparametric tests do not require such
assumptions. Consequently, these tests are
called distribution-free tests.
Copyright © 2004 Pearson Education, Inc.
Advantages of Nonparametric
Slide 5
Methods
1. Nonparametric methods can be applied to a wide variety
of situations because they do not have the more rigid
requirements of the corresponding parametric methods.
In particular, nonparametric methods do not require
normally distributed populations.
2. Unlike parametric methods, nonparametric methods
can often be applied to nonnumerical data, such as the
genders of survey respondents.
3. Nonparametric methods usually involve simpler
computations than the corresponding parametric
methods and are therefore easier to understand and
apply.
Copyright © 2004 Pearson Education, Inc.
Disadvantages of
Nonparametric Methods
Slide 6
1. Nonparametric methods tend to waste
information because exact numerical data are
often reduced to a qualitative form.
2. Nonparametric tests are not as efficient as
parametric tests, so with a nonparametric test
we generally need stronger evidence (such as a
larger sample or greater differences) before we
reject a null hypothesis.
Copyright © 2004 Pearson Education, Inc.
Efficiency of
Nonparametric Methods
Copyright © 2004 Pearson Education, Inc.
Slide 7
Definition
Slide 8
Data are sorted when they are arranged
according to some criterion, such as smallest
to the largest or best to worst. A rank is a
number assigned to an individual sample
according to its order in the ranked list. The
first item is assigned the rank of 1, the
second is assigned the rank of 2, and so on.
Copyright © 2004 Pearson Education, Inc.
Example
Slide 9
5
3
3 40 10 12 Original scores
5 10 12 40 Scores arranged in order
1
2
3
4
5
Ranks
Copyright © 2004 Pearson Education, Inc.
Handling Ties in Ranks
Slide 10
Find the mean of the ranks involved and assign
this mean rank to each of the tied items.
3
5
5
1
2.5 2.5
10
12
Original scores
4
5
Ranks
2 and 3 are tied
Copyright © 2004 Pearson Education, Inc.
Sign Test
Slide 11
Definition
The sign test is a nonparametric (distribution
free) test that uses plus and minus signs to
test different claims, including:
1) Claims involving matched pairs of
sample data;
2) Claims involving nominal data;
3) Claims about the median of a
single population.
Copyright © 2004 Pearson Education, Inc.
Figure 12-1
Sign Test Procedure
Slide 12
Copyright © 2004 Pearson Education, Inc.
Figure 12-1 SignTest Procedure
Copyright © 2004 Pearson Education, Inc.
Slide 13
Figure 12-1 Sign Test Procedure
Copyright © 2004 Pearson Education, Inc.
Slide 14
Assumptions
Slide 15
1. The sample data have been
randomly
selected.
2. There is no requirement that the
sample
data come from a population with
a
particular distribution, such a
normal
distribution.
Copyright © 2004 Pearson Education, Inc.
Notation for Sign Test
x = the number of times
the
less frequent sign
occurs
n = the total number of
positive and negative
signs
combined
Copyright © 2004 Pearson Education, Inc.
Slide 16
Test Statistic for
the Sign Test
For n  25: x
For n > 25:
z=
Slide 17
(the number of times the less
frequent sign occurs)
n
(x + 0.5) – 2
n
2
Critical values:
n  25, critical x values are in Table A-7
For n > 25, critical z values are in Table A-2
For
Copyright © 2004 Pearson Education, Inc.
Claims Involving
Matched Pairs
Slide 18
Convert the raw data to plus and minus
signs as follows:
1. Subtract each value of the second
variable from the corresponding
value
of the first variable
2. Record only the sign of the
difference
found in step 1.
Exclude ties: that is, any matched pairs in
which both values are equal.
Copyright © 2004 Pearson Education, Inc.
Key Principle
of Sign Test
Slide 19
If the two sets of data have equal
medians, the number of positive
signs should be approximately equal
to the number of negative signs.
Copyright © 2004 Pearson Education, Inc.
Example:
Intelligence in Children
Slide 20
Use the data in Table 12-2 with a 0.05 significance level to
test the claim that there is no difference between the times
of the first and second trials.
Copyright © 2004 Pearson Education, Inc.
Example:
Intelligence in Children
Copyright © 2004 Pearson Education, Inc.
Slide 21
Example:
Intelligence in Children
Slide 22
Use the data in Table 12-2 with a 0.05 significance level to
test the claim that there is no difference between the times
of the first and second trials.
H0: The median of the difference is equal to 0.
H1: The median of the difference is not equal to 0.
 = 0.05
x = minimum(12, 2) = 2
Critical value = 2
Copyright © 2004 Pearson Education, Inc.
Example:
Intelligence in Children
Slide 23
Use the data in Table 12-2 with a 0.05 significance level to
test the claim that there is no difference between the times
of the first and second trials.
We reject the null hypothesis.
There is sufficient evidence to warrant rejection of the
claim of no difference between the times; that is, the
median is equal to 0.
Copyright © 2004 Pearson Education, Inc.
Example:
Gender Discrimination
Slide 24
Hatters Restaurant Chain hired 30 men and 70 women. Use
the sign test and a 0.05 significance level to test the null
hypothesis that men and women are hired equally by this
company.
H0: p = 0.5
H1: p  0.5
x = minimum(30, 70) = 30
Copyright © 2004 Pearson Education, Inc.
Example:
Gender Discrimination
Slide 25
Hatters Restaurant Chain hired 30 men and 70 women. Use
the sign test and a 0.05 significance level to test the null
hypothesis that men and women are hired equally by this
company.
n
(x + 0.5) – 2
z=
n
2
100
(30 + 0.5) – 2 = –3.90
z=
100
2
Copyright © 2004 Pearson Education, Inc.
Example:
Gender Discrimination
Slide 26
Hatters Restaurant Chain hired 30 men and 70 women. Use
the sign test and a 0.05 significance level to test the null
hypothesis that men and women are hired equally by this
company.
With  = 0.05, the critical values are z =  1.96.
We reject the null hypothesis.
There is sufficient evidence to warrant rejection of the
claim that hiring practices are fair.
Copyright © 2004 Pearson Education, Inc.
Example:
Body Temperature
Slide 27
Use the 106 temperatures in Data Set 4 on Day 2 with the
sign test to test the claim that the median is less than
98.6°F. There are 68 subjects with temperatures greater
than 98.6°F, 23 subjects with temperatures less than 98.6°F,
and 15 subjects with temperatures equal to 98.6°F.
H0: Median is equal to 98.6°F.
H1: Median is less than 98.6°F.
Copyright © 2004 Pearson Education, Inc.
Example:
Body Temperature
Slide 28
Use the 106 temperatures in Data Set 4 on Day 2 with the
sign test to test the claim that the median is less than
98.6°F. There are 68 subjects with temperatures greater
than 98.6°F, 23 subjects with temperatures less than 98.6°F,
and 15 subjects with temperatures equal to 98.6°F.
n
(x + 0.5) – 2
z=
n
2
91
(23 + 0.5) – 2 = –4.61
z=
91
2
Copyright © 2004 Pearson Education, Inc.
Example:
Body Temperature
Slide 29
Use the 106 temperatures in Data Set 4 on Day 2 with the
sign test to test the claim that the median is less than
98.6°F. There are 68 subjects with temperatures greater
than 98.6°F, 23 subjects with temperatures less than 98.6°F,
and 15 subjects with temperatures equal to 98.6°F.
We use Table A-2 to get the critical z value of –1.645. We
can see that the test statistic of z = –4.61 falls into the
critical region. We therefore reject the null hypothesis.
We support the claim that the median body temperature
of healthy adults is less than 98.6°F.
Copyright © 2004 Pearson Education, Inc.
Slide 30
Section 12-3
Wilcoxon Signed-Ranks
Test for Matched Pairs
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Definition
Slide 31
The Wilcoxon signed-ranks test is a
nonparametric test that uses ranks of
sample data consisting of matched pairs.
It is used to test for differences in the
population distributions.
Copyright © 2004 Pearson Education, Inc.
Wilcoxon
Signed-Ranks Tests
Slide 32
H0: The two samples come from populations
with the same distribution.
H1: The two samples come from populations
with different distributions.
Copyright © 2004 Pearson Education, Inc.
Procedure for Finding the
Value of the Test Statistic
Slide 33
Step 1: For each pair of data, find the difference d by subtracting the
second score from the first, Keep signs, but discard any pairs
for which d = 0.
Step 2: Ignore the signs of the differences, then sort the differences
from lowest to highest and replace the differences by the
corresponding rank value. When differences have the same
numerical value, assign to them the mean of the ranks involved
in the tie.
Step 3: Attach to each rank the sign difference from which it came.
That is, insert those signs that were ignored in step 2.
Step 4: Find the sum of the absolute values of the negative ranks.
Also find the sum of the positive ranks.
(continued)
Copyright © 2004 Pearson Education, Inc.
Procedure for Finding the
Value of the Test Statistic
Slide 34
Step 5: Let T be the smaller of the two sums found in step 4. Either
sum could be used, but for a simplified procedure we
arbitrarily select the smaller of the two sums.
Step 6: Let n be the number of pairs of data for which the
difference
d is not 0.
Step 7: Determine the test statistic and critical values based on the
sample size, as shown below.
Step 8: When forming the conclusion, reject the null hypothesis if
the sample data lead to a test statistic that is in the critical
region - that is, the test statistic is less than equal or equal
to the critical value(s). Otherwise, fail to reject the null
hypothesis.
Copyright © 2004 Pearson Education, Inc.
Wilcoxon
Signed-Ranks Tests
Assumptions
Slide 35
1. The sample data have been
randomly
selected.
2. The population of differences (found
from
the pairs of data) has a distribution
that is
approximately symmetric, meaning
that
the left half of its histogram is roughly
a
mirror image of its right half. (There is
no
requirement that the data have a
normal
distribution.
Copyright © 2004 Pearson Education, Inc.
Notation
Slide 36
T = the smaller of the following two
sums:
1. The sum of the absolute values
of the negative ranks
2. The sum of the positive ranks
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the
Wilcoxon Signed-Ranks Test
for Matched Pairs
Slide 37
For n  30: T
For n > 30:
z=
T – n(n + 1)
4
n(n +1) (2n +1)
24
Critical values:
n  30, critical T values are in Table A-8
For n > 30, critical z values are in Table A-2
For
Copyright © 2004 Pearson Education, Inc.
Example:
Intelligence in Children
Slide 38
Use the data in Table 12-3 with the Wilcoxon signed-ranks
test and 0.05 significance level to test the claim that there is
no difference between the times of the first and second
trials.
Copyright © 2004 Pearson Education, Inc.
H0: There is no difference between the
Slide 39
times of the first and second trials.
H1: There is a difference between the times of the
first and second trials.
The differences in row three of the table are
found by computing the first time – second
time.
The ranks of differences in row four of the table
are found by ranking the absolute differences,
handling ties by assigning the mean of the
ranks.
The signed ranks in row five of the table are
found by attaching the sign of the differences to
the ranks.
Copyright © 2004 Pearson Education, Inc.
H0: There is no difference between the
Slide 40
times of the first and second trials.
H1: There is a difference between the times of the
first and second trials.
Find the sum of the absolute values of the
negative ranks: 5.5
Find the sum of the values of the positive
ranks: 99.5
T = 5.5 (the smaller of the two sums)
Let n be the number of pairs where d  0, so n =
14. Since n  30, T = 5.5 will be the test statistic.
Using Table A- 8, the critical value will be 21 .
Copyright © 2004 Pearson Education, Inc.
H0: There is no difference between the
Slide 41
times of the first and second trials.
H1: There is a difference between the times of the
first and second trials.
Since the test statistic (T = 5.5) is less
than the critical value of 21, we reject the
null hypothesis (Step 8 of procedures).
It appears that there is a difference
between the times of the first and second
trials.
Copyright © 2004 Pearson Education, Inc.
Slide 42
Section 12-4
Wilcoxon Rank-Sum Test
for Two Independent
Samples
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Wilcoxon Rank-Sum Test
for Two Independent Samples
Slide 43
Definition
The Wilcoxon rank-sum test is a
nonparametric test that uses ranks of sample
data from two independent populations. It is
used to test the null hypothesis that the two
independent samples come from populations
with the same distribution. (That is, the two
populations are identical.)
Copyright © 2004 Pearson Education, Inc.
Key Idea
Slide 44
If two samples are drawn from
identical populations and the
individual values are all ranked as
one combined collection of values,
then the high and low ranks should
fall evenly between the two samples.
Copyright © 2004 Pearson Education, Inc.
Assumptions
Slide 45
1. There are two independent samples that were
randomly selected.
2. Each of the two samples has more than 10
values.
3. There is no requirement that the two
populations have a normal distribution or any
other particular distribution.
Copyright © 2004 Pearson Education, Inc.
Procedure for Finding the
Value of the Test Statistic
Slide 46
1. Temporarily combine the two samples into one
big sample, then replace each sample value
with its rank.
2. Find the sum of the ranks for either one of the
two samples.
3. Calculate the value of the z test statistic
as shown next, where either sample can used
as ‘sample 1’.
Copyright © 2004 Pearson Education, Inc.
Notation for the
Wilcoxon Rank-Sum Test
Slide 47
n1 = size of sample 1
n2 = size of sample 2
R1 = sum of ranks for sample 1
R2 = sum of ranks for sample 2
R = same as R1 (sum of ranks for sample 1)
R
R
= mean of the sample R values that is expected
when the two populations are identical
= standard deviation of the sample R values that is
expected when the two populations are identical
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the
Wilcoxon Rank-Sum Test for
Two Independent Samples
z=
where

R
R
=
=
Slide 48
R – R
R
n1 (n1 + n2 + 1)
2
n1 n2 (n1 + n2 + 1)
12
n1 = size of the sample from which the rank sum R is found
n2 = size of the other sample
R = sum of ranks of the sample with size n1
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the
Wilcoxon Rank-Sum Test for
Two Independent Samples
Slide 49
Critical Values
Can be found in Table A-2 (because
the test statistic is based on the
normal distribution)
Copyright © 2004 Pearson Education, Inc.
Example:
Rowling and Tolstoy
Use the data in Table 124 with the Wilcoxon
rank-sum test and a 0.05
significance level to test
the claim that reading
scores for pages from
the two books have the
same distribution.
Copyright © 2004 Pearson Education, Inc.
Slide 50
Example:
Rowling and Tolstoy
Slide 51
Use the data in Table 12-4 with the Wilcoxon rank-sum test
and a 0.05 significance level to test the claim that reading
scores for pages from the two books have the same
distribution.
H0: The Rowling and Tolstoy books have Flesch Reading
Ease scores with the same distribution.
H1: The Rowling and Tolstoy books have distributions of
Flesch Reading Ease scores that are different in some
way.
Copyright © 2004 Pearson Education, Inc.
Example:
Rowling and Tolstoy
Slide 52
Use the data in Table 12-4 with the Wilcoxon rank-sum test
and a 0.05 significance level to test the claim that reading
scores for pages from the two books have the same
distribution.
R = 24 + 22 + 18 +  + 9.5 = 236.5


R
R
=
=
n1 (n1 + n2 + 1)
2
13 (13+ 12+ 1)
2
= 169
Copyright © 2004 Pearson Education, Inc.
Example:
Rowling and Tolstoy
Slide 53
Use the data in Table 12-4 with the Wilcoxon rank-sum test
and a 0.05 significance level to test the claim that reading
scores for pages from the two books have the same
distribution.
R
=
n1 n2 (n1 + n2 + 1)
12
R
=
(13)(12)(13+ 12+ 1)
= 18.385
12
Copyright © 2004 Pearson Education, Inc.
Example:
Rowling and Tolstoy
Slide 54
Use the data in Table 12-4 with the Wilcoxon rank-sum test
and a 0.05 significance level to test the claim that reading
scores for pages from the two books have the same
distribution.
z=
z=
R – R
R
236.5 – 169
18.385
= 3.67
Copyright © 2004 Pearson Education, Inc.
Example:
Rowling and Tolstoy
Slide 55
Use the data in Table 12-4 with the Wilcoxon rank-sum test
and a 0.05 significance level to test the claim that reading
scores for pages from the two books have the same
distribution.
We have a two tailed test with an  = 0.05, so the critical
values are 1.96 and –1.96. The test statistic of 3.67 falls
in the critical region, so we reject the null hypothesis
that the Rowling and Tolstoy books have the same
reading scores.
Copyright © 2004 Pearson Education, Inc.
Example: Wednesday
and Saturday Rain
Slide 56
Use the data from the Chapter Problem (shown in the
Minitab printout) with the Wilcoxon rank-sum test to test
the claim that the rainfall amounts for Wednesdays and
Saturdays have the same distribution.
Copyright © 2004 Pearson Education, Inc.
Example: Wednesday
and Saturday Rain
Slide 57
Use the data from the Chapter Problem (shown in the
Minitab printout) with the Wilcoxon rank-sum test to test
the claim that the rainfall amounts for Wednesdays and
Saturdays have the same distribution.
H0: The Wednesday and Saturday rainfall amounts come
from populations with the same distribution.
H1: The two distributions are different in some way.
Copyright © 2004 Pearson Education, Inc.
Example: Wednesday
and Saturday Rain
Slide 58
Use the data from the Chapter Problem (shown in the
Minitab printout) with the Wilcoxon rank-sum test to test
the claim that the rainfall amounts for Wednesdays and
Saturdays have the same distribution.
The rank sum is W = 2639.0, the P-value = 0.2773 (or
0.1992 after adjustment for ties). We cannot reject the null
hypothesis. The differences between Wednesday and
Saturday are not significant.
Copyright © 2004 Pearson Education, Inc.
Slide 59
Section 12-5
Kruskal-Wallis Test
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Kruskal-Wallis Test
Slide 60
(also call the H test)
Definition
 The Kruskal-Wallis test is a
nonparametric test that uses ranks of
sample data from three or more
independent populations.
 It is used to test the null hypothesis that
the independent samples come from
populations with the same distribution.
Copyright © 2004 Pearson Education, Inc.
Kruskal-Wallis Test
Slide 61
(also call the H test)
Hypotheses
 H0: The samples come from populations
with the same distribution.
 H1: The samples come from populations
with different distributions.
Copyright © 2004 Pearson Education, Inc.
Kruskal-Wallis Test
Slide 62
We compute the test statistic H, which has a
distribution that can be approximated by the chisquare (2 ) distribution as long as each sample
has at least 5 observations.
Copyright © 2004 Pearson Education, Inc.
Procedure for Finding the
Value of the Test Statistic
Slide 63
 1 Temporarily combine all samples into one big
sample and assign a rank to each sample value.
(Sort from lowest to highest, and in cases of ties,
assign each observation the mean of the ranks
involved.)
 2. For each sample, find the sum of the ranks
and find the sample size.
 3. Calculate H by using results of Step 2 and the
following:
Copyright © 2004 Pearson Education, Inc.
Assumptions
Slide 64
 1. We have at least three independent samples, all
of which are randomly selected.
 2. Each sample has at least 5 observations.
 3. There is no requirement that the populations
have a normal distribution or any other particular
distribution.
Copyright © 2004 Pearson Education, Inc.
Notation for the
Kruskal-Wallis Test
Slide 65
• N
= total number of observations combined
• k
= number of samples
• R1 = sum of ranks for sample 1
• n1 = number of observations in sample 1
• For sample 2, the sum of ranks is R2 and the
number of observations is n2 , and similar
notation is used for the other samples.
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the
Kruskal-Wallis Test
H=
12
N(N + 1)
2
1
2
R
R2
+
+...+
n1
n2
Slide 66
2
Rk
nk
– 3 (N +1)
• where degrees of freedom = k – 1
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the
Kruskal-Wallis Test
Slide 67
Critical Values
1. Test is right-tailed.
2. Use Table A-4 (because the H
test statistic can be approximated by
the 2 distribution).
3. Degrees of freedom = k – 1
Copyright © 2004 Pearson Education, Inc.
Example: Clancy,
Rowling and Tolstoy
Use the data in Table
12-5 with the
Kruskal-Wallis test to
test the claim that
reading scores for
pages from the three
samples have the
same distribution.
Copyright © 2004 Pearson Education, Inc.
Slide 68
Example: Clancy,
Rowling and Tolstoy
Slide 69
Use the data in Table 12-5 with the Kruskal-Wallis test to
test the claim that reading scores for pages from the three
samples have the same distribution.
H0: The populations of the readability scores for pages
from the three books are identical.
H1: The three populations are not identical.
Copyright © 2004 Pearson Education, Inc.
Example: Clancy,
Rowling and Tolstoy
Slide 70
Use the data in Table 12-5 with the Kruskal-Wallis test to
test the claim that reading scores for pages from the three
samples have the same distribution.
n1 = 12
n2 = 12
n3 = 12
N = 36
R1 = 201.5
R2 = 337
R3 = 127.5
Copyright © 2004 Pearson Education, Inc.
Example: Clancy,
Rowling and Tolstoy
Slide 71
Use the data in Table 12-5 with the Kruskal-Wallis test to
test the claim that reading scores for pages from the three
samples have the same distribution.
H=
H=
2
1
12
N(N + 1)
R
R
+
+...+
n1
n2
12
36(36+ 1)
201.5
12
2
k
2
2
2
+
2
337
12
R
nk
– 3 (N +1)
2
+
127.5
12
H = 16.949
Copyright © 2004 Pearson Education, Inc.
– 3 (36 +1)
Example: Clancy,
Rowling and Tolstoy
Slide 72
Use the data in Table 12-5 with the Kruskal-Wallis test to
test the claim that reading scores for pages from the three
samples have the same distribution.
The critical value is 2 = 5.991, which corresponds to 2
degrees of freedom and a 0.05 level of significance. We
reject the null hypothesis of equal means.
Copyright © 2004 Pearson Education, Inc.
Example: Rains
More on Weekends?
Slide 73
Use the Data Set 11 in Appendix B to test the claim that the
seven weekdays have distributions that are not all the
same.
Copyright © 2004 Pearson Education, Inc.
Example: Rains
More on Weekends?
Slide 74
Use the Data Set 11 in Appendix B to test the claim that the
seven weekdays have distributions that are not all the
same.
H0: The populations of the weekday rainfall data are
identical.
H1: The populations of the weekday rainfall data are not
identical.
Copyright © 2004 Pearson Education, Inc.
Example: Rains
More on Weekends?
Slide 75
Use the Data Set 11 in Appendix B to test the claim that the
seven weekdays have distributions that are not all the
same.
The test statistic H = 3.85 (adjusted for ties), and the Pvalue is 0.697. We fail to reject the null hypotheis.
There is not enough evidence to support a claim that the
rainfall amounts on the seven weekdays have
distributions that are not all the same.
Copyright © 2004 Pearson Education, Inc.
Slide 76
Section 12-6
Rank Correlation
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Rank Correlation
Slide 77
Definition
 Rank Correlation uses the ranks of sample data
consisting of matched pairs.
 The rank correlation test is used to test for an
association between two variables
 Ho:  s = 0 (There is no correlation between
the
two variables.)
 H1:  s  0 (There is a correlation between
the
two variables.)
Copyright © 2004 Pearson Education, Inc.
Advantages
Slide 78
1. The nonparametric method of rank
correlation can be used in a wider variety of
circumstances than the parametric method of
linear correlation. With rank correlation, we can
analyze paired data that are ranks or can be
converted to ranks.
2. Rank correlation can be used to detect some
(not all) relationships that are not linear.
3. The computations for rank correlation are
much simpler than the computations for
linear correlation, as can be readily seen by
comparing the formulas used to compute these
statistics. Copyright © 2004 Pearson Education, Inc.
Disadvantages
A disadvantage of rank correlation is its efficiency
rating of 0.91, as described in Section 12-1. This
efficiency rating shows that with all other
circumstances being equal, the nonparametric
approach of rank correlation requires 100 pairs of
sample data to achieve the same results as only 91
pairs of sample observations analyzed through
parametric methods.
Copyright © 2004 Pearson Education, Inc.
Slide 79
Assumptions
Slide 80
1. The sample data have been
randomly selected.
2. Unlike the parametric methods
of Section 9-2, there is no
requirement that the sample pairs of
data have a bivariate normal
distribution. There is no requirement
of a normal distribution for any
population.
Copyright © 2004 Pearson Education, Inc.
Notation
rs =
s =
n =
Slide 81
rank correlation coefficient for sample paired data
(rs is a sample statistic)
rank correlation coefficient for all the population
data (s is a population parameter)
number of pairs of data
d =
difference between ranks for the two values within
a
pair
rs is often called
Spearman’s rank correlation coefficient
Copyright © 2004 Pearson Education, Inc.
Test Statistic for the Rank
Correlation Coefficient
rs = 1
Slide 82
2
6

d
– 2
n(n – 1)
where each value of d is a difference between
the ranks for a pair of sample data
Critical values:
 If n  30, refer to Table A-9
 If n > 30, use Formula 12-1
Copyright © 2004 Pearson Education, Inc.
Formula 12-1
rs =
Slide 83
z
n–1
(critical values when n > 30)
where the value of z corresponds to the
significance level
Copyright © 2004 Pearson Education, Inc.
Figure 12-4
Rank Correlation for Testing H0: s = 0
Start
Slide 84
Are
the n pairs of data
in the form of ranks
?
No
Yes
Calculate the difference d for
each pair of ranks by subtracting
the lower rank from the higher
rank.
Square each difference d and
then find the sum of those
squares to get
Let n equal the
total number
(d2)
of signs.
Complete the computation of
2
rs = 1 – 62d
n(n –1)
to get the sample statistic.
Copyright © 2004 Pearson Education, Inc.
Convert the data of the first
sample to ranks from 1 to n
and then do the same for the
second sample.
Figure 13-4
Rank Correlation for Testing H0: s = 0
Slide 85
Complete the computation of
2
rs = 1 – 62d
n(n –1)
to get the sample statistic.
Calculate the critical values
Is
n  30
?
No
Yes
rs = 
z
n –1
where z corresponds to the
significance level
Find the critical values of rs
in Table A-9
If the sample statistic rs is positive and exceeds the positive critical
value, there is a correlation. If the sample statistic rs is negative and is
less than the negative critical value, there is a correlation. If the
sample statistic rs is between the positive and negative critical values,
there is no correlation.
Copyright © 2004 Pearson Education, Inc.
Example:
Perceptions of Beauty
Slide 86
Use the data in Table 12-6 to determine if there is a
correlation between the rankings of men and women in
terms of what they find attractive. Use a significance level
of  = 0.05.
Copyright © 2004 Pearson Education, Inc.
Example:
Perceptions of Beauty
Slide 87
Use the data in Table 12-6 to determine if there is a
correlation between the rankings of men and women in
terms of what they find attractive. Use a significance level
of  = 0.05.
H 0:  s = 0
H 1:  s  0
n = 10
rs = 1
rs = 1
2
6

d
– 2
n(n – 1)
6(74)
–
2
10(10 – 1)
rs = 0.552
Copyright © 2004 Pearson Education, Inc.
Example:
Perceptions of Beauty
Slide 88
Use the data in Table 12-6 to determine if there is a
correlation between the rankings of men and women in
terms of what they find attractive. Use a significance level
of  = 0.05.
We refer to Table A-9 to determine that the critical values
are 0.648. Because the test statistic of rs = 0.552 does
not exceed the critical value of 0.648, we fail to reject the
null hypothesis. There is no sufficient evidence to
support a claim of a correlation between the rankings of
men and women.
Copyright © 2004 Pearson Education, Inc.
Example: Perceptions of
Beauty with Large Samples
Slide 89
Assume that the preceding example is expanded by
including a total of 40 women and that the test statistic rs is
found to be 0.291. If the significance level of  = 0.05, what
do you conclude about the correlation?
Copyright © 2004 Pearson Education, Inc.
Example: Perceptions of
Beauty with Large Samples
Slide 90
rs =
z
n–1
rs =
 1.96 =  0.314
40 – 1
These are the critical values.
Copyright © 2004 Pearson Education, Inc.
Example: Perceptions of
Beauty with Large Samples
Slide 91
The test statistic of rs = 0.291 does not exceed the critical
value of 0.314, so we fail to reject the null hypothesis.
There is not sufficient evidence to support the claim of a
correlation between men and women.
Copyright © 2004 Pearson Education, Inc.
Example: Detecting
a Nonlinear Pattern
Slide 92
The data in Table 12-7 are the numbers of games played
and the last scores (in millions) of a Raiders of the Lost Ark
pinball game. We expect that there should be an
association between the number of games played and the
pinball score. Is there sufficient evidence to support the
claim that there is such an association?
Copyright © 2004 Pearson Education, Inc.
Example: Detecting
a Nonlinear Pattern
Copyright © 2004 Pearson Education, Inc.
Slide 93
Example: Detecting
a Nonlinear Pattern
H0: s = 0
H1: s  0
n=9
rs = 1
2
6

d
– 2
n(n – 1)
rs = 1
6(6)
– 2
9(9 – 1)
rs = 0.950
Copyright © 2004 Pearson Education, Inc.
Slide 94
Example: Detecting
a Nonlinear Pattern
Slide 95
We use Table A-9 to get the critical values of 0.683. The
sample statistic of 0.950 exceeds the critical value of 0.683,
so we conclude that there is significant correlation. Higher
numbers of games played appear to be associated with
higher scores.
Copyright © 2004 Pearson Education, Inc.
Slide 96
Section 12-7
Runs Test for
Randomness
Created by Erin Hodgess, Houston, Texas
Copyright © 2004 Pearson Education, Inc.
Runs Test for
Randomness
Slide 97
Definitions
Run
A run is a sequence of data having the same
characteristic; the sequence is preceded
and followed by data with a different
characteristic or no data at all.
Runs Test
The runs test uses the number of runs in a
sequence of sample data to test for
randomness in the order of the data.
Copyright © 2004 Pearson Education, Inc.
Fundamental Principles
of the Run Test
Slide 98
Reject randomness if the number of
runs is very low or very high.
Copyright © 2004 Pearson Education, Inc.
Examples
Slide 99
DDDDRRDDDR
4 runs
DDDD
RR
DDD
R
1st run
2nd run
3rd run
4th run
Copyright © 2004 Pearson Education, Inc.
Examples
Slide 100
DDDDDRRRRR
only 2 runs
If the number of runs is very low, randomness is lacking.
DRDRDRDRDR
10 runs
If the number of runs is very high, randomness is lacking.
Copyright © 2004 Pearson Education, Inc.
Assumptions
Slide 101
1. The sample data are arranged according to
some ordering scheme, such as the order
in
which the sample values were obtained.
2. Each data value can be categorized into
one
of two separate categories.
3. The runs test for randomness is based on
the
order in which the data occur; it is not
based
on the frequency of the data.
Copyright © 2004 Pearson Education, Inc.
Notation
Slide 102
n1 = number of elements in the sequence that
have one particular characteristic (The
characteristic chosen for n1 is arbitrary.)
n2 = number of elements in the sequence that
have the other characteristic
G = number of runs
Copyright © 2004 Pearson Education, Inc.
Large Sample Cases
Slide 103
Table A-10 applies when:
1. We are using 5% as the
cutoff for sequences that have
too few or too many runs
2. n1  20, and
3. n2  20
Copyright © 2004 Pearson Education, Inc.
Large Sample Cases
Formula 12-2
Formula 13-3
µ
G

2n1n2
=
+1
n 1 + n2
(2n1n2) (2n1n2 – n1 – n2)
G
=
2
(n1 + n2) (n1 + n2 – 1)
where µG = mean of the runs G
G = standard deviation of the runs G
and the distribution of the number of runs G is
approximately normal
Copyright © 2004 Pearson Education, Inc.
Slide 104
Test Statistic for the
Runs Test for Randomness
Slide 105
If  = 0.05 and n1  20 and n2  20, the test statistic is G.
If   0.05 and n1 > 20 and n2 > 20, the test statistic is
z=
Critical values:
G–

µ
G
G
 If the test statistic is G, critical values are found in Table
A-10
 If the test statistic is z, critical values are found in Table
A-2 by using the same procedures introduced in
Chapter 6.
Copyright © 2004 Pearson Education, Inc.
Figure 12-5
Runs Test for Randomness
Slide 106
Copyright © 2004 Pearson Education, Inc.
Figure 12-5
Runs Test for Randomness
Slide 107
Copyright © 2004 Pearson Education, Inc.
Figure 12-5
Runs Test for Randomness
Slide 108
Copyright © 2004 Pearson Education, Inc.
Example:
Basketball Foul Shots
Slide 109
In the course of a game, WNBA player Cynthia Cooper
shoots 12 free throws. Denoting shots made by “H” and
shots missed by “M”, her results are as follows: H, H, H, M,
H, H, H, H, M, M, M, H. Use a 0.05 significance level to test
for randomness in the sequence of hits and misses.
Copyright © 2004 Pearson Education, Inc.
Example:
Basketball Foul Shots
Slide 110
There are 8 hits, 4 misses, and 5 runs, so we have n1 = 8, n2
= 4, and G = 5. The test statistic is G = 5, and we refer to
Table A-10 to find the critical values of 3 and 10. We do not
reject randomness. There is not sufficient evidence to
warrant rejection of the claim that the hits and misses
occur randomly.
Copyright © 2004 Pearson Education, Inc.
Example: Boston
Rainfall on Mondays
Slide 111
Refer to the rainfall amounts for Boston as listed in Data
Set 11 in Appendix B. Is there sufficient evidence to
support the claim that rain on Mondays is not random?
Copyright © 2004 Pearson Education, Inc.
Example: Boston
Rainfall on Mondays
H0: The sequence is random.
H1: The sequence is not random.
n1 = 33
n2 = 19
G = 30
Copyright © 2004 Pearson Education, Inc.
Slide 112
Example: Boston
Rainfall on Mondays
G =
2n1n2
+1
n1+n2
G =
2(33)(19)
+1
33+19
= 25.115
Copyright © 2004 Pearson Education, Inc.
Slide 113
Example: Boston
Rainfall on Mondays

(2n1n2) (2n1n2 – n1 – n2)
G
=
2
(n1 + n2) (n1 + n2 – 1)
2(33)(19)[2(19)(33) – 33 – 19]

G
=
2
(33 + 19) (33 + 19 – 1)
G = 3.306
Copyright © 2004 Pearson Education, Inc.
Slide 114
Example: Boston
Rainfall on Mondays
z=
z=
G –G

G
30 – 25.115
3.306
= 1.48
Copyright © 2004 Pearson Education, Inc.
Slide 115
Example: Boston
Rainfall on Mondays
Slide 116
The critical values are 1.96, since  = 0.05, and we had a
two tailed test. The test statistic of 1.48 does not fall
within the critical region. We fail to reject the null
hypothesis of randomness. The given sequence does
appear to be random.
Copyright © 2004 Pearson Education, Inc.