Download CHAPTER 15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 15
NONPARAMETRIC STATISTICS
Learning Objectives
• Determine situations where nonparametric
procedures are better alternatives to the parametric
tests
• Understand the assumptions of nonparametric tests
• Use one- and two-sample nonparametric tests
• Use nonparametric alternatives to the single-factor
ANOVA
Nonparametric vs. Parametric
• Used an assumption that we are working with
random samples from normal populations
• Called parametric methods
• Based on a particular parametric family of
distributions
• Describe procedures called nonparametric
methods
• Make no assumptions about the population
distribution other than that it is continuous
Why Nonparametric Procedures
• Distributions are not close to normal
• Data need not be quantitative but can be
categorical (such as yes or no, defective or
non defective) or rank data
• Are usually very quick and easy to perform
• Provides considerable improvement over the
normal-theory parametric methods
• Not utilize all the information provided by the
sample
• Requirement of a larger sample size
Which One?
• Which one to choose?
• If both methods are applicable to a particular
problem
• Use the more efficient parametric procedure
• Otherwise, use the non parametric procedure
SIGN TEST
• Used to test hypotheses about the median of a
continuous distribution
• Mean of a normal distribution equals the median
• Sign test can be used to test hypotheses about
the mean of a normal distribution
• Used the t-test in Chapter 9
• Sign test is appropriate for samples from any
continuous distribution
• Counterpart of the t-test
Description of the Test
• Use the following differences
X i  ~o , i  1,2,....n
• Xi is ith the sample observation and ~o is the
specified median value
• Number of plus signs is a value of a binomial
random variable that has the parameter p=1/2
• Reject the H o if the proportion of plus signs is
significantly different from 1/2
Using P-value
• Use the P-value
• If r+ < n/2 the P-value
•
1
P  2 P( R   r  when p  )
2
If r+ > n/2 the P-value
1
P  2 P( R   r  when p  )
2
• If the P-value is less than the significance level
, we will reject H0 and conclude that H1 is true
The Normal Approximation
• Binomial distribution has well approximately a normal
distribution when n >10 and p=0.5
• Mean=np and the variance=np(1-p)
• Test statistics
R   0.5n
Zo 
0.5 n
• Critical region can be chosen from the table of the
standard normal distribution
Sign Test for Paired Samples
• Applied to paired observations drawn from two
continuous populations
• Define the paired difference as
D j  X1 j  X 2 j
j  1,2,.....n
• Test the hypothesis that the two populations have a
common median
~
• Equivalent to  D  0
• Done by applying the sign test to the n observed
differences
Example
• Ten samples were taken from a plating bath used
in an electronics manufacturing process, and the
bath pH was determined.
• The sample pH values are 7.91, 7.85, 6.82, 8.01,
7.46, 6.95, 7.05, 7.35, 7.25, 7.42
• Manufacturing engineering believes that pH has a
median value of 7.0. Do the sample data indicate
that this statement is correct? Use the sign test
with =0.05 to investigate this hypothesis. Find the
P-value for this test
Calculate the differences
•
1.
2.
3.
4.
Use the general procedure covered in
Chapter 8
Parameter of interest is the median of the
distribution of pH
The H 0 : ~  7.0
The H1 : ~  7.0
=0.05
Solution - Cont
•Data and the observed plus signs
i
xi
xi-7
Sign
1
7.91
+ 0.91
+
2
7.85
+ 0.85
+
3
6.82
- 0.18
-
4
8.01
+ 1.01
+
5
7.46
+ 0.46
+
6
6.95
- 0.05
-
7
7.05
+ 0.05
+
8
7.35
+ 0.35
+
9
7.25
+ 0.25
+
10
7.42
+ 0.42
+
5. Test statistic is the observed number of plus differences r+=8
6. Reject H0 if the P-value corresponding to r=8 is less than or
equal to = 0.05
Solution-Cont.
7. Since r >n/2=5, we calculate the P-value by using
the binomial formula with n=10 and p=0.5
• Hence, the P-value = 2P(R+8|p=0.5)
10 
2  (0.5) r (0.5)n r  0.109
r 8  r 
10
Since P=0.109 is not less than = 0.05, we
cannot reject the null hypothesis
8. Observed number of plus signs r = 8 was not
large or enough to indicate that median pH is
different from 7.0
•
Using Table
• Table of critical values for the sign test
• Appendix Table VII is for two-sided and onesided alternative hypothesis
• Let R=min (R+, R-)
• Reject H0
– If r-≤ critical value; if (>) used for H1
– If r+≤ critical value; if (<) used for H1
– If r≤ critical value; if (≠) used for H1
Wilcoxon Signed-rank Test
• Sign test uses only the plus and minus signs of the
differences
• Does not take into consideration the size or magnitude
of these differences
• Uses both direction (sign) and magnitude
• In case of symmetric and continuous distributions
• Test H0 as µ=µ0
Description of the Test
• Compute the following quantities
Xi- 0
• Xi is ith the sample observation i and 0 is the specified
median or mean value
• Rank the absolute differences in ascending order
• Give the ranks the signs
• W+ be the sum of the positive ranks and W- be the sum of
the negative ranks, and let W min(W+,W- )
• Table VIII contains critical values of W
• Reject H0
– If w-≤ critical value; if (>) used for H1
– If w+≤ critical value; if (<) used for H1
– If w≤ critical value; if (≠) used for H1
Large-Sample Approximation
• Large sample size (n>20)
• W  or W- has approximately a normal distribution
• Mean and variance
W

n( n  1)

4
 W2 

n ( n  1)( 2n  1)
24
• Test statistics
Z0 
W   n( n  1) / 4
n ( n  1)( 2n  1) / 24
• Appropriate critical region can be chosen from a table
of the standard normal distribution
Paired Observations
• Applied to paired observations drawn from two
continuous and symmetric populations
• Define the paired difference as
D j  X1 j  X 2 j
• Test the hypothesis that the two populations have a
common mean
• Equivalent to testing that the mean of the differences
D  0
Description of the Test
• Differences are first ranked in ascending order of
their absolute values
• Ranks are given the signs of the differences
• Ties are assigned average ranks
• W+ be the sum of the positive ranks and W- be the
sum of the negative ranks, and let W min(W+,W- )
• Table VIII contains critical values of W
• Reject H0
– If w-≤ critical value; if (>) used for H1
– If w+≤ critical value; if (<) used for H1
– If w≤ critical value; if (≠) used for H1
Example
• Consider the data in the previous example and
assume that the distribution of pH is symmetric and
continuous.
• Use the Wilcoxon signed-rank test with =0.05 to
test the following hypothesis H0: µ=7 vs. H1: µ≠7
Solution
1. Parameter of interest is the mean of the pH
2. H0: µ=7
3. H1: µ≠7
4. α=0.05
5. Test statistic
w=min (w+, w-)
6. Reject H0 if w<w*0.05=8 from Table VIII
Solution – Cont.
7. Signed rank
i
xi
xi-7
Signed Rank
1
7.05
+ 0.05
+ 1.5
2
6.95
-0.05
- 1.5
3
6.82
- 0.18
-3
4
7.25
+ 0.25
+4
5
7.35
+ 0.35
+5
6
7.42
+ 0.42
+6
7
7.46
+ 0.46
+7
8
7.85
+ 0.85
+8
9
7.91
+ 0.91
+9
10
8.01
+1.01
+ 10
•Determine the minimum value of the following
•w+ = ( 1.5 + 4 + 5 + 6 + 7 + 8 + 9 + 10)= 50.5
•w – = ( 1.5 + 3) = 4.5
•Test statistic is w = min (50.5,4.5)
Solution-Cont.
8. Since w=4.5 is less than the critical value
w0.05 =8
• Reject the null hypothesis
WILCOXON RANK-SUM TEST
• Statistical inference for two samples
• Wilcox on rank-sum test is a non parametric
alternative
• Two independent continuous populations X1 and
X2 with means 1 and 2
• Wish to test the following hypotheses
H o : 1  2
H1 : 1  2
• n1 and n2 are sample size
Description of the Test
• Arrange all n1+n2 observations in ascending order of magnitude and
assign ranks to them
• Ties are assigned average rank
• W1 be the sum of the ranks in the smaller sample (1), and define W2
to be the sum of the ranks in the other sample
• Also can be found
(n1  n2 )( n1  n2  1)
W2 
 W1
2
• Table IX contains the critical value of the rank sums for two
significance levels
• Reject H0
– If w2 ≤ critical value; if (>) used for H1
– If w1 ≤ critical value; if (<) used for H1
– If either w1 or w2 ≤ critical value; if (≠) used for H1
Large-Sample Approximation
• When both n1 and n2 are moderately large
• Distribution of w1 can be well approximated by
the normal distribution with the following mean
and variance
n ( n  n  1)
W1  1 1 2
2
 W2 
1
n1n2 (n1  n2  1)
12
• Test statistic
Zo 
W1  w1
w1
• Appropriate critical region can be chosen from
the table
Kruskal-Wallis Test
• Recall the single-factor analysis of variance
model
• Error terms ij were with mean zero and variance
2
• Kruskal-Wallis test is a nonparametric alternative
• Error terms ij are assumed to be from the same
continuous distribution
Description of the Test
• Compute the total number of observations
a
N   ni
i 1
• Rank all N observations from smallest to largest
• Assign the smallest observation rank 1, the next
smallest rank 2, . . . , and the largest observation
rank N
• Rij be the rank of observation Yij
• Ri. denote the total and Ri the. average of the ni
ranks
Test Statistic
• Calculate
a
12
N 1 2
H
n
(
R

 i i. 2 )
N ( N  1) i 1
• H has approximately a chi-square distribution
with a-1 degrees of freedom
• Reject H0 if the observed value h is greater than
the critical value, or
h  X2,a1
• Critical region can be chosen from the Chisquare distribution table depending on whether
the test is a two-tailed, upper-tail, or lower-tail
test
Ties in the Kruskal-Wallis Test
• Observations are tied, assign an average rank
• use the following test statistic
• ni is the number of observations in the ith treatment
• N is the total number of observations
• S2 is just the variance of the ranks
Example 15-7
• Montgomery (2001) presented data from an
experiment in which five different levels of cotton
content in a synthetic fiber were tested to
determine whether cotton content has any effect
on fiber tensile strength. The sample data and
ranks from this experiment are shown in following
Table
• Does cotton percentage affect breaking strength?
Use α=0.01
Solution
• Rank
all observations from smallest to largest
• Assign
Cotton %
7
7
7
9
10
Rank
1
2
3
4
5
Cotton %
11
11
11
12
12
Rank
6
7
8
9
10
Cotton %
14
15
15
17
18
Rank
11
12
13
14
15
Cotton %
18
18
18
19
19
Rank
16
17
18
19
20
Cotton %
19
19
22
23
25
Rank
21
22
23
24
25
average rank (1 + 2 +3)/3 = 2
•Perform the same calculations for the other tied
•
observations
Solution-Cont.
• Data and Ranks for the Tensile Testing
Experiment
• There is a fairly large number of ties
• Use the equation that was defined for the tied
observations
Solution-Cont.
• Thus
• Test statistic
• Since h> 13.28, we would reject the null hypothesis
• Conclude that treatments differ
• Same conclusion is given by the usual analysis of
variance
Next Agenda
• Introduces statistical quality control
• Fundamentals of statistical process control