Download Nonparametric Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 15
Nonparametric Methods(非参数统计)
1
Nonparametric Methods
15.1 The Sign Test: A Hypothesis Test
about the Median(符号检验)
15.2 The Wilcoxon Rank Sum Test
(Wilcoxon符号和检验)
1.1 Nonparametric Tests (非参数检验)
A. One-Sample Mean Test


Many tests are concern with testing some parameter
under a certain distribution.
Test H0 :   0 vs H1 :   0
under a normal population N(  ,  2 )
if  2 is known, the Z-test
Z
X  0
/ n
is recommended, where X is the sample mean and n is
the sample size.
1.1 Nonparametric Tests
B. Two-Sample Mean Tests

Test H0 : 1  2 vs H1 : 1  2
under two respective normal populations N(1 ,  12 )
and N(2 ,  22 ) .
If 1   2 are unknown a t-test is suggested.
Comparing Means of Two Populations
1   2 are unknown

In most cases the variances are unknown.
t
where
( X 1  X 2 )  ( μ1  μ 2 )
1
1 
s 2p   
 n1 n2 
~ t n1  n2 2
(n1  1) s12  (n2  1) s 22
s  pooled variance 
(n1  1)  (n2  1)
2
p
X 1  mean of the sample taken from population 1
s12  variance of the sample taken from population 1
n1  size of the sample taken from population 1
X 2  mean of the sample taken from population 2
s 22  variance of the sample taken from population 2
n2  size of the sample taken from population 2
1.1 Nonparametric Tests



If the data are not normal distributed, the
distribution of the t-statistic is unknown and
depends the distribution of the populations.
There are a huge amount of underlying
distributions.
Can we have some tests that are distribution free?
The nonparametric test is one of such kinds of
tests.
Example 1.1 Delivery times


A local pizza restaurant located close to a college
campus advertises that their delivery time to a college
dormitory is less than for a local branch of a national
pizza chain.
In order to determine whether this advertisement is
valid, you and some friends have decided to order 10
pizzas from the local pizza restaurant and 10 pizzas
from the national chain, all at different times. The
delivery times in minutes PIZZATIME are shown.
Example 1.1 Delivery times
Testing for the difference
in the mean delivery times
Local
16.8
18.1
11.7
14.1
15.6
21.8
16.7
13.9
17.5
20.8
Chain
22.0
19.5
15.2
17.0
18.7
19.5
15.6
16.5
20.8
24.0
Example 1.1 Delivery times


We can use t-test for this comparison if the
delivery times are normal distributed.
Since the distribution of delivery times is not
normal distributed, we might have difficulty to
use the t-test.
Example 1.1 Delivery times

We can consider the following way to compare
these two restaurants
Local
16.8
11.7
15.6
16.7
15.7
18.1
14.1
21.8
13.9
20.8
Chain
22.0
15.2
18.7
15.6
20.8
19.5
17.0
19.5
16.5
24.0
result
+
+
+
-
+
+
+
-
+
+
1.2 Sign Test (符号检验)




If two restaurants have the same level of the
delivery time, there is a half chance for “+” and
another half for “-”.
The number of “+”, denoted by T, follows the
binomial distribution with p=0.5.
The number of “-” also follows the binomial
distribution with p=0.5.
T=8 in this example.
Review: Binomial Distribution


A. Bernoulli trials
A trial with only two outcomes (yes or no, success
or fail, boy or girl, win or loss, 1 or 0) and related
probabilities p and 1-p, is called a Bernoulli trial.
B. Several Bernoulli trials
Let X be the number of success in n independently
identical Bernoulli trials . Random variable is said
to follow a binomial distribution B(n;p).
Review: Binomial Distribution

C. Binomial probability distribution (二项概率分布)
The probability of X=k is given by
n!
p k (1  p ) n  k
k!(n  k )!
P(X  k)  probabilit y of k successes given n and p
P( X  k ) 
where
n  number of observatio ns
p  probabilit y of success
1-p  probabilit y of failure
k  number of successes in the sample (k  0 ,1, ,n)
1.2 Sign Test: Example 1.1
One tailed test
H 0 : p  0.5 vs H1 : p  0.5
P(X  8)  P(X  8)  P(X  9)  P(X  10)  0.055
p  value  P(X  8)  0.055
SPSS result:
result
Group 1
Group 2
Total
B in om i al T es t
Category
1.00
.00
N
8
2
10
Observed
Prop.
.80
.20
1.00
Test Prop.
.50
Exact Sig.
(2-tailed)
.109
1.2 Sign Test: Example 1.1
Two tailed test
H 0 : p  0.5 vs H1 : p  0.5
P(X  8)  P(X  8)  P(X  9)  P(X  10)  0.055
p  value  2P(X  8)  0.11
SPSS result:
B in om i al T es t
result
Group 1
Group 2
Total
Category
1.00
.00
N
8
2
10
Observed
Prop.
.80
.20
1.00
Test Prop.
.50
Exact Sig.
(2-tailed)
.109
Example 1.2: Product Preference

An Italian restaurant, close to a college campus,
contemplated a new recipe for the sauce used on its
pizza. A random sample of eight students was chosen,
and each was asked to rate on a scale from 1 to 10 the
tastes of the original sauce and the propose new one.
The scores of the tests comparison are:
Example 1.2: Product Preference


We can’t use the t-test for this data as the score
is not normal distributed.
The statistic T, the number of “+”, follows
B (7;0.5) as the score of case “G” is zero. This
sample gives T=2 .
1.2 Sign Test: Example 1.2
One tailed test
H 0 : p  0.5
H1 : p  0.5
There is no overall tendency to prefer one
product to the other
A majority prefer the new product (or fewer
than 50% prefer the old product)
p  value  P(X  2)  P(X  0)  P(X  1)  P(X  2)  0.2266
SPSS result:
VAR00005
Group 1
Group 2
Total
B in om i al T es t
Category
.00
1.00
N
5
2
7
Observed
Prop.
.71
.29
1.00
Test Prop.
.50
Exact Sig.
(2-tailed)
.453
1.2 Sign Test: Example 1.2
Two tailed test
H 0 : p  0.5 vs H1 : p  0.5
p  value  2P(X  2)  2  0.2266  0.4532
Also, note that p  value  P(X  2)  P(X  5)  0.4532
SPSS result:
B in om i al T es t
VAR00005
Group 1
Group 2
Total
Category
.00
1.00
N
5
2
7
Observed
Prop.
.71
.29
1.00
Test Prop.
.50
Exact Sig.
(2-tailed)
.453
Review: Binomial Distribution

C. Properties of the binomial distribution

The expectation of B(n;p) is np

The variance of B(n;p) is

The standard deviation of B(n;p) is
np (1  p )
np(1  p)
Review: Binomial Distribution
 D. Normal Approximation (Section 6.4 of the book)
X ~ B(n; p)
 a  np

X

np
b

np

P ( a  X  b )  P


 np (1  p)

np
(
1

p
)
np
(
1

p
)


 a  np

b

np

 P
Z
 np (1  p)

np
(
1

p
)


 b  np   a  np 
 - 

 
 np (1  p)   np (1  p) 

 

where  () is the distribution function of N(0,1)
Example 1.3 Customer Sales
(Example 6.8, p. 213)
A saleswoman makes initial telephone contact with potential
customers in an effort to assess whether a follow-up visit to
their homes is likely to be worthwhile. Her experience
suggests that 40% of the initial contacts lead to follow-up
visit. If she contacts 100 people by telephone, what is the
probability that between 45 and 50 home visits will result?
Solution to Example 1.3: Customer Sales
Solution Let X be the number of follow-up visits. Then X
has a binomial distribution with n=100 and p=0.40.
Approximating the required probability gives
 45  (100)(0.4)

50

(100
)(
0
.
4
)

P(45  X  50)  P
Z
 (100)(0.4)(0.6)

(100
)(
0
.
4
)(
0
.
6
)


 P(1.02  Z  2.04)
 (2.04) - (1.02)
 0.9793  0.8461
 0.1332
This probability is shown as an area under the standard normal
curve below.
Solution to Example 1.3: Customer Sales
Number of Successes
The continuity correction

Since the binomial distribution is discrete and the normal
distribution is continuous, it is common practice to use
continuity correction in the approximation:
 b  0.5  np   a - 0.5  np 
 - 

P (a  X  b )   
 np (1  p)   np (1  p) 

 


Return to Example 1.3
 50  0.5  (100)(0.4) 
 45 - 0.5  (100)(0.4) 



P(45  X  50)  
 
 (100)(0.4)(0.6) 
 (100)(0.4)(0.6) 




 (2.14) - (0.92)
 0.9795  0.8208
 0.1587
1.2 Sign test: normal approximation
  np  0.5n,   np (1  p)  0.5 n
The approximation test-statistic
T*   T*  0.5n
z


0.5 n
where T * corrected for continuity defined as follows:
a. For a two-tail test
T  0.5, if T  
T*  
T  0.5, if T  
b. For an upper tail test
T *  T  0.5
c. For an lower tail test
T*  T  0.5
Example 1.4 Ice Cream
Solution:
Use the normal approximation equations:
Example 1.4 Ice Cream
Z
T*  


40.5  48
 1.53 since 40  48, T*  40.5
4.899
p  value  2  0.0630  0.126
The SPSS output:
B in om i al T es t
VAR00002
Group 1
Group 2
Total
Category
56.00
40.00
a. Based on Z Approximation.
N
56
40
96
Observed
Prop.
.58
.42
1.00
Test Prop.
.50
Asymp. Sig.
(2-tailed)
.125a
1.3 Sign test for single population median
Example 1.5
The dean of the School of Business Administration at a
particular university would like information about the starting
incomes of recent college graduates. A random sample of 23
recent graduates indicated the following starting salaries:
29250 29900 28070 31400 31100 29000 33000 50000 28500 31000
34800 42100 33200 36000 65800 34000 29900 32000 31500 29900
32890 36000 35000
Do the data indicate that the median starting income differs
from $35000?
Solution: H0 : Median  $35000 VS H1 : Median  $35000
Solution to Example 1.5

Since the distribution of incomes is often skewed, the sign
test is recommended. There is a half chance that the
income is greater than $35,000 if the hypothesis is true.
Let T be the number of the income > $35,000.
N=23-1=22 as one data=$35,000. T=17
  np  0.5n  0.5  22  11
  0.5 22  2.345
T  0.5  11
Z
 2.35
2.345
p  value  2  0.0094  0.0188
SPSS output to Example 1.5
B in om i al T es t
VAR00001
Group 1
Group 2
Total
Category
<= 35000
> 35000
N
17
5
22
Observed
Prop.
.77
.23
1.00
Test Prop.
.50
Exact Sig.
(2-tailed)
.017
1.4 Wilcoxon Rank Sum Test

Two population identical test
Take a sample of size n1 from the first population, F1 (x)
and a sample of size n 2 from the second population, F2 ( x )

We Want to test
H0 : F1  F2 vs H1 : F1  F2
1.4 Wilcoxon Rank Sum Test


The sign test does not use all the information
from the data set.
The sign test for the delivery time in Example 1.1
ignores the time length. The Wilcoxon rank sum
test provides a method to incorporate information
about the magnitude of the differences between
two populations.
1.4 Wilcoxon Rank Sum Test


Two samples are pooled and sorted them in ascending
order.
Let T denote the sum of the ranks of the observations
from the first population.
Wilcoxon Rank Sum Test: Example 1.1
Sort the Local data
11.7, 13.9, 14.1, 15.6, 16.7, 16.8, 17.5, 18.1, 20,8, 21.8
 Sort the Chain data
15.2, 15.6, 16.5, 17.0, 18.7, 19.5, 19.5, 20,8, 22.0, 24.0
 Sort the mixed data

Rank
Local
1
2
3
11.7 13.9 14.1
Chain
Rank
Local
Chain
4
12
13
6
7
15.6
15.2
11
5
14
17.5 18.1
16.5
16
17
20.8
18.7 19.5 19.5
9
10
16.7 16.8
15.6
15
8
20.8
17.0
18
19
20
21.8
22.0 24.0
Wilcoxon Rank Sum Test: Example 1.1

Sum of the rank
Tlocal  1  2  3  5.5  8  9  11  12  16.5  18  86

Test-statistic
T  Tlocal  86

Normal approximation
10(10  10  1)
E (T ) 
 105,
2
86  105
Z
 -1.4363,
175
Var (T) 
10 10(10  10  1)
 175
12
p  value  2  0.0755  0.151
SPSS output to Example 1.1
R an ks
time
group
local
chain
Total
N
10
10
20
Mean Rank
8.60
12.40
Sum of Ranks
86.00
124.00
T es t S ta ti s ti c sb
time
31.000
86.000
-1.438
.150
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
a
Exact Sig.
.165
[2*(1-tailed Sig.)]
a. Not corrected for ties.
b. Grouping Variable: group
Example 1.6
Example 1.6
Solution:
80(80  80  1)
E (T ) 
 6440
2
80  80(80  80  1)
Var (T) 
 85867
12
7287  6440
Z
 2.89
85867
p  value  2  0.0019  0.0038