Download PowerPoint 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
GG 313 Lecture 9
Nonparametric Tests
9/22/05
If we cannot assume that our data are at least
approximately normally distributed - because there are a
small number of elements in the sample, the distribution
is unknown, or the data are ordinal (can only be ranked),
then we must use non-parametric tests to evaluated
hypotheses. These tests do not use the standard
statistics of mean and standard deviation.
SIGN TEST for one-sample mean or median:
Robust replacement for 1-sampe t-test.
Requires that the sampled distribution is continuous and
symmetrical. If the population is not symmetrical, the test
applies to the median rather than the mean.
Sign test: We test whether out mean (or median) is
statistically indistinguishable from a hypothetical value. The
properties of the binomial distribution are invoked. The
question asked is: What is the probability of finding x
values out of n less than the mean (or median), following
directly from eqn. 1.117.
EXAMPLE: Random salinity measurements (P.44). We
hypothesize that the salinity is less than 98.5, so our null
hypothesis is that the salinity = 98.5. We COUNT the
number of values that are greater than 98.5, ignore all
values that equal 98.5, and find that there are two values
greater and one equal out of 15. So we want to know the
probability that 2 values or less out of 14 will be less than
98.5.
From equation 1.117:
n
n
n!
P(x)   p x (1 p) nx where  
x 
x  x!(n  x)!
In our case, p=the probability that an event will be less
than or greater than 98.5 is 1/2, and we want to sum the

probabilities for 0, 1, and 2 occurrences of a number
less than 98.5: (eqn. 2.31)
14  1 0
1
P   ( ) (1 )140
2
0  2
14 1 1
1 141 14 1 2
1
+  ( ) (1 )   ( ) (1 )142
2
2
1  2
2  2
0
14
1
13
14
14! 1   1 
14! 1 1 
14! 1 
P
  1  
   
   0.000061 0.0009  0.0056  .00656
0!14! 2   2  1!13! 2 2  2!12! 2 
This says that the probability that the salinity is less than 98.5
is 0.66%. Since this is less than 1%, we can reject Ho.

If np and n(1-p) are both > 5, we can use the normal
approximation to the binomial distribution, and use
the z statistic:
x  np
2x  n
z

np(1 p)
n
Eqn: 2.32
Mann-Whitney Test
This is a non-parametric alternative to the 2-sample ttest. Matlab and others know it as the WILCOXON
Test.
The data from two samples are tested to see if they
come from the same population. The two samples are
combined, sorted, and then ranked from 1 to n1+n2.
If two or more values are the same, they each get the
average rank of that group. For example if the 8, and 9th
ranked samples each have the same value, they both
get ranked 8.5.
The expectation is that the values from each sample will
be scattered more or less uniformly in the rankings if
they come from the same population.
After ranking, we split the samples apart again and get
the rank sums, W1 and W2 for each sample. The sum of
the rank sums is:
1
W1  W2  n1  n2 (n1  n2 1)
2
This is the sum of integers from 1 to n1+n2.

Eqn: 2.33
We define the U statistic as:
1
U1  n1n 2  n1(n1  1)  W1 and
2
1
U 2  n1n 2  n 2 (n 2  1)  W 2
2
Eqn: 2.34
2.35
U is defined as the smaller of U1 and U2. U varies from 0 to
 n1*n2, and it is symmetrical about n1*n2/2.
Our test is to compare U with the critical U, obtained from
a table. (I have not been able to find the equivalent in
Matlab). The table is presented on the next page:
Example: Grain size of lunar sands. Two samples were
taken at different parts of the moon. Do they come from
different populations?
Move to EXCEL demonstration.
Kolmogorov-Smirnov Test
This test is used to test goodness of fit or shape and can
be used instead of the chi^2 test. With this test, you do
not have to bin the data. It simply needs the maximum
difference between two cumulative distribution curves.
Steps:
• Sort data from smallest to largest.
• Convert the data distribution to a cumulative distribution
S(x). S(x) gives the fraction of the data points that are to
the left of x. The smallest x is 0 and the largest is 1.
• Plot the cumulative distribution along with the
comparison distribution.
•Find the maximum absolute difference.
Using Matlab function “kstest” or a table, we find the K-S
value for our alpha and n, and compare with the
maximum difference we observed.
Non-parametric Correlation
Also called rank correlation or Spearman’s rank
correlation, rs.
• Rank the x and y values seperately.
• Find the difference in rank between xi and yi pairs
• evaluate rs:
rs  1
6 d
2
i
n(n 1)
Eqn: 2.41
If the null hypothesis (no correlation) is true, the distribution
of rs has zero mean and a standard deviation of (n-1)^-.5.
Since this is a normal distribution we can use z- statistics:
r 0
tz s
 rs n 1
1
Eqn 2.41
n 1
Example:
Using the dice throws from earlier: (page 48):
