Download One-way nonparametric ANOVA with trigonometric scores

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
One-way nonparametric ANOVA
with trigonometric scores
by Kravchuk, O.Y.
School of Land and Food Sciences,
University of Queensland
Inspired by the simplicity of the Kruskal-Wallis ksample procedure, we introduce a new rank test of
the χ2 type that allows one to work with data that
violates the normality assumption, being unimodal
and symmetric but more heavier tailed than the
normal. This type of non-normality is common in
biometrical applications and also describes the
distribution of the log-transformed Cauchy data.
The distribution of the test statistic corresponds to
the distribution of the first component of the wellknown Cramer-von Mises test statistic.
The test is asymptotically most efficient for the
hyperbolic secant distribution that is compared to
the normal and logistic distributions in the diagram
below.
f HSD ( y ) 
1

sech( y )
1
2 y 
f L ( y )  sech  
4
2
 y2 
1
f N ( y) 
exp  
2
 2 
Fig1: Standardised normal, hyperbolic secant
and logistic densities
The test is a one way rank based ANOVA, where
we assume that within the k treatments the
populations are continuous, belong to the same
location family and may differ in the location
parameter only. There are N experimental units,
where the jth treatment accumulates nj units.
The test statistic is built on k “bridges”
corresponding to k linear contrasts of type
T1=(A1<A2,A3). The asymptotic distribution of the
test statistic is the χ2 with k-1 degrees of freedom.
Computationally, the exact distribution is easy to
construct on the basis of k-1 orthogonal contrasts
(for example, for k = 3, T1, 2U3 and T2,3).
Q
Sj 
k
1

2

j 1
 2
2
S j 1 

nj 
,
N
N  1  sin(  / 2 N ) 
 i 
2
sin   cD j ,  


N   / 2N 
2 N i 1  N  j 1
N

i

j 1
j
 1 n j (N  n j )
,
nk  i  n k

N
nj
k 1
k 1
ci  
n j (N  n j )
1

, otherwise
 ( N  n )
N
j


d
Q   k21

2
For small samples (max(nj)<6), the chi-square
approximation is more conservative than the exact
null distribution. The diagram and table below
provide the exact distribution for n1=3, n2=n3=2
Q=q
P(Q<=q)
3.172
0.810
3.511
0.848
3.525
0.867
4.039
0.886
4.311
0.905
4.545
0.924
4.799
0.943
5.238
0.962
5.647
7.359
0.981
1.000
We illustrate the method by an artificial example of
three normal populations different in location only.
The populations are, correspondingly, N(0,1),
N(-1,1), N(2,1). Random samples of size 8 are taken
from these populations.
One-way ANOVA
F = 18.07, p = 0.000
5
4
Kruskal-Wallis
KW = 14.11, p = 0.001
3
2
1
0
-1
-2
-3
N(0,1)
N(-1,1)
N(2,1)
Trigonometric ANOVA
Q=14.32, p = 0.001
Illustrating the procedure… When there is a certain
linear trend among the treatments, the corresponding
bridge tends to have the U-shape. We measure the
strength of such a tendency by the first coefficient of
the Fourier sine-decomposition of the bridge.
S1=1.17
S2=2.58
S3=-3.75
The larger the sample sizes, the smoother the
bridge. The actual shape of the bridge depends on
the difference in location as well as on the
distributions of the underlying populations. If the
difference is large, the shape is strictly triangular
regardless of the underlying distribution and the
median k-sample test works well.
If the difference in location is small, for symmetric,
unimodal distributions, the shape of the bridges is
determined by the tails of the distributions.
Normal
Logistic Hyperbolic secant
Efficiency
0.905
0.986
1.000
The difference in scale among several Cauchy
distributions may be analysed by means of the
current test. To illustrate such an application, we
perform the following ANOVA on the logtransformed Cauchy populations: Cauchy(0,1),
Cauchy(0,5) and Cauchy(0,2). The logtransformation of the absolute values of the data
makes it more normal-like. However, the analysis
of the residuals of one-way ANOVA shows a
departure from normality.
The test allows us to perform the formal analysis
and detect the difference in scale. The KruskalWallis test gives a similar conclusion.
The trigonometric ANOVA on log-transformed
Cauchy… Random samples of size 8 were taken
from the parent populations.
Normal Probability Plot
.999
5
.99
4
.95
Probability
3
2
1
0
.80
.50
.20
.05
.01
-1
.001
-2
-3
-2
Log(C(0,1))
Log(C(0,5))
Average: -0.0000000
StDev: 1.34377
N: 24
One-way ANOVA
F = 5.78, p = 0.01
Trigonometric ANOVA
Q=11.33, p = 0.003
-1
0
1
2
3
RESI1
Log(C(0,2))
Anderson-Darling Normality Test
A-Squared: 0.631
P-Value: 0.088
Kruskal-Wallis
KW = 11.26, p = 0.004
The multiple comparisons and contrasts are to be
further developed for this test.
The two-way test with trigonometric scores is to be
investigated.
The test performances are to be compared to the ksample Cramer-von Mises test.
Olena Kravchuk, LAFS, UQ
[email protected]
(07) 33652171