Download Choosing the Appropriate Statistical Test with your Computer

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Choosing the
Appropriate Statistical Test
with your Computer
Volume One
Parametric/Non-Parametric Tests
P.Y Cheng
Preface
Volume 1, 2 and 3 of ‘Choosing the approximate statistical test with your computer’ are the
newest Books I have written so far, and the characteristics of these new books are:
1. They try to help readers to choose the approximate statistical test for their data, which is a
very important starting point for their success!
2. As in previous books, we try to skip difficult theories and try to demonstrate the using of
different tests with clear and typical examples! We hope readers can, at the first time,
follow the examples in this book for choosing the approximate test for their data, even if
they don’t want to learn too much about underlying theories immediately!
3. For ordinal and interval data, we try to compare the parametric tests with their
corresponding non-parametric ones one by one, and show when we should use which of
them.
4. For categorical data, we try to clarify the very similar and rather confusing types of testing.
For example, we remind readers not to be confused by number of criteria and number of
samples. We also remind users that the calculators would not differentiate whether we are
running a test for independence or a test for homogeneity, only we human knows about it!
5. As in other books published previously, we introduce free web calculators corresponding
to large statistical software e.g. SPSS, so the readers can still get the same results even
without expensive packages!
Acknowledgments
Firstly of all, I would like to thanks Prof. C.M. Wong and Dr L.M. Ho who give me the
braveness to start writing. They have been consultants for many staff of the University of Hong
Kong and are always helpful for any HKU staff approaching them for asking statistical problems!
I would also like to thank everybody who has contributed for the publishing of this book!
This include the publishing company (which is still no known yet), the authors of reference books
or internet material that has helped me a lot during the writing and checking process!
I would like to thank my friends and relatives who have been encouraging me during the
publishing of my books, especially my son (Andy) and my wife (Betty) who have shown their
patient and understanding when I concentrate on the production of this new book!
Lastly, I would like to thank, in advance, any future audience of this book and hope they can
find some useful materials for helping them to solve statistical problems. Also hope they can enjoy
reading this book in full color, with hundreds of brilliant pictures and many ‘cook book’ examples!
Medical Faculty, The University of Hong Kong Cheng Ping Yuen (Senior Technician) Bachelor of Life Science (BSc), Napier University, UK Master of Public Health (MPH), Hong Kong University Certificate of Hong Kong Statistics Society (HKSS) Fellow of British Royal Statistician Society (RSS) Microsoft Certified Professional (MCP) Hong Kong Registered Medical Technologist (Class I) Phone : (852) 9800 7023 / 3917 9417 email : [email protected] Content 1.1 Some basic concepts
1.1.a
1.1 b
1.1.c
1.1.d
1.1.e
Definition of Parametric and Non-Parametric Tests……………………...P. 1
Type of measurements (Data Type)………………………………………P. 4
Independence of samples………………………………………………….P. 5
Numbers of samples……………………………………………………....P. 5
Summary of Parametric & Non-Parametric Tests…………………………P.6
1.2 The most important distributions for using Parametric Tests
1.2.a The Normal Distribution……………………….………………………...P. 7
1.2.a.i Distribution of Sample Means…………………………………..P. 8
1.2.a.ii The standard normal distribution from sample means………….P. 9
1.2.a.iii The t Distribution from sample means…………………………P.10
1.2.a.iv Testing of Hypothesis (Significance)…………………………..P.11
1.2.b The Binomial Distribution………………………………………………..P.15
1.2.c The Poisson Distribution………………………………………..……….P. 17
1.2.d Before giving up parametric tests………………………………………..P. 19
1.2.d.i The Central Limit Theorem…………………………………….P. 19
1.2.d.ii The Normal approximation to other distributions……………..P. 20
1.2.d.iii Robustness to deviation from distribution assumptions………P. 23
1.3 Running Parametric Tests vs
corresponding Non-Parametric Tests
1.3.a.i One sample T –Test………………………………………………...…P. 24
1.3.a.ii Wilcoxon signed rank test
(corresponding to one sample t test)……………………………P. 32
1.3.b.i Two Samples T Test…………………………………………………..P. 38
1.3.b.ii Mann-Whitney Test for two independent samples
(Corresponding to parametric two independent samples t test)……..P. 49
1.3.c.i Paired T-Test………………………………………………………….P. 55
1.3.c.ii Wilcoxon matched-pairs signed-rank test for paired samples
(Corresponding to parametric paired t test…………………………...P. 63
1.3.d.i One-way Anova: Completely Randomised Design
– Equal Sample Size ………………….P. 72
1.3.d.ii Non-parametric independent k samples
- Kruskal-Wallis one-way analysis of variance…………..P. 86
1.3.e.i Two Factors Anova (a X b factorial)…………………………………..P. 93
1.3.e.ii Non-parametric paired k samples - Friedman two-way
analysis of variance…………………………………..….P.103
1.4 Table Form Non-parametric Tests for Categorical Data
1.4.1 One sample Chi-square Test
1.4.1.a A Goodness of fit test with one sample…………………….P. 112
1.4.1.b A Test of Independence with one sample………….……….P. 116
1.4.1.b.i 2 x 2 contingency table for one sample…………….P. 116
1.4.1.b.ii Independence test for
k x k contingency table for one sample…………….P. 120
1.4.2 Two sample Chi-square Test
1.4.2.a Independent Samples
1.4.2.a.i Chi-square test for two samples……………………..P. 123
1.4.2.a.ii Chi-square test for k > 2 samples …………………..P. 127
1.4.2. b Dependent Samples
1.4.2.b.i McNemar Test for two dependent
(paired) samples…………..P. 135
1.4.b.ii Cochran Q Test for k dependent samples…………...P. 136
Appendix – Installation of free statistical software
- Tables
Choosing the
Appropriate Statistical Test
with your Computer
Volume One
Parametric/Non-Parametric Tests
1 1.1 Some basic concepts
1.1.a Definition of Parametric vs Non-Parametric Tests
Parametric Tests
When applying a Parametric Test in statistics, we assume that the samples come from a
population with well known distributions such as Normal Distribution, Binomial
Distribution, Poisson Distribution, especially for Normal Distribution. The assumptions
support the theories behind the Parametric Tests. If the population is too far away from the
assumptions (e.g. Normality, equal variance), the test results might not be valid anymore!
Anyway, they are called Parametric Tests since they are used for detecting population
parameters such as mean, proportion or variance, with the null hypothesis involving them!
The most famous parametric tests including T Test, Anova Test and linear regression.
2 Non-Parametric Tests
On the other hand, non-parametric tests do not need the population to have a well
known distribution. It might only just require the population to be e.g. having a continuous
distribution, having a median, or being symmetric etc. etc.
Unlike parametric test, they are usually not used to detected parameters of populations.
However, they can be used to detect e.g. whether 2 samples come from the same
population, or whether a sample agree with some frequency theories.
Famous non-parametric tests includes table tests such as Chi square test, Contingency
test, and rank comparing tests such as Wilcoxon rank sum test, Mann-Whitney test etc..
** Remark – If the condition for running parametric tests is fulfilled (e.g. Normality, equal
variance, data type…), please always prefer parametric tests to non-parametric
tests! Parametric tests utilizes more information from data, the results is more
reliable and convincing i.e. they are more powerful than non-parametric ones!
3 1.1.b Type of measurement (Data Type)
What type of distribution the population could be, on the other hand, is often
determined by what type of measurement being taken, thus affecting the choice between
parametric or non-parametric tests to be run!
Nominal
e.g. Red/Bright/Green
Ordinal
Tasteful (1 – 10)
Categorical
Interval
Height (cm)
Continuous
1. For nominal data e.g. color, gender etc., the NO parametric test is available! Table form
non-parameter tests such as Chi square test and Contingency test would be used instead!
2. For ordinal data, or for interval data that well known distributions cannot be assumed
(especially for normal distribution), rank comparing test such as Wilcoxon signed rand test,
Mann-Whitney test and Kruskal-Wallis test would be used for detecting e.g. whether the
samples come from the same population!
3. For interval data fulfilling conditions for population assumption e.g. Normality, equal
variance etc., more powerful parametric tests such as T Test, Anova Test and linear
regression should be used.
4 1.1.c Independence of samples
For BOTH parametric and non-parametric test, independence would determine the choice
of tests to be used for inference of hypothesis.
Independent samples – Observations relate to independent groups of individuals,
such as weight from boys and girls
Dependent samples - Each set of observations is made on the same groups of
individuals e.g. blood pressure before and after a certain
treatment. The observations are made on the same individuals
and usually represent the change over time.
1.1.d Numbers of samples
The number of samples and variables would also determine which test should be used,
BOTH in parametric and in non-parametric groups of tests!
This is because for > 2 samples, we cannot just repeat the 2 sample tests, otherwise the
risk of committing type 1 error would increase rapidly!
5 1.1.e Summary of Parametric and Non-Parametric Tests
Tests for independent samples:
Parametric Tests
Interval:
One Sample t test
Corresponding Non-parametric Tests
Ordinal/Interval:
Nominal (parametric tests n/a)
One sample Wilcoxon
signed rank test
Two Samples t test
Two samples Wilcoxon
rank sum test, Mann-Whitney test
k Samples
One way Anova
k samples Kruskal-Wallis
One way Anova
Linear Regression
Non-Parametric Regression
One sample Chi-square test
Two sample Chi-square test,
relative risk, odds ratio
k sample Chi-square test
Tests for dependent (paired) samples:
Parametric Tests
Corresponding Non-parametric Tests
Interval:
Ordinal/Interval:
Nominal: (parametric tests n/a)
Two sample Paired t test
2 sample Wilcoxon
matched signed-rank test
(Many people just call it
Wilcoxon signed-rank test,
Careful!)
Two samples McNemar Test
k Samples Two Way Anova
k samples Friedman
Two Way Anova
k samples Cochran Q Test
6 1.2 The most important distributions for using parametric tests
1.2.a The Normal Distribution
The normal distribution is the most important distribution in Statistics, not only because so many
natural phenomena (e.g. weight, height, class mark, IQ score…) follow this distribution, but also
because the possibility of making use of it for solving problems of many other statistical
distributions!
The function for possibility density of a Normal distribution is:-
The curve is, thus, determined by two parameters: 1. the population mean µ,
and 2. the population standard deviation σ
(or σ2, the variance)
7 1.2.a.i Distribution of Sample means
If we can always measure each individual of a population e.g. height of all children born in 1995
in UK, then we might not need to run statistical tests to get conclusion from them! However, it is
usually impossible or, too costly, to make such a measurement! We usually take a sample from the
population, run a statistical test with the sample data, and based on distribution and probability
theories to get a conclusion of whether accepting a hypothesis or not! This is also called the
interpretation of the population using a sample.
The following is a sample of height from the population (e.g. children born in UK in 1995):-
Image we can measure infinitive number of sample means (although we usually would not,
practically, do so) and plot the frequent histogram, we can get a distribution of sample means:
8 1.2.a.ii The standard normal distribution from sample means
If the population is normal and the variance is known :
Then the random variable
is exactly standard normal (Mean = 0, S.D. = 1), no
matter how small the sample size is.
All the four distributions above are normal distributions, but only the GREEN one is a
Standard Normal One with µ = 0 and σ2 = 1 (σ = 1) !
Where : x is the mean of the sample
µ is the mean of the population
σ is the known standard deviation of population
n is the sample size
s is the standard deviation calculated from sample
9 1.2.a.iii The t Distribution from sample means
If the population is normal and the variance is unknown.
The random variable
has exactly a t-distribution (Mean = 0, S.D. approaches 1 as n
increased) with n-1 degrees of freedom, no matter how small the sample size is. ‘s’, the
sample variance, is used instead of a population variance!
Please notice that when the underlying population is normal, we can apply t-distributions for
statistical tests no matter how small (degree of freedom) is!! A t-distribution is similar with
the z-distribution in that they are both bi-symmetric and of dell shape, but the central peak is
lower and the two tails are higher! As df (N-1) increase, it would become more and more like a
z-distribution! At df 120, we might say that there is almost no difference at all
10 1.2.a.iv Testing of Hypothesis (Significance)
With the z distribution (Standard Normal Distribution) and the t distribution, under which
the area under them being well known (either from tables or by using computer), we can
then carry out testing of hypothesis-using samples to interpret the underlying population!
A probability of 0.05 is usually used as the critical probability for testing of hypothesis!!
Z Distribution (z test):-
Area = 0.25% Area = 0.25% If the mean and the variance of a population are known, then we can run a normal test
with a sample using the z distribution (standard normal distribution).
For example, an education department want to know whether the average mark of
students on Mathematics this year is the same as pass years (mean = 80 and S.D. = 5).
A random sample of 25 students is taken, marks measured with mean = 83
Null hypothesis
H0: Mean of this year = Mean of pass years
Alternative hypothesis: Ha: Mean of this year =\= Mean of pass
z=
= 83-80 / 5/√25
= 3/1
=3
z = 3 >> 1.96
The probability of getting z > 1.96 or < -1.96 by change only (sampling error) =
0.05.
Thus the probability of getting such a high z value of 3 by chance only is << 0.05!!
The sample mean is significantly different from the population mean used for z test!
So we reject the Null Hypothesis that the average mark being same as pass years!
11 T Distribution (t test):-
df = ∞ Df = 24
Df =12 2.179 ‐2.179 2.064 ‐2.064 2.5
Suppose the S.D. of the underlying population of student marks in the above section
is unknown, and the sample variance calculated from sample data being 6 instead
of being 5, then:t=
= 83-80 / 6√25
= 3/1.2
= 2.5
t = 2.5 >> 2.064 (from table : df = 24 , 5% probability of same population mean)
The probability of getting t > 2.064 or < -2.064 by change (sampling error) = 0.05
Thus the probability of getting such a high t value of 2.5 by chance only is << 0.05!!
The sample mean is significantly different from the population mean used for t test!
So we reject the Null Hypothesis that the average mark being same as last year!
(Please remember that, to make use of the t-distribution for the calculation of
probability above, we assume that the underlying population being normal, and the
t-distribution curves are useful no matter how smaller the sample size being used!)
12 One-Tailed Test and Two-Tailed Test
Not significate Significate One‐tailed Test critical value for 5% chance Two‐tailed Test critical value for 5% chance In (a), we test for whether the sample mean is greater than or smaller than the population mean ,
such that the probability of getting a z value is totally 0.05 in probability, i.e. 1.96 on each side!
The probability on each side is 0.025 only!
In (b), we just test for whether the sample mean is greater than the population mean, such that the
probability of getting a z value is 0.05 in probability i.e. 1.651 on the right hand extreme only!
This also implies that the rejection of the Null Hypothesis (that there is no real difference in means)
is easier to achieve with a double fold of chance!! One tailed significance = two tailed significance
divided by 2!
As for the case in the graph above, the test for the difference between sample mean and the
population mean is not significant in a two-tailed test (z=1.8 < .196), but being significant in
the one-tailed test (z=1.8>1.651)!!
13 Type I Error and Type II Error
We might say, by default, type I error would be the error we try to avoid Firstly!
This is the error to say that there is difference between two groups while there is, in fact, no!
(Suppose having difference is a crime, type I error is the error to sentence a person to have
committed the crime, while he, in fact, has not!)
If we accept the null hypothesis the there is no difference, then we would not have the risk of
committing the type I error. However, we would then immediately be under the risk of
committing the type II error i.e. saying that there is no difference while there is, in fact,
yes! (Saying that a person has not committed a crime while he, in fact, has committed)
1 ‐ β Range of t‐values that would make us commit the Type II error i.e. saying there is no difference while there are, in fact, two curves existing! Critical t‐value for rejecting the null hypothesis that there is no real difference between the two groups (only one curve)! In the graph above, using α/2 as the critical point, we would not reject the null hypothesis that
there is no real difference between the 2 populations, while t is less than 2! We accept the null
hypo-thesis since we don’t want to committee the Type I error (sentencing a person that he has
committed a crime while he is ignore)! We think there is only ONE curve (Red) existing!!
However, if there is, in fact, real difference between the two populations (two curves existing),
then we would have committed the Type II error (let the accused person go while he has
committed the crime) already!! The range of t values that would make us commit such a mistake
is shown in the graph above! The probability that we would commit such an error is the area
represented by β! This probability would depend on ‘how different’ you would say that there is a
real different. β would be important for calculation of ‘Power’ (1-β) and sample size N.
We would take about calculation of Sample Size and Power in the later sections!
14 1.2.b The Binomial Distribution
The binomial distribution describes the behavior of a count variable X if the following
conditions apply:
1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes ("success" or "failure").
4: The probability of "success" p is the same for each outcome.
If these conditions are met, then X has a binomial distribution with parameters n and p,
abbreviated B(n,p).
Example
Suppose individuals with a certain gene have a 0.70 probability of eventually contracting a certain
disease. If 100 individuals with the gene participate in a lifetime study, then the distribution of the
random variable describing the number of individuals who will contract the disease is distributed
B(100,0.7).
Note: The sampling distribution of a count variable is only well-described by the binomial
distribution is cases where the population size is significantly larger than the sample size. As a
general rule, the binomial distribution should not be applied to observations from a simple
random sample (SRS) unless the population size is at least 10 times larger than the sample size.
To find probabilities from a binomial distribution, one may either calculate them directly, use a
binomial table, or use a computer. The number of sixes rolled by a single die in 20 rolls has a
B(20,1/6) distribution. The probability of rolling more than 2 sixes in 20 rolls, P(X>2), is equal to
1 - P(X<2) = 1 - (P(X=0) + P(X=1) + P(X=2)). Using the MINITAB command "cdf" with
subcommand "binomial n=20 p=0.166667" gives the cumulative distribution function as follows:
15 Binomial with n = 20 and p = 0.166667
x
0
1
2
3
4
5
6
7
8
9
P( X <= x)
0.0261
0.1304
0.3287
0.5665
0.7687
0.8982
0.9629
0.9887
0.9972
0.9994
The corresponding graphs for the probability density function and cumulative distribution function
for
the
B(20,1/6)
distribution
are
shown
below:
Since the probability of 2 or fewer sixes is equal to 0.3287, the probability of rolling more than 2
sixes = 1 - 0.3287 = 0.6713.
Mean and Variance of the Binomial Distribution
The binomial distribution for a random variable X with parameters n and p represents the sum of
n independent variables Z which may assume the values 0 or 1. If the probability that each Z
variable assumes the value 1 is equal to p, then the mean of each variable is equal to 1*p + 0*(1p) = p, and the variance is equal to p(1-p). By the addition properties for independent random
variables, the mean and variance of the binomial distribution are equal to the sum of the means
and variances of the n independent Z variables, so
These definitions are intuitively logical. Imagine, for example, 8 flips of a coin. If the coin is fair,
then p = 0.5. One would expect the mean number of heads to be half the flips, or np = 8*0.5 = 4.
The variance is equal to np(1-p) = 8*0.5*0.5 = 2.
16 1.2.c The Poisson Distribution
The Poisson distribution arises when you count a number of events across time or over an
area. You should think about the Poisson distribution for any situation that involves counting
events. Some examples are:




the number of Emergency Department visits by an infant during the first year of life,
the number of pollen spores that impact on a slide in a pollen counting machine,
the number of incidents of apna and bradycardia in a pre-term infant.
The number of white blood cells found in a cubic centimeter of blood.
Sometimes, you will see the count represented as a rate, such as the number of deaths per year
due to horse kicks, or the number of defects per square yard.
The Poisson distribution depends on a single parameter λ. The probability that the Poisson random
variable equals k is
for any value of k from 0 all the way up to infinity. Although there is no theoretical upper bound
for the Poisson distribution, in practice these probabilities get small enough to be negligible when
k is very large. Exactly how large k needs to be before the probabilities become negligible depends
entirely on the value of λ.
Here are some tables of probabilities for small values of λ.
λ
0
1
2
3
0.1 0.905 0.090 0.005 0.000
λ
0
1
2
3
4
5
0.5 0.607 0.303 0.076 0.013 0.002 0.000
λ
0
1
2
3
4
5
6
7
8
1.5 0.223 0.335 0.251 0.126 0.047 0.014 0.004 0.001 0.000
For larger values of λ it is easier to display the probabilities in a graph.
17 The plot shown above illustrates Poisson probabilities for λ = 2.5.
The above plot illustrates Poisson probabilities for λ = 7.5.
and this plot illustrates Poisson probabilities for λ = 15.
The mean of the Poisson distribution is λ. For the Poisson distribution, the variance, λ, is the
same as the mean, so the standard deviation is √λ.
Binomial Distribution vs Poisson Distribution
Poisson Distribution can often be regarded as a Binomial Distribution with n is very large
and p being very small! In fact, sometimes we can impose such a Binomial Distribution with a
Poisson one to save much labor of computation!
18 1.2.d Before giving up parametric tests
Although choosing parametric or non-parametric tests depends mainly on whether the
population fulfills the assumptions of underlying distributions, especially for Normal or t
distribution, we should try our best to stick to using parametric tests although the population might
be deviating from the assumptions! This is because they are more powerful and the results are
more reliable and convincing.
1.2.d.i The Central Limit Theorem
When : The population is not normal and the variance may or may not be known:The random variable
or the random variable
(the one used depending on
whether the variance is known or unknown) is approximately standard normal if the
sample size is sufficiently large (at least thirty).
 The is also call the Central Limit Theorem!
Please notice that the sample size N must be equal or greater than 30 for applying this Central
Limit Theorem, with that we can apply approximation of z distribution for running
statistical tests!
Where : x is the mean of the sample
µ is the mean of the population
σ is the known standard deviation of population
n is the sample size
s is the standard deviation calculated from sample
19 1.2.d.ii The Normal approximation to other distributions
(Strictly speaking, it is just a outcome of the Central Limit Theorem)
Approximation of a Binomial distribution by a Normal distribution
As stated previously, the normal distribution is so important that it can solve many problems not
only for so many natural phenomena in the natural world, but it can also be used for solving many
other problems by superimposing on other distributions!
Let’s see the following binomial distributions before any medicine is available for a disease
and patient just recover by bed resting, the chance of recovery is 0.4 (fail to recover is 0.6) :
20 As you can see, for these binomial distributions, as N increases, the more the histogram would
become a normally distribution one! In fact, the more p approaches 0.5, the less N need to be so
that a normally distribution can well superimpose on the histogram of the binomial distribution!
Generally, we can carry out the approximation when Np and Nq are both greater than 5! In the
other way, N = 5/p and round up or N = 5/q and round up, depending on which is smaller!! For
example, for 0.4 and 0.6, we have 5/0.4 = 12.5, round up to N=13, for carrying out the
approximation!
If the superimpose of the normal distribution on the binomial one is alright, the later one can be
treated as a normal distribution with mean = Np (=20*0.4=8 in this case) and Std. Dev = √Npq
(= √20*0.4*0.6) = 2.19 in this case)!
21 Approximation of a Poisson distribution by a Normal distribution
A Poisson() has a mean and standard deviation given by:
From Central Limit Theorem, as  gets large:
Equation 2 for the Poisson distribution method can then be rewritten:
(1)
Figure 1: Example of Equation 1 estimate of  where  = 40, t = 4
Figure 2: Example of Equation 3 estimate of  where  = 2, t = 4
Equation 1 works nicely in Figure 1 for large  (40):  is the measure of the amount of data, whereas t is
just a scaling factor. Figure 2 shows the normal distribution approximation is less useful for smaller  (2):
now Equation 1 is completely inaccurate, assigning considerable confidence to negative values, and fails
to reflect the asymmetric nature of the uncertainty distribution
22 1.2.d.iii Robustness to deviation from distribution assumptions
T test
Overall, the two sample t-test is reasonably power-robust to symmetric non-normality (the true
type-I-error-rate is affected somewhat by kurtosis, the power is impacted mostly by that).
When the two samples are mildly skew in the same direction, the one-tailed t-test is no longer
unbiased. The t-statistic is skewed oppositely to the distribution, and has much more power if the
test is in one direction than if it's in the other. If they're skew in opposite directions, the type I error
rate can be heavily affected.
Heavy skewness can have bigger impacts, but generally speaking, moderate skewness with a twotailed test isn't too bad if you don't mind your test in essence allocating more of its power to one
direction that the other.
In short - the two-tailed, two-sample t-test is reasonably robust to those kinds of things if you can
tolerate some impact on the significance level and some mild bias.
Anova and linear regression
Anova is robust to normality, but probably being more sensitive to equality of variance! However,
if the sample size of different groups are equal, or nearly equal, you can choose the Tukey’s Pos
Hoc Test for finding inter-groups difference, since it is robust to deviation from equal variance,
with equal sample size!
Please notice that Anova is just a case of linear regression and a T test is just a Anova of 2 groups!
Same as in T test, it might be probably fine in most cases, if your data are somewhat symmetrical
and skewness does not occur in opposite directions for your study groups.
23 1.3 Running Parametric Tests vs corresponding Non-Parametric Tests
1.3.a.i One sample T -Test
The vendor of a new medicine claimed that it can yield a depression score below 70
after applying for 2 weeks to the patients! A sample of 25 patients has been chosen
to take the new medicine and the depression score is taken after two weeks. (The
underlying population is an unlimited average of samples of 25 patients)!
1) The result scores, in SPSS, are:-
24 2) Analysis, Compare Means, One Sample T Test…
3) You would see, move Dep_Score to Test Variable(s), input 70 as Test Value
4) Click ‘OK’
25 Results for One Sample T Test in SPSS :-
4a) One-Sample Statistics Results:
Sample Size N Sample Mean x Sample Std. Dev. σ
26 σm = σ/√N = 4.748/√25
Value to be compared with 4b) One Sample Test Results:
t value calculated by degree of freedom = N‐1 ‐3.833 Probability of t > 3.833 or < ‐3.8333 by chance < 0.05, reject H0 that the means are equal! Difference between sample mean (66.36) and hypothesis mean (70) 70 ‐5.60 = 64.4 70 ‐1.68 = 68.3 95% confidence for the mean to fall between these 2 limit 99.99% Conclusion:
The 25 patients taking the new medicine have a depression score of mean 66.36 and SD 4.784. A
t value of -3.833 is obtained, which is significant even for a 2 tailed test! We can reject the Null
Hypothesis H0 that the sample mean is same as the comparing value of 70! The vendor might be
right that their new medicine can cure patients to obtain a depression score different from 70!
27 One-Tailed Test for example above:
The computer output above is a 2 tailed test output! For a 2 tailed test:
H0 : The population mean =70
Ha: The population mean =\= 70
In a 1 tailed test:
H0 : The population mean >=70
Ha: The population mean < 70
We just want to decide whether the population mean of depression score would be less than 70,
without considering that whether it would be greater than 70 in average:-
NOTHING NEED TO BE CHANGED FOR THE RUNNING OF THE 2 TAILED TEST
ABOVE! What you need to do is to know above how to interpret the same computer output.
For rejecting the Null Hypothesis:-
Step 2: divide the significance by 2, as the 1 tailed test would produce a probability 2 times less that a 2 tailed test for obtaining critical t – values! Step 1: t must be negative in this case that the negative tail is being tested for, and must be positive if the positive tail is being tested for! 28 One Sample T-Test using Excel (with PHstat4 Add-In)
(For installation of PHStat4 Add-In, please refer to Appendix: Installation of Free Software)
1) PHStat, One-Sample Tests, t-test for the Mean, sigma unknown…
29 2) Input information for running a 2 tailed test:
3) Results nearly same as when using SPSS above,
differences might due to rounding off issue:
t = 3.833 when using SPSS above! 2 times p‐Value in 1 tailed test 30 4) Input information for running a 1 tailed test:
5) Results nearly same as when using SPSS above,
differences might due to rounding off issue:
t = 3.833 using when using SPSS above! shifting of critical value towards central axis! 0.5 times p‐Value in 2 tailed test 31 1.3.a.ii Wilcoxon signed rank test (corresponding to one sample t test)
The only assumptions for running Wilcoxon signed rank sum test are:
1) The population is continuous
2) The population has a median
3) The population is symmetric
Running the Wilcoxon signed rank sum test in SPSS:
For example, we have got a data set as following:
9, 11, 18, 16, 17, 21, 12, 10, 11, 11, 19, 16, 12, 13. 20, 14, 15, 13
We want to test the hypothesis:
Ho: Median = 16
Ha: Median =\= 16
1) Data in SPSS:
32 2) Analysis, Nonparametric Tests, One Sample…
3) Click ‘Assign Manually’:
33 4) Move Data to ‘Continuous’:
5) Click ‘OK’
34 6) You would go back to the previous window, select the test again:
7) Select ‘Automatically compare ….’, Click ‘Settings’
2 1 35 8) Select ‘Choose Tests’, ‘ Customize tests‘, Compare median …. (Wilcoxon signed-rank
test)’ Input 16:
9) Choose ‘Test Options’, input ‘Significance level’ and ‘Confidence interval’,
use default is no need to change:
36 P 0.066 > 0.05, can’t reject the hypothesis that the population median is equal to 16. But 0.066/2 = 0.033 < 0.05, sign. for 1 tailed test! 10) Results:
Calculation by hand if SPSS is not available:
i) Subtracting 16 from each observation, we get -7, -5, 2, 0, 1, 5, -4, -6, -5, -5, 3, 0, -4, -3,
4, -2, -1, -3
ii) Discarding the zeros and ranking the others in order of increasing absolute magnitude,
we have 1, -1, 2, -2, 3, -3, -3, -4, -4, 4, -5, 5, -5, -5, -6, -7
iii) The ‘1’s occupy ranks 1 and 2; the mean (average) of these ranks is 1.5; and each ‘1’
is given a rank of 1.5
iv) The ‘2’s occupy ranks 3 and 4; the means of these ranks is 3.5; each 2 is given a rank
of 3.5.
v) In a similar manner, each ‘3’ receives a rank of 6; each ‘4’ a rank of 9; each ‘5’ and
rank of 12.5; the ‘6’ is assigned a rank of 15; and the ‘-7’ a rank of 16.
vi) The sequence of the ranks is now 1.5, -1.5, -3.5, 3.5, 6, -6, -6, -9, -9, 9, -12.5, 12.5,
-12.5, -12.5, -15, -16. ( ‘-‘ indicate negative ranks).
vii) The positive rank sum = 32.5
The negative rank sum = 103.5
Take the smaller rank sum is taken as T = 32.5
viii) In the table for Wilcoxon signed-rank test, find, in the column headed by the value
α = 0.05, n = number of ranks = 16 (18-2), critical value = 29. 32.5 is not less
than or equal 29, so we have to accept the null hypothesis that the median is 16!
For one tailed test, α = 0.01, critical value = 35, we can reject the null hypothesis!
37 1.3.b.i Two Samples T Test
This is also called ‘independent t test’, meaning that the two samples would not affect each
other in measuring of the data values. Two t-distributed populations are tested by using two
samples from each of them.
The basic question is: How different are the two means below different such that the chance
of getting the critical t-value would be less than 5% (one tailed) or 2.5
% (two-tailed)?
Mean of Treatment Group Mean of Control Group For example, 25 patients are chosen for taking a traditional medicine for treating depression
(Group 1- control), and another 25 patients are also chosen for taking the new medicine (Group
2) ! The Depression Score is taken after 2 weeks and being input to SPSS as:-
38 1) In SPSS, Analysis, Compare Means, Independent Samples T Test…
2) Move ‘Dep_Score’ into Test Variable(s)’ and Group into ‘Grouping Variables’:
39 3) Click ‘Define Groups’
4) Input the values 1 and 2 form definition of Groups
5) Click ‘OK’
40 6) SPSS output:
7a) Groups Statistics
Group 2 has a lower mean and Std. Dev. than Group 1 41 7b) T Test results
B
A
C
Part A - Test for assumption of ‘equal variance’ and t value obtained
Levene's Test for Equality of
Variances
F
Dep_Score Equal variances
assumed
t-test for Equality of Means
Sig.
t
.009
7.441
Equal variances not
df
.674
48
.674
38.928
assumed
An analysis of variance test has been run for testing the hypothesis that the variance of the two groups are equal or not! The larger the value of F, the higher the chance that their variance being different! Sign. = 0.009 < 0.05, meaning that the variance of the two groups being significantly different! This imply that equal variance could not be assumed and we should use the ‘equal variance not assumed’ data e.g. t, df …instead! 42 t value and df calculated under the assumption that the variance of the 2 groups being different! Separated variance instead of pooled variance is used! Part B – Significance, Mean Difference, and Std Error Difference:
t-test for Equality of Means
Std. Error
Sig. (2-tailed)
Dep_Score
Mean Difference
Difference
Equal variances assumed
.503
1.260
1.868
Equal variances not assumed
.504
1.260
1.868
2 tailed probability = 0.504 > 0.05, we cannot reject the Null Hypothesis that the two population means are equal! Sample Mean of Group 1 – Sample Mean of Group 2 = 67.62‐62.36
Part C) 95% Confidence interval of the Difference
t-test for Equality of Means
95% Confidence Interval of the Difference
Lower
Dep_Score
Upper
Equal variances assumed
-2.497
5.017
Equal variances not assumed
-2.520
5.039
We have 95% confidence to say that the difference between the two groups (Mean of Group One – Mean of Group Two) would fall between the value ‐2.520 and 5.039! 43 One-Tailed Test:
As stated previously, there is no need to make any changes for running the test!
Just make sure that whether a positive tail or a negative tail you are testing, and
see whether you can get a significate test result after a double fold increase of
chance! For example, if you just want to test whether the new medicine can produce
a lower depression score in Group 2, this also means the whether Group 1 can
produce a higher score that Group 2! Then we can testing the positive tail Mean of
Group 1 – Mean of Group 2, and follow step 1 and step 2 below:
Levene's Test for Equality of
Variances
F
Dep_Score Equal variances
t-test for Equality of Means
Sig.
7.441
assumed
t
.009
Equal variances not
assumed
df
.674
48
.674
38.928
Step 1 : t‐value being positive instead of being negative t-test for Equality of Means
Std. Error
Sig. (2-tailed)
Dep_Score
Difference
Equal variances assumed
.503
1.260
1.868
Equal variances not assumed
.504
1.260
1.868
Step 2: 0.504/2 = 0.252 Still > 0.05, the 1 tailed test still can’t find sign. Difference between the 2 groups! 44 Mean Difference
Running the 2 samples test with Excel
(For implanting ‘Data Analysis Tools’ Add-In of Excel, please refer to Appendix: Installation
of Free Software)
1) DATA, Data Analysis
45 2) Test for Assumption of equal variance
3) Input required information
4) Results show that the equal variance assumption is not hold
0.006 << 0.05, the variance of the two groups are sign. Different! 46 5) Choose ‘t-test: Two-Sample Assuming Unequal Variance
6) Input required information: The ‘Hypothesis Mean Difference’ is a very useful item!
If we just want to test whether there is any difference in mean of two group, just leave
it blank (meaning zero)! If we want to test whether the 2 groups have a particular value
of difference, just fill in the value:
Leave blank means ‘being zero’! If a particular value of difference is to be tested, please input here!! 47 7) Results similar to results from SPSS, the small difference might due to
rounding off issue:
Sample as in SPSS (two tailed) Two tailed and One tailed Test Results from PH4Stat4 Excel Add-In:
Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation 25
67.6
8.0052
25
66.36
4.7511
Intermediate Calculations Numerator of Degrees of Freedom 12.0150
Denominator of Degrees of Freedom 0.3077
Total Degrees of Freedom 39.0416
Degrees of Freedom 39
Standard Error 1.8618
Difference in Sample Means 1.2400
Separate‐Variance t Test Statistic 0.6660
Two‐Tail Test Lower Critical Value Upper Critical Value p‐Value Do not reject the null hypothesis ‐2.0227
2.0227
0.5093
48 Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation Upper‐Tail Test Upper Critical Value p‐Value Do not reject the null hypothesis 25
67.6
8.0052
25
66.36
4.7511
Intermediate Calculations Numerator of Degrees of Freedom 12.0150
Denominator of Degrees of Freedom 0.3077
Total Degrees of Freedom 39.0416
Degrees of Freedom 39
Standard Error 1.8618
Same as in SPSS Difference in Sample (two tailed) Means 1.2400
Separate‐Variance t Test Statistic 0.6660
1.6849
0.2547
1.3.b.ii Mann-Whitney Test for two independent samples
(Corresponding to parametric two independent samples t test)
*Should get identical results for Wilcoxon rank sum test for 2 samples)
The assumptions for running Mann-Whitney test are:
1) The populations are continuous
2) The population have a median
3) The populations must have the same form
For example, we have 2 samples with scores that are heavily skewed and we don’t want to
use t test for comparing them! We prefer to use a non-parametric test, the Mann-Whitney Test.
Sorry the problem with SPSS:
1) Input Data
49 2) Analyze, Nonparmetric Tests, Independent Samples…
3) Choose default
50 4) Choose Tests, check ‘Mann-Whitney U (2 samples)
5) Test Options, Choose default if not change, click ‘Run’
51 6) Results:
0.349 > 0.005, we can’t reject the hypothesis that the two samples are come from two populations with the same median i.e. they are from two identical populations! Wilcoxon Rank Sum Test for two independent samples with PHStat4
(Expected to have the same results as the Mann-Whitney Test above)
It is interesting to know about that the Mann-Whitney test above and the Wilcoxon Rank
Sum Test for two independent samples would get the same answer of p value, and there is no
differences which test you run!
So, if you don’t have SPSS, you can use the free Excel Add-In PHStat4 to run a Wilcoxon
test to get the same conclusion!
1) Run PHStat by clicking the icon on Desktop, Enable Macro
52 2) Input Data
3) ADD-IN, PHStat, Two –Sample Tests…, Wilcoxon Rank Sum Test…
53 4) Input required information, click ‘OK’
5) Results:
Very close to p‐value 0.349 found in SPSS with Mann‐
Whitney Test! Difference might due to rounding off! 54 1.3.c.i Paired T-Test
Paired T-Test would be used when e.g. the sample subjects would be measured on two time points,
or pairs of twins are studied under experiments etc., etc. The key point is that we assume there is
a particular relationship existing such that the data values measured would not be independent
from each other!
Simply speaking, it is an analysis of the differences from each pair of data:
t = d–0
/s
d
55 / √nd
For example, a company running two shops want to know whether there would be real difference
of income between them. Suppose using SPSS:-
1) Analyze, Compare Means, Paired-Samples T Test…
56 2) Put Shop_1 under Variable1 and Shop_2 under Variable2
3) Click ‘OK’
4) Output:
57 Enlarged pictures:
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Difference
Mean
Pair
Shop_1 -
1
Shop_2
Mean of Shop 1 – Mean of Shop 2 -190.900
Std.
Std. Error
Deviation
Mean
265.334
Std. Dev. Of the difference Lower
83.906
-380.709
Std. Error of Mean of the difference Upper
-1.091
95% confidence that the difference would fall into this range Paired Samples Test
t
Pair 1
Shop_1 - Shop_2
df
-2.275
Sig. (2-tailed)
9
.049
t (df = 9) 0.025
0.025 t = ‐2.275 ‐2.262 Sign. < 0.05, the result is significance! Reject the hypothesis that the difference between the income of the two shop = 0! Shop 2 have an income different from shop 2. 2.262 58 Two-Tailed Test
Paired Samples Test
t
Pair 1
Shop_1 - Shop_2
-2.275
Sig. (2-tailed)
9
Step 2: Divided this probability value by two and see whether it can be < 0.05 for rejecting the null hypothesis! 0.049/2 = 0.0242, reject null hypothesis and accept the alternative hypothesis that Shop 1 have income lower that Shop 2. Step 1: Make sure the +/‐ sign is agreed with the hypothesis you want to test for i.e. Shop_1 > Shop_2 or Shop 2 > Shop_1! If opposite, then no need to test anymore. If agreed, go to Step 2! 0.05 t = ‐2.275 ‐1.833
59 df
.049
Solved by Excel
1) Data, Data Analysis
2) Choose ‘t-test: Paired Two Samples for Means
60 3) Input required fields:
Besides zero, we can test for hypothesis that variable 1 is different from variable 2 by a certain value!
4) Results is almost the same as in SPSS:
61 Output of Excel Add-In PHStat4:
62 1.3.c.ii Wilcoxon matched-pairs signed-rank test for paired samples
(Corresponding to parametric paired t test)
Running Wilcoxon matched-pairs signed-rank test with SPSS
Suppose ten students has taken a mathematics training course! Testing is carried out before and
after the course and score is recorded. The marks are found to be heavily skewed previously and
not being suitable to be tested by t-test etc.!
1) Input Data
2) Analyze, Nonparametric Tests, Related Samples
63 3) Choose Assign Manually:
4) Seeing:
5) Click OK:
64 6) Go back to:
7) Click Field:
65 8) Seeing:
9) Click ‘Settings’:
66 10) Choose Tests, check ‘Wilcoxon matched-pair…’
11) Test Options, use default if not change:
67 12) Results:
0.404 > 0.05 result being significant even for this two tailed test. Accept the null hypothesis that the median, and thus the two populations, are identical! IF YOU DON’T HAVE SPSS
If you don’t have SPSS, please try to find a free calculator in the internet for this Wilcoxon
matched-pairs signed-rank test (by using google search etc.)!
e.g.: http://www.socscistatistics.com/tests/signedranks/
1) Check that the test is for Wilcoxon Matched paired…, click ‘Take me to the calculator’:
68 2) Seeing, input data:
69 3) After input Data:
70 4) Result:
5)
Ignore ‘Result 1 – Z‐value’ due to N being too small (9). Refer to ‘Result 2 – W‐value’, W‐value is 15.5, the result is NOT significant at P<= 0.05, for a two tailed test! 71 1.3.d.i – One-way Anova: Completely Randomised Design – Equal Sample Size
• This is one of the most common and basic experimental design
• The equal sample size make the Anova test very robust for ‘not
too serious’ violation of assumption of NORMALITY.
*Moreover, Tukey Post Hoc test is appropriate for this equal size case, since it would be
robust for violation of the assumption of EQUAL VARIANCE.(Post Hoc means ‘unplanned
before the experiment’.)
‐ 5 cages each with 4 rats were used for a ‘Completely Randomized Design’
Experiment.
‐ The 20 rats had been assigned to the treatments A (control), B, C and D
totally randomly by e.g. blinded researcher drawing animal numbers,
without concerning any cages boundaries!
‐ The response was a ‘score’ after the 4 ‘treatments’ e.g. a growth in body
weight after a certain period of time.
‐ Please find any Significant Differences in this score caused by the four
treatments :-
72 73 Solved by using SPSS ver. 20.0
1. Input or ‘cut and paste’ data
2. Click ‘Variable View’.
74 3. Edit variable names and decimal places etc.
4. ‘Analyze’, ‘Compare Means’, ‘One-way Anova’.
75 5. Move Score to ‘Dependent List’ and ‘Treatment’ to ‘Factor’.
6. Enter ‘Options’ and ‘Post Hoc’ for choosing related options and tests.
‘Contrasts’ would be left along for discussion in Book 2.
76 7. In ‘Options’, choose e.g. ‘Descriptive’ and ‘Homogeneity of variance
test’. (‘Welch’ would be useful when the distribution of population is
not normal or unknown, for test of equal variance assumption.)
8. In ‘Post Hoc’, choose e.g. LSD, Tukey and Scheffe, in an order of
‘toughness’ for detection of significant differences among groups!
77 9. Click ‘OK’ to run.
10. SPSS result output:A - The command lines for all tests are shown first.
B - The ‘Descriptives’ function calculate the basic parameters such as
Mean and Std Deviation of the samples.
C – The ‘Sig.’ of 0.969 > 0.05 in the Test of Homogeneity of Variance
indicate that the assumption of equal variance is OK
(*However, for such a few num of subjects, the chance of failing the
test is very low. We had better either trust the population being
normal, or use non-parametric tests for equal variance e.g. Welch!)
D - For the overall Anova, a ‘Sig.’ Of 0.000 < 0.05 implies that the
null hypothesis of equal means would be rejected. At least one
Group Difference exists among the groups – Multiple Comparisons
should be used for finding out where it is!
78 A
B F: df (3,16) C
D 13.594
3.329
P = 0.05
F value =
=
.
.
= 13.594
Critical F value for df (3,16) from table = 3.239 (0.05), F > Critical F,
at least two groups are different in their group means!
(Although the computer can do everything for you, you should also
learn about this important step used in various Anova analysis!)
79 80 11. See results of Tukey Test as example, particularly in this equal
sample size case! A ‘Sig.’ <0.05 indicates a Group Difference. The
results shows that Group 1, 2 different from Group 3, 4!
12. Imagined picture:
Group 1 Group 2 Group 3
Group 4
81 Solved by using Excel
(CRD – Equal Sample Size) Solved by Excel a) Tukey’s Test
1. Click ‘Data’, ‘Data Analysis’.
2. Choose ‘Anova: Single Factor’ = One-way Anova
82 3. Input required fields and data area (dotted line), click ‘OK’.
4. Overall Anova results:-
p-value = 0.00015 < 0.05 means we can reject the null hypothesis. Go
ahead for founding inter-groups differences.
5. Finding of the value of ‘q’, k = 4, df = 16, q = 4.05 (0.05):83 6. Finding ‘Critical Difference’ and significant differences-
84 7. Counter Checking with SPSS figures – Although same groups of sign.
Differences are found, this is still not enough to say that the results are
exactly the same. However, we can counter check the 95% Confidence
Interval of SPSS!
8. Comparison of Tukey’s Test results in Excel and in SPSS:
85 1.3.d.ii Non-parametric independent k samples - Kruskal-Wallis one-way
analysis of variance
Suppose the government want to compare the number of traffic accidents
in 3 districts in 10 Sundays. Experience shows that the data can be quite skew
without being normal distributed at all!
1) Enter Data
86 2) Analyze, Nonparametric Tests, Independent Samples…
3) Seeing, click Assign Manually…
4) Move Traff_Acc to Continuous, District to Nominal
87 5) Choose Default : Automatically compare…..
6) Fields, move Traff_Acc to Test Fields, District to Groups
88 7) In Setting, Choose Tests, choose ‘Kreskal-Wallis 1-way ANOVA (k samples)’
8) In Test Options. Choose significance and confidence interval
89 9) Results:
Sign. = 0.047 < 0.05. Reject the null hypothesis that the frequency of traffic accidents in the 3 districts has been the same during the passed 10 years! Solved by PHStat4 Excel Add-In
1) Click icon on desktop etc.
90 2) Click ‘Enable Macro’:
3)Add-In, PHStat, Multiple Sample Tests, Kruskal-Walli Rank Test…
91 1) Seeing:
5) Input information required:
92 6) Results:
Sign. same as in SPSS above! 1.3.e.i Two Factors Anova (a X b factorial)
- Each subject is randomly chosen into each ‘combination’ of the two factors under
investigation, both factors are of interest.
- Calculation is the same as in the previous block design, No Post Hoc test would be
run and all interaction is assume to be due to chance only.
- 5 cages each with 4 rats have been used for a ‘Completely Randomized TwoFactors (a X b factorial) Without Replication Design’ Experiment.
- Each of the 20 rats had been assigned randomly to be subjects for the
‘combinations’ of factor one and two (Diet A,B,C,D X Lighting 1,2,3,4,5 =
20). The response is a ‘score’ after the twenty ‘treatments’ e.g. the growth in
weight within a certain period of time.
- Please find any Significant Differences caused by the two factors.
93 Diet A ‐ Control Diet B
Lighting 1 ‐
Control Solved by SPSS
1. Data input to SPSS.
94 Diet C Diet D 2. Click ‘Variable View’.
3. Give variable names, decimal places etc.
95 4. ‘Analyze’, ‘General Linear Model’, ‘Univariate’.
5. Put ‘Weight_Increase’ under ‘Dependent Variables’, ‘Diet’
under ‘Fixed Factor(s)’ and ‘Light’ under ‘Random Factor(s)’.
Post Hoc tests would not run even chosen.
96 6. The ‘Sig.’ for Diet is 0.006 < 0.05, indicating that Diet is a factor
causing sign. difference between at least two of the treatment
groups. The ‘Sig’. for Light is 0.369 > 0.005, we cannot reject the
null hypothesis that the group means under this factor is equal.
Post Hoc Test wouldn’t run and no way to calculate interaction
being sig. or not!
97 98 7. You might say why not put ‘Light’ into ‘Fixed Factor(s)’, since
we are interested in it also??
8.
Although ‘light’ seems to be a fixed factor that the researcher would be interested in its
effect, assigning it as a ‘fixed factor’ in SPSS would not give the meaningful values of F! It is
because that under the situation without any replication, and the interaction is due to chance
only, ‘light’ should be treated more likely a ‘Blocking factor’ in the Blocked Design rather
than a ‘fixed factor’ in a axb Factorial Design with replication. Please refer to Part 4 for more
details about ‘fixed and random factors’.
99 Solved by Excel
1. Click ‘Data’, ‘Data Analysis’.
2. Choose ‘Anova: Two-Factor Without Replication’.
100 3. Input requested fields and select data area (dotted line).
101 4. Anova Results:Although upside down, the overall anova results is same as from SPSS,
indicating the ‘Diet’ is a sign. Factor while ‘Lighting’ is not. At least
two groups different in their means. Again, no way to run Post Hoc tests
due to no replication.
Lighting Diet = interaction in SPSS
The ‘Error’ and ‘df’ under Interaction =0, Not shown!! Post Hoc Tests wouldn’t run. 102 1.3.e.ii Non-parametric paired k samples - Friedman two-way
analysis of variance
Suppose six patients being handled by three different doctors in the pass 9 years,
each doctor by about 3 years, and average frequency of admission to the hospital
has been recorded. Please compare the frequency of admission and see whether it
is the same for all the 3 doctors!
1) Input data
2) Analyze, Nonparametric Tests, Related Samples…
103 3) Click ‘Assign Manually…’
4) Moving as below:
104 1) Again, Nonparametric Tests, Related Samples:
2) Use Default:
105 3) Moving as below:
8) Under Choose Test, choose ‘Friedman’s 2-way ANOVA by ranks (k
samples)
106 9) Under Test Options, choose significance and confidence interval
10) Results:
Sign. = 0.016 < 0.05. Reject the null hypothesis that the number of admission to hospital under the 3 doctors are different! 107 Solved by XLStat
1) Run XLStat2015
2) Proceed with the trial version
3) Input Data:
108 4) Choose as following:
5) Seeing:
109 6) Input required information
7) In ‘Options’, input Significance level etc
110 Seeing, Continue
9) Results (Sample as using SPSS above):
111 1.4 Table Form Non-parametric Tests for Categorical Data
For categorical data, there are no parametric tests or rank comparing
non-parametric tests to be used! There is mostly no choice but to use table
form non-parametric tests for making statistical testing on the data obtained.
The most important table form test is the Chi-square (ӽ2) test that use
the statistic ӽ2 and the Chi-square Distribution for the interpretation of the
categorical data. Commonly used Chi-square test including the Goodness of
fit test that test whether the sample follows some probability theories, testing
of homogeneity and testing of independence etc.
The number of samples taken and whether the data are paired or not
are important for the choosing of which version of test to be used!
1.4.1 One sample Chi-square Test
1.4.1.a A Goodness of fit test with one sample
A sales contact 5 potential customers every day and for 100 days she make a record of
making sales as :
Number of sales:
Frequency:
0
15
1
21
2
40
3
14
4
6
5
4
Her boss feels that the chance of making a sales with a call is about 35%, and a binomial
distribution of b (y; 5, 0.35) would have the following probabilities :
y
0
1
2
3
4
5
p( y)
0.1160
0.3124
0.3364
0.1812
0.0487
0.0053
e = 100p( y)
11.60
31.24
33.64
18.12
4.87
0.53
112 Solved by SPSS
113 1) Data Input
15 ‘ 0 ‘ 21 ‘ 1 ‘
40 ‘ 2’ so on 114 2) Analyze, Nonparametric Tests, Legacy Dialogs, Chi-square
3) Input values of Expected Values, if not ‘All categories equal’
115 4) Input Expected Values until finished:
5) Run the test and get the following results:-
= 10.4108 in hand calculation above < 0.05, the null that the selling follows the binomial distribution b (y; 5,0.35) is rejected! 116 1.4.1.b A Test of Independence with one sample
1.4.1.b.i 2 x 2 contingency table for one sample
This might be one of the most commonly used test for categorical data in our daily life! The
expected values are determined by marginal totals assuming that the proportions are evenly divided
by the number of cells in a 2 x 2 table
(Please don’t mix up the ‘number of samples’ with the ‘number of criteria’ in a sample! One
sample can have > 1 criteria thus forming a 2 x 2, 2 x k or k x k contingency table. For > 1 sample,
we are taking different samples from different populations or groups, can also be handled with one
contingency table!)
A football coach wants to know whether the winning or losing of a game is independent of
whether the game is played at home or away, so he runs a test of independence test with:
H0: Winning is independent of where the game is played
Ha: Winning is dependent of where the game is played
Checking the results of the pass 30 years:
Observed values:
Home
Away
Total
Won
97
69
166
Lost
42
83
125
Total
139
152
291
From marginal Totals:
Expected values:
Home
Away
Won
79.3
86.7
Lost
59.7
65.3
117 Since ӽ2 0.05, 1 = 3.841, the null hypothesis is rejected and there is evidence that winning and
losing is dependent of where the game is played!
Solved by Free Web Tools
There are many many free Chi-square calculator in the internet that can
calculate the answer as above, e.g.
http://www.socscistatistics.com/tests
118 Home Away Win Lost
119 1.4.1.b.ii Independence test for k x k contingency table for one sample
For example, we have a sample of people with different hair and eye color, and we want to test
for whether the two criteria are independent or not:-
Observed and expected (bracketed) values:
Hair Color
Red
Blue 12 (8.58)
Brown 24 (27.82)
Black 16 (15.6)
Total 52
Eye Color
Golden
18 (13.70)
39 (44.41)
26 (24.9)
83
Black
36 (43.73)
151 (141.78)
88 (79.5)
265
H0: Eye Color and Hair Color are independent of each other
Ha: Eye Color and Hair Color are dependent of each other
http://www.socscistatistics.com/tests
120 Total
66
214
120
400
Please notice that, in test for independence, these values are found only after taking the single sample! In test for homogeneity, the values is designed before taking sampleS!!
121 122 We cannot reject the null hypothesis that eye color and hair color are independent
to each other!
1.4.2 Two sample Chi-square Test
1.4.2.a Independent Samples
1.4.2.a.i Chi-square test for two samples
Sometimes we want to know whether 2 or more samples come from the same population and we
might even don’t know is the distribution of the population!
For examples, if a doctor wants to know whether the male/female proportion in disease A and
disease B is the same or not:Disease A
Disease B
Total
Male
32
28
60
Female
18
22
40
Total
50
50
100
123 Solved by Free Web Tools
** Please notice that the web calculator cannot differentiate whether you are
calculating for a one sample 2 x 2 table for independence or a two samples test for
homogeneity, only we human know about this!
http://www.socscistatistics.com/tests
124 125 Fisher exact probability test
Fisher exact test is more accurate than Chi-square test when the sample size is too small,
especially when one of the expect value is less than 5.
We don’t want to discuss the underlying theory of this test here, which involve features
like e.g. hypergeometric distribution, we would just show you how to simply get the correct
probability (i.e. events just happen by chance) by using the following free web tool:
http://www.danielsoper.com/statcalc3/calc.aspx?id=29
Suppose we have the following 2 X 2 table :
2 5
7
3 1
4
5 6
11
126 1.4.2.a.ii Chi-square test for k > 2 samples
** Please, again, notice that the web calculator cannot differentiate whether you are
calculating for a one sample k x k table for independence or a k samples test for
homogeneity, only we human know about this!
For example, a wine company want to learn more about a new band of wine about how it is favored
by drinkers of different countries. 100 Chinese, 100 Americans and 100 Europeans are invited for
a tasting study. The ranking of tastefulness is 1 to 4, with 1 being most favour:Tastefulness
1
2
3
4
Total
42
26
19
13
100
55
21
14
10
100
38
30
22
10
100
135
77
55
33
300
Chinese
American
Europeans
Total
Solved by Free Web Tools
Using:
http://www.socscistatistics.com/tests
127 128 129 If you are confused why ordinal data above is tested by chi-square test
instead of rank comparing rest e.g. Kruskal-Wallis
We can try e.g. (For convenience we has lessen the data set)
1
Tastefulness
3
2
4
Total
4
3
3
10
20
3
5
8
4
20
5
4
5
6
20
12
12
16
20
60
Chinese
American
Europeans
Total
130 If we run Kruskai-Wallis one-way in SPSS:
131 132 133 A samiliar, but lower significance comparing with 0.404 is found. This is reasonable since the request for having significance for this test is stricter comparing with the chi‐
square test for homogeneity above! 134 1.4.2. b Dependent Samples
1.4.2.b.i McNemar Test for two dependent (paired) samples
Solved by Free Web Tools
For example, in a study a test is performed before treatment and after treatment in 20 patients. The
results of the test are coded 0 and 1. Is there a significant change in the test result before and after
treatment?
135 Using:
http://vassarstats.net/propcorr.html
1.4.b.ii Cochran Q Test for k dependent samples
Cochran's Q test is an extension to the McNemar test for related samples that provides a method
for testing for differences between three or more matched sets of frequencies or proportions.
Example: 12 subjects are asked to perform 3 tasks. The outcome of each task is a dichotomous
value, success or failure.
The results are coded 0 for failure and 1 for success. In the example, subject 1 was successful in
task 2 but failed tasks 1 and 3.
136 Please run a Cochran Q test for testing whether the success or fail rate for the 3 tasks is the
same or not!
Solved by SPSS:
1) Input Data
137 2) Checking, data type need to be Numeric!
3) Analyze, Nonparametric Tests, Related Samples
138 4) Seeing, Click ‘Field
’
5) Choose ‘Fields’
139 6) In Setting, choose ‘Cochran’s Q (k samples)’ etc, click ‘Run’
7) Results:
Significance is 0.013 < 0.05, we can reject the null hypothesis that the 3 tasks are the same in
success or fail proportion!
140 Appendix
- Installation of free software
- Tables
Activvation o
of Exceel ‘Analyysis Too
olPak’ A
Add‐In (Office
e 2013)) 1. Clicck ‘File’ 2. Clicck ‘Options’ 93 3. Clicck ‘Add In
ns’ 4. Clicck ‘Analysis ToolP
Pak’, ‘Go’’ 94 5. Cheeck ‘Anallysis ToolPak’, Click ‘OK’ 6. ‘Daata’, ‘Data Analysis’ 95 7. Anaalysis Too
ols Ready 8. Tryy anyone of the teests, seem
ms fine! 96 3) Saved Folder: 4) Enter to see: Installation of PHStat4 Excel Add‐In 1) Go to : http://users.business.uconn.edu/rjantzen/phstatinstall.htm 2) Save as PHStat_4.0 on e.g. Desktop 5) Enter to see: 6) Create a PHStat shortcut on Desktop by copying its icon: 7) Doubling its icon on Desktop, click ‘Enable Macros’: 8) Click Add‐in, PHStat, installation of PHStat4 successful! Installation of G Power 1) Download G Power from : http://www.psycho.uni‐duesseldorf.de/abteilungen/aap/gpower3/ 2) Seeing: 3) Click ‘GPowerSetup’ 4) Click ‘Next’ 5) Click ‘Next’ 6) Click ‘Next’ after finishing 7) Click ‘Shortcut’ on Desktop: Installation of XLStat2015 1) Go to ‘http://www.xlstat.com/en’ for downloading the software 2) Select Windows, Mac etc 3) Save the installation file 4) Select where to save e.g. desktop 5) Installation File Downloaded 6) Click ‘Run’ 7) Select Language 8) Agree 9) Select ‘Complete’, ‘Next’… 10) Installing files: 11) Finish 12) XLStat2015 Icon appears 13) Clicking the icon: 14) Choose ‘Trial version’ 15) Program ready