Download PSYCHOLOGICAL STATISTICS B Sc COUNSELLING PSYCHOLOGY UNIVERSITY OF CALICUT IV Semester

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Analysis of variance wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
PSYCHOLOGICAL STATISTICS
IV Semester
COMPLEMENTARY COURSE
B Sc COUNSELLING PSYCHOLOGY
(2011 Admission)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University P.O. Malappuram, Kerala, India 673 635
School of Distance Education
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL
Complementary Course
B Sc Counselling Psychology
IV Semester
PSYCHOLOGICAL STATISTICS
Prepared by:
Dr.Vijayakumari. K,
Associate Professor,
Farook Teacher Training College,
Farook College. P.O. Feroke
Scrutinized by:
Prof.C. Jayan,
Department of Psychology,
University of Calicut.
Layout:
Computer Section, SDE
©
Reserved
Psychological Statistics – IV Semester
Page 2
School of Distance Education
CONTENTS
MODULE 1- HYPOTHESIS TESTING
PAGE
5
MODULE 2 - NORMAL DISTRIBUTION
15
MODULE 3 - ANALYSIS OF VARIANCE
19
Psychological Statistics – IV Semester
Page 3
School of Distance Education
Psychological Statistics – IV Semester
Page 4
School of Distance Education
MODULE 1
Objectives: 1. To know about various techniques of hypothesis testing
2. To develop understanding about various hypothesis testing techniques
INTRODUCTION
It is impossible or impractical to study all the elements of a population to arrive at
conclusions. Usually a researcher selects an appropriate sample from the population and studies the
sample. From the sample values, the population values are inferred using inferential statistics.
Inferential statistics is that branch of statistics which helps in inferring population value. It
uses the concept of probability to deal with uncertainty in decision making. Inferential statistics has
two functions.
1. Estimation- that is, estimating the parameter (Population value) from the statistic
(sample value).
2. Testing of hypothesis – that is, to test some hypothesis about the population from which
the sample is drawn.
Statistical inference makes it possible to have idea about population value (which is unknown)
from the sample values (which is known).
UNIT 1
HYPOTHESIS TESTING
HYPOTHESIS
Hypotheses are assumptions about population values. It may be of different types. If we
want to know whether a coin is unbiased, the hypothesis stated may be‘the coin is unbiased’. The
coin may be tossed for a number of times, say 200. Suppose one got 80 heads and 120 tails.
Using this information (sample value), statistics helps to test the hypothesis ‘the coin is
unbiased’ arriving at a conclusion of either accept the hypothesis or reject it.
A hypothesis test is a statistical method that uses sample values to evaluate a hypothesis
about a parameter. Hypotheses are stated in different forms. In statistics there are two types of
hypotheses –Null Hypothesis and alternate hypothesis. The null hypothesis (denoted as
)
states that in the general population, there is no change , no difference or no relationship.
In an experiment with treatment A for the control group and treatment B for the
experimental group, the investigator may be interested to know which treatment has greater effect
on the dependent variable ‘Y’. Then the hypothesis may be stated as there is no significant
difference in mean Y scores of control and experimental groups after the treatment. This
hypothesis is stated in null form as it says no difference between groups.
Psychological Statistics – IV Semester
Page 5
School of Distance Education
If the study is to find out whether there is gender difference in mechanical aptitude, the null
hypothesis will be ‘there is no significant gender difference in the mean scores of mechanical
aptitude’. In the case of finding whether the variables X and Y are related or not, the null
hypothesis will be ‘the two variables X and Y are not related’ or ‘there is no significant
relationship between the variables X and Y’.
If the population mean score of a variable X is 60, one can test whether the obtained sample
mean indicates a difference in the value. Here the null hypothesis is ‘the population value is equal
to 60’.
The hypothesis simply opposite of the null hypothesis is known as the alternative
hypothesis (denoted as
). This hypothesis states that there is a change, a difference or a
relationship in the general population.
In the first case, the alternate hypothesis will be ‘there is significant difference in the mean
Y score of control and experimental group after the treatment’. The second one will be ‘there is
significant gender difference in the mean scores of mechanical aptitude’. The third hypothesis can
be stated as ‘the two variables X and Y are related’.
In the fourth example the alternate hypothesis can be written as ‘the population value will
not be equal to 60’. All these alternative hypotheses state that there will be some type of change.
That is, there is no indication of change or direction of relation. The experimental and control
groups differ significantly but which group is high / low- it is not stated in the hypothesis. Such
hypotheses in which there is no indication of direction of change or relation are called non
directional hypotheses. Tests that are used to test such hypotheses are non directional tests or two
tailed tests.
If there is an indication of direction of change or nature of relation, the hypothesis is a
directional one. In the above examples, if the hypothesis is ‘experimental group has a higher
mean Y score than the control group after the treatment’, it is directional as there is a clear
indication that experimental group is better than the control group in the variable Y.
Similarly, if the hypothesis ‘the two variables X and Y are related’, is stated as ‘there is
positive significant relation between the variables X and Y’, it is directional as there is indication
of the nature of relationship – positive/negative.
If µ and µ are two population means and we have to test whether these means are equal
or not, the null hypothesis can be stated as
: µ = µ against the alternative hypothesis
: µ ≠ µ . Here the hypothesis formed is non directional and the test used will be a two
tailed test.
If the researcher wishes to know whether µ is greater than µ , the hypothesis will be
: µ ≤µ
against
: µ >µ
Here hypotheses are directional and hence the test will
be one tailed test. Statistical tests are designed to test the null hypothesis and based on this
decision, alternate hypothesis is rejected or accepted.
Psychological Statistics – IV Semester
Page 6
School of Distance Education
HYPOTHESIS TESTING
Hypothesis testing deals with prediction of population values based on sample values. That
is, here we are taking decisions about parameters based on the sample values. Whenever we take
decisions about accepting or rejecting four alternatives are possible. Our decision will be one
among the four alternatives. The four alternatives are
Accept
, When
not true
Accept
, When
true
Reject
, When
not true
Reject
, When
true
The population values are unknown and hence we do not know whether
is true or false.
But based on the evidences from the sample we are taking decisions on accept
or reject .
This is similar to the judiciary. One who is accused may be innocent or the culprit. Based on the
evidences before the court, the Judge sentences that the person is innocent or the person is culprit
which may be correct or wrong.
Similarly based on the evidences from sample, suppose the researcher had taken a decision
to accept / reject . Among the four alternatives, the second and third are correct decisions and
the other two are incorrect.
The decision taken by the researcher will be a correct one or a wrong one. One cannot avoid
completely the chance for error in taking decisions. The four decisions can be represented in a 2x2
(Two by Two ) table as below.
True
Reject
Incorrect decision
False
Correct decision
(Type I error)
Accept
Correct decision
Incorrect decision
(Type II error)
The two errors possible in hypothesis testing are rejecting
accepting
when
is false.
when
is true and
The first type of error is known as type one (type I) error and the second one is known as
type two (type II ) error in hypothesis testing. That is, type I error in hypothesis testing is the error
committed while rejecting Ho when it is true. The probability of type I error is known as level of
significance of the test denoted as ‘α’
Type II error is the error committed while accepting
when
of type II error is denoted as β and 1-β is known as power of the test.
is not true. The probability
It should be noted that, if we decrease the probability of type I error the chance for rejecting
will be decreased, that is chance for accepting
will be increased leading to increase in the
Psychological Statistics – IV Semester
Page 7
School of Distance Education
probability of type II error. The case is same when the reverse is taken. That is , as α decreases, β
increases, and as α Increases, β decreases. To avoid confusion, in research α Is fixed at different
levels. In behavioral sciences, usually α Is taken as 0.01 or 0.05. α = 0.01 means that the
probability of rejecting
when actually it was true is 0.01. or more clearly, the probability of
getting a mean difference, when such difference does not exist is 0.01. It can be again explained
that if one conduct the study 100 times, in 99 times the researcher will get a difference, only in one
case he may get a non difference.
If α= 0.05, the probability of getting a difference when actually there is no difference
between group means is 0.05 or the probability of accepting a true hypothesis is 95 percent.
SAMPLING DISTRIBUTION AND STANDARD ERROR
Sampling distributions are the distributions formed by sample values. For example if we
want to study the EQ of adolescents in Kerala, we will take a sample from the population ( say
sample size 1000), measure their EQ and calculate the mean and other descriptive statistics. If the
same procedure is continued with different samples of same size, we will get a set of Arithmetic
means of EQ. Each mean score will be a true estimate of the population mean. But as we know,
we are not measuring EQ of each and every member of the population and hence the mean score
obtained from the sample may be different from the population mean. The distribution formed by
the sample mean values is known as the sampling distribution of mean.
If the set is formed by calculating correlation between two variables, the distribution will be
a sampling distribution of correlation. Generally, sampling distribution is formed from a
population distribution known or assumed. A number of sampling distributions (each for a specific
sample size) is possible from a population. Sampling distributions of two or more statistics are
possible from the same population.
Sampling distribution help the researcher to calculate errors due to chance involved in
making generalization about population on the basis of samples.
The standard deviation of sampling distribution is named as standard error (SE). SE of
sampling distribution of mean is
. where σ Is the standard deviation of the population
√
distribution (or its estimate). Standard Error gives an idea about the unreliability of the sample.
Moreover, confidence limits within which the parameter values are expected to lie can be formed
with the help of SE.
TESTS OF SIGNIFICANCE FOR LARGE SAMPLES
Though there is no clear-cut line of demarcation between large and small samples, if the
size of a sample exceeds 30, statistically it can be considered as a large sample. Tests of
significance used for large samples are different from that for small samples. This is because the
assumption of normality will not be satisfied by small samples.
Two tailed test for difference between the means of two samples (large independent samples)
If two independent (different , for eg Boys and Girls, Students of class A & class B etc.,)
random samples with size
&
perceptively ( ,
> 30) are drawn from populations of
standard deviation and Then to test the null hypothesis that there is no significant difference
in the means of the two samples
Psychological Statistics – IV Semester
Page 8
School of Distance Education
: µ = µ against the alternative hypothesis
Critical ratio
=
Where
=
Therefore
: µ ≠µ
+
=
If this value is greater than 1.96, the null hypothesis is rejected at 0.05 level. (If it is greater
than 2.58, the null hypothesis is rejected at 0.01 level).
Illustration
Emotional Intelligence of two groups A and B are measured and the mean, standard
deviation and sample size of each group is given below. Test wether there is significant difference
in the mean emotional intelligence scores of the two groups.
Mean
SD
N
Group A
75
15
150
Group B
70
20
250
: µ = µ against
: µ ≠µ
(The two groups do not differ significantly in their mean emotional intelligence score
against there is significant difference between the groups).
=
−
+
=
=
2.84
Since the calculated value is greater than the tabled value 2.58 for significance at 0.01 level,
is rejected. That is group A and group B differe significantly in their mean emotional
intelligence. (α ≤ 0.01)
Psychological Statistics – IV Semester
Page 9
School of Distance Education
TESTS OF SIGNIFICANCE FOR SMALL SAMPLES
When the size of the sample is less than 30, one can not assume that the sampling
distribution of the statistic is approximately normal and that the values given by the sample data are
sufficiently close to the population value. (That is the sample value need not be a true estimate of
the population value)
While dealing with small samples, one will be interested in testing a given hypothesis than
estimating the population value. For eg, if a correlation of .38 is reported from a sample of 10
individuals, instead of finding out the population value, one will be interested in finding out
whether this value could have arisen from an uncorrelated population.
STUDENTS t DISTRIBUTION
Theoretical works on t distribution of W.S. Gosset was published in 1905 under the pen
name ‘student’. The t distribution is known as ‘student t distribution’or ‘student distribution’. The t
distribution is used when sample size is less than or equal to 30 and the population standard
deviation is unknown.
t statistic is calculated using the formula
Where
=
∑(
)
=
−µ
× √
PROPERTIES OF t DISTRIBUTION
1. ‘t’ distribution ranges from minus infinity to plus infinity
2. ‘t’ distribution varies as ‘n’ varies
3. It is symmetrical with respect to the ordinate at mean
4. The variance of the distribution is greater than one and approaches one as the sample size
becomes large. That is, as sample size increases the t distribution approaches a normal
distribution.
Following figure will clearly bring out the features of ‘t’ distribution. As degrees of freedom
increases, that is, as the sample size increases the ‘t’ distribution approaches the normal curve.
Psychological Statistics – IV Semester
Page 10
School of Distance Education
Student t distribution is used to test the significance of various results obtained from small
samples. It can be used to test the significance of ‘mean’ of a random sample in the following way.
To test whether the mean of a sample drawn from a normal population deviate significantly
from a stated value (may be a specified value or a population mean) when the population standard
deviation is unknown, t variate can be calculated using the formula
=
µ
× √
where
-- Mean of the sample
µ -- the population mean or the specified value
n – sample size
=
∑(
)
If the calculated value of t exceeds . (the tabled value of t at n-1 degrees of freedom for
significance at 0.05 level) the difference between and µ is significant at 0.05 level, if it is less
than . , the difference is not significant at 0.05 level.
If the calculated value greater than
. , the difference is significant at 0.01 level. For
example, if the mean life time of ten bulbs is found to be 4400 hrs with standard deviation 0.589,
test the hypothesis that the average life time of bulbs is 4000 hrs.
=
( − )√
(
=
=2.148
.
)√
Degrees of freedom= n-1= 9.
Therefore
.
for 9 df =2.262 (from table)
Since the calculated value is less than . , the difference between
and µ
significant at 0.05 level. That is, the average life of the bulbs can be taken as 4000 hrs.
is not
Test of Significance of difference between two means (Small independent samples)
If and
are the means of two samples (small and independent) of size
and
with
standard deviation and
respectively, to test whether the two means differ significantly, ‘t’ can
be calculated using the formula
Psychological Statistics – IV Semester
Page 11
School of Distance Education
=
∑(
−
) + ∑( −
+ −2
−
This formula can be rewritten as:
t=
(
)
(
)
+
)
If this calculated value is greater than the tabled value of ‘t’ for n1+ n2-2 degrees of
freedom ( . ) for significance at 0.05 level, the difference is significant at 0.05 level.
Illustration: Effect of two types of drugs on two samples of patients sample A (Size 5) and
sample B (size 7) for reducing weight was studied and the loss in weight was measured. Sample A
(using drug 1) had a mean loss of weight 12 kg with a standard deviation 1.12 and sample B (Using
drug 2) has a mean loss of weight 11 kg with standard deviation 2.31. Find whether there is
significant difference in the efficacy of the two drugs.
=12
=1.12
= 11
=2.31
=5
=7
For practical purposes the standard deviation of
=
instead of S =
(
)
(
)
.
−
can be calculated using the formula
+
+
Thus t =
=
× .
× .
×
×
=0.89
Psychological Statistics – IV Semester
Page 12
School of Distance Education
Since the calculated value is less than the tabled value of ‘t’ for 10 degrees of freedom at
0.05 level of significance ( . = 2.228) the difference is not significant at 0.05 level. That is
there is no significant difference in the efficacy of the drugs used.
Test of Significance of difference between two means (Small dependent samples).
If the samples are dependent, that is paired observations, the difference between means can
be tested using the formula:
t= √
where ̅ - mean of differences
s=
∑(
)
or
∑
Here ‘t’ is based on (n-1) degrees of freedom.
Illustration: In an experimental study, a researcher obtained the pre-test and post- test scores for 10
participants as below:
Individual No. 1
2
3
4
5
6
7
8
9
10
Pre-test:
44
40
61
52
32
44
70
41
67
72
Post Test:
53
38
69
57
46
39
73
48
73
74
Test whether the pre-test and post-test mean scores differ significantly.
d
9
-2
8
5
14
-5
3
7
6
2
d2
81
4
64
25
196
25
9
49
36
4
̅ = 4.7, ∑
= 493
=
=
∑
̅√
. ×√
−
−1
×
.
= 2.85
for 9 d.f = 2.26 (from table). The calculated ‘t’ value exceeds the tabled value, hence the
.
difference is significant at 0.05 level. That is the pre test and post test mean scores differ
significantly at 0.05 level.
Psychological Statistics – IV Semester
Page 13
School of Distance Education
Test of Significance of Observed Correlation Coefficient.
Suppose a random sample is taken from a bivariate normal population, (that is two variables
are involved in the population), to test the hypothesis that the correlation coefficient of the
population is zero, ie.,the two variables in the population are uncorrelated, t can be calculated using
formula
=
√
× √ − 2 with (n-2) degrees of freedom.
If the calculated ‘t’ value exceeds the tabled value of ‘t’ at 0.05 level for n-2 degrees of
freedom, the value of r is significant at 0.05 level. If ‘t’ is less than the tabled value, the data are
consistent with the hypothesis of an uncorrelated population.
Illustration
The correlation coefficient obtained in the case of two variables is 0.42 for a sample of 27 pairs of
observations. Test whether the correlation obtained is a significant one.
=
=
√1 −
. 42
×√ −2
√1 −. 42
=2.31
× (27 − 2)
The tabled value of t at 0.05 for 25 degrees of freedom is 1.708. Since the calculated value is
greater than the tabled value, the correlation obtained is significant at 0.05 level.
Psychological Statistics – IV Semester
Page 14
School of Distance Education
MODULE 2
NORMAL DISTRIBUTION
Objectives:
1. To know about the characteristics of Normal distribution
2. To familiarize with the concepts of skewness and kurtosis of frequency curves.
3. to acquaint with knowledge about various measures of skewness and kurtosis.
Normal distribution
Normal distribution was originally investigated by DeMoivre (1667–1754) to describe the
results of games of chance (gambling). The distribution was defined precisely by Pierre-Simon
Laplace (1749–1827) and put in its more usual form by Carl Friedrich Gauss (1777–1855). It was
Francis Galton (1822–1911) who gave the normal distribution a central role in psychological
theory, especially in the theory of mental abilities.
Mathematically the normal distribution is defined as
(
)
=
Where e and π are constants, µ and σ are the mean and standard deviation of the
√
set of scores.(π = 3.1416 and e = 2.7183)
Normal distribution has a significant role in statistical analysis because
1. Many of the dependent variables with which we deal are commonly assumed to be normally
distributed in the population.
2. Many of the statistical techniques to make inferences about values of the variable assumes the
normality of the distribution of the variable.
3. The theoretical distribution of the hypothetical set of sample means obtained by drawing
an infinite number of samples from a specified population can be shown to be approximately
normal under a wide variety of conditions. The concepts of sampling distribution and sampling
error are highly connected to the concept of normal distribution.
The general characteristics of a normal distribution are
1. It is symmetrical with respect to the ordinate at mean. The left side of the normal curve is a
mirror image of the right side.
2. Mean, Median and Mode coincide.
3. Fifty percent of the scores are below the mean, and fifty percent above it. Most of the
scores pile up around the mean and extreme scores are relatively rare.
4. The height of the vertical line (Ordinate) is the maximum at the mean.
Psychological Statistics – IV Semester
Page 15
School of Distance Education
5. The curve has no boundaries in either direction (The curve is asymptotic to the X-axis and
extends from –∞ to + ∞ )
6. The percentage area around the mean are
a. Mean to Mean±1σ is 34.13%
b. Mean+1σ to Mean + 2σ is 13.59% ( Mean-1σ to Mean - 2σ is 13.59%).
c. Mean+2σ to Mean + 3σ is 2.15%. (Mean-2σ to Mean -3σ is 2.15%).
That is 68% of the total area of the curve lies between the limits mean + 1 σ and mean –
1σ; 95.44% of the total area falls between mean + 2σ and Mean – 2 σ: And 99.73% of
the total area lies between mean + 3σ and Mean -3σ
Hence for practical purposes it is assumed that the normal curve extends from mean -3σ and mean
+ 3σ.
Application
Normal distribution is a good model for many naturally occurring distributions. So it is
very much useful in interpreting inferences about population.
Major applications of normal curve can be listed as below.
1. When the distribution is normally or nearly normally distributed using the normal
probability table, the percentage of cases that fall between mean and a given σ distance
from the mean or the percentage of total area included between mean and a given σ distance
from the mean can be calculated.
2. The normal curve is used to convert a raw score into standard score ( =
).
3. Normal curve is useful in calculating the percentile rank of scores.
4. For normalizing a given frequency distribution normal curve is used.
SKEWNESS AND KURTOSIS
Two distributions may have the same mean and standard deviation but may differ widely in
their overall appearance. Equal means and standard deviations do not guarantee the equality of two
distributions. Measures of skewness and kurtosis give clear picture of the overall appearance of the
distribution.
Skewness
The term ‘skewness’ means lack of symmetry. When a distribution is not symmetrical
(asymmetrical) it is called a skewed distribution.
Measure of skewness gives the direction and the extent of skewness. In symmetrical distribution,
the mean, median, and mode are identical. The more the mean moves away from the mode, the
larger the asymmetry or skewness.
Psychological Statistics – IV Semester
Page 16
School of Distance Education
In a symmetrical distribution, the value of mean, median and mode coincide. The spread of
the frequencies is the same on both sides of the centre point of the curve. A skewed distribution can
be either positively skewed or negatively skewed. In a positively skewed distribution, the value of
mean is maximum and that of mode is minimum. Median lies in between mean and mode. In a
negatively skewed distribution, the value of mode is maximum and that of mean is least. Median
lies in between the two. The distribution of frequencies are spread out over a wide range of values
on the high-value side or the right hand side of the distribution in a positively skewed distribution
(excess tail on right hand side) while the tail will be more extended in the left hand side of the
curve in a negatively skewed distribution.
Negatively skewed distribution
Normal distribution
Positively skewed distribution
Measurement of Skewness
Extent of skewness is calculated using the formula
Or
=
=
( Karl Pearson’s coefficient of skewness)
(Bowley’s coefficient of skewness)
In Karl Pearson’s method there is no limit for the value of skewness but in Bowley’s method the
value of skewness ranges from -1 to +1. A value of zero in both cases indicates the curve is
symmetrical or non skewed.
Kurtosis
The word Kurtosis means ‘bulginess’. Kurtosis is the degree of flatness or peakedness in the
region of mode of a frequency curve. Thus kurtosis gives an idea about how far a distribution is
peaked or flat compared to a normal distribution.
A normal curve is said to be mesokurtic and if the curve is more peaked than the normal
curve, it is leptokurtc. If the curve is more flatter than the normal curve it is known as platy kurtic.
A lepto kurtic curve has a narrower central position and higher tails than does the normal curve. A
platy kurtic curve will have a broader central position and lower tails.
Psychological Statistics – IV Semester
Page 17
School of Distance Education
Mesokurtic(normal curve)
Platykurtic
Leptokurtic
Measurement of Kurtosis
A formula for calculating kurtosis in terms of percentiles is
=(
)
where Q is the quartile deviation or (
−
)/2.
In the case of a normal distribution the value of kurtosis calculated using this formula is 0.263. if
the value is less than .263 the distribution will be lepto kurtic and if it is greater than .263 it will be
a platy kurtic distribution.
Psychological Statistics – IV Semester
Page 18
School of Distance Education
MODULE 3
ANALYSIS OF VARIANCE
Objectives: 1. To know about ANOVA
2. To know about assumptions of ANOVA
3.To know about one-way and two-way ANOVA
ANALYSIS OF VARIANCE (ANOVA)
When we have to test the significance of difference between means of two random samples,
we use test of significance of difference between means or t-test. But if there are more than two
groups using t-test will be laborious to find out whether any of the two group means differ
significantly. For example if there are five groups to be compared, ten t-values are to be calculated
to know whether any of the groups differ in their means. Analysis of variance helps one to find out
whether any of these groups differ significantly in their mean. Instead of a large number of t-tests,
ANOVA uses a single test, F-test in which the variances are compared (One Way ANOVA).
Though ANOVA is used for testing the significance of difference between means, it is known as
Analysis of Variance as it uses or analyses two types of variances-Between variance and Within
variance. Between variance is the variance of the group means and within variance is the mean
value of the variances of the scores within each sample or groups.
F-value is calculated using the formula
=
ℎ
From the table of F-values one can determine whether the groups differ significantly in their
mean. If the calculated value is greater than the tabled value of F, for (k-1),(N-k) degrees of
freedom, the mean difference between atleast two groups in the set will be significant and if the
calculated value is less than the tabled value, the mean difference between any groups is not
significant at that level of significance considered.
Basic Assumptions of ANOVA
ANOVA is a parametric test and has to satisfy certain assumptions in order to use it for statistical
inferences.
1. The population distribution of the dependent variable should follow normality.(Assumption
of Normality)
2. The groups drawn on certain criteria should be randomly selected from the sub population
having the same criteria. (Assumption of Randomness)
3. The subgroups under study should have the same variability. (Assumption of Homogeneity
of Variance).
Psychological Statistics – IV Semester
Page 19
School of Distance Education
Computation for Analysis of Variance (One-Way)
Step 1 Correction
(∑ )
=
(∑
+∑
+ ∑ +. . )
Where x1,x2…. Are individual measures and N total number of observations in all the
groups.
Step 2 Total sum of squares(SSt)
=∑
+∑
+ ⋯−
Step 3 Sum of squares of Between Means (SSb)
=
(∑ )
Step 4 Sum of Squares within (SSw)
+
(∑ )
+ ⋯−
SSw= SSt-SSb
Step 5 Calulation of F
F= =
=
Where
groups)
is the degrees of freedom for between group sum of squares (k-1, k number of
is the degrees of freedom for within group sum of squares (N-k)
Step 6 Arriving at conclusion
If the calculated value is greater than the tabled value reject H0 , otherwise accept H0.
Analysis of Variance for factorial design.
If there are more than one independent variables, One Way Anova cannot be used and then
we have to use ANOVA for Factorial esign. Usually in ANOVA independent variables are called
as factors and different categories of these variables as levels. For example if we consider the
dependent variable as hostility among students and the independent variables as sex and home
environment, sex and home environment are known as factors and the levels, male and female are
the levels of the factor sex and the levels of home environment will be the categories formed
based on the home environment. In this case we can find out the main effects of the two factors
and the interaction effect of the two factors. That is we can test the significance of difference in
hostility among male and female students, test the significance of difference in hostility of
students of various home environment and test whether the home environment influence hostility
Psychological Statistics – IV Semester
Page 20
School of Distance Education
at various levels of the factor sex. In this case the total variance is devided into four- the main
effect of sex; main effect of home environment ; interaction effect of the two factors and the
residual or within group variance. If two factors are involved the ANOVA is known as Two way
ANOVA,if there are three factors it is three way ANOVA and if more than three factors are
included it is factorial design.
While using ANOVA it is conventional to say the levels of variables. In the above example
with sex and home environment as factors the ANOVA used is two way ANOVA with design
2X3 ( sex has two levels-male and female; home environment is here assumed to have three
levels: it is read as two by three even written in the form of multiplication). Thus an ANOVA
with design 3X2X2 means there are three factors involved in the analysis-first one with 3 levels,
and the second and third with two levels each. Here the ANOVA used is a three way ANOVA.
*******
Psychological Statistics – IV Semester
Page 21