Download Null hypothesis. - KV Institute of Management and Information Studies

Document related concepts
no text concepts found
Transcript
HYPOTHESIS-Meaning
Hypothesis is an assumption which may or may not be true about a population
parameter.
Hypothesis is a tentative conclusion logically drawn about a population
parameter.
Statistical hypotheses are statements about the probability distributions of the
populations. In other words, statistical hypothesis are assumptions or guesses about
population involved.
Example, The average weight of fish reared in a Research station is 2.6.
Types of hypothesis
There are two types of hypothesis namely
null hypothesis
alternative hypothesis.
Null hypothesis.
•
A statistical hypothesis which is stated for the purpose of possible acceptance is
called null hypothesis. It is denoted by H0.
Alternative hypothesis.
•
Hypothesis contradictory to null hypothesis is called an alternative hypothesis
Steps in formulation of Hypothesis
•
•
•
•
•
•
•
•
•
•
Steps / Procedure to formulation of Hypothesis:
Setting up of Hypothesis:
The first step in testing the hypothesis is setting up of hypothesis. The conventional approach is
to set up two hypotheses instead of one. These hypotheses are
1. Null Hypothesis
2. Alternative Hypothesis
Following the sampling theory approach, we accept or reject a hypothesis on the basis of
sampling information alone. Any samples we draw will vary from population. We must judge
whether these differences are statistically significant or insignificant.
Choosing a Statistical Tools
Second step in testing the hypothesis is choosing a statistical technique. There are many
statistical tests, which are frequently used in hypothesis testing. They are Z- test, t-test, and Chisquare test. The researcher should be able to choose the appropriate test. When hypothesis
pertain to a large samples (30 or more than 30) the Z-test is used. When the sample is less than
30, the t-test can be used.
3.Selection of Desired Level of Significance:
The third step is the Selection of Desired Level of Significance. The confidence with which an
experimenter rejects or retains a hypothesis depends upon the level of significance. The
significance level is expressed in percentage such as 5% and 1% etc. When the researcher
accepts the 5% level, he will make wrong decision about 5% of the time. By rejecting the
hypothesis at the same level , he runs the risk of rejecting a hypothesis in 5 out of every 100
occasions.
Continue
•
•
•
•
4. Computation of Chi-square Test.
The fourth step is the performance of computation necessary for the test. The calculation
includes testing statistics and the standard error.
5.Draw Statistical Decision:
The final step in hypothesis is to draw statistical decision involving the acceptance or
rejection of hypothesis. This will depend on whether the computed values of the test fall in
the region of acceptance or in the region of rejection at a given level of significance.
One Tail Test
•
One-Tailed Tests of Significance
A test is one-tailed when the alternate hypothesis, H1 , states a
direction, such as:
H0 : The mean income of females is less than or equal to the mean income of
males.
H1 : The mean income of females is greater than males. Sampling
Distribution for the Statistic z for aOne-Tailed Test, .05 Level of
Significance
Two-Tailed Tests of Significance
•
A test is two-tailed when no direction is specified in
the alternate hypothesis H1 , such as:
– · H0 : The mean income of females is equal to the mean
income of males.
– · H1 : The mean income of females is not equal to the mean
income of the males.
Two-Tailed Test
A statistical test in which the critical area of a distribution is two sided
and tests whether a sample is either greater than or less than a certain range
of values.
Example ;
The candy plant wants to make sure that the number of candies per
bag is around 50. The factory is willing to accept between 45 and 55 candies
per bag. It would be too costly to have someone check every bag, so the
factory selects random samples of the bags, and tests whether the average
number of candies exceeds 55 or is less than 45 with whatever level of
significance it chooses.
Z-test and t-test
Z-test and t-test.
• Z test is the test of significance used for large samples i.e when n>=30.
• The Z-test compares the mean from a research sample to the mean of a
population. Details (μ, σ) of the population must be known.
• t- test is the test of significance for small samples i.e when n < 30.
• The t-test compares the means from two research samples. Used when
the population details (μ, σ) are unknown.
Table Values
Level of
significance
1% Level
5% Level
10% Level
Two Tail Test
2.58
1.96
1.645
One Tail Test
2.33
1.645
1.28
Symbols for Population and samples
Population Symbol
Sample Symbol
Size of population
=N
Mean of Population
=μ
Population Standard deviation = σ
Population proportion
=p
Sample Size = n
_
Sample mean = x
Sample standard deviation = s
_
Sample proportion = P
T – Test ; Test of Significance of Small Samples
The t-test is probably the most commonly used Statistical Data Analysis procedure for
hypothesis testing. Actually, there are several kinds of t-tests, but the most common is the "two-
sample t-test" also known as the "Student's t-test" or the "independent samples t-test". This
method invented by Sir William Gosset.
When the sample size is 30 or less and the population standard deviation is unknown ,
we can use the t- Distribution.
Applications of the t- Distribution:
1) Test of hypothesis about the population mean.
2) Test of hypothesis about the difference between two means.
3) Test of hypothesis about the difference between two means with dependent samples.
4) Test of hypothesis about co- efficient of correlation
Test of hypothesis about the Single population mean
In determining whether the mean of a sample drawn from a normal population deviates
significantly from the hypothetical value, when variance of the population is unknown we use t –
distribution.
_
_
( X - μ) √n
√Σ (X – X) 2
Formula: t = ----------; S = -----------_
S
n–1
Where; x = Mean of the Sample
μ = The actual / hypothetical value of the population
n = Sample Size
S = The Standard deviation of the Sample
The 95% fiducial limit of the population Mean (μ) are:
_
S
X + - ---- ( t 0.005) and
√n
99 limits are :
_
S
X + - --- ( t 0.01)
√n
Test of hypothesis about the population mean – Modal-1
The following results are obtained from a sample of 10 boxes of biscuits:
Mean weight of contents = 490 gms
Standard deviation of the weight = 9 gms
Could the sample come from a population having a mean of 500 gms.
Solution:
Let us take the hypothesis that μ = 500 gms.
_
_
490 – 500
10
( X - μ) √n
X = 490; μ = 500; n = 10; t = ----------- x √10 = ---- x 3.16; t = 3.51
Formula: t = ----------;
9
9
S
d.f = (n – 1) 10 - 1 = 9 ;
Table Value t 0.01 = 3.25
Calculated Value = 3.51
Conclusion: 3.51 > 3.25 our hypothesis rejected .
95% confidence interval of the population mean
_
S
9
=
X + - ---- ( t 0.005) = 490 + - --- x 3.51 = 490+- 10 =500 and 480
√n
√10
Illustration 2
A sample of 26 bulbs give a mean life of 990 hours with a standard deviation of 20 hours. The
manufacturers claim that the mean life of the bulb is 1000 hours . Is the sample not up to the
standard.
Null hypothesis = Population of the mean μ = 1000
_
X = 990; μ = 1000; n =26 ; s = 20
_
990 – 1000
-10
( X - μ) √n ; t = ------------------ x √26 = ---- x 5.10; t = - 2.55
Formula: t = ----------;
20
20
S
d.f = (n – 1) 26 - 1 = 25 ;
Table Value t 0.05 = 1.708
Calculated Value = 2.55
Conclusion: 2.55 > 1.708 our hypothesis rejected .
Illustration 3 – University question -2009
A random sample of size 16 has 53 as mean . The sum of squares of deviation taken from mean is 135.
Can this sample is regarded as taken from the population having 56 as mean obtain95% and 99%
confidence limit of the mean of the population.
Null hypothesis = Population of the mean μ = 56
_
X = 990; μ = 1000; n =16 ; Variance =135 i.e S = 11.62
_
53 – 56
-3
( X - μ) √n ; t = ------------------ x √16 = -------- x 4 ; t = - 1.03
Formula: t = ----------;
11.62
11.62
S
d.f = (n – 1) 16 - 1 = 15 ;
Table Value t 0.05 = 1.708
Calculated Value = 1.03
Conclusion: 1.03 < 1.708 our hypothesis Accepted .
95% confidence interval of the population mean
_
S
11.62
=
X + - ---- ( t 0.005) = 53 + - ------- x 1.708 = 53+- 3 = 56 and 50
√n
√16
Test of hypothesis about the Single population mean – Modal-1
Additional problem
Problem1). A random of 10 children had mean weight of 14.3 kgs and a variance of 2.1.
Test at the 5% level of significance that the mean weight of the children population
is 15 kgs.
Problem 2) A new machine attachment would be introduced if it receives a mean of at least 7 0n a
ten point scale . A sample of 20 purchase engineers is shown the attachment and asked to
evaluate it. The results indicates a mean rating of 7.9 with a S.D is 1.6 . A significant level of
@ =0.05 (Table value 2.09) is selected. Should attached be introduced. ( Anna University
Question)
Hits : _
X = 7.9; μ = 7; n =20; S = 1.6
3) The mean weekly sales of soap bars in departmental stores was 146.3bars per store. After an
advertising campaign the mean weekly sales in 22 stores for typical week increased to 153.7
and showed a S.D of 17.2 was the advertising campaign successful. A significant level of @
=0.05 (Table value 1.72) is selected.
Hits : _
X = 153.7; μ = 146.3; n =22; S = 17
Test of hypothesis about the Single population mean – Modal- 2
• The life time electronic bulbs for a random sample of 10 from a large
consignment gave the following data. Can you accept the hypothesis that the
average life time of bulbs is 4000 hours.
Items
1
2
3
4
5
6
7
8
9
10
Life in
‘000’
hours
4.2
4.6
3.9
4.1
5.2
3.8
3.9
4.3
4.4
5.6
Solution
Let us take the hypothesis that “there is no significant difference in the sample mean and the
hypothetical population Mean. Applying the t-test.
X
_
(X - X); 4.4
4.2
-0.2
o.o4
4.6
0.2
0.04
3.9
-0. 5
.25
4.1
-0.3
.09
5.2
0.8
.64
3.8
-0.6
.36
3.9
-0.5
.25
4.3
-0.1
.01
4.4
0
0
5.6
1.2
1.44
ΣX = 44
_
(X – X) 2
_
(X – X)2
= 3.12
Continue
_
_
( X - μ) √n
(X – X) 2
3.12
Formula: t = ----------; S = √ ------------ = S = √------------ = 0.589
S
n–1
10 – 1
t =
4.4 – 4
0.4 x 3.162
----------- * √10 = ------------------- = 2.148
0.589
0.589
For v ( Degrees of Freedom) = (n-1) = (10 – 1) = 9 .
Table Value t o.05 = 2.262
Calculated Value
= 2.148
Conclusion : The calculated value of t 2.148 is less than the table value 2.262.The
hypothesis is accepted . The average life time of bulbs could be 4000 hours.
Illustration
•
Example Past records show that the mean marks of students taking statistics are 60 with
standard deviation of 15 marks. A new method of teaching is adopted and a random sample of
64 students is chosen. After using the new method, the sample gives the mean marks of 65. Is
the new method better?
• Solution: Here we are interested in knowing whether the marks increased on using the new
teaching method. Therefore, we use the one-tailed method:
• We have = 65, m = 60, s = 16 and n = 64 then
_
_
( X - μ) √n
√Σ (X – X) 2
√ 3.12
Formula: t = ----------; S = ------------ = S = -------- = 0.589
S
n–1
10 – 1
t =
60 – 65
-5 x 8
40
--------- * √64 = -------------- = ------- = 2.5
16
16
16
The null hypothesis is : Ho : m = 60
The alternative hypothesis is : Ha : m > 60.
Now suppose the researcher had predetermined the level of significance which is 0.01 or
1% for his decision. Then 2.5 > 2.33 (Here z-score is 2.33 for 0.01 level on the upper-tail of
distribution). Therefore, the observed value is highly significant. That is H o is rejected This
means the new teaching method is not better.
Test of hypothesis about the Single population mean –
Additional problem Modal- 2
1) Prices of shares of X & Co Ltd . On different days in a month were found to be :
155, 154, 158, 159, 158, 160, 159, 152, 153,and 155. Can we accept the hypothesis that the
average price of shares is Rs.155? (v=9,t 0.05 = 2.262.
2) A random sample of 10 cricket matches has 39 runs as mean. The sum of the squares of
deviation taken from mean score is 10,404. Can this sample is regarded as taken from the cricket
matches having 45 as average scores ? Also obtain 95% confidence limits of the mean of the
population ( for V = 9 , t 0.05 = 2.262.
Test of hypothesis about the difference Between Two means
Given two independent random samples of size n1 and size n2 with X1 and X2 and standard
deviations S1 and S2 , the value of t is calculated by applying the following formula:
_
_
( X1 - X2)
Formula: t = ----------x √n 1xn2 / n1 + n2
S
_
_
S = √Σ (X1 – X1) 2 + Σ (X2 – X2) 2
_____________________________________
n1 + n2 – 2
1(n1S1 2 +n2S2 2 )
_________________________________
(or)
n1+n2 - 2
Test of hypothesis about the difference Between Two meansModal 1-Problem
Illustration.1.The heights of six randomly chosen soldiers are in inches: 76, 70, 68, 69, 69 and
68. Those of 6 randomly chosen sailors are 68, 64, 65,69, 72,64. Discuss in the light of these data
throw on the suggestions that soldiers are ,on the average , taller than sailors. Use t- test.
Solution:
Let us take the hypothesis that “There is no difference in height in height o soldiers and sailors.
Applying t –test.
Height X1
_
X1 – X1
_
(X1 – X1) 2
Height X2
_
X2 – X2
(X2 – X2) 2
76
6
36
68
1
1
70
0
0
64
-3
9
68
-2
4
65
-2
4
69
-1
1
69
2
4
69
-1
1
72
5
25
68
-2
4
64
-3
9
Σ(X1 – X1)2=46
ΣX2 = 402
ΣX1= 420
Σ(X2 – X2) 2
= 52
Continue
_
X1 = 420/6 = 70;
_
X2 = 402/6 = 67
_
_
( X1 - X2)
Formula: t = ----------- x √n 1xn2 / n1 + n2
S
_
_
S = √Σ (X1 – X1) 2 + Σ (X2 – X2) 2 / (n1+n2 -2)
S = √46 + 52/(6+6-2) = √ 98/10 = 3.13.
70 -67
3
t = ------------- x √(6 x 6)/(6 + 6) = ------ x 1.732; t = 1.66
3.13
3.13
Degrees of freedom (n1-1) +(n2-1)= 10
The calculated value of t is (1.66) less than the table value( t 0.05= 2.23), that hypothesis
accepted. Hence ,the soldiers are not , on an average ,taller than sailors.
Test of hypothesis about the difference Between Two meansModal 2-Problem
Illustration :1
_
The mean weekly wages of sample of n1 = 30 employees in a large firm is X =
Rs. 280, with a sample standard deviation of S1 = Rs. 14. In another firm a sample of
_
n2 = 40 employee have a mean wage X2 = Rs. 270with a S.D S2 = Rs. 10. The S. D of
the population are not assumed to be equal. Test the hypothesis that there is no
difference between the mean weekly wage amounts of the two firm at 5% significant
Level.
Solution:
_
X1 = 280;
_
X2 = 270 ; n1 = 30; n2 = 40 ; S1 = 14; S2 = 10.
_
_
( X1 - X2)
Formula: t = ----------- x √n 1xn2 / n1 + n2
S
1(n1(S1) 2 +n2(S2)2
1 ( 30(14) 2+40(10) 2
5880 + 4000
S2 = ----------------------------------- = ---------------------------- = -------------------- = 145.29
n1+n2 – 2
30 + 40 -2
68
S = √145.29 = 12.05
280 - 270
10
t = ----------------- x √(30 x40)/(30 + 40) = ------------ x 4.15; t = 3.44
12.05
12.05
Degrees of freedom (n1-1) +(n2-1)= 68
The calculated value of t is (3.44) less than the table value( t 0.05= 1.96), that hypothesis
rejected. Hence , There difference among the mean weekly wage amounts of the two firm at 5%
significant Level.
Solution
Let us take the hypothesis that there is significant difference in mean life of the two
makes of bulbs I and II .Applying t-test of the difference of means:
S = √Σ (n1-1)x S1 2 + (n2 – 1)x S22 /(n1 +n2 - 2) = √7x(36) 2 + 6 x(40) 2 / 8 + √7 -2 = 37.898_
_
X1 = 1234; X = 1136; S = 37.898; n1 =8; n2 = 7;
Formula: t
For V = 13;
_ _
( X1 - X2)
1234 – 1136
98x1.932
= ----------- x √n 1xn2 / n1 + n2 ; t = ----------------- x √ 8x 7 / (8+7) = ---------S
37.898
37.898
= 4.996.
t = 0.05 = 2.16;
The calculated value of t (4.996) is more than the table value (2.16). The hypothesis is
rejected. Hence the difference in the mean is significant.
Test of hypothesis about the difference Between Two means- Modal 2
Additional Problem
Sample of two different types of bulbs were tested for length of life , and the following data
were obtained . Is the difference in the means significant? ( Given that the significant value of t at
5% level of significance for 13 d.f is 2.16.) (Ans: S = 40.73 ; Calculated t – value = 9.39 ;
Particulars
Type I
Type II
Sample size
n1 = 8
n2 = 7
Sample mean
_
X 1= 1234 hours
_
X 2 =1036 hours
Sample of SD
S1 = 36 hours
S2 = 40 hours
Paired t – Test - Illustration
Memory capacity of a student was tested before and after a course of mediation for a month
of state whether course was effective or not from the following data?
Before Training :
10
15 9 3
7 12 16 17 4
After Training :
12
17
8 5
6
11 18 20 3
Solution :
Step : 1 : Setting up Null Hypothesis : The course was effective :
Step 2 : Find the Square of Difference :
x
y
d = y-x
d2
10
15
9
3
7
12
16
17
4
12
17
8
5
6
11
18
20
3
2
2
-1
2
-1
-1
2
3
-1
4
4
1
4
1
1
4
9
1
Σd = 7
Σd2 = 29
Continue:
Step 3 Test Statistic :
_
Σd
7
d = ---------- = ---------- = 0.77
n
9
Σd2
(Σd) 2
29
( 7) 2
S = √ --------- - ------ = √ ------ - -------- = √3.22 - 0.59 = S = 1.621
n
(n) 2
9
( 9) 2 a
_
d
0.77
t = ------- = ------------ = 1.343
S
1.621
---------------------√n–1
√8
Step 4 : Level of Significance :
Degree of freedom : n – 1 = 9 – 1 = 8 ; Table Value : 2.31
Step : 5 :Conclusion :
Calculated value( 1.343 ) > Table value ( 2.32 ) , we accept the Null hypothesis , Hence We concluded
that the course is effective.
Paired t – Test – Additional Problem
Poor students were given intensive coaching and test whether the given before and after coaching if
any improvement in the coaching class use pair –t - test.
Before Coaching :
50 42 51 26 35 42
60
41
70
55 62 38
After Coaching :
62 40 61 35 30 52
68
51
84
63 72 50
Hints : Degree of freedom : 12
Table Value @ 5% level of Significance : 2.31
Calculated value : 4.87
Test of Significance of Large Samples
It is very difficult to distinguish between large and small samples. If the sample size is
greater than 30 i.e. if n > 30 , then those samples may be regarded as large samples.
Assumptions:
1) The random sampling distribution of statistics is approximately normal.
2) Sampling values are sufficiently close to the population value and can be used for the
calculation of standard error estimate.
Standard Error of mean: It measures sampling errors involved in estimating population
parameter from a sample.
1) When standard deviation of the population is given:
_
σP
S.E.X =
----√ n
Where, σP = Standard deviation of the population
n = Number of observation in the sample
2) When standard deviation of the population is not given:
_
σ ( Sample)
S.E.X =
-------------√ n
Where,
σ= Standard deviation of the population
Generally, we use the standard deviation of the sample , if standard deviation of the population is
not given.
Test of Significance of Large Samples -Problem
Problem : A company manufacturing electric light bulbs claims that the average life of its
bulbs is 1600 hours . The average life and standard deviation of a random sample of 100 such
bulbs were 1570 hours and 120 hours respectively . Should we accept the claim of the
company.
Solution : Step 1 : Setting Up Null Hypothesis :
Step 2 : Test Statistics : Standard deviation (σ) = 120 ; Actual Mean = 1570 ;
Expected Mean = 1600 ; n = 100.
_
S.E.X =
σ
120
------- = ----- = 12
√ n
√ 100
Difference
1600 – 1570
30
Z = --------------- = --------------------- =
----- = 2.5
_
12
12
S.E. X
Step 3: Level of Significance : Table value @ 5% level = 1.96
Calculated value
= 2.5
Step : 4 : Conclusion: Calculated value 2.5 > table value 1.96 at 5% level of significance , the
hypothesis can not be accepted. We can not accept the claim of the company.
Illustration
A sample of 100 students is taken from a college. The mean height is 64 inches and the
standard deviation 6 inches. Can it be reasonably regarded that the students , the mean height is
66 inches? Also set up 99% limits within which the average height of the students is expected to
lie.
Solution: Step 1 : Setting Up Null Hypothesis :
The ( population)students’ average height can be 66 inches.
Step 2 : Test Statistics
_
σ
6
S.E.X =
------- = ----- = 0.6
√ n
√ 100
Difference
2
Z = --------------- = ---------------- = 3.33
_
0.6
S.E. X
Step 3: Level of Significance
Table value @ 5% level = 1.96
Calculated value
= 3.33
Step : 4 : Conclusion: Calculated value 3.33 > table value 1.96 at 5% level of significance , the
hypothesis can not be accepted. We can not accept the hypothesis. Thus ,we can conclude that
the ( population)students’ average height can not be 66 inches.
University Question
A random sample of 121 checking accounts at a bank showed an average daily balance of $ 280. The
standard deviation of the population is known to be $66.
i) Find the standard error of the mean.
ii) Construct a 95% confidence interval for the mean.
iii) Construct a 99% confidence interval estimate for a mean.
Solution:
_
n = 121 ; X = 280 ; σ = 66
_
σ
66
66
S.E. X =
------------- = -------------- = ------ = 6
√ n
√ 121
11
_
95% confidence interval for the mean : X + - Table value x S.E.
Upper limit = 280 + (1.96 x 6) = 291.76
Lower limit = 280 - (1.96 x 6) = 268.24
_
99% confidence interval for the mean : X + - Table value x S.E.
Upper limit = 280 + (2.58 x 6) = 295.48
Lower limit = 280 – (2.58 x 6) = 264.52
Continue
_
99% confidence limits = X +- 2.58 S.E
64 + - 2.58 x 0.6
= 64 +-1.548
= 62.45 to 65.55
Hence the mean height of the students is expected to lie between 62.45 to 65.55.
Illustration
An auto company decided to introduce a new six cylinders cars whose mean petrol
consumption is claimed to be lower than that of the existing auto engine. It was found that the
mean petrol consumption for the 50 cars was 10 km. per litre with a standard deviation of 3.5
k.m .per litre . The test for the company at 5% level of significance , whether the claim regarding
the new car petrol consumption is 9.5 k.m. per litre the average is acceptable.
Solution: Step 1 : Setting Up Null Hypothesis Let us take the hypothesis that there is no significant
difference the sample average and the company’s claim.
Step 2 : Test Statistics
_
σ
3.5
3.5
S.E.X =
------- = ----- = -----= 0.495
√ n
√ 50 7.07
Difference
10 – 9.5
Z = --------------- =
---------------- = 1.01
_
0.495
S.E. X
Step 3: Level of Significance
Table value @ 5% level = 1.96
Calculated value
= 1.01
Step 4 :Conclusion: Calculated value 1.01< table value 1.96 at 5% level of significance , the
hypothesis can be accepted. Hence, the company claim that the new car petrol consumption is
9.5 km.per litre is acceptable.
Additional Problems :
Problem .1. The mean life of a sample of 400 fluorescent tube light produced by a company is found
to be 1570 hours with a standard deviation 150 hours . Test the hypothesis that the mean life of
time of the tube light produced by the company is 1600 hours against the alternative hypothesis
that it is greater than 1600 hours at 1% level of significance.
Answer : Table Value : 2.58 ; Calculated Value = 4
Test of hypothesis about the difference Between Two means
a) When two independent random samples are drawn from same population
S.E. of the difference between sample means = √ σ 2(1/n + 1/n2)
b)When two random samples are drawn from different population
S.E. of the difference between sample means = √ (σ 2 1 /n1+ σ 2 2/n2)
We are mostly used , when two random samples are drawn from different population
Illustration : 1) Intelligence test on two groups of boys and girls gave the following result: Is there a
significant difference in the mean score obtained by boys and girls.
Mean
S.D
N
Girls
75
15
150
Boys
70
20
250
Solution:
Step 1 : Setting Up Null Hypothesis : Let us take the hypothesis there is no significant
difference the mean score obtained by boys and girls.
_
_
S.E. of the difference between sample means = X1 – X2 = √ (σ 21 /n1+ σ22/n2)
σ1 = 15, σ2 = 20, n1 = 150, n2 = 250,
Step 2 : Find out the Standard Error:
Substituting values: _
_
S.E = X1 – X2 = √ (15) 2 /150 +(20) 2 /250 = √1.5+1.6 = 1.76
Step 3 : Z test of statistics:
Z=
Difference
------------- =
S.E
75 - 70
-------- -- = 2.84
1.76
Step 4: Level of Significance :1% level of significance table value = 2.58.
Step 5 : Conclusion:
Since the difference is more than 2.58 S.E (1% level of significance ), the
hypothesis is rejected . There seems to be a significance difference in the mean scores
obtained by boys and girls.
Test of hypothesis about the difference Between Two means
Problem.2. A college conducted both day and evening classes intended to be identical. A sample of 100
day students yields examination results as under:
_
X1 = 72.4; σ1 = 14.8
a sample of 200 evening students yields examination results as under:
_
X2 = 73.9; σ2 = 17.9
Are the two means statistically equal at 1% level?
Solution : Step 1 : Setting Up Null Hypothesis : The two sample means of college students are statistically equal .
Step 2 : Find the standard error :
_
_
S.E. of the difference between sample means = X1 – X2 = √(σ 21 /n1+ σ 22/n2)
Step 3 : Z test Statistics :
_
_
S.E = X1 – X2 = √ (14.8) 2 /100+(17.9) 2 / 200 = √2.1904+1.602 = 1.947
Difference
72.4 – 73.9
Z = ------------------ = --------------- = 0.77
S.E
1.947
Step 4: Level of Significance :1% level of significance table value = 2.58
Step : 5 : conclusion : Calculated value 0.77 < table value 2.58 at 1% level of significance , the
hypothesis can be accepted. Hence conclude that The two sample means of college students are
statistically equal.
Test of hypothesis about the difference Between Two meansAdditional problem .1
The number of accidents per day was studied for 144 days in a town A and 100
days in town B and the following information is obtained . Is the difference between
mean accidents of the two towns statistically significant.
Particular
Town - A
Town - B
Mean No.of Accidents
4.5
5.4
Standard Deviation
1.2
1.5
Problem :2
You are working as a purchasing manager for a company . The following information has been
supplied to you by two manufacturers of electric bulbs :
Particulars
Company A
Company B
Mean life ( in hours)
1300
1248
Standard Deviation
82
93
Sample Size
100
100
Which brand of bulbs are you going to purchase if you desire to take a risk of 5%.
Answer : Z value = 4.19 ; table value : 1.96.
Small Sample F – Test/ The Variance - Ratio Test
The F-distribution, also known as the Snedecor's F-distribution or the Fisher-Snedecor distribution
(after R.A. Fisher and George W. Snedecor), is the distribution of ratios of two independent estimators of
the population variances. Suppose we have two samples with n1 and n2 observations, the ratio F = s12 / s22
where s12 and s22 are the sample variances, is distributed according to an F-distribution with v1 = n1-1
numerator degrees of freedom, and v2 = n2-1 denominator degrees of freedom.
_
2
=
S1
= (X1 – X1 ) 2 /(n1 – 1)
_
2
=
S1
= (X2 – X2 ) 2 /(n1 – 1)
F=
Larger Estimate of variance
---------------------------------Smaller estimate of variance
.
F – Test Problems- Modal 1
The main use of F-distribution is to test whether two independent samples have been
drawn for the normal populations with the same variance, or if two independent estimates of
the population variance are homogeneous or not, since it is often desirable to compare two
variances rather than two averages.
Problem.1. Two random samples drawn from two normal populations, test whether the
whether the two population have the same variance at 5% level of significance.
Sample I
55
54
52
53
56
58
52
50
51
49
Sample II
108
107
105
105
106
107
104
103
104
101
Solution:
Step : 1 : Set up Null Hypothesis : Let us take the hypothesis that two populations have the same
variance.
Step 2 : Calculation of Mean square :
Sample I
X1
_
(X1 – X1); 53
_
(X1 – X1) 2
Sample II
X2
_
(X2 – X2);105
_
(X2 – X2) 2
55
54
52
53
56
58
52
50
51
49
2
1
-1
0
3
5
-1
-3
-2
-4
4
1
1
0
9
25
1
9
4
16
108
107
105
105
106
107
104
103
104
101
3
2
0
0
1
2
-1
-2
-1
-4
9
4
0
0
1
4
1
4
1
16
ΣX1 = 530
_
Σ(X1 – X1)= 0
_
Σ (X1 – X1) 2
= 70
ΣX2 = 1050
_
Σ(X2 – X2)=0
_
Σ(X2 – X2) 2= 40
Continue
Step : 3. Statistic Test :
_
X1 = ΣX1/N = 530 / 10 = 53;
Sample Variance 1 = S1 2
Sample Variance 2= S2 2
Therefore F = s 2
1
/ s 22
=
_
X2 = ΣX2 / N =1050/ 10=105
_
=
= (X1 – X1 ) 2 /(n1-1)=70 / 9 = 7.78
_
= (X2 – X2 ) 2 /( n2 -1) = 40/9 = 4.44
= 7.78/ 4.44 = 1.75
Step 4 : Level of Significance :
Degrees of Freedom V1 = 9 , V2 = 9
Table value at 5% level = 3.15
Calculated value
= 1.75
Step : 5 Conclusion:
The calculated value is less than table value . Hence we accept the hypothesis and conclude that the
samples have been drawn from the same population
Addition Problem
Two random sample drawn from two normal populations are
Sample 1 : 20
16
26 27 23 22 18 24 25 19
Sample 2 : 27
32
42 35 32 34 38 28 41 43
30 37
Obtain the estimate of the variances of the population and test whether the populations have the
same variance.
Hints : s12 = 13.33 ; s 22 = 28. 55 ; F = 2.1417 ; Table Value @ 5% level of Significance = 3.11
Modal 2
Two random samples gave the following results :
_
n1 = 10 , ( X – X) 2 = 90
_
n2 = 12 , (Y – Y) 2 = 108
Test whether the sample came fro the populations with same variance.
Solution :
Step 1 : Setting up Null Hypothesis :The samples are drawn from the populations with equal
variance.
Step 2 : Test Statistic :
_
Sample Variance 1 = S1 2 =
= (X1 – X1 ) 2 /(n1-1)= 90 / 9 = 10
_
Sample Variance 2= S2 2 =
= (Y – Y ) 2 /( n2 -1) = 108/11 = 9.82
Therefore F = s 2
1
/ s22
= 10/ 9.82 = 1.02
Continue
Step 4 : Level of Significance :
Degrees of Freedom V1 = 9 , V2 =11
Table value at 5% level = 2.90
Calculated value
= 1.02
Step : 5 Conclusion:
The calculated value is less than table value . Hence we accept the hypothesis and conclude that
the variances of two samples have been drawn from the populations are equal.
F – Distribution : Modal 2 - Additional
From the following data test if the difference between the variances is significant at 5% level of
significant.
Sample
A
B
Sum of squares of deviation from Mean = 84.4
102.6
Size (n)
= 8
10
Hints : s12 = 13.5 ; s 22 = 11.3 ; F = 1.147 ; Table Value @ 5% level of Significance = 3.29
Analysis of Variance
The Analysis of Variance is one of the most powerful statistical techniques.
It is a statistical test for heterogeneity of means by analysis of group variances.
The analysis of variance technique, developed by R.A. Fisher in 1920s, is capable
of fruitful application to diversity or practical problems.
*Many studies involve comparisons between more than two groups of subjects.
*If the outcome is categorical data, a chi square test for a larger than 2 x 2 table can
be used to compare proportions between groups.
* The analysis of difference between two statistical data is known as Analysis of
variance
*If the outcome is numerical , ANOVA can be used to compare the mean between
groups.
*ANOVA is the abbreviation for the full name of the method Analysis Of Variance.
F – test
•
•
•
•
•
Assumptions for F – test
The following are the assumptions for applying the F-test.
· The samples are simple random samples.
· The samples are independent of each other.
· The parent populations from which they are drawn are normally distributed
Procedures for Performing an Analysis of Variance:
•
•
•
•
•
•
STAT GRAPHICS Centurion provides several procedures for performing an analysis
of variance:
1. One-Way ANOVA - used when there is only a single categorical factor. This is
equivalent to comparing multiple groups of data.
2. Multifactor / Two way ANOVA - used when there is more than one categorical
factor, arranged in a crossed pattern. When factors are crossed, the levels of one
factor appear at more than one level of the other factors.
3. Variance Components Analysis - used when there are multiple factors, arranged
in a hierarchical manner. In such a design, each factor is nested in the factor above
it.
4. General Linear Models - used whenever there are both crossed and nested
factors, when some factors are fixed and some are random, and when both
categorical and quantitative factors are present.
We are only discuss about the One way ANOVA and Two way ANOVA
Steps in One-Way ANOVA/Classification:
In one way classification , the data are classified according to only one criterion.
1)Total sum of all the items of various samples , i.e. T
T = ΣX1+ΣX2+ΣX3+ΣX4 ……….
2) Correction Factor = T2/N (N = Number of items)
3) Total sum of Squares = ΣX1 2 +ΣX2 2 +ΣX3 2 +ΣX4 2 _ T2/N
4) Sum of square between samples = ΣX1 2 / N+ΣX2 2/N +ΣX3 2 /N+ΣX4 2/N _ T2/N
5) Mean square between samples = Sum of square between samples – Degrees of freedom
(Take total samples)
6) Sum of square within samples = Total sum of squares - Sum square between samples
7) Mean square within samples = Sum of square within samples /Degree of freedom
(Take total number of items of samples)
8)Prepare the ANOVA Table
9) Calculate the F- Ratio = Mean square Between Colum variance / Mean square Within
column Variance
10) Compare the calculated value of F with the table value of F for the degrees of freedom at certain
critical level
ADVANTAGES
•
•
•
An important advantage of this design is it is more efficient than its one-way
counterpart. There are two assignable sources of variation – age and gender in our
example – and this helps to reduce error variation thereby making this design
more efficient.
Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same
time.
One can also test for independence of the factors provided there are more than
one observation in each cell. The only restriction is that the number of
observations in each cell has to be equal (there is no such restriction in case of
one-way ANOVA).
•
Illustration : 1
Set up ANOVA table for the following per hectare yield for these varieties.
Variety of Yield
A1
A2 A3
6
5
5
7
5
4
3
3
3
8
7
4
Also work out f – ratio and test whether there is a significant difference among the means of the
wheat.
Solution :
Variety of Yield
A1
A2 A3
6
5
5
7
5
4
3
3
3
8
7
4
Total
24
20 16
Continue
Step 1 ; setting up Null Hypothesis : There is no significant difference between the means of the
samples. μ 1 = μ2 = μ3
Step 2 : Correction Factor T2/N :
T = ΣA1+ΣA2+ΣA3 = 24+20+16= 60
= T2/N = (60) 2 = 3600/12 = 300
Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N
= 6 2 + 5 2 + 5 2 + 7 2 + 5 2 + 4 2 + 3 2 + 3 2 + 3 2 +8 2 + 7 2 + 4 2
= 36 + 25 + 25+ 49 + 25 + 16 + 9 + 9 + 9 + 64 + 49 + 16 – 300 = 32.
Step 4 : Column Sum of Squares (CSS): ΣA1 2 +ΣA2 2 +ΣA3 2 / Number of Rows - T2
= 24 2 +20 2 + 16 2 =576 + 400 + 256 / 4 - 300 = 1232 /4 – 300 = 308 – 300 = 8
Step 5 : Error Sum of squares (ESS): TSS - CSS = 32 – 8 = 24.
Step 6 Level of Significance: ANOVA Table : Table Value @ 5% level ( 4.26) .
Step 7 :Conclusion : Calculated value (1.5) < Table Value ( 4.26) , Accept the Null Hypothesis. We
conclude that there is no significant difference between the means of the samples. μ 1 = μ2 = μ3
Sources of
Variation
Degrees of
Freedom
Sum of
Square
Mean Sum of
square
F - ratio
CSS
ESS
Tss
(C-1)2
(n-C) 9
(n-1)11
8
24
32
8/2 =4
24/9 = 2.67
4/2.67 = 1.50
F – Table
Value
F (2,9 ) = 4.26
Problems and Solutions in One-Way Classification:
1) A certain manure was used on four plats of land . A , B, C. D. four beds were prepared in each plot
and the manure used. The output of the crop in the beds of plots A, B , C, D is given below. You are
find out whether the differences in the means of the production of crops of the plots is significant
or not.
Beds of plots
A
B
C
D
8
9
15
6
12
3
10
8
1
7
4
10
3
1
7
8
Solution:
Beds of plots
A
B
C
D
8
9
15
6
12
3
10
8
1
7
4
10
3
1
7
8
Total
24
20 36
32
Continue
Step 1 ; setting up Null Hypothesis : There is no significant difference in the means of the production of
crops of the plots. = μ 1 = μ2 = μ3 = μ4
Step 2 : Correction Factor T2/N :
T = ΣA+ΣB+ΣC+ΣD = 24+20+36 + 30 =120
= T2/N = (112) 2 = 12544/16 = 784
Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N
= 8 2 + 9 2 + 15 2 + 6 2 + 12 2 + 3 2 + 10 2 + 8 2 + 1 2 +7 2 + 4 2 + 102 + 3 2 +1 2 + 7 2 + 82 - 784
= 64 + 81 + 225+ 36 + 144 + 9 + 100 + 64 + 1 + 49 + 16 + 100 + 9 + 1 + 49 + 64 – 900 = 1012-900=228.
Step 4 : Column Sum of Squares (CSS): ΣA 2 +ΣB 2 +ΣC 2 +ΣD 2 / Number of Rows - T2
= 24 2 +20 2 + 36 2 + 32 2 = 576 + 400 + 1296+ 1024 / 4 - 784 = 3296 /4 – 784 = 824 – 784 = 40
Step 5 : Error Sum of squares (ESS): TSS - CSS = 228 – 40 = 188.
Step 6 ANOVA Table :
Sources of
Variation
Degrees of
Freedom
Sum of Square
Mean Sum of
square
F - ratio
CSS
ESS
Tss
(C-1) 3
(n-C) 12
(n-1) 15
40
188
228
40/3 =13.33
188/12 = 15.67
13.33/15.67
=0.85
F – Table
Value
F (3,12 ) = 3.49
Step 7: Level of Significance 5% level : Table Value (3.49) ,
Step 8 : Conclusion : Calculated value (.85) < Table Value (3.49) , Accept the Null Hypothesis. Hence We
conclude that there is no significant difference in the means of the production of crops of the plots.
Problems and Solutions in One-Way Classification
Problem.2.The following table
illustrate the sample psychological health rating of
corporate executives in the field of Nanking, Manufacturing, Retailing. Can you
consider the psychological health rating of corporate executives in the given three
fields to be equal at 55 level of significance?
Banking
14
16
18
Manufacturing
14
13
15
22
Retailing
18
16
19
19
20
Solution
Step 1 ; setting up Null Hypothesis : Let us take the hypothesis that there is no significance in the
psychological health rating of corporate executives.
Step 2 : Correction Factor T2/N :
T = ΣA+ΣB+ΣC+ΣD = 48+64+92 =204
= T2/N = (204) 2 = 41616/12 = 3468
Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N
= 14 2 + 16 2 + 18 2 + 14 2 + 13 2 + 15 2 + 22 2 + 18 2 + 16 2 +19 2 + 19 2 + 202 - 3468
= 196 + 256 + 324+ 196 + 169 + 225 + 484 + 324 +256 + 361 +361 + 400 – 3468 = 3552- 3468= 84.
Step 4 : Column Sum of Squares (CSS): ΣA 2 +ΣB 2 +ΣC 2 +ΣD 2 / Number of Rows - T2
(48) 2/3+ (64) 2/4 +(92) 2/5 -3468 = 768 +1024+ 1693– 3468 = 17
Step 5 : Error Sum of squares (ESS): TSS - CSS = 84 – 17 = 67.
Step 6 ANOVA Table :
Sources of
Variation
Degrees of
Freedom
Sum of Square
Mean Sum of
square
F - ratio
F – Table Value
CSS
ESS
Tss
(C-1)2
(n-C) 9
(n-1) 11
17
67
84
17/2 = 8.5
67/9 = 7.44
8.5/7.44 =
1.1585
F (2,9 ) = 3.49
Step 7 Level of Significance: Table value of F @ 5% level of significance = 4.26.
Step 8: Conclusion: Since the calculated value of F is (1.15) less than the table value (4.26) , we can
accept the hypothesis . Hence we concluded that psychological health rating of corporate
executives in the given three fields do not differ significantly.
One way ANOVA Classification- Additional Problems
Problem. 1) A researcher is concerned about the level of knowledge possessed by university students
regarding United States history. Students completed a high school senior level standardized U.S.
history exam. Major for students was also recorded. Data in terms of percent correct is recorded
below for 32 students. Compute the appropriate test for the data provided below.
Education
Management
Social Science
Fine Arts
62
72
42
80
81
49
52
57
75
63
31
87
58
68
80
64
67
39
22
28
48
79
71
29
26
40
68
62
36
15
76
45
One way ANOVA Classification- Additional Problems
ANSWER
Source
SS
df
MS
F
63.25
3
21.083333333
.04
Within
12298.25
28
439.2232143
Total
12361.5
31
Between
One way ANOVA Classification- Additional Problems
Problem.2. A research study was conducted to examine the clinical efficacy of a new
antidepressant. Depressed patients were randomly assigned to one of three groups: a placebo
group, a group that received a low dose of the drug, and a group that received a moderate dose
of the drug. After four weeks of treatment, the patients completed the Beck Depression
Inventory. The higher the score, the more depressed the patient. The data are presented below.
Compute the appropriate test.
Placebo
Low Dose
Moderate Dose
38
22
14
47
19
26
39
8
11
25
23
18
42
31
5
One way ANOVA Classification- Additional Problems
ANSWER
Source
Between
Within
Total
SS
df
MS
F
1484.9333333
2
742.4666666
11.26
790.8
12
65.9
2275.733333
14
One way ANOVA Classification- Additional Problems
Problem 3: An official from Central Government is concerned about the monthly expenses of three
different boards, that is, Civil Supplies Board, Electricity Board and Higher Education Board. He
wants to find out whether the boards spend equal amounts on personnel and equipment. He
applies the technique of analysis of variance to test his assumption at 0.05 level of significance.
He collects the monthly expenses of three different boards for the previous few months and
summarizes them into a tabular form as shown in table 7.3. Calculate the number of degrees of
freedom to test at the given level of significance?
Civil Supply Board
14
8
12
9
18
Electricity Board
15
9
8
10
13
Higher Education Board
8
16
12
6
13
One way ANOVA Classification- Additional Problems
1. There are three main brands of a certain powder. A set of 120 sample values is
examined and found to be allotted among 4 groups A, B,C and D and three brands as
shown below.
Brands
Groups
A
B
C
D
1
0
4
8
16
2
5
8
13
6
3
18
19
11
13
Is there any significant difference between brand preference at 5% Level of significance?
Two way ANOVA Classification
Meaning : In a two-way classification the data are classified according to two different criteria or
factors.
• Assumptions
• The populations from which the samples were obtained must be normally or approximately
normally distributed.
• The samples must be independent.
• The variances of the populations must be equal.
• The groups must have the same sample size.
• EXAMPLE: An agricultural scientist is interested in the corn yield when three different fertilizers
are available and corn is planted in four different soil types. The questions he is interested in
answering are:
• Does fertilizer type have an effect on crop yield?
• Does soil type have an effect on crop yield?
• Do the two treatment factors interact? For instance, there may be no difference between fertilizer
#1 and fertilizer #2 in soil type 1, but fertilizer #1 may produce a greater corn yield than fertilizer
#2 in soil type 2. This is an example of interaction.
Steps for calculation of Two way ANOVA Classification
Step 1) Data are covert in to Coded data .
Step 2) Find out Correction Factor = T.
Step 3) Find out Sum of square between Columns.
Step4) Calculate the Sum of square between Rows.
Step 5) Computation of Total sum of square = Sum of Square of all items- Correction factor.
Step 6) Find out Sum of Square of residual.
= Total sum of square – (Sum of square between Columns + Sum of square between Rows)
Step 7) Calculate the degrees Freedom(Columns, Rows, Residual).
Step 8) Preparation of ANOVA TABLE.
Step 9) Find out the Columns Variance (Fc).
Step 10) Find out the Rows Variance (Fr).
Step 11) Give conclusion.
Problems and Solutions of Two way ANOVA Classification
Problem.1.To study the performance of four sales man during the festivals – Deepavali, Ramzan,
Christmas the number of units of Refrigerators sold are given below. Use analysis of variance
and answer the following.
i) Do the salesman significantly differ in performance?
ii) Is there significant difference in the sales between the festival?
Festival
Salesman A
Salesman B
Salesman C Salesman D
Festival
Total
Deepavali
Ramzan
Christmas
50
32
39
48
31
36
52
34
33
46
39
32
196
136
140
Salesmen’s
Total
121
115
119
117
472
Solutions of Two way ANOVA Classification
Step 1 : Coded Data : It is a problem of two way analysis of variance . In order to simplify
calculations , we code the data by deducting 35 from each figure.
Coded Data:
Salesman
Festival
Deepavali
Ramzan
Christmas
Column
total
A
B
C
D
Row total
X1
X2
X3
X4
15
-3
4
13
-4
1
17
-1
-2
11
4
-3
56
-4
0
ΣX1=16
ΣX2=10
ΣX3=14
ΣX4=12
T=52
Solution (continue)
Step 2 Setting up Hypothesis : Let us take the hypothesis that there is no difference in the sales of
sales man and festivals.
Step 3: Correction Factor:
Correction Factor = T2/N = (52) 2 /12 = 2704/12 =225.3
Step 4: (TSS)Total sum of square = Sum of Square of all items- Correction factor
= (15) 2+ (-3) 2 + (4) 2 + (13) 2 + (-4) 2 + (1) 2 + (17) 2 + (-1) 2 + (-2) 2 + (11) 2 + (4) 2 + (-3) 2 _ T2/N
=225+9+16 +169+16+1 +289+1 +4+121+16+9+- 225.3 = 650.7
V = ( Degree of freedom) = 12 - 1=11
Step 5:(CSS) Sum of square between Salesman (Columns):
= (ΣX1) 2+(ΣX2) 2 +(ΣX3) 2 + +(ΣX4) 2/Number of Column _ T2/N
= (16) 2+ (10) 2 +(14) 2 +(12) 2 - 225.30
= 256 +100+196+144/ 3 - 225.3
696 /3 =232 - 225.3 = 6.7
V =( Degree of freedom) = 4 -1 = 3
Step 6 : (RSS)Sum of square between Festivals (Rows):
= (56) 2+ (-4) 2 +(0) 2/Number of Rows - T2/N
= 3136 +16 + 0 =3152 /4 =788 - 225.3 = 526.7
V = ( Degree of freedom) =3- 1=2
Step 7 (ESS) Sum of Square of residual =TSS – (CSS+RSS)
= 650.7 – (6.7+ 562.7) = 81.4
V = (3 – 1)(4 -1) = 6
Solution (continue)
Step 8) ANOVA TABLE:
Sources of
Variations
Sum of
Squares
Degree of
freedom
Mean
Square
F
CSS
6.6
3
6.6/3 =2.2
Fc =2.2/13.6= .162
RSS
562.7
2
562.7/2 =
281.4
Fr=281.4/ 13.6=20.69
ESS
81.4
6
81.4/6 =
13.6
TSS
560.7
11
Solution (continue)
Step 9 Conclusion:
i) Now, first you compare salesman variance estimate with the residual variance estimate
F = Mean Square of Column / Residual mean square
F = 2.2 / 13.6 =0.612
The table value of F for V1 = 3 and V2 = 6 at 5% level of significance is 4.76. The
calculated value is less than the table value and we concluded that the sales of different
salesman don’t differ significantly.
ii)We shall compare seasons variance estimate with the residual variance estimate.
F = Mean Square of Rows/ Residual mean square
F = 281.4 / 13.6 = 20.69
The table value of F for V1 =2 and V2 = 6 at 5% level of significance is 5.14. The
calculated value is more than the table value and we concluded that the sales during
different salesman differ significantly.
Illustration-2
Problem . 2) A manufacturer of bags who has so far been making Leather bags wants to introduce
three additional types of bags. The new will be Plastic, Water proof, and Canvas bags. The
manufacturer test marketed all four types of bags in five different stores for a month to decide
which type of bags to concentrate on so that his sales are maximised. Find if there is a significant
difference in the sales of different types of bags at 5% level of significance.
Types of bags
Stores
leather bags
1
2
3
4
5
46
48
36
35
40
Plastic bags
40
42
38
40
44
water proof
bags
49
54
46
48
51
canvas bags
38
45
34
35
41
Solutions of Two way ANOVA Classification
Step 1 CODED DATA: It is a problem of two way analysis of variance. Let us code the data by
deducting 40 from each figures. CODED DATA:
Types of Bags Sales
Stores
1
2
3
4
5
Column
Total
Leather
bags
Plastic bags
Water proof Canvas bags Row total
bags
X1
X2
X3
X4
6
8
-4
-5
0
0
2
-2
0
4
9
14
6
8
11
-2
5
-6
-5
1
13
29
-6
-2
16
ΣX1=5
ΣX2=4
ΣX3=48
ΣX4=-7
T=50
Solution (continue)
Step 2) Correction Factor :
Correction Factor = T2/N = (50) 2 /20= 2500/20 = 125
Step 3. (TSS)Total sum of square = Sum of Square of all items - Correction factor
= (6) 2+ (8) 2 + (-4) 2 + (-5) 2 + (0) 2 + (0) 2 + (2) 2 + (-2) 2 + (0) 2 + (4) 2 + (14) 2 + (6) 2 + (8) 2 + (11) 2 + (-2) 2 +
(5) 2 + (-6) 2 + (-5) 2 + (1) 2 _ T2/N
=36+64+16 +25+4+4 +16+81 +196+36+64+121 +4+25 +36+25+1 - 125 = 629
V = ( Degree of freedom) = 20 - 1=19
Step 4 (CSS) Sum of square between Bags (Columns):
= (ΣX1) 2 +(ΣX2) 2 +(ΣX3) 2+ (ΣX4) 2 /Number of Columns _ T2/N
= (5) 2+ (4) 2 +(48) 2 +(-7) 2 – 125
= 5+3.2+ 460.8+9.8 - 125 = 353.8
V =( Degree of freedom) = 4 -1 = 3
Step 5. (RSS)Sum of square between Stores (Rows):
= (13) 2+ (29) 2 +(-6) 2 + (-2) 2 + (16) 2/Number of Rows - T2/N = 326
= 169+841+36+4+256 = 1306/4 = 326.6 - 125 = 201.6
V = ( Degree of freedom) = 5- 1 = 4
Step .6 .( ESS) Sum of Square of Error = TSS – RSS = 629 – (353.8+201.6) = 73.6
V = (5 – 1)(4 -1) = 12
Solution (continue)
Step 7) ANOVA TABLE:
Sources of Variations
Sum of
Squares
Degree of
freedom
Mean Square
F
(CSS)
(Salesman)
353.8
3
117.9
Fc= 117.9/6.1
=19.3
(RSS)
201.6
4
50.4
Fr=50.4/6.1
=8.3
(ESS)
73.6
12
6.1
Total
629
19
Solution (continue)
Step .8 . Setting up Hypothesis :Let us take the hypothesis that there is no difference in the types
of Bags and different stores.
Step.9 : Conclusion:
i) Now, first you compare types of Bags variance estimate with the residual variance estimate
F = Mean Square of Column / Residual mean square
F = Fc= 117.9/6.1 =19.3
The table value of F for V1 = 3 and V2 = 12 at 5% level of significance is 3.49. The
calculated value is more than the table value and we concluded that the sales of different
types of bags are significantly.
ii)We shall compare stores variance estimate with the residual variance estimate.
F = Mean Square of Rows/ Residual mean square
Fr=50.4/6.1 =8.3
The table value of F for V1 = 4 and V2 = 12 at 5% level of significance is 3.25. The
calculated value is greater than the table value and we concluded that the mean of four
types of bags 0f during different sales stores differ significantly.
Two way ANOVA Classification – Additional Problem
Problem 3: Performance study conducted by the Sales Manager of an NML Manufacturing
Company on three salesmen during three seasons and the data is presented in following
table . He wants to know whether there is significant difference between salesmen’s
performances between seasons using level of significance equal to 0.05.
Seasons
Salesman
Summer
Rainy
Winter
Salesman I
32
20
24
Salesman II
40
50
68
Salesman III
54
46
58
Solution (continue)
Answer: ANOVA TABLE:
Sources of
Variations
Sum of
Squares
Degree of freedom
Mean
Square
F
Between Column
(Salesman)
304.22
2
152.110
Fc =152.11/15.44
= 9.85
Between Rows
(Festival)
73.55
2
36.775
Fr =36.775/15.445
= 2.38
Residual
61.79
4
15.445
Total
439.56
8
Continue
Problem.4) A study examining differences in life satisfaction between young adult, middle adult, and
older adult men and women was conducted. Each individual who participated in the study
completed a life satisfaction questionnaire. A high score on the test indicates a higher level of life
satisfaction. Test scores are recorded below.
Group
Male
Female
Young Adult
Middle Adult
Older Adult
4
2
3
4
2
7
5
7
5
6
10
7
9
8
11
7
4
3
6
5
8
10
7
7
8
10
9
12
11
13
Continue
Answer : A study examining differences in life satisfaction between young adult, middle adult, and
older adult men and women was conducted. Each individual who participated in the study
completed a life satisfaction questionnaire. A high score on the test indicates a higher level of life
satisfaction. Test scores are recorded below.
Age Group/
Gender
Male
Female
Young Adult
4
2
3
4
2
Mean=3.0
7
4
3
6
5
Mean=5.0
Middle Adult
7
5
7
5
6Mean=6.0
8
10
7
7
8
Mean=8.0
Older Adult
10
7
9
8
11
Mean=9.0
10
9
12
11
13
Mean=11.0
Solution (continue)
Conclusion: There are significant main effects for age (F=49.09 (2,24), p<.01) and gender (F=16.36 (1,
24), p<.01). There is no interaction effect (F=0.00 (2,24), not significant). Interpret your answer. I
appears from the data that older adults have the highest life satisfaction and younger adults
have the lowest life satisfaction. Women also have significantly higher life satisfaction than men.
Answer: ANOVA TABLE:
Sources of
Variations
Sum of
Squares
Degree of
freedom
Mean
Square
F
Between Column
(Age group)
180
2
90.00
Fc=90/1.83=49.09
Between Rows
(Gender)
30
1
30.00
Fr=30.00/1.83
= 16.36
Residual
44
24
1.83
254
29
Total
Two way ANOVA Classification – Additional Problem
Problem.5) To study the performance of three detergents and three water temperatures , the
following whiteness reading were obtained with specially designed equipment .Perform a
two analysis of variance using 5% level of significance.
Detergent Powder
Water
Temperature
Cold water
Warm water
Hot water
Arasan
Wheel
Rin
57
49
54
55
52
46
67
68
58
Continue
Problem.6.To study the performance of four sales man during the season 0f Summer , Winter ,and
Monsoon the number of units of Fans sold are given below. Use analysis of variance and
answer the following.
i) Do the salesman significantly differ in performance?
ii) Is there significant difference in the sales between the Seasons.
Sales in fans
Season
Usha
Summer
Winter
Monsoon
Total sales
Crompton
40
32
30
1o2
Bajaj
40
33
32
105
Anchor
25
35
33
93
Seasons
Total
39
36
33
108
144
136
128
408
Solution (continue)
Answer: ANOVA TABLE:
Sources of
Variations
Sum of
Squares
Degree of freedom
Mean
Square
F
Between Column
(Salesman)
42
3
14
Fc =14/22.67
= 0.618
Between Rows
(Festival)
32
2
16
Fr =16/22.67
= 0.706
Residual
136
6
22.67
Total
210
11
Related documents