Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HYPOTHESIS-Meaning Hypothesis is an assumption which may or may not be true about a population parameter. Hypothesis is a tentative conclusion logically drawn about a population parameter. Statistical hypotheses are statements about the probability distributions of the populations. In other words, statistical hypothesis are assumptions or guesses about population involved. Example, The average weight of fish reared in a Research station is 2.6. Types of hypothesis There are two types of hypothesis namely null hypothesis alternative hypothesis. Null hypothesis. • A statistical hypothesis which is stated for the purpose of possible acceptance is called null hypothesis. It is denoted by H0. Alternative hypothesis. • Hypothesis contradictory to null hypothesis is called an alternative hypothesis Steps in formulation of Hypothesis • • • • • • • • • • Steps / Procedure to formulation of Hypothesis: Setting up of Hypothesis: The first step in testing the hypothesis is setting up of hypothesis. The conventional approach is to set up two hypotheses instead of one. These hypotheses are 1. Null Hypothesis 2. Alternative Hypothesis Following the sampling theory approach, we accept or reject a hypothesis on the basis of sampling information alone. Any samples we draw will vary from population. We must judge whether these differences are statistically significant or insignificant. Choosing a Statistical Tools Second step in testing the hypothesis is choosing a statistical technique. There are many statistical tests, which are frequently used in hypothesis testing. They are Z- test, t-test, and Chisquare test. The researcher should be able to choose the appropriate test. When hypothesis pertain to a large samples (30 or more than 30) the Z-test is used. When the sample is less than 30, the t-test can be used. 3.Selection of Desired Level of Significance: The third step is the Selection of Desired Level of Significance. The confidence with which an experimenter rejects or retains a hypothesis depends upon the level of significance. The significance level is expressed in percentage such as 5% and 1% etc. When the researcher accepts the 5% level, he will make wrong decision about 5% of the time. By rejecting the hypothesis at the same level , he runs the risk of rejecting a hypothesis in 5 out of every 100 occasions. Continue • • • • 4. Computation of Chi-square Test. The fourth step is the performance of computation necessary for the test. The calculation includes testing statistics and the standard error. 5.Draw Statistical Decision: The final step in hypothesis is to draw statistical decision involving the acceptance or rejection of hypothesis. This will depend on whether the computed values of the test fall in the region of acceptance or in the region of rejection at a given level of significance. One Tail Test • One-Tailed Tests of Significance A test is one-tailed when the alternate hypothesis, H1 , states a direction, such as: H0 : The mean income of females is less than or equal to the mean income of males. H1 : The mean income of females is greater than males. Sampling Distribution for the Statistic z for aOne-Tailed Test, .05 Level of Significance Two-Tailed Tests of Significance • A test is two-tailed when no direction is specified in the alternate hypothesis H1 , such as: – · H0 : The mean income of females is equal to the mean income of males. – · H1 : The mean income of females is not equal to the mean income of the males. Two-Tailed Test A statistical test in which the critical area of a distribution is two sided and tests whether a sample is either greater than or less than a certain range of values. Example ; The candy plant wants to make sure that the number of candies per bag is around 50. The factory is willing to accept between 45 and 55 candies per bag. It would be too costly to have someone check every bag, so the factory selects random samples of the bags, and tests whether the average number of candies exceeds 55 or is less than 45 with whatever level of significance it chooses. Z-test and t-test Z-test and t-test. • Z test is the test of significance used for large samples i.e when n>=30. • The Z-test compares the mean from a research sample to the mean of a population. Details (μ, σ) of the population must be known. • t- test is the test of significance for small samples i.e when n < 30. • The t-test compares the means from two research samples. Used when the population details (μ, σ) are unknown. Table Values Level of significance 1% Level 5% Level 10% Level Two Tail Test 2.58 1.96 1.645 One Tail Test 2.33 1.645 1.28 Symbols for Population and samples Population Symbol Sample Symbol Size of population =N Mean of Population =μ Population Standard deviation = σ Population proportion =p Sample Size = n _ Sample mean = x Sample standard deviation = s _ Sample proportion = P T – Test ; Test of Significance of Small Samples The t-test is probably the most commonly used Statistical Data Analysis procedure for hypothesis testing. Actually, there are several kinds of t-tests, but the most common is the "two- sample t-test" also known as the "Student's t-test" or the "independent samples t-test". This method invented by Sir William Gosset. When the sample size is 30 or less and the population standard deviation is unknown , we can use the t- Distribution. Applications of the t- Distribution: 1) Test of hypothesis about the population mean. 2) Test of hypothesis about the difference between two means. 3) Test of hypothesis about the difference between two means with dependent samples. 4) Test of hypothesis about co- efficient of correlation Test of hypothesis about the Single population mean In determining whether the mean of a sample drawn from a normal population deviates significantly from the hypothetical value, when variance of the population is unknown we use t – distribution. _ _ ( X - μ) √n √Σ (X – X) 2 Formula: t = ----------; S = -----------_ S n–1 Where; x = Mean of the Sample μ = The actual / hypothetical value of the population n = Sample Size S = The Standard deviation of the Sample The 95% fiducial limit of the population Mean (μ) are: _ S X + - ---- ( t 0.005) and √n 99 limits are : _ S X + - --- ( t 0.01) √n Test of hypothesis about the population mean – Modal-1 The following results are obtained from a sample of 10 boxes of biscuits: Mean weight of contents = 490 gms Standard deviation of the weight = 9 gms Could the sample come from a population having a mean of 500 gms. Solution: Let us take the hypothesis that μ = 500 gms. _ _ 490 – 500 10 ( X - μ) √n X = 490; μ = 500; n = 10; t = ----------- x √10 = ---- x 3.16; t = 3.51 Formula: t = ----------; 9 9 S d.f = (n – 1) 10 - 1 = 9 ; Table Value t 0.01 = 3.25 Calculated Value = 3.51 Conclusion: 3.51 > 3.25 our hypothesis rejected . 95% confidence interval of the population mean _ S 9 = X + - ---- ( t 0.005) = 490 + - --- x 3.51 = 490+- 10 =500 and 480 √n √10 Illustration 2 A sample of 26 bulbs give a mean life of 990 hours with a standard deviation of 20 hours. The manufacturers claim that the mean life of the bulb is 1000 hours . Is the sample not up to the standard. Null hypothesis = Population of the mean μ = 1000 _ X = 990; μ = 1000; n =26 ; s = 20 _ 990 – 1000 -10 ( X - μ) √n ; t = ------------------ x √26 = ---- x 5.10; t = - 2.55 Formula: t = ----------; 20 20 S d.f = (n – 1) 26 - 1 = 25 ; Table Value t 0.05 = 1.708 Calculated Value = 2.55 Conclusion: 2.55 > 1.708 our hypothesis rejected . Illustration 3 – University question -2009 A random sample of size 16 has 53 as mean . The sum of squares of deviation taken from mean is 135. Can this sample is regarded as taken from the population having 56 as mean obtain95% and 99% confidence limit of the mean of the population. Null hypothesis = Population of the mean μ = 56 _ X = 990; μ = 1000; n =16 ; Variance =135 i.e S = 11.62 _ 53 – 56 -3 ( X - μ) √n ; t = ------------------ x √16 = -------- x 4 ; t = - 1.03 Formula: t = ----------; 11.62 11.62 S d.f = (n – 1) 16 - 1 = 15 ; Table Value t 0.05 = 1.708 Calculated Value = 1.03 Conclusion: 1.03 < 1.708 our hypothesis Accepted . 95% confidence interval of the population mean _ S 11.62 = X + - ---- ( t 0.005) = 53 + - ------- x 1.708 = 53+- 3 = 56 and 50 √n √16 Test of hypothesis about the Single population mean – Modal-1 Additional problem Problem1). A random of 10 children had mean weight of 14.3 kgs and a variance of 2.1. Test at the 5% level of significance that the mean weight of the children population is 15 kgs. Problem 2) A new machine attachment would be introduced if it receives a mean of at least 7 0n a ten point scale . A sample of 20 purchase engineers is shown the attachment and asked to evaluate it. The results indicates a mean rating of 7.9 with a S.D is 1.6 . A significant level of @ =0.05 (Table value 2.09) is selected. Should attached be introduced. ( Anna University Question) Hits : _ X = 7.9; μ = 7; n =20; S = 1.6 3) The mean weekly sales of soap bars in departmental stores was 146.3bars per store. After an advertising campaign the mean weekly sales in 22 stores for typical week increased to 153.7 and showed a S.D of 17.2 was the advertising campaign successful. A significant level of @ =0.05 (Table value 1.72) is selected. Hits : _ X = 153.7; μ = 146.3; n =22; S = 17 Test of hypothesis about the Single population mean – Modal- 2 • The life time electronic bulbs for a random sample of 10 from a large consignment gave the following data. Can you accept the hypothesis that the average life time of bulbs is 4000 hours. Items 1 2 3 4 5 6 7 8 9 10 Life in ‘000’ hours 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6 Solution Let us take the hypothesis that “there is no significant difference in the sample mean and the hypothetical population Mean. Applying the t-test. X _ (X - X); 4.4 4.2 -0.2 o.o4 4.6 0.2 0.04 3.9 -0. 5 .25 4.1 -0.3 .09 5.2 0.8 .64 3.8 -0.6 .36 3.9 -0.5 .25 4.3 -0.1 .01 4.4 0 0 5.6 1.2 1.44 ΣX = 44 _ (X – X) 2 _ (X – X)2 = 3.12 Continue _ _ ( X - μ) √n (X – X) 2 3.12 Formula: t = ----------; S = √ ------------ = S = √------------ = 0.589 S n–1 10 – 1 t = 4.4 – 4 0.4 x 3.162 ----------- * √10 = ------------------- = 2.148 0.589 0.589 For v ( Degrees of Freedom) = (n-1) = (10 – 1) = 9 . Table Value t o.05 = 2.262 Calculated Value = 2.148 Conclusion : The calculated value of t 2.148 is less than the table value 2.262.The hypothesis is accepted . The average life time of bulbs could be 4000 hours. Illustration • Example Past records show that the mean marks of students taking statistics are 60 with standard deviation of 15 marks. A new method of teaching is adopted and a random sample of 64 students is chosen. After using the new method, the sample gives the mean marks of 65. Is the new method better? • Solution: Here we are interested in knowing whether the marks increased on using the new teaching method. Therefore, we use the one-tailed method: • We have = 65, m = 60, s = 16 and n = 64 then _ _ ( X - μ) √n √Σ (X – X) 2 √ 3.12 Formula: t = ----------; S = ------------ = S = -------- = 0.589 S n–1 10 – 1 t = 60 – 65 -5 x 8 40 --------- * √64 = -------------- = ------- = 2.5 16 16 16 The null hypothesis is : Ho : m = 60 The alternative hypothesis is : Ha : m > 60. Now suppose the researcher had predetermined the level of significance which is 0.01 or 1% for his decision. Then 2.5 > 2.33 (Here z-score is 2.33 for 0.01 level on the upper-tail of distribution). Therefore, the observed value is highly significant. That is H o is rejected This means the new teaching method is not better. Test of hypothesis about the Single population mean – Additional problem Modal- 2 1) Prices of shares of X & Co Ltd . On different days in a month were found to be : 155, 154, 158, 159, 158, 160, 159, 152, 153,and 155. Can we accept the hypothesis that the average price of shares is Rs.155? (v=9,t 0.05 = 2.262. 2) A random sample of 10 cricket matches has 39 runs as mean. The sum of the squares of deviation taken from mean score is 10,404. Can this sample is regarded as taken from the cricket matches having 45 as average scores ? Also obtain 95% confidence limits of the mean of the population ( for V = 9 , t 0.05 = 2.262. Test of hypothesis about the difference Between Two means Given two independent random samples of size n1 and size n2 with X1 and X2 and standard deviations S1 and S2 , the value of t is calculated by applying the following formula: _ _ ( X1 - X2) Formula: t = ----------x √n 1xn2 / n1 + n2 S _ _ S = √Σ (X1 – X1) 2 + Σ (X2 – X2) 2 _____________________________________ n1 + n2 – 2 1(n1S1 2 +n2S2 2 ) _________________________________ (or) n1+n2 - 2 Test of hypothesis about the difference Between Two meansModal 1-Problem Illustration.1.The heights of six randomly chosen soldiers are in inches: 76, 70, 68, 69, 69 and 68. Those of 6 randomly chosen sailors are 68, 64, 65,69, 72,64. Discuss in the light of these data throw on the suggestions that soldiers are ,on the average , taller than sailors. Use t- test. Solution: Let us take the hypothesis that “There is no difference in height in height o soldiers and sailors. Applying t –test. Height X1 _ X1 – X1 _ (X1 – X1) 2 Height X2 _ X2 – X2 (X2 – X2) 2 76 6 36 68 1 1 70 0 0 64 -3 9 68 -2 4 65 -2 4 69 -1 1 69 2 4 69 -1 1 72 5 25 68 -2 4 64 -3 9 Σ(X1 – X1)2=46 ΣX2 = 402 ΣX1= 420 Σ(X2 – X2) 2 = 52 Continue _ X1 = 420/6 = 70; _ X2 = 402/6 = 67 _ _ ( X1 - X2) Formula: t = ----------- x √n 1xn2 / n1 + n2 S _ _ S = √Σ (X1 – X1) 2 + Σ (X2 – X2) 2 / (n1+n2 -2) S = √46 + 52/(6+6-2) = √ 98/10 = 3.13. 70 -67 3 t = ------------- x √(6 x 6)/(6 + 6) = ------ x 1.732; t = 1.66 3.13 3.13 Degrees of freedom (n1-1) +(n2-1)= 10 The calculated value of t is (1.66) less than the table value( t 0.05= 2.23), that hypothesis accepted. Hence ,the soldiers are not , on an average ,taller than sailors. Test of hypothesis about the difference Between Two meansModal 2-Problem Illustration :1 _ The mean weekly wages of sample of n1 = 30 employees in a large firm is X = Rs. 280, with a sample standard deviation of S1 = Rs. 14. In another firm a sample of _ n2 = 40 employee have a mean wage X2 = Rs. 270with a S.D S2 = Rs. 10. The S. D of the population are not assumed to be equal. Test the hypothesis that there is no difference between the mean weekly wage amounts of the two firm at 5% significant Level. Solution: _ X1 = 280; _ X2 = 270 ; n1 = 30; n2 = 40 ; S1 = 14; S2 = 10. _ _ ( X1 - X2) Formula: t = ----------- x √n 1xn2 / n1 + n2 S 1(n1(S1) 2 +n2(S2)2 1 ( 30(14) 2+40(10) 2 5880 + 4000 S2 = ----------------------------------- = ---------------------------- = -------------------- = 145.29 n1+n2 – 2 30 + 40 -2 68 S = √145.29 = 12.05 280 - 270 10 t = ----------------- x √(30 x40)/(30 + 40) = ------------ x 4.15; t = 3.44 12.05 12.05 Degrees of freedom (n1-1) +(n2-1)= 68 The calculated value of t is (3.44) less than the table value( t 0.05= 1.96), that hypothesis rejected. Hence , There difference among the mean weekly wage amounts of the two firm at 5% significant Level. Solution Let us take the hypothesis that there is significant difference in mean life of the two makes of bulbs I and II .Applying t-test of the difference of means: S = √Σ (n1-1)x S1 2 + (n2 – 1)x S22 /(n1 +n2 - 2) = √7x(36) 2 + 6 x(40) 2 / 8 + √7 -2 = 37.898_ _ X1 = 1234; X = 1136; S = 37.898; n1 =8; n2 = 7; Formula: t For V = 13; _ _ ( X1 - X2) 1234 – 1136 98x1.932 = ----------- x √n 1xn2 / n1 + n2 ; t = ----------------- x √ 8x 7 / (8+7) = ---------S 37.898 37.898 = 4.996. t = 0.05 = 2.16; The calculated value of t (4.996) is more than the table value (2.16). The hypothesis is rejected. Hence the difference in the mean is significant. Test of hypothesis about the difference Between Two means- Modal 2 Additional Problem Sample of two different types of bulbs were tested for length of life , and the following data were obtained . Is the difference in the means significant? ( Given that the significant value of t at 5% level of significance for 13 d.f is 2.16.) (Ans: S = 40.73 ; Calculated t – value = 9.39 ; Particulars Type I Type II Sample size n1 = 8 n2 = 7 Sample mean _ X 1= 1234 hours _ X 2 =1036 hours Sample of SD S1 = 36 hours S2 = 40 hours Paired t – Test - Illustration Memory capacity of a student was tested before and after a course of mediation for a month of state whether course was effective or not from the following data? Before Training : 10 15 9 3 7 12 16 17 4 After Training : 12 17 8 5 6 11 18 20 3 Solution : Step : 1 : Setting up Null Hypothesis : The course was effective : Step 2 : Find the Square of Difference : x y d = y-x d2 10 15 9 3 7 12 16 17 4 12 17 8 5 6 11 18 20 3 2 2 -1 2 -1 -1 2 3 -1 4 4 1 4 1 1 4 9 1 Σd = 7 Σd2 = 29 Continue: Step 3 Test Statistic : _ Σd 7 d = ---------- = ---------- = 0.77 n 9 Σd2 (Σd) 2 29 ( 7) 2 S = √ --------- - ------ = √ ------ - -------- = √3.22 - 0.59 = S = 1.621 n (n) 2 9 ( 9) 2 a _ d 0.77 t = ------- = ------------ = 1.343 S 1.621 ---------------------√n–1 √8 Step 4 : Level of Significance : Degree of freedom : n – 1 = 9 – 1 = 8 ; Table Value : 2.31 Step : 5 :Conclusion : Calculated value( 1.343 ) > Table value ( 2.32 ) , we accept the Null hypothesis , Hence We concluded that the course is effective. Paired t – Test – Additional Problem Poor students were given intensive coaching and test whether the given before and after coaching if any improvement in the coaching class use pair –t - test. Before Coaching : 50 42 51 26 35 42 60 41 70 55 62 38 After Coaching : 62 40 61 35 30 52 68 51 84 63 72 50 Hints : Degree of freedom : 12 Table Value @ 5% level of Significance : 2.31 Calculated value : 4.87 Test of Significance of Large Samples It is very difficult to distinguish between large and small samples. If the sample size is greater than 30 i.e. if n > 30 , then those samples may be regarded as large samples. Assumptions: 1) The random sampling distribution of statistics is approximately normal. 2) Sampling values are sufficiently close to the population value and can be used for the calculation of standard error estimate. Standard Error of mean: It measures sampling errors involved in estimating population parameter from a sample. 1) When standard deviation of the population is given: _ σP S.E.X = ----√ n Where, σP = Standard deviation of the population n = Number of observation in the sample 2) When standard deviation of the population is not given: _ σ ( Sample) S.E.X = -------------√ n Where, σ= Standard deviation of the population Generally, we use the standard deviation of the sample , if standard deviation of the population is not given. Test of Significance of Large Samples -Problem Problem : A company manufacturing electric light bulbs claims that the average life of its bulbs is 1600 hours . The average life and standard deviation of a random sample of 100 such bulbs were 1570 hours and 120 hours respectively . Should we accept the claim of the company. Solution : Step 1 : Setting Up Null Hypothesis : Step 2 : Test Statistics : Standard deviation (σ) = 120 ; Actual Mean = 1570 ; Expected Mean = 1600 ; n = 100. _ S.E.X = σ 120 ------- = ----- = 12 √ n √ 100 Difference 1600 – 1570 30 Z = --------------- = --------------------- = ----- = 2.5 _ 12 12 S.E. X Step 3: Level of Significance : Table value @ 5% level = 1.96 Calculated value = 2.5 Step : 4 : Conclusion: Calculated value 2.5 > table value 1.96 at 5% level of significance , the hypothesis can not be accepted. We can not accept the claim of the company. Illustration A sample of 100 students is taken from a college. The mean height is 64 inches and the standard deviation 6 inches. Can it be reasonably regarded that the students , the mean height is 66 inches? Also set up 99% limits within which the average height of the students is expected to lie. Solution: Step 1 : Setting Up Null Hypothesis : The ( population)students’ average height can be 66 inches. Step 2 : Test Statistics _ σ 6 S.E.X = ------- = ----- = 0.6 √ n √ 100 Difference 2 Z = --------------- = ---------------- = 3.33 _ 0.6 S.E. X Step 3: Level of Significance Table value @ 5% level = 1.96 Calculated value = 3.33 Step : 4 : Conclusion: Calculated value 3.33 > table value 1.96 at 5% level of significance , the hypothesis can not be accepted. We can not accept the hypothesis. Thus ,we can conclude that the ( population)students’ average height can not be 66 inches. University Question A random sample of 121 checking accounts at a bank showed an average daily balance of $ 280. The standard deviation of the population is known to be $66. i) Find the standard error of the mean. ii) Construct a 95% confidence interval for the mean. iii) Construct a 99% confidence interval estimate for a mean. Solution: _ n = 121 ; X = 280 ; σ = 66 _ σ 66 66 S.E. X = ------------- = -------------- = ------ = 6 √ n √ 121 11 _ 95% confidence interval for the mean : X + - Table value x S.E. Upper limit = 280 + (1.96 x 6) = 291.76 Lower limit = 280 - (1.96 x 6) = 268.24 _ 99% confidence interval for the mean : X + - Table value x S.E. Upper limit = 280 + (2.58 x 6) = 295.48 Lower limit = 280 – (2.58 x 6) = 264.52 Continue _ 99% confidence limits = X +- 2.58 S.E 64 + - 2.58 x 0.6 = 64 +-1.548 = 62.45 to 65.55 Hence the mean height of the students is expected to lie between 62.45 to 65.55. Illustration An auto company decided to introduce a new six cylinders cars whose mean petrol consumption is claimed to be lower than that of the existing auto engine. It was found that the mean petrol consumption for the 50 cars was 10 km. per litre with a standard deviation of 3.5 k.m .per litre . The test for the company at 5% level of significance , whether the claim regarding the new car petrol consumption is 9.5 k.m. per litre the average is acceptable. Solution: Step 1 : Setting Up Null Hypothesis Let us take the hypothesis that there is no significant difference the sample average and the company’s claim. Step 2 : Test Statistics _ σ 3.5 3.5 S.E.X = ------- = ----- = -----= 0.495 √ n √ 50 7.07 Difference 10 – 9.5 Z = --------------- = ---------------- = 1.01 _ 0.495 S.E. X Step 3: Level of Significance Table value @ 5% level = 1.96 Calculated value = 1.01 Step 4 :Conclusion: Calculated value 1.01< table value 1.96 at 5% level of significance , the hypothesis can be accepted. Hence, the company claim that the new car petrol consumption is 9.5 km.per litre is acceptable. Additional Problems : Problem .1. The mean life of a sample of 400 fluorescent tube light produced by a company is found to be 1570 hours with a standard deviation 150 hours . Test the hypothesis that the mean life of time of the tube light produced by the company is 1600 hours against the alternative hypothesis that it is greater than 1600 hours at 1% level of significance. Answer : Table Value : 2.58 ; Calculated Value = 4 Test of hypothesis about the difference Between Two means a) When two independent random samples are drawn from same population S.E. of the difference between sample means = √ σ 2(1/n + 1/n2) b)When two random samples are drawn from different population S.E. of the difference between sample means = √ (σ 2 1 /n1+ σ 2 2/n2) We are mostly used , when two random samples are drawn from different population Illustration : 1) Intelligence test on two groups of boys and girls gave the following result: Is there a significant difference in the mean score obtained by boys and girls. Mean S.D N Girls 75 15 150 Boys 70 20 250 Solution: Step 1 : Setting Up Null Hypothesis : Let us take the hypothesis there is no significant difference the mean score obtained by boys and girls. _ _ S.E. of the difference between sample means = X1 – X2 = √ (σ 21 /n1+ σ22/n2) σ1 = 15, σ2 = 20, n1 = 150, n2 = 250, Step 2 : Find out the Standard Error: Substituting values: _ _ S.E = X1 – X2 = √ (15) 2 /150 +(20) 2 /250 = √1.5+1.6 = 1.76 Step 3 : Z test of statistics: Z= Difference ------------- = S.E 75 - 70 -------- -- = 2.84 1.76 Step 4: Level of Significance :1% level of significance table value = 2.58. Step 5 : Conclusion: Since the difference is more than 2.58 S.E (1% level of significance ), the hypothesis is rejected . There seems to be a significance difference in the mean scores obtained by boys and girls. Test of hypothesis about the difference Between Two means Problem.2. A college conducted both day and evening classes intended to be identical. A sample of 100 day students yields examination results as under: _ X1 = 72.4; σ1 = 14.8 a sample of 200 evening students yields examination results as under: _ X2 = 73.9; σ2 = 17.9 Are the two means statistically equal at 1% level? Solution : Step 1 : Setting Up Null Hypothesis : The two sample means of college students are statistically equal . Step 2 : Find the standard error : _ _ S.E. of the difference between sample means = X1 – X2 = √(σ 21 /n1+ σ 22/n2) Step 3 : Z test Statistics : _ _ S.E = X1 – X2 = √ (14.8) 2 /100+(17.9) 2 / 200 = √2.1904+1.602 = 1.947 Difference 72.4 – 73.9 Z = ------------------ = --------------- = 0.77 S.E 1.947 Step 4: Level of Significance :1% level of significance table value = 2.58 Step : 5 : conclusion : Calculated value 0.77 < table value 2.58 at 1% level of significance , the hypothesis can be accepted. Hence conclude that The two sample means of college students are statistically equal. Test of hypothesis about the difference Between Two meansAdditional problem .1 The number of accidents per day was studied for 144 days in a town A and 100 days in town B and the following information is obtained . Is the difference between mean accidents of the two towns statistically significant. Particular Town - A Town - B Mean No.of Accidents 4.5 5.4 Standard Deviation 1.2 1.5 Problem :2 You are working as a purchasing manager for a company . The following information has been supplied to you by two manufacturers of electric bulbs : Particulars Company A Company B Mean life ( in hours) 1300 1248 Standard Deviation 82 93 Sample Size 100 100 Which brand of bulbs are you going to purchase if you desire to take a risk of 5%. Answer : Z value = 4.19 ; table value : 1.96. Small Sample F – Test/ The Variance - Ratio Test The F-distribution, also known as the Snedecor's F-distribution or the Fisher-Snedecor distribution (after R.A. Fisher and George W. Snedecor), is the distribution of ratios of two independent estimators of the population variances. Suppose we have two samples with n1 and n2 observations, the ratio F = s12 / s22 where s12 and s22 are the sample variances, is distributed according to an F-distribution with v1 = n1-1 numerator degrees of freedom, and v2 = n2-1 denominator degrees of freedom. _ 2 = S1 = (X1 – X1 ) 2 /(n1 – 1) _ 2 = S1 = (X2 – X2 ) 2 /(n1 – 1) F= Larger Estimate of variance ---------------------------------Smaller estimate of variance . F – Test Problems- Modal 1 The main use of F-distribution is to test whether two independent samples have been drawn for the normal populations with the same variance, or if two independent estimates of the population variance are homogeneous or not, since it is often desirable to compare two variances rather than two averages. Problem.1. Two random samples drawn from two normal populations, test whether the whether the two population have the same variance at 5% level of significance. Sample I 55 54 52 53 56 58 52 50 51 49 Sample II 108 107 105 105 106 107 104 103 104 101 Solution: Step : 1 : Set up Null Hypothesis : Let us take the hypothesis that two populations have the same variance. Step 2 : Calculation of Mean square : Sample I X1 _ (X1 – X1); 53 _ (X1 – X1) 2 Sample II X2 _ (X2 – X2);105 _ (X2 – X2) 2 55 54 52 53 56 58 52 50 51 49 2 1 -1 0 3 5 -1 -3 -2 -4 4 1 1 0 9 25 1 9 4 16 108 107 105 105 106 107 104 103 104 101 3 2 0 0 1 2 -1 -2 -1 -4 9 4 0 0 1 4 1 4 1 16 ΣX1 = 530 _ Σ(X1 – X1)= 0 _ Σ (X1 – X1) 2 = 70 ΣX2 = 1050 _ Σ(X2 – X2)=0 _ Σ(X2 – X2) 2= 40 Continue Step : 3. Statistic Test : _ X1 = ΣX1/N = 530 / 10 = 53; Sample Variance 1 = S1 2 Sample Variance 2= S2 2 Therefore F = s 2 1 / s 22 = _ X2 = ΣX2 / N =1050/ 10=105 _ = = (X1 – X1 ) 2 /(n1-1)=70 / 9 = 7.78 _ = (X2 – X2 ) 2 /( n2 -1) = 40/9 = 4.44 = 7.78/ 4.44 = 1.75 Step 4 : Level of Significance : Degrees of Freedom V1 = 9 , V2 = 9 Table value at 5% level = 3.15 Calculated value = 1.75 Step : 5 Conclusion: The calculated value is less than table value . Hence we accept the hypothesis and conclude that the samples have been drawn from the same population Addition Problem Two random sample drawn from two normal populations are Sample 1 : 20 16 26 27 23 22 18 24 25 19 Sample 2 : 27 32 42 35 32 34 38 28 41 43 30 37 Obtain the estimate of the variances of the population and test whether the populations have the same variance. Hints : s12 = 13.33 ; s 22 = 28. 55 ; F = 2.1417 ; Table Value @ 5% level of Significance = 3.11 Modal 2 Two random samples gave the following results : _ n1 = 10 , ( X – X) 2 = 90 _ n2 = 12 , (Y – Y) 2 = 108 Test whether the sample came fro the populations with same variance. Solution : Step 1 : Setting up Null Hypothesis :The samples are drawn from the populations with equal variance. Step 2 : Test Statistic : _ Sample Variance 1 = S1 2 = = (X1 – X1 ) 2 /(n1-1)= 90 / 9 = 10 _ Sample Variance 2= S2 2 = = (Y – Y ) 2 /( n2 -1) = 108/11 = 9.82 Therefore F = s 2 1 / s22 = 10/ 9.82 = 1.02 Continue Step 4 : Level of Significance : Degrees of Freedom V1 = 9 , V2 =11 Table value at 5% level = 2.90 Calculated value = 1.02 Step : 5 Conclusion: The calculated value is less than table value . Hence we accept the hypothesis and conclude that the variances of two samples have been drawn from the populations are equal. F – Distribution : Modal 2 - Additional From the following data test if the difference between the variances is significant at 5% level of significant. Sample A B Sum of squares of deviation from Mean = 84.4 102.6 Size (n) = 8 10 Hints : s12 = 13.5 ; s 22 = 11.3 ; F = 1.147 ; Table Value @ 5% level of Significance = 3.29 Analysis of Variance The Analysis of Variance is one of the most powerful statistical techniques. It is a statistical test for heterogeneity of means by analysis of group variances. The analysis of variance technique, developed by R.A. Fisher in 1920s, is capable of fruitful application to diversity or practical problems. *Many studies involve comparisons between more than two groups of subjects. *If the outcome is categorical data, a chi square test for a larger than 2 x 2 table can be used to compare proportions between groups. * The analysis of difference between two statistical data is known as Analysis of variance *If the outcome is numerical , ANOVA can be used to compare the mean between groups. *ANOVA is the abbreviation for the full name of the method Analysis Of Variance. F – test • • • • • Assumptions for F – test The following are the assumptions for applying the F-test. · The samples are simple random samples. · The samples are independent of each other. · The parent populations from which they are drawn are normally distributed Procedures for Performing an Analysis of Variance: • • • • • • STAT GRAPHICS Centurion provides several procedures for performing an analysis of variance: 1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to comparing multiple groups of data. 2. Multifactor / Two way ANOVA - used when there is more than one categorical factor, arranged in a crossed pattern. When factors are crossed, the levels of one factor appear at more than one level of the other factors. 3. Variance Components Analysis - used when there are multiple factors, arranged in a hierarchical manner. In such a design, each factor is nested in the factor above it. 4. General Linear Models - used whenever there are both crossed and nested factors, when some factors are fixed and some are random, and when both categorical and quantitative factors are present. We are only discuss about the One way ANOVA and Two way ANOVA Steps in One-Way ANOVA/Classification: In one way classification , the data are classified according to only one criterion. 1)Total sum of all the items of various samples , i.e. T T = ΣX1+ΣX2+ΣX3+ΣX4 ………. 2) Correction Factor = T2/N (N = Number of items) 3) Total sum of Squares = ΣX1 2 +ΣX2 2 +ΣX3 2 +ΣX4 2 _ T2/N 4) Sum of square between samples = ΣX1 2 / N+ΣX2 2/N +ΣX3 2 /N+ΣX4 2/N _ T2/N 5) Mean square between samples = Sum of square between samples – Degrees of freedom (Take total samples) 6) Sum of square within samples = Total sum of squares - Sum square between samples 7) Mean square within samples = Sum of square within samples /Degree of freedom (Take total number of items of samples) 8)Prepare the ANOVA Table 9) Calculate the F- Ratio = Mean square Between Colum variance / Mean square Within column Variance 10) Compare the calculated value of F with the table value of F for the degrees of freedom at certain critical level ADVANTAGES • • • An important advantage of this design is it is more efficient than its one-way counterpart. There are two assignable sources of variation – age and gender in our example – and this helps to reduce error variation thereby making this design more efficient. Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same time. One can also test for independence of the factors provided there are more than one observation in each cell. The only restriction is that the number of observations in each cell has to be equal (there is no such restriction in case of one-way ANOVA). • Illustration : 1 Set up ANOVA table for the following per hectare yield for these varieties. Variety of Yield A1 A2 A3 6 5 5 7 5 4 3 3 3 8 7 4 Also work out f – ratio and test whether there is a significant difference among the means of the wheat. Solution : Variety of Yield A1 A2 A3 6 5 5 7 5 4 3 3 3 8 7 4 Total 24 20 16 Continue Step 1 ; setting up Null Hypothesis : There is no significant difference between the means of the samples. μ 1 = μ2 = μ3 Step 2 : Correction Factor T2/N : T = ΣA1+ΣA2+ΣA3 = 24+20+16= 60 = T2/N = (60) 2 = 3600/12 = 300 Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N = 6 2 + 5 2 + 5 2 + 7 2 + 5 2 + 4 2 + 3 2 + 3 2 + 3 2 +8 2 + 7 2 + 4 2 = 36 + 25 + 25+ 49 + 25 + 16 + 9 + 9 + 9 + 64 + 49 + 16 – 300 = 32. Step 4 : Column Sum of Squares (CSS): ΣA1 2 +ΣA2 2 +ΣA3 2 / Number of Rows - T2 = 24 2 +20 2 + 16 2 =576 + 400 + 256 / 4 - 300 = 1232 /4 – 300 = 308 – 300 = 8 Step 5 : Error Sum of squares (ESS): TSS - CSS = 32 – 8 = 24. Step 6 Level of Significance: ANOVA Table : Table Value @ 5% level ( 4.26) . Step 7 :Conclusion : Calculated value (1.5) < Table Value ( 4.26) , Accept the Null Hypothesis. We conclude that there is no significant difference between the means of the samples. μ 1 = μ2 = μ3 Sources of Variation Degrees of Freedom Sum of Square Mean Sum of square F - ratio CSS ESS Tss (C-1)2 (n-C) 9 (n-1)11 8 24 32 8/2 =4 24/9 = 2.67 4/2.67 = 1.50 F – Table Value F (2,9 ) = 4.26 Problems and Solutions in One-Way Classification: 1) A certain manure was used on four plats of land . A , B, C. D. four beds were prepared in each plot and the manure used. The output of the crop in the beds of plots A, B , C, D is given below. You are find out whether the differences in the means of the production of crops of the plots is significant or not. Beds of plots A B C D 8 9 15 6 12 3 10 8 1 7 4 10 3 1 7 8 Solution: Beds of plots A B C D 8 9 15 6 12 3 10 8 1 7 4 10 3 1 7 8 Total 24 20 36 32 Continue Step 1 ; setting up Null Hypothesis : There is no significant difference in the means of the production of crops of the plots. = μ 1 = μ2 = μ3 = μ4 Step 2 : Correction Factor T2/N : T = ΣA+ΣB+ΣC+ΣD = 24+20+36 + 30 =120 = T2/N = (112) 2 = 12544/16 = 784 Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N = 8 2 + 9 2 + 15 2 + 6 2 + 12 2 + 3 2 + 10 2 + 8 2 + 1 2 +7 2 + 4 2 + 102 + 3 2 +1 2 + 7 2 + 82 - 784 = 64 + 81 + 225+ 36 + 144 + 9 + 100 + 64 + 1 + 49 + 16 + 100 + 9 + 1 + 49 + 64 – 900 = 1012-900=228. Step 4 : Column Sum of Squares (CSS): ΣA 2 +ΣB 2 +ΣC 2 +ΣD 2 / Number of Rows - T2 = 24 2 +20 2 + 36 2 + 32 2 = 576 + 400 + 1296+ 1024 / 4 - 784 = 3296 /4 – 784 = 824 – 784 = 40 Step 5 : Error Sum of squares (ESS): TSS - CSS = 228 – 40 = 188. Step 6 ANOVA Table : Sources of Variation Degrees of Freedom Sum of Square Mean Sum of square F - ratio CSS ESS Tss (C-1) 3 (n-C) 12 (n-1) 15 40 188 228 40/3 =13.33 188/12 = 15.67 13.33/15.67 =0.85 F – Table Value F (3,12 ) = 3.49 Step 7: Level of Significance 5% level : Table Value (3.49) , Step 8 : Conclusion : Calculated value (.85) < Table Value (3.49) , Accept the Null Hypothesis. Hence We conclude that there is no significant difference in the means of the production of crops of the plots. Problems and Solutions in One-Way Classification Problem.2.The following table illustrate the sample psychological health rating of corporate executives in the field of Nanking, Manufacturing, Retailing. Can you consider the psychological health rating of corporate executives in the given three fields to be equal at 55 level of significance? Banking 14 16 18 Manufacturing 14 13 15 22 Retailing 18 16 19 19 20 Solution Step 1 ; setting up Null Hypothesis : Let us take the hypothesis that there is no significance in the psychological health rating of corporate executives. Step 2 : Correction Factor T2/N : T = ΣA+ΣB+ΣC+ΣD = 48+64+92 =204 = T2/N = (204) 2 = 41616/12 = 3468 Step 3 : Total Sum of Squares (TSS): X1 2 + X2 2 +X3 2 +X4 2 …………… _ T2/N = 14 2 + 16 2 + 18 2 + 14 2 + 13 2 + 15 2 + 22 2 + 18 2 + 16 2 +19 2 + 19 2 + 202 - 3468 = 196 + 256 + 324+ 196 + 169 + 225 + 484 + 324 +256 + 361 +361 + 400 – 3468 = 3552- 3468= 84. Step 4 : Column Sum of Squares (CSS): ΣA 2 +ΣB 2 +ΣC 2 +ΣD 2 / Number of Rows - T2 (48) 2/3+ (64) 2/4 +(92) 2/5 -3468 = 768 +1024+ 1693– 3468 = 17 Step 5 : Error Sum of squares (ESS): TSS - CSS = 84 – 17 = 67. Step 6 ANOVA Table : Sources of Variation Degrees of Freedom Sum of Square Mean Sum of square F - ratio F – Table Value CSS ESS Tss (C-1)2 (n-C) 9 (n-1) 11 17 67 84 17/2 = 8.5 67/9 = 7.44 8.5/7.44 = 1.1585 F (2,9 ) = 3.49 Step 7 Level of Significance: Table value of F @ 5% level of significance = 4.26. Step 8: Conclusion: Since the calculated value of F is (1.15) less than the table value (4.26) , we can accept the hypothesis . Hence we concluded that psychological health rating of corporate executives in the given three fields do not differ significantly. One way ANOVA Classification- Additional Problems Problem. 1) A researcher is concerned about the level of knowledge possessed by university students regarding United States history. Students completed a high school senior level standardized U.S. history exam. Major for students was also recorded. Data in terms of percent correct is recorded below for 32 students. Compute the appropriate test for the data provided below. Education Management Social Science Fine Arts 62 72 42 80 81 49 52 57 75 63 31 87 58 68 80 64 67 39 22 28 48 79 71 29 26 40 68 62 36 15 76 45 One way ANOVA Classification- Additional Problems ANSWER Source SS df MS F 63.25 3 21.083333333 .04 Within 12298.25 28 439.2232143 Total 12361.5 31 Between One way ANOVA Classification- Additional Problems Problem.2. A research study was conducted to examine the clinical efficacy of a new antidepressant. Depressed patients were randomly assigned to one of three groups: a placebo group, a group that received a low dose of the drug, and a group that received a moderate dose of the drug. After four weeks of treatment, the patients completed the Beck Depression Inventory. The higher the score, the more depressed the patient. The data are presented below. Compute the appropriate test. Placebo Low Dose Moderate Dose 38 22 14 47 19 26 39 8 11 25 23 18 42 31 5 One way ANOVA Classification- Additional Problems ANSWER Source Between Within Total SS df MS F 1484.9333333 2 742.4666666 11.26 790.8 12 65.9 2275.733333 14 One way ANOVA Classification- Additional Problems Problem 3: An official from Central Government is concerned about the monthly expenses of three different boards, that is, Civil Supplies Board, Electricity Board and Higher Education Board. He wants to find out whether the boards spend equal amounts on personnel and equipment. He applies the technique of analysis of variance to test his assumption at 0.05 level of significance. He collects the monthly expenses of three different boards for the previous few months and summarizes them into a tabular form as shown in table 7.3. Calculate the number of degrees of freedom to test at the given level of significance? Civil Supply Board 14 8 12 9 18 Electricity Board 15 9 8 10 13 Higher Education Board 8 16 12 6 13 One way ANOVA Classification- Additional Problems 1. There are three main brands of a certain powder. A set of 120 sample values is examined and found to be allotted among 4 groups A, B,C and D and three brands as shown below. Brands Groups A B C D 1 0 4 8 16 2 5 8 13 6 3 18 19 11 13 Is there any significant difference between brand preference at 5% Level of significance? Two way ANOVA Classification Meaning : In a two-way classification the data are classified according to two different criteria or factors. • Assumptions • The populations from which the samples were obtained must be normally or approximately normally distributed. • The samples must be independent. • The variances of the populations must be equal. • The groups must have the same sample size. • EXAMPLE: An agricultural scientist is interested in the corn yield when three different fertilizers are available and corn is planted in four different soil types. The questions he is interested in answering are: • Does fertilizer type have an effect on crop yield? • Does soil type have an effect on crop yield? • Do the two treatment factors interact? For instance, there may be no difference between fertilizer #1 and fertilizer #2 in soil type 1, but fertilizer #1 may produce a greater corn yield than fertilizer #2 in soil type 2. This is an example of interaction. Steps for calculation of Two way ANOVA Classification Step 1) Data are covert in to Coded data . Step 2) Find out Correction Factor = T. Step 3) Find out Sum of square between Columns. Step4) Calculate the Sum of square between Rows. Step 5) Computation of Total sum of square = Sum of Square of all items- Correction factor. Step 6) Find out Sum of Square of residual. = Total sum of square – (Sum of square between Columns + Sum of square between Rows) Step 7) Calculate the degrees Freedom(Columns, Rows, Residual). Step 8) Preparation of ANOVA TABLE. Step 9) Find out the Columns Variance (Fc). Step 10) Find out the Rows Variance (Fr). Step 11) Give conclusion. Problems and Solutions of Two way ANOVA Classification Problem.1.To study the performance of four sales man during the festivals – Deepavali, Ramzan, Christmas the number of units of Refrigerators sold are given below. Use analysis of variance and answer the following. i) Do the salesman significantly differ in performance? ii) Is there significant difference in the sales between the festival? Festival Salesman A Salesman B Salesman C Salesman D Festival Total Deepavali Ramzan Christmas 50 32 39 48 31 36 52 34 33 46 39 32 196 136 140 Salesmen’s Total 121 115 119 117 472 Solutions of Two way ANOVA Classification Step 1 : Coded Data : It is a problem of two way analysis of variance . In order to simplify calculations , we code the data by deducting 35 from each figure. Coded Data: Salesman Festival Deepavali Ramzan Christmas Column total A B C D Row total X1 X2 X3 X4 15 -3 4 13 -4 1 17 -1 -2 11 4 -3 56 -4 0 ΣX1=16 ΣX2=10 ΣX3=14 ΣX4=12 T=52 Solution (continue) Step 2 Setting up Hypothesis : Let us take the hypothesis that there is no difference in the sales of sales man and festivals. Step 3: Correction Factor: Correction Factor = T2/N = (52) 2 /12 = 2704/12 =225.3 Step 4: (TSS)Total sum of square = Sum of Square of all items- Correction factor = (15) 2+ (-3) 2 + (4) 2 + (13) 2 + (-4) 2 + (1) 2 + (17) 2 + (-1) 2 + (-2) 2 + (11) 2 + (4) 2 + (-3) 2 _ T2/N =225+9+16 +169+16+1 +289+1 +4+121+16+9+- 225.3 = 650.7 V = ( Degree of freedom) = 12 - 1=11 Step 5:(CSS) Sum of square between Salesman (Columns): = (ΣX1) 2+(ΣX2) 2 +(ΣX3) 2 + +(ΣX4) 2/Number of Column _ T2/N = (16) 2+ (10) 2 +(14) 2 +(12) 2 - 225.30 = 256 +100+196+144/ 3 - 225.3 696 /3 =232 - 225.3 = 6.7 V =( Degree of freedom) = 4 -1 = 3 Step 6 : (RSS)Sum of square between Festivals (Rows): = (56) 2+ (-4) 2 +(0) 2/Number of Rows - T2/N = 3136 +16 + 0 =3152 /4 =788 - 225.3 = 526.7 V = ( Degree of freedom) =3- 1=2 Step 7 (ESS) Sum of Square of residual =TSS – (CSS+RSS) = 650.7 – (6.7+ 562.7) = 81.4 V = (3 – 1)(4 -1) = 6 Solution (continue) Step 8) ANOVA TABLE: Sources of Variations Sum of Squares Degree of freedom Mean Square F CSS 6.6 3 6.6/3 =2.2 Fc =2.2/13.6= .162 RSS 562.7 2 562.7/2 = 281.4 Fr=281.4/ 13.6=20.69 ESS 81.4 6 81.4/6 = 13.6 TSS 560.7 11 Solution (continue) Step 9 Conclusion: i) Now, first you compare salesman variance estimate with the residual variance estimate F = Mean Square of Column / Residual mean square F = 2.2 / 13.6 =0.612 The table value of F for V1 = 3 and V2 = 6 at 5% level of significance is 4.76. The calculated value is less than the table value and we concluded that the sales of different salesman don’t differ significantly. ii)We shall compare seasons variance estimate with the residual variance estimate. F = Mean Square of Rows/ Residual mean square F = 281.4 / 13.6 = 20.69 The table value of F for V1 =2 and V2 = 6 at 5% level of significance is 5.14. The calculated value is more than the table value and we concluded that the sales during different salesman differ significantly. Illustration-2 Problem . 2) A manufacturer of bags who has so far been making Leather bags wants to introduce three additional types of bags. The new will be Plastic, Water proof, and Canvas bags. The manufacturer test marketed all four types of bags in five different stores for a month to decide which type of bags to concentrate on so that his sales are maximised. Find if there is a significant difference in the sales of different types of bags at 5% level of significance. Types of bags Stores leather bags 1 2 3 4 5 46 48 36 35 40 Plastic bags 40 42 38 40 44 water proof bags 49 54 46 48 51 canvas bags 38 45 34 35 41 Solutions of Two way ANOVA Classification Step 1 CODED DATA: It is a problem of two way analysis of variance. Let us code the data by deducting 40 from each figures. CODED DATA: Types of Bags Sales Stores 1 2 3 4 5 Column Total Leather bags Plastic bags Water proof Canvas bags Row total bags X1 X2 X3 X4 6 8 -4 -5 0 0 2 -2 0 4 9 14 6 8 11 -2 5 -6 -5 1 13 29 -6 -2 16 ΣX1=5 ΣX2=4 ΣX3=48 ΣX4=-7 T=50 Solution (continue) Step 2) Correction Factor : Correction Factor = T2/N = (50) 2 /20= 2500/20 = 125 Step 3. (TSS)Total sum of square = Sum of Square of all items - Correction factor = (6) 2+ (8) 2 + (-4) 2 + (-5) 2 + (0) 2 + (0) 2 + (2) 2 + (-2) 2 + (0) 2 + (4) 2 + (14) 2 + (6) 2 + (8) 2 + (11) 2 + (-2) 2 + (5) 2 + (-6) 2 + (-5) 2 + (1) 2 _ T2/N =36+64+16 +25+4+4 +16+81 +196+36+64+121 +4+25 +36+25+1 - 125 = 629 V = ( Degree of freedom) = 20 - 1=19 Step 4 (CSS) Sum of square between Bags (Columns): = (ΣX1) 2 +(ΣX2) 2 +(ΣX3) 2+ (ΣX4) 2 /Number of Columns _ T2/N = (5) 2+ (4) 2 +(48) 2 +(-7) 2 – 125 = 5+3.2+ 460.8+9.8 - 125 = 353.8 V =( Degree of freedom) = 4 -1 = 3 Step 5. (RSS)Sum of square between Stores (Rows): = (13) 2+ (29) 2 +(-6) 2 + (-2) 2 + (16) 2/Number of Rows - T2/N = 326 = 169+841+36+4+256 = 1306/4 = 326.6 - 125 = 201.6 V = ( Degree of freedom) = 5- 1 = 4 Step .6 .( ESS) Sum of Square of Error = TSS – RSS = 629 – (353.8+201.6) = 73.6 V = (5 – 1)(4 -1) = 12 Solution (continue) Step 7) ANOVA TABLE: Sources of Variations Sum of Squares Degree of freedom Mean Square F (CSS) (Salesman) 353.8 3 117.9 Fc= 117.9/6.1 =19.3 (RSS) 201.6 4 50.4 Fr=50.4/6.1 =8.3 (ESS) 73.6 12 6.1 Total 629 19 Solution (continue) Step .8 . Setting up Hypothesis :Let us take the hypothesis that there is no difference in the types of Bags and different stores. Step.9 : Conclusion: i) Now, first you compare types of Bags variance estimate with the residual variance estimate F = Mean Square of Column / Residual mean square F = Fc= 117.9/6.1 =19.3 The table value of F for V1 = 3 and V2 = 12 at 5% level of significance is 3.49. The calculated value is more than the table value and we concluded that the sales of different types of bags are significantly. ii)We shall compare stores variance estimate with the residual variance estimate. F = Mean Square of Rows/ Residual mean square Fr=50.4/6.1 =8.3 The table value of F for V1 = 4 and V2 = 12 at 5% level of significance is 3.25. The calculated value is greater than the table value and we concluded that the mean of four types of bags 0f during different sales stores differ significantly. Two way ANOVA Classification – Additional Problem Problem 3: Performance study conducted by the Sales Manager of an NML Manufacturing Company on three salesmen during three seasons and the data is presented in following table . He wants to know whether there is significant difference between salesmen’s performances between seasons using level of significance equal to 0.05. Seasons Salesman Summer Rainy Winter Salesman I 32 20 24 Salesman II 40 50 68 Salesman III 54 46 58 Solution (continue) Answer: ANOVA TABLE: Sources of Variations Sum of Squares Degree of freedom Mean Square F Between Column (Salesman) 304.22 2 152.110 Fc =152.11/15.44 = 9.85 Between Rows (Festival) 73.55 2 36.775 Fr =36.775/15.445 = 2.38 Residual 61.79 4 15.445 Total 439.56 8 Continue Problem.4) A study examining differences in life satisfaction between young adult, middle adult, and older adult men and women was conducted. Each individual who participated in the study completed a life satisfaction questionnaire. A high score on the test indicates a higher level of life satisfaction. Test scores are recorded below. Group Male Female Young Adult Middle Adult Older Adult 4 2 3 4 2 7 5 7 5 6 10 7 9 8 11 7 4 3 6 5 8 10 7 7 8 10 9 12 11 13 Continue Answer : A study examining differences in life satisfaction between young adult, middle adult, and older adult men and women was conducted. Each individual who participated in the study completed a life satisfaction questionnaire. A high score on the test indicates a higher level of life satisfaction. Test scores are recorded below. Age Group/ Gender Male Female Young Adult 4 2 3 4 2 Mean=3.0 7 4 3 6 5 Mean=5.0 Middle Adult 7 5 7 5 6Mean=6.0 8 10 7 7 8 Mean=8.0 Older Adult 10 7 9 8 11 Mean=9.0 10 9 12 11 13 Mean=11.0 Solution (continue) Conclusion: There are significant main effects for age (F=49.09 (2,24), p<.01) and gender (F=16.36 (1, 24), p<.01). There is no interaction effect (F=0.00 (2,24), not significant). Interpret your answer. I appears from the data that older adults have the highest life satisfaction and younger adults have the lowest life satisfaction. Women also have significantly higher life satisfaction than men. Answer: ANOVA TABLE: Sources of Variations Sum of Squares Degree of freedom Mean Square F Between Column (Age group) 180 2 90.00 Fc=90/1.83=49.09 Between Rows (Gender) 30 1 30.00 Fr=30.00/1.83 = 16.36 Residual 44 24 1.83 254 29 Total Two way ANOVA Classification – Additional Problem Problem.5) To study the performance of three detergents and three water temperatures , the following whiteness reading were obtained with specially designed equipment .Perform a two analysis of variance using 5% level of significance. Detergent Powder Water Temperature Cold water Warm water Hot water Arasan Wheel Rin 57 49 54 55 52 46 67 68 58 Continue Problem.6.To study the performance of four sales man during the season 0f Summer , Winter ,and Monsoon the number of units of Fans sold are given below. Use analysis of variance and answer the following. i) Do the salesman significantly differ in performance? ii) Is there significant difference in the sales between the Seasons. Sales in fans Season Usha Summer Winter Monsoon Total sales Crompton 40 32 30 1o2 Bajaj 40 33 32 105 Anchor 25 35 33 93 Seasons Total 39 36 33 108 144 136 128 408 Solution (continue) Answer: ANOVA TABLE: Sources of Variations Sum of Squares Degree of freedom Mean Square F Between Column (Salesman) 42 3 14 Fc =14/22.67 = 0.618 Between Rows (Festival) 32 2 16 Fr =16/22.67 = 0.706 Residual 136 6 22.67 Total 210 11