Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
One way Analysis of Variance (ANOVA) Comparing k Populations The F test – for comparing k means Situation • We have k normal populations • Let mi and s denote the mean and standard deviation of population i. • i = 1, 2, 3, … k. • Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s We want to test H 0 : m1 m2 m3 mk against H A : mi m j for at least one pair i, j Computing Formulae: Compute ni 1) 2) Ti xij Total for sample i j 1 k k G Ti xij Grand Total i 1 k 3) i 1 ni x ij i 1 j 1 k 5) i 1 j 1 N ni Total sample size k 4) ni 2 Ti i 1 ni 2 The data • Assume we have collected data from each of k populations • Let xi1, xi2 , xi3 , … denote the ni observations from population i. • i = 1, 2, 3, … k. Then 1) SS Between 3) 2 Ti G N i 1 ni k 2) 2 k ni k 2 Ti SSW ithin xij i 1 j 1 i 1 ni 2 SS Between k 1 F SSW ithin N k Anova Table Source d.f. Sum of Squares Between k-1 SSBetween Mean Square MSBetween Within N-k SSWithin MSWithin Total N-1 SSTotal SS MS df F-ratio MSB /MSW Example In the following example we are comparing weight gains resulting from the following six diets 1. Diet 1 - High Protein , Beef 2. Diet 2 - High Protein , Cereal 3. Diet 3 - High Protein , Pork 4. Diet 4 - Low protein , Beef 5. Diet 5 - Low protein , Cereal 6. Diet 6 - Low protein , Pork Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork) Diet Mean Std. Dev. x x2 1 73 102 118 104 81 107 100 87 117 111 100.0 15.14 1000 102062 2 98 74 56 111 95 88 82 77 86 92 85.9 15.02 859 75819 3 94 79 96 98 102 102 108 91 120 105 99.5 10.92 995 100075 4 90 76 90 64 86 51 72 90 95 78 79.2 13.89 5 107 95 97 80 98 74 74 67 89 58 83.9 15.71 792 839 64462 72613 6 49 82 73 86 81 97 106 70 61 82 78.7 16.55 787 64401 Thus Ti 2 G 2 52722 SS Between 467846 4612.933 N 60 i 1 ni 2 k ni k T 2 SSW ithin xij i 479432 467846 11586 i 1 j 1 i 1 ni k SS Between k 1 4612.933 / 5 922.6 F 4.3 SSW ithin N k 11586 / 54 214.56 F0.05 2.386 with 1 5 and 2 54 Thus since F > 2.386 we reject H0 Anova Table Source d.f. Sum of Squares Between 5 4612.933 Mean Square 922.587 F-ratio 4.3** (p = 0.0023) SS Within 54 11586.000 Total 59 16198.933 214.556 * - Significant at 0.05 (not 0.01) ** - Significant at 0.01 Equivalence of the F-test and the t-test when k = 2 the t-test xy t 1 1 sPooled n m sPooled n 1sx2 m 1s 2y nm2 the F-test k 2 Between 2 Pooled s F s n x x 2 i i 1 i k 2 n 1 s i i i 1 k 1 k ni k i 1 n1 x1 x n2 x1 x n1 1s12 n1 1s12 n1 n2 2 2 denominato r s 2 2 pooled numerator n1 x1 x n2 x1 x 2 2 n1 x1 n2 x2 n1 x1 x n1 x1 n1 n2 2 n1n2 2 x x 1 2 2 n1 n2 2 n2 x2 x 2 n1 x1 n2 x2 n2 x2 n1 n2 n12 n2 2 x1 x2 2 n1 n2 2 2 n1 x1 x n2 x2 x 2 2 nn n n n1 n 2 1 2 2 2 1 2 2 x1 x2 n1n2 x1 x2 2 n1 n2 Hence F 1 1 1 n1 n2 1 x1 x2 2 x1 x2 2 1 1 sPooled n1 n2 2 t2 2 Factorial Experiments Analysis of Variance • Dependent variable Y • k Categorical independent variables A, B, C, … (the Factors) • Let – – – – a = the number of categories of A b = the number of categories of B c = the number of categories of C etc. The Completely Randomized Design • We form the set of all treatment combinations – the set of all combinations of the k factors • Total number of treatment combinations – t = abc…. • In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination. – Total number of experimental units N = nt=nabc.. The treatment combinations can thought to be arranged in a k-dimensional rectangular block B 1 1 2 A a 2 b C B A • The Completely Randomized Design is called balanced • If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations) • If for some of the treatment combinations there are no observations the design is called incomplete. (In this case it may happen that some of the parameters - main effects and interactions - cannot be estimated.) Example In this example we are examining the effect of The level of protein A (High or Low) and the source of protein B (Beef, Cereal, or Pork) on weight gains (grams) in rats. We have n = 10 test animals randomly assigned to k = 6 diets The k = 6 diets are the 6 = 3×2 Level-Source combinations 1. High - Beef 2. High - Cereal 3. High - Pork 4. Low - Beef 5. Low - Cereal 6. Low - Pork Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s ource of protein (Beef, Cereal, or Pork) Level of Protein High Protein Low protein Source of Protein Beef Cereal Pork Beef Cereal Pork Diet 1 2 3 4 5 6 73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61 111 92 105 78 58 82 Mean 100.0 85.9 99.5 79.2 83.9 78.7 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55 Treatment combinations Source of Protein Level of Protein Beef Cereal Pork High Diet 1 Diet 2 Diet 3 Low Diet 4 Diet 5 Diet 6 Summary Table of Means Source of Protein Level of Protein Beef High 100.00 Low 79.20 Overall 89.60 Cereal 85.90 83.90 84.90 Pork Overall 99.50 95.13 78.70 80.60 89.10 87.87 Profiles of the response relative to a factor A graphical representation of the effect of a factor on a reponse variable (dependent variable) Profile Y for A Y This could be for an individual case or averaged over a group of cases This could be for specific level of another factor or averaged levels of another factor 1 2 3 Levels of A … a Profiles of Weight Gain for Source and Level of Protein 110 High Protein Low Protein Overall Weight Gain 100 90 80 70 Beef Cereal Pork Profiles of Weight Gain for Source and Level of Protein 110 Beef Cereal Pork Weight Gain 100 Overall 90 80 70 High Protein Low Protein Example – Four factor experiment Four factors are studied for their effect on Y (luster of paint film). The four factors are: 1) Film Thickness - (1 or 2 mils) 2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and 4) Temperature of wash (92 ˚C or 100 ˚C) Two observations of film luster (Y) are taken for each treatment combination The data is tabulated below: Regular Dry Minutes 92 C 100 C 1-mil Thickness 20 3.4 3.4 19.6 14.5 30 4.1 4.1 17.5 17.0 40 4.9 4.2 17.6 15.2 60 5.0 4.9 20.9 17.1 2-mil Thickness 20 5.5 3.7 26.6 29.5 30 5.7 6.1 31.6 30.2 40 5.5 5.6 30.5 30.2 60 7.2 6.0 31.4 29.6 Special Dry 92C 100 C 2.1 4.0 5.1 8.3 3.8 4.6 3.3 4.3 17.2 13.5 16.0 17.5 13.4 14.3 17.8 13.9 4.5 5.9 5.5 8.0 4.5 5.9 5.8 9.9 25.6 29.2 32.6 33.5 22.5 29.8 27.4 29.5 Definition: A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors: No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors) Otherwise the factor is said to affect the response: Profile Y for A – A affects the response Y 1 2 3 Levels of A … Levels of B a Profile Y for A – no affect on the response Y 1 2 3 Levels of A … Levels of B a Definition: • Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s). • Profiles of the factor for different levels of the other factor(s) are not parallel • Otherwise the factors are said to be additive . • Profiles of the factor for different levels of the other factor(s) are parallel. Interacting factors A and B Y 1 2 3 Levels of A … Levels of B a Additive factors A and B Y 1 2 3 Levels of A … Levels of B a • If two (or more) factors interact each factor effects the response. • If two (or more) factors are additive it still remains to be determined if the factors affect the response • In factorial experiments we are interested in determining – which factors effect the response and – which groups of factors interact . The testing in factorial experiments 1. Test first the higher order interactions. 2. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact 3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response. Models for factorial Experiments The Single Factor Experiment Situation • We have t = a treatment combinations • Let mi and s denote the mean and standard deviation of observations from treatment i. • i = 1, 2, 3, … a. • Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sa = s The data • Assume we have collected data for each of the a treatments • Let yi1, yi2 , yi3 , … , yin denote the n observations for treatment i. • i = 1, 2, 3, … a. The model Note: yij mi yij mi mi ij m mi m ij m i ij ij yij mi where 1 k m mi k i 1 i mi m a Note: i 1 i 0 has N(0,s2) distribution (overall mean effect) (Effect of Factor A) by their definition. Model 1: yij (i = 1, … , a; j = 1, …, n) are independent Normal with mean mi and variance s2. Model 2: yij mi ij where ij (i = 1, … , a; j = 1, …, n) are independent Normal with mean 0 and variance s2. Model 3: yij m i ij where ij (i = 1, … , a; j = 1, …, n) are independent Normal with mean 0 and variance s2 and a i 1 i 0 The Two Factor Experiment Situation • We have t = ab treatment combinations • Let mij and s denote the mean and standard deviation of observations from the treatment combination when A = i and B = j. • i = 1, 2, 3, … a, j = 1, 2, 3, … b. The data • Assume we have collected data (n observations) for each of the t = ab treatment combinations. • Let yij1, yij2 , yij3 , … , yijn denote the n observations for treatment combination - A = i, B = j. • i = 1, 2, 3, … a, j = 1, 2, 3, … b. The model Note: yijk mij yijk mij mij ijk m mi m m j m mij mi m j m ij m i j ij ijk ijk yijk mij has N(0,s2) distribution 1 a b 1 b 1 a m mij , mi mij and m j mij ab i 1 j 1 b j 1 a i 1 where i mi m , j m j m , and ij mij mi m j m The model Note: yijk mij yijk mij mij ijk m mi m m j m mij mi m j m ij m i j ij ijk ijk yijk mij has N(0,s2) distribution 1 a b 1 b 1 a m mij , mi mij and m j mij ab i 1 j 1 b j 1 a i 1 where i mi m , j m j m , a Note: i 1 i 0 by their definition. Main effects Interaction Error Mean Model : Effect yijk m i j ij ijk where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are independent Normal with mean 0 and variance s2 and a i 1 i b 0 j 1 b a and j 0 i 1 ij j 1 ij 0 Maximum Likelihood Estimates yijk m i j ij ijk where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are independent Normal with mean 0 and variance s2 and a b n mˆ y yijk abn i 1 j 1 k 1 b n ˆi yi y yijk bn y j 1 k 1 a n ˆ j y j y yijk an y i 1 k 1 ^ ij yij yi y j y n yijk n yi y j y k 1 a b n 2 1 2 sˆ yijk yij nab i 1 j 1 k 1 2 ^ 1 a b n ˆ ˆ ˆ y m ijk i j ij nab i 1 j 1 k 1 This is not an unbiased estimator of s2 (usually the case when estimating variance.) The unbiased estimator results when we divide by ab(n -1) instead of abn The unbiased estimator of s2 is a b n 2 1 2 s yijk yij ab n 1 i 1 j 1 k 1 a b n ^ 1 yijk mˆ ˆ i ˆ j ij ab n 1 i 1 j 1 k 1 1 SS Error MS Error ab n 1 where SS Error yijk yij a b n i 1 j 1 k 1 2 2 Testing for Interaction: We want to test: H0: ()ij = 0 for all i and j, against HA: ()ij ≠ 0 for at least one i and j. 1 The test statistic a 1 b 1 MS AB F MS Error MS Error SS AB where ^ SS AB ij yij yi y j y a b i 1 j 1 2 a b i 1 j 1 2 We reject H0: ()ij = 0 for all i and j, If MS AB F F (a 1)(b 1), ab(n 1) MS Error Testing for the Main Effect of A: We want to test: H0: i = 0 for all i, against HA: i ≠ 0 for at least one i. The test statistic where MS A F MS Error a 1 SS A a 1 MS Error a SS A ˆ yi y i 1 2 i i 1 2 We reject H0: i = 0 for all i, If MS A F F (a 1), ab(n 1) MS Error Testing for the Main Effect of B: We want to test: H0: j = 0 for all j, against HA: j ≠ 0 for at least one j. The test statistic where MS B F MS Error 1 SS B b 1 MS Error 2 2 ˆ SSB j y j y b b j 1 j 1 We reject H0: j = 0 for all j, If MS B F F (b 1), ab(n 1) MS Error The ANOVA Table Source S.S. d.f. MS =SS/df F A SSA a-1 MSA MSA / MSError B SSB b-1 MSB MSB / MSError AB SSAB (a - 1)(b - 1) MSAB MSAB/ MSError Error SSError ab(n - 1) MSError Total SSTotal abn - 1 Computing Formulae a b n Let T yijk i 1 j 1 k 1 b n a n n Ti yijk , T j yijk , Tij yijk j 1 k 1 i 1 k 1 2 T 2 yijk ••• nab i 1 j 1 k 1 a Then SSTotal k 1 a b 2 i •• n 2 ••• a 2 • j• T 2 ••• T T T SS A , SS B nab nab i 1 nb i 1 na 2 2 a T2 a a T2 T T ij • • j• i •• SS AB ••• , nab i 1 n i 1 nb i 1 na and SSError SSTotal SS A SS B SS AB