Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 111 - Lecture 19 One Way Analysis of Variance ANOVA •A statistical method for comparing several population means • This is a generalization of the two sample t-test to more than two groups. •Data We will obtain a SRS from each of the k populations. •Null hypothesis All the populations means are the same. 1 Example: Workplace safety •Workers were asked to rate various elements of safety •A composite score called the Safety Climate Index was calculated. Its values are between 0-100. •The workers were classified according to their job category as unskilled, skilled and supervisor. Job n mean SD Unskilled workers 448 70.42 18.27 Skilled Workers 91 71.21 18.83 Supervisors 51 80.51 14.58 Example: Workplace safety • The purpose of Anova is to asses whether the observed differences among sample means are statistically significant • Is this variation among the means is due to chance or is it good evidence for a difference among the population means? 2 Example: Workplace safety • Just looking at the means is not enough! • We need to look at the standard error which depends on the standard deviations of each group and their sizes. Example: Workplace safety • Within-group variation • Between-group variation 3 Example: Workplace safety • If the between group variation is large and the within group variation is small This will imply that the means are likely to be different The ANOVA model The one-way ANOVA model assumptions 1.The observations in group i are generated from a normal distribution with mean μi. 2.The groups population standard deviations are equal 4 The ANOVA model Estimation 1. Estimating the population means ̂i xi 2. Estimating the standard deviations by the pooled estimator ( n1 1) s12 ( n2 1) s22 ... ( nk 1) sk2 2 sp ( n1 1) ( n2 1) ... ( nk 1) The ANOVA model for the worker safety Is it reasonable to assume normality in our case? Is it reasonable to assume equal standard deviation? 5 The ANOVA model for the worker safety Rule for examining standard deviations in ANOVA If the largest standard deviation is less than twice the smaller standard deviation, we can use methods based on the assumption of equal standard deviations, and our results will still be approximately correct The ANOVA model for the worker safety Testing hypothesis in one-way ANOVA H 0 : 1 2 ... k H a : not all of the i are equal Notice if you reject the null it DOES NOT imply that ALL of the means are different from each other. It could be that only two differ and the rest are the same! 6 The ANOVA model for the worker safety The information for testing the null hypothesis is organized in an ANOVA table. Sum of Squares df Mean squares F Between groups SSG K-1 MSG=SSG/(K-1) MSG/MSE Within group SSE N-K MSE=SSE/(N-K) total SST N-1 N- is the total number of observations (in the data set) K- number of categories The ANOVA model for the worker safety The information for testing the null hypothesis is organized in an ANOVA table. Sum of Squares df Mean squares F Between groups SSG K-1 MSG=SSG/(K-1) MSG/MSE Within group SSE N-K MSE=SSE/(N-K) total SST N-1 SSG is the estimated total variation between the groups means SSE is the estimated total variation within the groups SST=SSG+SSE 7 The ANOVA model for the worker safety The information for testing the null hypothesis is organized in an ANOVA table. Sum of Squares df Mean squares F Between groups SSG K-1 MSG=SSG/(K-1) MSG/MSE Within group SSE N-K MSE=SSE/(N-K) total SST N-1 MSG is the estimated average variation between the groups means MSE is the estimated average variation within the groups The ANOVA model for the worker safety The information for testing the null hypothesis is organized in an ANOVA table. Sum of Squares df Mean squares F Between groups 4662.2 2 2331.116 7.137 Within group 191729.2 587 326.626 total 196391.4 589 •The F-statistic from the ANOVA follows a new distribution that is called F distribution •The F-distribution has two parameters: 1. Numerator DF 2. Denominator DF 8 The ANOVA model for the worker safety The ANOVA model for the worker safety • P-value turns out to be 0.001 • Conclusion We reject the null hypothesis of equal mean values. This implies that some of the groups means differ from each other. 9 ANOVA in JMP Jmp! 10