Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 16 Analysis of Variance Copyright ©2011 Brooks/Cole, Cengage Learning 1 9.1 Parameters, Statistics, and Statistical Inference A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean x , sample standard deviation s, and sample proportion p̂. A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean m, population standard deviation s, and population proportion p. Copyright ©2011 Brooks/Cole, Cengage Learning 2 ANOVA Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors. If one categorical variable: one-way ANOVA If two categorical variables: two-way ANOVA Copyright ©2011 Brooks/Cole, Cengage Learning 3 16.1 Comparing Means with an ANOVA F-Test H0: m1 = m2 = … = mk Ha: The population means are not all equal. F-statistic: Variation among sample means F Natural variation within groups Copyright ©2011 Brooks/Cole, Cengage Learning 4 Variation among sample means F Natural variation within groups Variation among sample means is 0 if all k sample means are equal and gets larger the more spread out they are. If large enough evidence at least one population mean is different from others reject null hypothesis. p-value found using an F-distribution (more later) Copyright ©2011 Brooks/Cole, Cengage Learning 5 Example 16.1 Seat Location and GPA Q: Do best students sit in the front of a classroom? Data on seat location and GPA for n = 384 students; 88 sit in front, 218 in middle, 78 in back Students sitting in the front generally have slightly higher GPAs than others. Copyright ©2011 Brooks/Cole, Cengage Learning 6 Example 16.1 Seat Location and GPA H0: m1 = m2 = m3 Ha: The three population means are not all equal. The F-statistic is 6.69 and the p-value is 0.0001. p-value so small reject H0 and conclude there are differences among the population means. Copyright ©2011 Brooks/Cole, Cengage Learning 7 Example 16.1 Seat Location and GPA 95% Confidence Intervals for 3 population means: Interval for “front” does not overlap with the other two intervals significant difference between mean GPA for front-row sitters and mean GPA for other students Copyright ©2011 Brooks/Cole, Cengage Learning 8 Notation for Summary Statistics k = number of groups x , si, and ni are the mean, standard deviation, and sample size for the ith sample group N = total sample size = n1 + n2 + … + nk Example 16.2 Seat Location and GPA Three seat locations k = 3 n1 = 88, n2 = 218, n3 = 78; N = 88+218+78 = 384 x1 3.2029, x2 2.9853, x3 2.9194 s1 0.5491, s2 0.5577, s3 0.5105 Copyright ©2011 Brooks/Cole, Cengage Learning 9 Assumptions for the F-Test • Samples are independent random samples. • Distribution of response variable is a normal curve within each population. • Different populations may have different means. • All populations have same standard deviation, s. How k = 3 populations might look … Copyright ©2011 Brooks/Cole, Cengage Learning 10 Conditions for Using the F-Test • F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different. • Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large. • A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations. Copyright ©2011 Brooks/Cole, Cengage Learning 11 Example 16.3 Seat Location and GPA • The boxplot showed two outliers in the group of students who typically sit in the middle of a classroom, but there are 218 students in that group so these outliers don’t have much influence on the results. • The standard deviations for the three groups are nearly the same. • Data do not appear to be skewed. Necessary conditions for F-test seem satisfied. Copyright ©2011 Brooks/Cole, Cengage Learning 12 The Family of F-Distributions • Skewed distributions with minimum value of 0. • Specific F-distribution indicated by two parameters called degrees of freedom: numerator degrees of freedom and denominator degrees of freedom. • In one-way ANOVA, numerator df = k – 1, and denominator df = N – k Copyright ©2011 Brooks/Cole, Cengage Learning 13 Determining the p-Value Statistical Software reports the p-value in output. Table A.4 provides critical values for 1% and 5% significance levels. • If the F-statistic is > than the 5% critical value, the p-value < 0.05. • If the F-statistic is > than the 1% critical value, the p-value < 0.01 . • If the F-statistic is between the 1% and 5% critical values, the p-value is between 0.01 and 0.05. Copyright ©2011 Brooks/Cole, Cengage Learning 14 16.2 Details of One-Way Analysis of Variance Fundamental concept: the variation among the data values in the overall sample can be separated into: (1) differences between group means (2) natural variation among observations within a group Total variation = Variation between groups + Variation within groups ANOVA Table displays this information. Copyright ©2011 Brooks/Cole, Cengage Learning 15 Measuring Variation Between Groups Sum of squares for groups = SS Groups SS Groups groups ni xi x 2 Numerator of F-statistic = mean square for groups SS Groups MS Groups k 1 Copyright ©2011 Brooks/Cole, Cengage Learning 16 Measuring Variation within Groups Sum of squared errors = SS Error SS Errors groups ni 1si2 Denominator of F-statistic = mean square error SS Error MSE N k Pooled standard deviation: Copyright ©2011 Brooks/Cole, Cengage Learning sp MSE 17 Measuring Total Variation Total sum of squares = SS Total = SSTO SS Total values xij x 2 SS Total = SS Groups + SS Error Copyright ©2011 Brooks/Cole, Cengage Learning 18 General Format of a One-Way ANOVA Table Copyright ©2011 Brooks/Cole, Cengage Learning 19 Example 16.7 Analysis of Variation among Weight Losses x1 7 x2 9 x3 15 Program 3 appears to have the highest weight loss overall. Copyright ©2011 Brooks/Cole, Cengage Learning 20 Example 16.8 Analysis of Variation among Weight Losses x1 7, x2 9, x3 15 and x 10 n1 4, n2 3, n3 3 and N 10 SS Groups groups ni xi x 2 47 10 39 10 315 10 114 2 2 2 SS Groups 114 MS Groups 57 k 1 3 1 Copyright ©2011 Brooks/Cole, Cengage Learning 21 Example 16.8 Analysis of Variation among Weight Losses x1 7, x2 9, x3 15 and x 10 n1 4, n2 3, n3 3 and N 10 SS Total values xij x 2 7 10 9 10 5 10 7 10 2 2 2 2 9 10 11 10 7 10 2 2 2 15 10 12 10 18 10 148 2 Copyright ©2011 Brooks/Cole, Cengage Learning 2 2 22 Example 16.8 Analysis of Variation among Weight Losses x1 7, x2 9, x3 15 and x 10 n1 4, n2 3, n3 3 and N 10 SS Error SS Total - SS Groups 148-114 34 SS Error 34 MSE 4.857 N k 10 3 MS Groups 57 F 11.74 with 2 and 7 df MSE 4.857 Copyright ©2011 Brooks/Cole, Cengage Learning 23 Example 16.8 Analysis of Variation among Weight Losses “Factor” used instead of Groups as the groups (weight-loss programs) form an explanatory factor for the response. Note: Pooled StDev is s p MSE 4.86 2.204 Copyright ©2011 Brooks/Cole, Cengage Learning 24 Example 16.9 Top Speeds of Supercars Data: top speeds for six runs on each of five supercars. Kitchens (1998, p. 783) Copyright ©2011 Brooks/Cole, Cengage Learning 25 Example 16.9 Top Speeds Copyright ©2011 Brooks/Cole, Cengage Learning 26 Example 16.9 Top Speeds • F = 25.15 and p-value is 0.000 reject null hypothesis that population mean speeds are same for all five cars. • Conditions are satisfied. Data not skewed and no extreme outliers. Largest sample std dev (5.02 Viper) not more than twice as large as smallest std dev (2.92 Acura). • MS Error =14.5 is an estimate of variance of top speed for hypothetical distribution of all possible runs with one car. Estimated standard deviation for each car is 3.81. • Based on sample means and CIs: Porsche and Ferrari seem to be significantly faster than other cars. Copyright ©2011 Brooks/Cole, Cengage Learning 27 Computation of 95% Confidence Intervals for the Population Means In one-way analysis of variance, a confidence interval for a population mean mi is s p * xi t n i where s p MSE and t* is such that the confidence level is the probability between -t* and t* in a t-distribution with df = N – k. Copyright ©2011 Brooks/Cole, Cengage Learning 28 16.3 Other Methods When data are skewed or extreme outliers present …better to analyze the median instead of mean H0: Population medians are equal. Ha: Population medians are not all equal. Two such tests are: 1. Kruskal-Wallis Test 2. Mood’s Median Test Also called nonparametric tests. Copyright ©2011 Brooks/Cole, Cengage Learning 29 Example 16.12 Drinks and Seat Location Data: Seat location and number of alcoholic drinks per week Data appear skewed, sample standard deviations differ. Students sitting in the back report drinking more. Copyright ©2011 Brooks/Cole, Cengage Learning 30 Example 16.12 Drinks and Seat Location P = 0.000 strong evidence that the population median number of drinks per week are not all equal. Copyright ©2011 Brooks/Cole, Cengage Learning 31 Example 16.13 Drinks and Seat Location P = 0.000 => the null hypothesis of equal population medians can be rejected. Copyright ©2011 Brooks/Cole, Cengage Learning 32