Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ANALYSIS OF VARIANCE (ANOVA) Chapter 12 Chapter Problem 2 A Study involved children who lived within 7km of a large ore smelter that emitted lead pollution. To investigate the possible effect of lead exposure on performance IQ scores Chapter Problem 3 Informal and subjective comparisons show that the low group has a mean that is somewhat higher than the means of the medium and high groups. Formal Methods: The method of Section 9-3 to compare means from samples collected from two different populations. But here we need to compare means from samples collected from _____ different populations. Need the method of _______________________. Analysis of Variance (분산분석) 4 Analysis of variance(ANOVA) is a method for testing the hypothesis that three or more population means are equal. 셋 이상의 표본정보를 이용, 이들에서 추출된 셋 이상의 모집단이 갖는 평균들이 서로 같은가 를 확인하는 방법 Hypothesis: ANOVA requires the F-distribution 5 not symmetric; it is skewed to the right. 2. The values of F can be 0 or positive, 3. for each pair of degrees of freedom for the numerator and denominator. Critical values of F are given in Table A-5 1. Content 1. 2. One-way ANOVA Two-way ANOVA 1. One-way ANOVA One-Way ANOVA (일원분산분석) 8 A method of testing the equality of three or more population means by analyzing sample variances. One-way analysis of variance is used with data categorized with ___________________, which is a characteristic that allows us to distinguish the different populations from one another. Requirements 9 1. The populations have approximately normal distributions. 2. The populations have the same variance σ2 (or standard deviation σ). 3. The samples are simple random samples of quantitative data. 4. The samples are independent of each other. 5. The different samples are from populations that are categorized in only one way. Example: Lead and Performance IQ Scores 10 Use the performance IQ scores listed in Table 12-1 and a significance level of α = 0.05 to test the claim that the three samples come from populations with means that are all equal. 11 Example: Lead and Performance IQ Scores Here are summary statistics from the collected data: Example: Requirement Check 12 1. 2. 3. 4. 5. The three samples appear to come from populations that are approximately normal (normal quantile plots). The three samples have standard deviations that are not dramatically different. We can treat the samples as simple random samples. The samples are independent of each other and the IQ scores are not matched in any way. The three samples are categorized according to a single factor: ________________________________ Example: Results 13 The hypotheses are: H0 : 1 2 3 H1 : At least one of the means is different from the others. Results: Example: Procedure 14 15 Example: Lead and Performance IQ Scores The displays all show that the P-value is 0.01951. Because the P-value is less than the significance level of α = 0.05, we can ___________________. There is sufficient evidence that the three samples come from populations with means that are _____________. We cannot conclude formally that any particular mean is different from the others, but it appears that greater blood lead levels are associated with lower performance IQ scores. Class Talk to test the claim that the three samples come from populations with means that are all equal. Class Talk 17 Relationship b/w P-value & F Test Statistic 18 Larger values of the test statistic result in smaller Pvalues, so the ANOVA test is right-tailed. F Test Statistic 19 Assuming that the populations have the same variance σ2 (as required for the test), the F test statistic is the ratio of these two estimates of σ2: variation ________ samples (based on variation among sample means) variation ______ samples (based on the sample variances). F Test Statistic: Calculations w/ Equal Sample Sizes n 20 variance between samples F variance within samples 2 x Variance between samples = ns 2 where s x =variance of sample means s 2 p Variance within samples = where s 2p =pooled variance (or the mean of the sample variances) Calculations w/ Equal Sample Sizes n 21 Variance Between Samples: 표본평균값들의 분산 표본평균들이 총 평균을 기준으로 볼 때 표본들간 에 얼마나 변동폭이 큰지를 측정. Variance Within Samples: 각 표본 분산의 평균 우연히 발생한 각 표본의 잔차의 합. 각표본마다 계측되는 분산들의 합이 잔차를 대변. 21 Calculations w/ Equal Sample Sizes n 22 Sample Calculations 23 s 2 x x x 2 x 5.5 5.83 6.0 5.83 6.0 5.83 2 k 1 2 s 2 3 1 2 0.0833 nsx2 4*(0.0833) 0.3332 2 ns 0.3332 3.0 2.0 2.0 x 2 sp 2.333 F s 2 2.3333 0.1428 p 3 Critical Value of F 24 Right-tailed test Degree of freedom with k samples of the same size n numerator df = k – 1 denominator df = k(n – 1) where k = number of samples n = sample size 25 분산 분석: 일원 배치법 요약표 인자의 수준 Sample 1 Sample 2 Sample 3 관측수 합 평균 분산 4 4 22 24 5.5 6 3 2 4 24 6 2 분산 분석 변동의 요인 처리 (Between) 잔차 (Within) 제곱합 (SS) 자유도 제곱 평균 (df) (MS) F비 P-값 F 0.66667 2 0.333333 0.14286 0.868805 4.26 21 9 2.333333 FYI: Calculations with Unequal Sample Sizes 27 ni(xi – x)2 F variance between samples = variance within samples = k –1 (ni – 1)s2i (ni – 1) where x = mean of all sample scores combined k = number of population means being compared ni = number of values in the ith sample xi = mean values in the ith sample si2 = variance of values in the ith sample 2. Two-way ANOVA Key Concept 29 We introduce the method of two-way analysis of variance, which is used with _________________ __________ according to ____________. The methods of this section require that we begin by testing for an interaction between the two factors. Then we test whether the row or column factors have effects. Example 30 The data in the table are categorized with two factors: 1. Sex: Male or Female 2. Blood Lead Level: Low, Medium, or High The subcategories are called _____, and the response variable is IQ score. Example 31 There is an interaction between two factors if the effect of one of the factors changes for different categories of the other factor. How many cells? _____ Example 32 Let’s explore the IQ data in the table by calculating the mean for each cell and constructing an interaction graph Example 33 An interaction effect is suggested if the line segments are far from being parallel. No interaction effect is suggested if the line segments are approximately parallel. For the IQ scores(the figure), it appears there is an __________________: Females with high lead exposure appear to have lower IQ scores, while males with high lead exposure appear to have high IQ scores. Requirements 34 1. For each cell, the sample values come from a population with a distribution that is approximately normal. 2. The populations have the same variance σ2. 3. The samples are simple random samples. 4. The samples are independent of each other. 5. The sample values are categorized two ways. 6. All of the cells have the same number of sample values (a balanced design – this section does not include methods for a design that is not balanced). Procedure for Two-way ANOVA 35 Step 1: Interaction Effect - test the null hypothesis that there is ______________ Step 2: Row/Column Effects - if we conclude there is no interaction effect, proceed with these two hypothesis tests Row Factor: _________________ Column Factor: ______________________ All tests use the F distribution 36 37 Procedure 38 Result 39 Result Step 1: Interaction Effect : No interaction b/w the 2 factors MS (Interaction) 105.7333 F 0.4311 MS (error) 245.2667 40 Example: Continued 41 Step 1: Test that there is no interaction between the two factors. The test statistic is F = 0.43 and the P-value is 0.655, so we fail to reject the null hypothesis. It does not appear that the performance IQ scores are affected by an interaction between sex and blood lead level. There does not appear to be an interaction effect, so we proceed to test for row and column effects. Result Step 1: Row/Column Effect MS ( Sex) 17.6333 0.0719 Row Factor: F MS (error) 245.2667 MS ( Lead Level ) 24.4 Column Factor: F 0.0995 MS (error) 245.2667 42 Example: Continued 43 Step 2: Hypothesis test H0 : There are no effects from the row factor (gender). H0 : There are no effects from the column factor (blood lead level). For the row factor, F = 0.0719 and the P-value is 0.791. Fail to reject the null hypothesis, there is no evidence that IQ scores are affected by the gender of the subject. For the column factor, F = 0.0995 and the P-value is 0.906. Fail to reject the null hypothesis, there is no evidence that IQ scores are effected by the level of lead exposure. Example: Continued 44 Interpretation: Based on the sample data, we conclude that IQ scores do not appear to be affected by sex or blood lead level. Class Talk 45