* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hypothesis Testing
Survey
Document related concepts
Transcript
Six Sigma Greenbelt Training Hypothesis Testing Dave Merritt 12/6/16 Hypothesis Testing Flowchart Variance Tests Chi Square Test Compares a distribution to a specification or target Mean Tests 1 Sample t Test Compares one group mean to a target Variable Data Test for Equal Variances Compares the variances of 2 or more distributions One way ANOVA Compares the means of two or more distributions 2 Sample t Test Hypothesis Testing Compares the means of two distributions F Test Compares the variances of 2 distributions 2 Paired t Test Compares the paired differences of two distributions Attribute Data Contingency Table Evaluates if methods or classifications are independent Hypothesis Testing Flowchart Hypothesis Testing Contingency Tables This tool is used to test the relationship between two sources of variation. In statistics, the relationship can be two ways: Independent. There is no relationship at all (Two different populations) Dependent. There is common relationship between them (Same population). This tool is going to tell us which of the relationships is statistically valid. It is important to say that this won’t tell us if data is good or bad, only if there is a difference Attribute Data Contingency Table Evaluates if methods or classifications are independent Contingency Table The statistic to use is Chi-Square (2): Oij -Eij)2/Eij degrees of freedom = (r-1)(c-1) where: O is the observed value. E is the expected value (Minitab will calculate this for you) r = number of rows c =number of columns Contingency Table Chi-Square (2): The Chi-Square Distribution Probability Decreases Chi-Square Distribution 0.5 0.4 Probability 8 df The Chi-square distribution changes with the number of Degrees of Freedom 0.3 4 df 0.2 2 df 0.1 Chi Square As Chi-Square increases 16.0 15.0 14.0 13.0 12.0 11.0 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 0.0 Contingency Tables Example: To illustrate the use and analysis of contingency tables, let’s use an evaluation of vehicle color preference vs. vehicle type. Is the color preference dependent on the type? Red Green Sports Car 201 45 Sedan 183 58 Truck 178 64 Solution: Step 1 Null Hypothesis Ho: Preference for Vehicle color is independent of vehicle type Step 2 Alternate Hypothesis Ha: Color preference is not independent of vehicle type Step 3 To determine the critical value of the chi-square, we need to know df(degree of freedom), the number of degrees of freedom involved. Step 4 We will use Minitab to find both df, and the CHI-SQUARE statistic, and calculate the expected values. Contingency Tables Let’s Go to MINTAB (HYPTEST.mpj) to set-up the CHI-SQUARE test/Contingency Table for this Color vs Vehicle Type example 1)Set up the following table in Minitab C1 C2 C3 C4 Color Sports Car Sedan Truck 1 Red 2 Green 201 183 178 45 58 64 2) Minitab Stat - Tables - Chi-Square Test for Association 3) Select all Columns C2, C3, C4 4) Select OK Contingency Tables Minitab Results Chi-Square Test Expected counts are printed below observed counts Sports C Sedan Truck Total 201 183 178 562 189.65 185.79 186.56 45 58 64 56.35 55.21 55.44 246 241 242 1 2 Total Chi-Sq = 167 729 0.680 + 0.042 + 0.393 + 2.288 + 0.141 + 1.322 = 4.866 DF = 2, P-Value = 0.088 Contingency Tables - Detail Chi-Square Test Expected counts are printed below observed counts Observed Counts Sports C Sedan Truck Total 201 183 178 562 189.65 185.79 186.56 45 58 64 56.35 55.21 55.44 1 Expected Count: Product of the ith row total and the jth column total divided by the total of all cells. (562*246)/729= 189.65 Total of first column (observed data) (201 + 45 = 246) Contribution to Chi-Sq 2 Total Chi-Sq = 246 241 242 167 729 0.680 + 0.042 + 0.393 + 2.288 + 0.141 + 1.322 = 4.866 DF = 2, P-Value = 0.088 [(Obs – Exp)^2]/Exp Degrees of Freedom (rows-1)*(columns-1) (2-1)*(3-1) = 2 Total across row (201 + 183 + 178 = 562) Total in end column (observed data) (562 + 167 = 729) The probability that you would have obtained the observed counts if the variables were independent of each other. If this value is less than or equal to your alpha-level, you can say the variables are dependent. . Contingency Table 5) Determine the degrees of freedom (r-1)(c-1) (2-1)(3-1) = 2 6) Determine the Critical Value using Minitab Calc – Probability Distributions – Chi Square – Inverse Cumulative Probability Degrees of freedom = 2 Input Constant = .95 (1-a = 1.0 - 0.05 = .95) 7) OK Inverse Cumulative Distribution Function Or, use tables in appendix! Chi-Square with 2 DF: P( X <= x) 0.9500 x 5.9915 8) Conclusion Chi-Square Value = 4.866 which is less than the Critical Value of 5.9915 p-value = 0.088 which is > 0.05 Accept the Null Hypothesis – The Color Preference and Vehicle Type are Independent Chi-Squared Distribution Chi-Square Distribution df=2 alpha=0.05 0.5 Probability 0.4 0.3 Chi-Sq = 4.866 0.2 Critical Value 5.99 0.1 0 0 1 2 3 4 5 Chi Square Accept Null Hypothesis 6 7 8 Hypothesis Testing Flowchart Variance Tests Variable Data Test for Equal Variances Compares the variances of 2 or more distributions Hypothesis Testing F Test Compares the variances of 2 distributions Attribute Data Contingency Table Evaluates if methods or classifications are independent Test for Equal Variances When comparing two distributions using variable data, we must first decide if there is no statistical difference in the variances. This is important since it affects the formula used to perform the test on the means. We also need to know if the distributions are normally distributed since this can affect the type of homogeneity of variance test used. Our first step will be to plot the data using the normal probability plot in Minitab (Stat-Basic Statistics-Normality test) The results of this step will determine how you proceed. Test for Equal Variances If the normal probability plot indicates we are dealing with normally distributed data, then we can use one of two types of tests: – F test - Only for use if there are two distributions. (Minitab will perform this test under Homogeneity of Variance.) – Bartletts test - This can be used for two or more distributions. (Minitab performs this test under Homogeneity of Variance.) If the data are not normally distributed, then we will offer only one option: – Levines test - This may be used on two or more distributions. (Minitab performs this test under Homogeneity of Variance.) Test for Equal Variances When performing the Homogeneity of Variance test we will use Minitab: – Stat – ANOVA – Test for Equal Variances – Enter data in the stacked format – Click on OK The test indicates a significant difference if the calculated p-value is less than the specified alpha value.(95% confidence has an alpha of 0.05 or 5%, therefore, a calculated p-value of less than 0.05 indicates a significant difference between the two distributions.) It is important to note that the test only indicates a significant difference. It cannot determine goodness or badness. Your knowledge of the process must be used to evaluate this condition. Test for Equal Variances The calculated F value should be compared to the critical F value. If the calculated value is larger than the critical value then there is a significant difference between the two distributions. If it is the same or smaller, then statistically there is no difference between the two distributions. They represent the same population. There are two important issues to note: 1. This number is indicating a difference if it is larger because it is using the F value. The F, Bartlett’s and Levine’s test in Test for Equal Variances is using a probability statistic, therefore a smaller number indicates significance (<.05). 2. These tests indicate only a difference, not goodness or badness. Test for Equal Variances Example 1 Example using data in file HYPTEST.MTW. This data is for two different reactors Reactor1 Reactor2 89.7 84.7 81.4 86.1 84.5 83.2 84.8 91.9 87.3 86.3 79.7 79.3 85.1 82.6 81.7 89.1 83.7 83.7 84.5 88.5 This is unstacked data and must be stacked to use Minitab`s Test for Equal Variances. Test for Equal Variances Example 1 Stack Command Now we’ll change the data from unstacked to stacked. Yield 89.7 81.4 84.5 84.8 87.3 79.7 85.1 81.7 83.7 84.5 84.7 86.1 83.2 91.9 86.3 79.3 82.6 89.1 83.7 88.5 Reactor 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Minitab-Data-Stack-Columns Stack columns C1 and C2 Store stacked data in Yield Store subscripts into Reactor OK Now we have our Output variable in C3 and our Input Variable in C4. Minitab automatically assigns sequential values to each column when it is stacked. Test for Equal Variances Example 1 Now we’ll test the data for Normality. Minitab-Stat-Basic Statistics-Normality Test, Variable - Yield p-value > 0.05 data is normal Test for Equal Variances Example 1 Now perform the Test of Equal Variances: Stat-ANOVA- Test of Equal Variances Response-Yield, Factor-Reactor, 95% Confidence Test for Equal Variances: Yield 1 versus Reactor 1 Method Null hypothesis All variances are equal Alternative hypothesis At least one variance is different Significance level α = 0.05 95% Bonferroni Confidence Intervals for Standard Deviations Reactor 1 N StDev CI Reactor1 10 2.90180 (1.65970, 6.53916) Reactor2 10 3.65033 (2.18733, 7.85175) Individual confidence level = 97.5% Tests Method Multiple comparisons Levene Test Statistic 0.48 0.78 P-Value 0.487 0.390 p-value > 0.5 Accept Null Hypothesis There is no difference in the variances Test for Equal Variances Example 1 Graphical Results Test for Equal Variances: Yield 1 vs Reactor 1 Multiple comparison intervals for the standard deviation, α = 0.05 Multiple Comparisons P-Value 0.487 Levene’s Test Reactor1 Reactor 1 P-Value Reactor2 2 3 4 5 If intervals do not overlap, the corresponding stdevs are significantly different. 6 0.390 Test for Equal Variances Example 2 Example 2 using data in file HYPTEST.MTW. This data is for two different trimmers This example compares the variances between the output of two different trim machines The response is the trimmed OD Dimension Test for Equal Variances Example 2 1) Stack the data – Minitab-Manip-Stack/Unstack Stack columns C7 and C8 Store stacked data in OD Store subscripts into Trimmer OK 2) Test the data for Normality – Minitab-Stat-Basic Statistics-Normality Test, Variable – OD OK Test for Equal Variances Example 2 p-value < 0.05 data is not normal Test for Equal Variances Example 2 Now perform the Homogeneity of variance test: Stat-ANOVA-Test for Equal Variances Response-OD, Factor-Trimmer, 95% Confidence Test for Equal Variances: OD versus Trimmer Method Null hypothesis All variances are equal Alternative hypothesis At least one variance is different Significance level α = 0.05 95% Bonferroni Confidence Intervals for Standard Deviations Trimmer N StDev CI Trimmer 1 20 0.0099852 (0.0074127, 0.015148) Trimmer 2 20 0.0842441 (0.0654102, 0.122195) Individual confidence level = 97.5% Tests Method Multiple comparisons Levene Test Statistic 56.31 22.88 P-Value 0.000 0.000 p-value < 0.5 Reject Null Hypothesis There is a difference in the variances Test for Equal Variances Example 2 Graphical Results Test for Equal Variances: OD vs Trimmer Multiple comparison intervals for the standard deviation, α = 0.05 Multiple Comparisons P-Value 0.000 Levene’s Test Trimmer 1 Trimmer P-Value Trimmer 2 0.00 0.02 0.04 0.06 0.08 0.1 0 If intervals do not overlap, the corresponding stdevs are significantly different. 0.1 2 0.000 Hypothesis Testing Flowchart Variance Tests Mean Tests 1 Sample t Test Compares one group mean to a target Variable Data Test for Equal Variances Compares the variances of 2 or more distributions 2 Sample t Test Hypothesis Testing Compares the means of two distributions F Test Compares the variances of 2 distributions 2 Paired t Test Compares the paired differences of two distributions Attribute Data Contingency Table Evaluates if methods or classifications are independent T-tests Single mean compared to a target value Comparison of two independent group means Comparison of paired data from two groups Single Mean Compared to Target Example 1 Example using file Bhh73.mtw The example includes 10 measures of specific gravity from an alloy. The question is: Is the mean of the sample representative of a target value of 84.12? The Hypotheses: H o : = 84.12 H a : 84.12 Ho can be rejected if p < 0.05 Single Mean Compared to Target Example 1 Minitab – Stat - Basic Statistics - 1-Sample t Variables: C2-Sample Hypothesized: 84.12 Alternate: not equal OK Test of μ = 84.12 vs ≠ 84.12 Variable N Mean StDev SE Mean Reactor2 10 85.54 3.65 1.15 95% CI T P (82.93, 88.15) 1.23 0.250 Notice that 84.12 is included in this interval p-value > 0.05 Accept Null Hypothesis The sample mean is representative of the target Comparison of Two Independent Sample Means Example 1 Now we will make a comparison between two group means. This is really our first experimental design—One Attribute Factor (Input) and one quantitative Output. We’ll use data file Bhh77.mtw. Let’s change the scenario to comparing Reactor 1 to Reactor 2 on chemical yield. There are two ways to enter the data. Enter Reactor 1 yields in C1 and Reactor 2 yields in C2. This is called the “unstacked approach” Enter all yields in C1 and enter Reactor number in C2. Minitab calls C2 a subscript variable, the “stacked approach”. The second method is preferred; we always want one column for each Input variable and one column for each Output variable. Let’s start with the Unstacked data and then we’ll use the stacked data. Two Sample t Test Unstacked Example 1 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in different columns First: Reactor 1 Second: Reactor 2 Alternative: not equal OK Two Sample T-Test and Confidence Interval Two sample T for Reactor1 vs Reactor2 N Reactor1 10 Reactor2 10 Mean 84.24 85.54 StDev SE Mean 2.90 0.92 3.65 1.2 p-value > 0.05 Accept Null Hypothesis The reactors appear to have the same yield 95% CI for mu Reactor1 - mu Reactor2: ( -4.40, 1.8) T-Test mu Reactor1 = mu Reactor2 (vs not =): T = -0.88 P = 0.39 DF = 18 Both use Pooled StDev = 3.30 Two Sample t Test Stacked Example 1 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in one column Samples: Yield Subscripts: Reactor Alternative: not equal OK Two Sample T-Test and Confidence Interval Two sample T for Yield Reactor 1 2 N 10 10 Mean 84.24 85.54 StDev SE Mean 2.90 0.92 3.65 1.2 95% CI for mu (1) - mu (2): ( -4.40, 1.8) T-Test mu (1) = mu (2) (vs not =): T = -0.88 P = 0.39 DF = 18 Both use Pooled StDev = 3.30 Two Sample t Test Example 2 We’ll use data file Bhh77.mtw. Let’s change the scenario to comparing the means of a molding process Before and After a process change to eliminate Nonfills Response = % Nonfill Scrap / Heat Use Stacked and Unstacked methods Two Sample t Test Unstacked Example 2 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in different columns First: Before Second: After Alternative: not equal OK Two Sample T-Test and Confidence Interval Two sample T for Before vs After N Before 100 After 100 Mean StDev SE Mean 0.05011 0.00109 0.00011 0.004983 0.000103 0.000010 p-value < 0.05 Reject Null Hypothesis The process change appears to have reduced the nonfills 95% CI for mu Before - mu After: ( 0.04491, 0.045347) T-Test mu Before = mu After (vs not =): T = 410.87 P = 0.0000 DF = 198 Both use Pooled StDev = 0.000777 Two Sample t Test Stacked Example 1 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in one column Samples: %Scrap Subscripts: B/F Alternative: not equal OK Two Sample T-Test and Confidence Interval Two sample T for %Scrap B/F 1 2 N 100 100 Mean StDev SE Mean 0.05011 0.00109 0.00011 0.004983 0.000103 0.000010 95% CI for mu (1) - mu (2): ( 0.04491, 0.045347) T-Test mu (1) = mu (2) (vs not =): T = 410.87 P = 0.0000 DF = 198 Both use Pooled StDev = 0.000777 Paired Comparisons This is a case where we can pair observations. A good example is where we compare measurements made by an on-line system to measurements made in a lab using the same samples. This also can also be used in measurement system studies to see if operators are getting the same mean value across the same set of samples. Let’s look at the example in file Paircomp.mtw. We are testing shoe material. We have a sample of 10 boys and we’ll have each boy wear one shoe made from each material. Pair Comparisons Example 1 Use data on file Paircomp.mtw Material A: 13.2, 8.2, 10.9, 14.3, 10.7, 6.6, 9.5, 10.8, 8.8, 13.3 Material B: 14.0, 8.8, 11.2, 14.2, 11.8, 6.4, 9.8, 11.3, 9.3, 13.6 BOY MAT A MAT B Delta d 1 13.2 14.0 -0.80000 2 8.2 8.8 -0.60000 3 10.9 11.2 -0.30000 4 14.3 14.2 0.10000 5 10.7 11.8 -1.10000 6 6.6 6.4 0.20000 7 9.5 9.8 -0.30000 8 10.8 11.3 -0.50000 9 8.8 9.3 -0.50000 10 13.3 13.6 -0.30000 Our new Output variable is Delta (d). d = x Material A – x Material B Hypotheses Ho: d = 0 Ha: d 0 Where d = x matA x matB Pair Comparisons Example 1 Minitab – Stat - Basic Statistics – Paired-t. Sample 1: Material A Sample 2: Material B OK Paired T for Material A - Material B Material A Material B Difference N 10 10 10 Mean 10.630 11.040 -0.410 StDev 2.451 2.518 0.387 SE Mean 0.775 0.796 0.122 p-value < 0.05 Reject Null Hypothesis The Delta ddoes not = 0 95% CI for mean difference: (-0.687, -0.133) T-Test of mean difference = 0 (vs ≠ 0): T-Value = -3.35 P-Value = 0.009 Pair Comparisons Example 1 Minitab – Stat - Basic Statistics - 1-Sample-t. Variable: Delta Test mean: 0.0 OK p-value < 0.05 Reject Null Hypothesis The Delta ddoes not = 0 T-Test of the Mean Test of mu = 0.000 vs mu not = 0.000 Variable Delta N 10 Mean -0.410 StDev 0.387 SE Mean 0.122 T -3.35 P 0.009 Doing the “Wrong” Analysis We’ll use the same data and analyze it with the two independent sample comparison. Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in different columns First: Material A Second: Material B Alternative: not equal Assume Equal Variances OK Two Sample T-Test and Confidence Interval Two sample T for Material A vs Material B N Material 10 Material 10 Mean 10.63 11.04 StDev SE Mean 2.45 0.78 2.52 0.80 95% CI for mu Material - mu Material: ( -2.74, 1.92) T-Test mu Material = mu Material (vs not =): T = -0.37 P = 0.72 DF = 18 Both use Pooled StDev = 2.49 Why is one analysis significant and one not significant? Doing the “Wrong” Analysis Performing the 2 sample t Test compares the two distributions without regard to pairing BOY MAT A MAT B 1 13.2 14.0 2 8.2 8.8 3 10.9 11.2 4 14.3 14.2 5 10.7 6 6.6 7 9.5 9.8 8 10.8 11.3 9 8.8 9.3 10 13.3 13.6 vs 11.8 6.4 The test is designed to compare the wear of the materials under equal conditions. That’s why each boy wears one shoe of each material. However, each boy will cause different amounts of wear on his pair of shoes For Example, Boy #1 has worn his shoes 2 times the amount of Boy #6 The 2 Sample t Test did not detect a significant difference p-value = 0.72 > 0.05 Hypothesis Testing Flowchart Variance Tests Chi Square Test Compares a distribution to a specification or target Mean Tests 1 Sample t Test Compares one group mean to a target Variable Data Homogeneity of Variance Compares the variances of 2 or more distributions One way ANOVA Compares the means of two or more distributions 2 Sample t Test Hypothesis Testing Compares the means of two distributions F Test Compares the variances of 2 distributions Paired t Test Compares the paired differences of two distributions Attribute Data Contingency Table Evaluates if methods or classifications are independent