* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes 21
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Categorical variable wikipedia , lookup
Omnibus test wikipedia , lookup
Student's t-test wikipedia , lookup
Stat 112: Lecture 21 Notes • • • • Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1st. I will be e-mailing you tonight or tomorrow some comments on your project ideas. • I will have the quizzes graded by tomorrow’s office hours (Wed. 1:30-2:30); otherwise, I will return to you next Tuesday. Model Building 1. Among the potential explanatory variables, think about which explanatory variables address the question of interest. 2. For each explanatory variable, investigate whether a transformation is needed for it either because of curvature or crunching. 3. Consider adding polynomial terms for each variable if there is remaining curvature for the variable (use the procedure of adding higher orders as long as the highest order term has pvalue < 0.05). 4. Consider interactions between the explanatory variables, adding the interaction if the p-value < 0.05 on the interaction term. Analysis of Variance • The goal of analysis of variance is to compare the means of several (many) groups. • Analysis of variance is regression with only categorical variables • One-way analysis of variance: Groups are defined by one categorical variable. • Two-way analysis of variance: Groups are defined by two categorical variables. Milgram’s Obedience Experiments • Subjects recruited to take part in an experiment on “memory and learning.” • The subject is the teacher. The subject conducted a paired-associated learning task with the student. The subject is instructed by the experimenter to administer a shock to the student each time he gave a wrong response. Moreover, the subject was instructed to “move one level higher on the shock generator each time the learner gives a wrong answer.” The subject was also instructed to announce the voltage level before administering a shock. Four Experimental Conditions 1. Remote-Feedback condition: Student is placed in a room where he cannot be seen by the subject nor can his voice be heard; his answers flash silently on signal box. However, at 300 volts the laboratory walls resound as he pounds in protest. After 315 volts, no further answers appear, and the pounding ceases. 2. Voice-Feedback condition: Same as remotefeedback condition except that vocal protests were introduced that could be heard clearly through the walls of the laboratory. 3. 4. Proximity: Same as the voice-feedback condition except that student was placed in the same room as the subject, a few feet from subject. Thus, he was visible as well as audible. Touch-Proximity: Same as proximity condition except that student received a shock only when his hand rested on a shock plate. At the 150-volt level, the student demanded to be let free and refused to place his hand on the shock plate. The experimenter ordered the subject to force the victim’s hand onto the plate. Two Key Questions 1. Is there any difference among the mean voltage levels of the four conditions? 2. If there are differences, what conditions specifically are different? Oneway Analysis of Voltage Level By Condition 450 Voltage Level 400 350 300 250 200 150 100 Proximity Remote Touch-Proximity Voice-Feedback Condition Means and Std Deviations Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 312.000 405.000 268.125 367.875 Std Dev 129.979 63.640 131.874 119.518 Std Err Mean 20.552 10.062 20.851 18.897 Lower 95% 270.43 384.65 225.95 329.65 Upper 95% 353.57 425.35 310.30 406.10 Multiple Regression Model for Analysis of Variance • To answer these questions, we can fit a multiple regression model with voltage level as the response and one categorical explanatory variable (condition). • We obtain a sample from each level of the categorical variable (group) and are interested in estimating the population means of the groups based on these samples. • Assumptions of multiple regression model for one-way analysis of variance: – Linearity: automatically satisfied. – Constant variance: Check if spread within each group is the same. – Normality: Check if distribution within each group is normally distributed. – Independence: Sample consists of independent observations. Comparing the Groups Expanded Estimates Nominal factors expanded to all levels Term Intercept Condition[Proximity] Condition[Remote] Condition[Touch-Proximity] Condition[Voice-Feedback] Estimate 338.25 -26.25 66.75 -70.125 29.625 Std Error 9.067431 15.70525 15.70525 15.70525 15.70525 t Ratio 37.30 -1.67 4.25 -4.47 1.89 Prob>|t| <.0001 0.0966 <.0001 <.0001 0.0611 Eˆ (Y | Condition Re mote Feedback ) 338.25 66.75 405 Eˆ (Y | Condition Voice Feedback ) 338.25 29.625 367.875 Eˆ (Y | Condition Pr oximity) 338.25 26.25 312 Eˆ (Y | Condition Touch Pr oximity) 338.25 70.125 268.125 • The coefficient on Condition[Proximity]=-26.25 means that proximity is estimated to have a mean that is 26.25 less than the mean of the means of all the conditions. ˆ (Y | Condition Proximity ) E • Sample mean of proximity group. Means and Std Deviations Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 312.000 405.000 268.125 367.875 Std Dev 129.979 63.640 131.874 119.518 Std Err Mean 20.552 10.062 20.851 18.897 Lower 95% 270.43 384.65 225.95 329.65 Upper 95% 353.57 425.35 310.30 406.10 Response Voltage Level Effect Tests Source Condition Nparm 3 DF 3 Sum of Squares 437591.25 F Ratio 11.0881 Prob > F <.0001 Expanded Estimates Nominal factors expanded to all levels Term Intercept Condition[Proximity] Condition[Remote] Condition[Touch-Proximity] Condition[Voice-Feedback] Estimate 338.25 -26.25 66.75 -70.125 29.625 Std Error 9.067431 15.70525 15.70525 15.70525 15.70525 t Ratio 37.30 -1.67 4.25 -4.47 1.89 Prob>|t| <.0001 0.0966 <.0001 <.0001 0.0611 • Effect Test tests null hypothesis that the mean in all four conditions is the same versus alternative hypothesis that at least two of the conditions have different means. • p-value of Effect Test < 0.0001. Strong evidence that population means are not the same for all four conditions. JMP for One-way ANOVA • One-way ANOVA can be carried out in JMP either using Fit Model with a categorical explanatory variable or Fit Y by X with the categorical variable as the explanatory variable. • After using the Fit Y by X command, click the red triangle next to Oneway Analysis and then Display Options, Boxplots to see side by side boxplots and click Mean/ANOVA to see means of the different groups and the test of whether all groups have the same means. This test of whether all groups have the same means has pvalue Prob>F in the ANOVA table. Oneway Analysis of Voltage Level By Condition 450 Voltage Level 400 350 300 250 200 150 100 Proximity Remote Touch-Proximity Voice-Feedback Condition Oneway Anova Summary of Fit Rsquare Adj Rsquare Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.175756 0.159906 114.6949 338.25 160 Analysis of Variance Source Condition Error C. Total DF 3 156 159 Sum of Squares 437591.3 2052168.8 2489760.0 Mean Square 145864 13155 F Ratio 11.0881 Prob > F <.0001 Means for Oneway Anova Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 312.000 405.000 268.125 367.875 Std Error 18.135 18.135 18.135 18.135 Lower 95% 276.18 369.18 232.30 332.05 Upper 95% 347.82 440.82 303.95 403.70 Prob>F = p-value for test that all groups have same mean. Same as p-value for Effect test in Fit Model Output. Two Key Questions 1. Is there any difference among the mean voltage levels of the four conditions? Yes, there is strong evidence of a difference. p-value of Effect Test < 0.0001. 2. If there are differences, what conditions specifically are different? Testing whether each of the groups is different • Naïve approach to deciding which groups have mean that is different from the average of the means of all groups: Do ttest for each group and look for groups that have p-value <0.05. • Problem: Multiple comparisons. Finding pairs that are significantly different: Naive approach: Compare each group using a custom t-test and reject the null hypothesis that the means of both groups in the pair are same if the p-value of a two-sided t-test is less than 0.05. This can be done automatically by using Fit Y by X and then clicking Compare Means and clicking Each Pair, Student’s t. Oneway Analysis of Voltage Level By Condition Means Comparisons Comparisons for each pair using Student's t Mean Level Remote Voice-Feedback Proximity Touch-Proximity A A B B 405.00000 367.87500 312.00000 268.12500 Levels not connected by same letter are significantly different Level - Level Difference Remote Touch-Proximity 136.8750 Voice-Feedback Touch-Proximity 99.7500 Remote Proximity 93.0000 Voice-Feedback Proximity 55.8750 Proximity Touch-Proximity 43.8750 Remote Voice-Feedback 37.1250 Lower CL 86.2157 49.0907 42.3407 5.2157 -6.7843 -13.5343 Upper CL 187.5343 150.4093 143.6593 106.5343 94.5343 87.7843 p-Value Difference 3.2771e-7 0.0001484 0.0003890 0.0308583 0.0891141 0.1497462 Significantly different pairs: Remote and Proximity, Remote and Touch-Proximity, Voice-Feedback and Proximity, Voice-Feedback and Touch-Proximity. Errors in Hypothesis Testing State of World Decision Based on Data Null Hypothesis True Accept Null Correct Hypothesis Decision Alternative Hypothesis True Type II error Reject Null Type I Hypothesis errror Correct Decision When we do one hypothesis test and reject null hypothesis if p-value <0.05, then the probability of making a Type I error when the null hypothesis is true is 0.05. We protect against falsely rejecting a null hypothesis by making probability of Type I error small. Multiple Comparisons Problem • Compound uncertainty: When doing more than one test, there is an increase chance of making a mistake. • If we do multiple hypothesis tests and use the rule of rejecting the null hypothesis in each test if the p-value is <0.05, then if all the null hypotheses are true, the probability of falsely rejecting at least one null hypothesis is >0.05. Multiple Comparisons Simulation • In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. • The observations for each group are simulated from a standard normal distribution. Thus, in fact, 1 2 20 0 • Number of pairs found to have significantly different means using t-test at level 0.05 2 3 4 5 • Iterat 1 ion # of Pairs Multiple Comparison Simulation • In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. • The observations for each group are simulated from a standard normal distribution. 1 2 20 0 Thus, in fact, • Number of groups found to have means different than average using t-test and rejecting if p-value <0.05. Iteration 1 2 3 4 5 # of Groups Individual vs. Familywise Error Rate • When several tests are considered simultaneously, they constitute a family of tests. • Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. • Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true. • When we consider a family of tests, we want to make the familywise error rate small, say 0.05, to protect against falsely rejecting a null hypothesis. Bonferroni Method • General method for doing multiple comparisons for any family of k tests. • Denote familywise type I error rate we want by p*, say p*=0.05. • Compute p-values for each individual test -p1,..., pk p* • Reject null hypothesis for ith test if pi k • Guarantees that familywise type I error rate is at most p*. • Why Bonferroni works: If we do k tests and all null hypotheses are true , then using Bonferroni with p*=0.05, we have probability 0.05/k to make a Type I error for each test and expect to make k*(0.05/k)=0.05 errors in total. Tukey’s HSD • Tukey’s HSD is a method that is specifically designed to control the familywise type I error rate (at 0.05) for analysis of variance. • After Fit Model, click the red triangle next to the X variable and click LSMeans Tukey HSD. LSMeans Differences Tukey HSD Alpha= 0.050 Q= 2.59695LSMean[i] By LSMean[j] Mean[i]-Mean[j] Std Err Dif Lower CL Dif Upper CL Dif Proximity Remote Touch-Proximity Voice-Feedback Level Remote Voice-Feedback Proximity Touch-Proximity A A B B C C Proximity Remote Touch-Proximity Voice-Feedback 0 0 0 0 93 25.6466 26.3972 159.603 -43.875 25.6466 -110.48 22.7278 55.875 25.6466 -10.728 122.478 -93 25.6466 -159.6 -26.397 0 0 0 0 -136.88 25.6466 -203.48 -70.272 -37.125 25.6466 -103.73 29.4778 43.875 25.6466 -22.728 110.478 136.875 25.6466 70.2722 203.478 0 0 0 0 99.75 25.6466 33.1472 166.353 -55.875 25.6466 -122.48 10.7278 37.125 25.6466 -29.478 103.728 -99.75 25.6466 -166.35 -33.147 0 0 0 0 Least Sq Mean 405.00000 367.87500 312.00000 268.12500 Levels not connected by same letter are significantly different Comparisons between groups that are in red are groups for which the null hypothesis that the group means are the same is rejected using the Tukey HSD procedure, which controls the familywise Type I error rate at 0.05. A confidence interval for the difference in group means that adjusts for multiple comparisons is shown in the third and fourth lines. Assumptions in one-way ANOVA • Assumptions needed for validity of oneway analysis of variance p-values and CIs: – Linearity: automatically satisfied. – Constant variance: Spread within each group is the same. – Normality: Distribution within each group is normally distributed. – Independence: Sample consists of independent observations. Rule of thumb for checking constant variance • Constant variance: Look at standard deviation of different groups by using Fit Y by X and clicking Means and Std Dev. Means and Std Deviations Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 312.000 405.000 268.125 367.875 Std Dev 129.979 63.640 131.874 119.518 Std Err Mean 20.552 10.062 20.851 18.897 • Rule of Thumb: Check whether (highest group standard deviation/lowest group standard deviation) is greater than 2. If greater than 2, then constant variance is not reasonable and transformation should be considered.. If less than 2, then constant variance is reasonable. • (Highest group standard deviation/lowest group standard deviation) =(131.874/63.640)=2.07. Thus, constant variance is not reasonable for Milgram’s data. Transformations to correct for nonconstant variance • If standard deviation is highest for high groups with high means, try transforming Y to log Y or Y . If standard deviation is highest for groups with low means, try transforming Y to Y2. Means and Std Deviations Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 312.000 405.000 268.125 367.875 Std Dev 129.979 63.640 131.874 119.518 Std Err Mean 20.552 10.062 20.851 18.897 • SD is particularly low for group with highest mean. Try transforming to Y2. To make the transformation, right click in new column, click New Column and then right click again in the created column and click Formula and enter the appropriate formula for the transformation. Transformation of Milgram’s data to Squared Voltage Level Means and Std Deviations Level Proximity Remote Touch-Proximity Voice-Feedback Number 40 40 40 40 Mean 113816 167974 88847 149259 Std Dev 78920.2 48541.4 79291.3 74053.6 Std Err Mean 12478 7675 12537 11709 • Check of constant variance for transformed data: (Highest group standard deviation/lowest group standard deviation) = 1.63. Constant variance assumption is reasonable for voltage squared. • Analysis of variance tests are approximately valid for voltage squared data; reanalyzed data using voltage squared. Analysis using Voltage Squared Strong evidence that the group mean voltage squared levels are not all the same. Response Voltage Squared Effect Tests Source Condition Nparm 3 DF 3 Sum of Squares 1.50737e11 F Ratio 9.8735 Prob > F <.0001 Effect Test Gives Strong Evidence That Not All Conditions Have the Same Mean Voltage. Oneway Analysis of Voltage Squared By Condition Comparisons for all pairs using Tukey-Kramer HSD Level Remote Voice-Feedback Proximity Touch-Proximity A A B B C C Mean 167973.75 149259.38 113816.25 88846.88 Levels not connected by same letter are significantly different Level - Level Difference Remote Touch-Proximity 79126.88 Voice-Feedback Touch-Proximity 60412.50 Remote Proximity 54157.50 Voice-Feedback Proximity 35443.13 Proximity Touch-Proximity 24969.38 Remote Voice-Feedback 18714.38 Lower CL 37701.9 18987.6 12732.6 -5981.8 -16455.6 -22710.6 Upper CL Difference 120551.8 101837.4 95582.4 76868.1 66394.3 60139.3 Strong evidence that remote has higher mean voltage squared level than proximity and touch-proximity and that voice-feedback has higher mean voltage squared level than touch-proximity, taking into account the multiple comparisons. Rule of Thumb for Checking Normality in ANOVA • The normality assumption for ANOVA is that the distribution in each group is normal. Can be checked by looking at the boxplot, histogram and normal quantile plot for each group. • If there are more than 30 observations in each group, then the normality assumption is not important; ANOVA p-values and CIs will still be approximately valid even for nonnormal data if there are more than 30 observations in each group. • If there are less than 30 observations per group, then we can check normality by clicking Analyze, Distribution and then putting the Y variable in the Y, Columns box and the categorical variable denoting the group in the By box. We can then create normal quantile plots for each group and check that for each group, the points in the normal quantile plot are in the confidence bands. If there is nonnormality, we can try to use a transformation such as log Y and see if the transformed data is approximately normally distributed in each group. One way Analysis of Variance: Steps in Analysis 1. Check assumptions (constant variance, normality, independence). If constant variance is violated, try transformations. 2. Use the effect test (commonly called the Ftest) to test whether all group means are the same. 3. If it is found that at least two group means differ from the effect test, use Tukey’s HSD procedure to investigate which groups are different, taking into account the fact multiple comparisons are being done.