Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Center for Biofilm Engineering Statistical design & analysis for assessing the efficacy of instructional modules Marty Hamilton Professor Emeritus of Statistics Montana State University CS 580 April 24, 2006 Why Statistics? Provide convincing results Improve communication “...I do not mean to suggest that computers eliminate stupidity---they may in fact encourage it.” Robert P. Abelson, in Statistics as Principled Argument (cited on Rocky Ross’s CS 580 home page) What is Statistics? Data Design Uncertainty assessment Statistical Thinking Data Design Uncertainty assessment Data: Choosing the quantity to measure Reliable test of knowledge Quantitative response Statistical thinking Data Design Uncertainty assessment After-treatment score A student used the modules, then scored 80% on the test Conclusion: modules have high efficacy Data: Choosing the quantity to measure Reliable tests of knowledge: before-treatment test after-treatment test Quantitative response: difference in test scores, after-treatment minus before-treatment After-treatment score Test score High Low After Before- and after-treatment scores Test score High Response Low Before After Difference between beforeand after-treatment scores A student used the modules, then scored 50 points higher on the aftertreatment test than on the before treatment test (Response = 50). Conclusion: modules have high efficacy Anticipating criticism: “natural” improvement Test score High without the treatment Response Low Before After Anticipating criticism Before/after observations for just the “treated” student may not accurately represent the treatment effect May need treated and untreated students (i.e., a control) Control or comparison The control can be either a negative control (placebo) or positive control (best conventional) A student taking a conventional classroom lecture/recitation course would provide a positive control or comparison Difference (after – before) Difference scores for each of 12 students, 6 per group 100 Of practical importance? 0 Control group Treated group Study design Before and after test scores for each student in both the treated and control groups Good study design Control or comparison Replication Randomization Anticipate criticism Data: 20 students per group (randomly assigned?) Treatment C C C C C C C C C C C C C C C C C C C C Response -28.5096 34.7186 -3.3184 -13.9297 -5.7949 29.0260 15.4682 29.1025 -10.8522 -18.7876 -3.1457 5.4531 -9.3185 1.2575 -11.5470 -17.6932 5.5314 6.7628 -10.8001 18.3930 Treatment T T T T T T T T T T T T T T T T T T T T Response 53.4115 75.9697 8.3348 33.3584 42.5355 58.2345 47.9143 58.6826 48.3604 68.2412 91.1052 42.8328 48.9096 67.1174 39.2733 68.9961 52.2039 39.2210 31.1658 36.4764 Analysis via Minitab 14 .Minitab: FirstStudy_CS580.MTW Show data layout ... matrix Stat > Basic Statistics > Display Descriptive Statistics ... Ask for individual value plot Stat > Basic Statistics > 2 Sample t ... Minitab output Two-Sample T-Test and CI: Response, Treatment Two-sample T for Response Treatment N Mean StDev SE Mean C 20 0.6 17.4 3.9 T 20 50.6 18.4 4.1 Difference = mu (C) - mu (T) Estimate for difference: -50.0164 95% CI for difference: (-61.4656, -38.5672) T-Test of difference = 0 (vs not =): T-Value = -8.84 P-Value = 0.000 DF = 38 Both use Pooled StDev = 17.8846 Null hypothesis: true mean response for Treatment = true mean response for Control Conclusions: 1. Reject the null hypothesis because it is discredited by the data (p-value < 0.001) 2. 95% confident that the treatment mean response is between 38.6 and 61.5 larger than the true control mean response 3. Is this efficacy repeatable? Analysis via Minitab 14 (more) FirstStudy_CS580 100 80 Response 60 40 20 0 -20 -40 C T Treatment Analysis via Minitab 14 (more) Minitab: SixStudies_CS580.MTW Show data layout ... matrix Stat > Tables > Descriptive Statistics Minitab output Tabulated statistics: Replicate, Treatment Rows: Replicate Columns: Treatment C T All 1 0.60 50.62 25.61 20 20 40 2 0.07 62.94 31.50 20 20 40 3 5.09 51.46 28.27 20 20 40 4 13.29 58.99 36.14 20 20 40 5 6.85 41.45 24.15 20 20 40 6 16.05 51.59 33.82 20 20 40 All 6.99 52.84 29.92 120 120 240 Cell Contents: Response: Mean Count Analysis via Minitab 14 (more) SixStudies_CS580 C 1 T 2 3 100 Response 50 0 4 5 -50 6 100 50 0 -50 C T C Treatment Panel variable: Replicate T Analysis via Minitab 14 (more) Stat > ANOVA > General Linear Model ... Minitab output General Linear Model: Response versus Treatment, Replicate Factor Type Levels Values Treatment fixed 2 C, T Replicate(Treatment) random 12 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 Analysis of Variance for Response, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS Treatment 1 126120 126120 126120 Replicate(Treatment) 10 9840 9840 984 Error 228 88786 88786 389 Total 239 224746 S = 19.7335 F 128.16 2.53 P 0.000 0.007 Variance Components, using Adjusted SS Estimated Source Value Replicate(Treatment) 29.73 Variance among replicate studies Error 389.41 Variance among students in same study and treatment ---------added by Marty ---------Total variance 419.14 Repeatability Standard Deviation = 20.5 (single student) Repeatability Standard Deviation = 9.9 (mean of 20 treated students minus mean of 20 control students) Stat > Basic Statistics > Normality Test... of residuals provides an evaluation of key statistical assumption underlying the ANOVA Analysis via Minitab 14 (more) Data copied from Tables output and pasted into the worksheet: Rep CntrlMean TrtMean Mean (Treatment minus Control) 1 0.60 50.62 50.02 2 0.07 62.94 62.87 3 5.09 51.46 46.37 4 13.29 58.99 45.70 5 6.85 41.45 34.60 6 16.05 51.59 35.54 Stat > Basic Statistics > 1 sample t ... analysis of 6 Means Conclusions: 1. Reject the null hypothesis because it is discredited by the data (p-value < 0.001) 2. Estimated difference in mean responses = 45.9 3. 95% confident that the treatment mean response is between 36.9 and 54.9 larger than the true control mean response 4. 95% confident that the treatment mean response is at least 38.6 larger than the true control mean response 5. The efficacy measure is repeatable Note: this straightforward analysis of the six means, one mean for each of the 6 repeated studies, using a 1-sample t-test provides nearly the same results as does the ANOVA variance component analysis approach. Trade-offs: What is the main source of variability? It is often more important to repeat the study than to expend time and materials finding a precise efficacy estimate for a single study. Fin