* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Slide 1 Confounding and the Language of Experimentation, Part III – Statistical Significance Slide 2 This video is designed to accompany pages 13-18 of the workbook “Making Sense of Uncertainty: Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company Slide 3 Experimental data are among the purest data that can be collected because of the control that is designed into a well-thought out experiment. But even if confounding is not an issue affecting inferences from experimental data, there is still a hurdle to overcome. Let’s look at an example. Slide 4 Flibanserin was originally created as a depression drug. While it proved unsuccessful for treatment of depression, researchers noticed it seemed to have a positive effect on female sex drive. Subsequently, studies with Flibanserin were conducted involving pre-menopausal women with generalized acquired Hypoactive Sexual Desire Disorder. Slide 5 The studies were mostly well-designed experiments. One key study involved 1,378 women who were randomized to one of two treatments – either they received Flibanserin or a placebo. All the women were required to keep a record of whether they had sex, and if they did, whether it was satisfying in their view. The participants were screened for depression and other medical problems, eliminating a series of possible confounding variables. Slide 6 Here’s what happened. Women in the Flibanserin group and the placebo group had very similar baseline (pre-treatment) values for the average number of sexually satisfying events. At the end of the study period the Flibanserin women reported an average of 4.5 sexually satisfying events, while women in the placebo group reported an average of about 3.7 sexually satisfying events. The FDA wanted to know about this difference, but they also wanted to know something much more technical. They wanted to know how likely that difference between 4.5 and 3.7 was to have happened by chance alone. Slide 7 The FDA needed to know if the results of the experiment were statistically significant. A useful, nontechnical definition of statistical significance is shown here. “Statistical Significance” is a difference between treatments that is large enough to have been unlikely to have happened by chance.” This is an unapologetically probabilistic statement. In many respects it is the phrase that consumers of statistical inference encounter most often. What about the Flibanserin study? Indeed, the difference between 4.5 and 3.7 was judged to be statistically significant. That is, the experiment showed a “statistically significant” difference between the Flibanserin-treated patient group, compared to the placebo group. This was a very important hurdle to get over for the drug company. This story ended in an unusual fashion, however. The practical effect of Flibanserin was small, and there were possible side effects. So on June 18, 2010, a federal advisory panel to the FDA unanimously voted against recommending approval in spite of the statistical significance. Slide 8 This concludes our video on the idea of statistical significance and experimentation. Remember, credible inferences from experimental data have to ultimately be held to a mathematically formal standard of statistical significance in order to assess whether the results were likely to have occurred by chance.