Download Transcript

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia, lookup

Time series wikipedia, lookup

Slide 1
Confounding and the Language of Experimentation, Part III – Statistical Significance
Slide 2
This video is designed to accompany pages 13-18 of the workbook “Making Sense of Uncertainty:
Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company
Slide 3
Experimental data are among the purest data that can be collected because of the control that is
designed into a well-thought out experiment. But even if confounding is not an issue affecting
inferences from experimental data, there is still a hurdle to overcome. Let’s look at an example.
Slide 4
Flibanserin was originally created as a depression drug. While it proved unsuccessful for treatment of
depression, researchers noticed it seemed to have a positive effect on female sex drive. Subsequently,
studies with Flibanserin were conducted involving pre-menopausal women with generalized acquired
Hypoactive Sexual Desire Disorder.
Slide 5
The studies were mostly well-designed experiments. One key study involved 1,378 women who were
randomized to one of two treatments – either they received Flibanserin or a placebo. All the women
were required to keep a record of whether they had sex, and if they did, whether it was satisfying in
their view. The participants were screened for depression and other medical problems, eliminating a
series of possible confounding variables.
Slide 6
Here’s what happened. Women in the Flibanserin group and the placebo group had very similar
baseline (pre-treatment) values for the average number of sexually satisfying events. At the end of the
study period the Flibanserin women reported an average of 4.5 sexually satisfying events, while women
in the placebo group reported an average of about 3.7 sexually satisfying events.
The FDA wanted to know about this difference, but they also wanted to know something much more
technical. They wanted to know how likely that difference between 4.5 and 3.7 was to have happened
by chance alone.
Slide 7
The FDA needed to know if the results of the experiment were statistically significant. A useful, nontechnical definition of statistical significance is shown here. “Statistical Significance” is a difference
between treatments that is large enough to have been unlikely to have happened by chance.”
This is an unapologetically probabilistic statement. In many respects it is the phrase that consumers of
statistical inference encounter most often.
What about the Flibanserin study? Indeed, the difference between 4.5 and 3.7 was judged to be
statistically significant. That is, the experiment showed a “statistically significant” difference between
the Flibanserin-treated patient group, compared to the placebo group. This was a very important
hurdle to get over for the drug company.
This story ended in an unusual fashion, however. The practical effect of Flibanserin was small, and there
were possible side effects. So on June 18, 2010, a federal advisory panel to the FDA unanimously voted
against recommending approval in spite of the statistical significance.
Slide 8
This concludes our video on the idea of statistical significance and experimentation. Remember,
credible inferences from experimental data have to ultimately be held to a mathematically formal
standard of statistical significance in order to assess whether the results were likely to have occurred by