Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 10: Statistics for experimental design . 10 2 How statistical decisions are made using hypothesis testing This topic guide will look at how statistical decisions are made using hypothesis testing. A statistical hypothesis test is used to make decisions using data from a scientific study. A result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold probability, the significance level. You will gain an understanding of how hypothesis testing is used in experimental design including the null hypothesis, significance level, type I and type II errors, one-tailed tests, twotailed tests, the power of a test and estimation of sample size. You will then look at the differences between parametric and non-parametric models of analysis. On successful completion of this topic you will: •• understand how statistical decisions are made using hypothesis testing (LO2). To achieve a Pass in this unit you need to show that you can: •• assess the use of hypothesis testing in experimental design (2.1) •• illustrate the differences between parametric and non-parametric models of analysis (2.2). 1 Unit 10: Statistics for experimental design 1 Hypothesis testing in experimental design Key terms Null hypothesis (H0): A statement that the parameters involved have no effect on the outcome. Significance level: Threshold for a statistical test that indicates the level of confidence that a hypothesis has been correctly accepted or rejected. The significance level is denoted by the Greek symbol α (alpha). In principle the significance level can take any value, but typically 10% (0.1), 5% (0.05) or 1% (0.01) are used. Statistical hypothesis tests of significance are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance. In statistics, the word significant does not mean large or important; with a sufficiently large sample size, a statistically significant effect may be small in magnitude. The significance level is denoted by the Greek symbol α (alpha). In principle the significance level can take any value, but typically 10% (0.1), 5% (0.05) or 1% (0.01) are used. If a test of significance gives a p-value lower than or equal to the significance level α, the null hypothesis is rejected. In this case the results are said to be statistically significant. In this type of study the null hypothesis is that the results occurred by random variation. Test requirements Conditions that must be met in tests of significance for deciding whether or not to reject the null hypothesis are as follows. •• Hypotheses that are true shall be rejected only very occasionally, and the probability of rejection can be chosen by the experimenter. •• Hypotheses that are false shall be rejected as often as possible. The failures of a test to fulfil these conditions are known as type I and type II errors: •• a type I error is a false positive, or the rejection of the null hypothesis when it is, in fact, true •• a type II error is a false negative, or the acceptance of the null hypothesis when it is false. The power of a statistical test is the probability that the test will commit a type II error. Power analysis can be used to calculate the minimum sample size required to detect an effect of a given size, or to calculate the minimum effect that can be detected using a given sample size. In a complex problem, such as the response of the human body to a new drug, sample sizes in the thousands are typically required to be regarded as statistically significant. In a simple system, such as testing if a die roll or coin toss is fair (each outcome equally likely) a sample size of 100 would be enough to accept or reject a hypothesis with a high level of confidence. Statistical power is also used to compare different statistical testing procedures: for example, between a parametric and a non-parametric test of the same hypothesis. Statistical power depends on many factors. This almost always includes the following three factors: •• the statistical significance criterion used in the test •• the magnitude of the effect of interest in the population •• the sample size used to detect the effect. 10.2: How statistical decisions are made using hypothesis testing 2 Unit 10: Statistics for experimental design Key terms Alternative hypothesis (H1): An alternative statement to the null hypothesis, that the parameters involved have a measurable effect on the outcome. Parametric statistics: Analysis that assumes that the data has come from a specific type of probability distribution and makes inferences about the parameters of the distribution. Non-parametric statistics: Analysis that makes no assumptions about the specific type of probability distribution of the sample population. One- and two-tailed tests To determine if there is a statistically meaningful difference between the observations of two samples we use the t-test. The calculation of the t-distribution is based on the null hypothesis (H0). The manner in which the t-test is applied also depends on the nature of the alternative hypothesis (H1). If the alternative hypothesis is of the form: H1: A < B then the mean value of B can only be greater than the mean of A and a onetailed test is needed. For a test of significance at 5%, there is a 5% chance that B is greater than A due to random variation. The possibility of B being less than A is not considered. Similar reasoning can be applied for: H1: A > B where the mean value of B is now always less than that of A. However, if the alternative hypothesis is of the form: H1: A ≠ B then the mean value of B could be either higher or lower than the mean of A and a two-tailed test is needed. For a test of significance at 5%, there is a 2.5% chance that B is less than A plus a 2.5% chance that B is greater than A due to random variation. 2 Parametric and non-parametric methods In parametric statistics it is assumed that the data has come from a type of probability distribution and makes inferences about the parameters of the distribution. In general, parametric methods make more assumptions than nonparametric methods. If those extra assumptions are correct, parametric methods can produce more accurate and precise estimates. For this reason they are described as having more statistical power. However, if assumptions made in the parametric analysis are incorrect then these methods can be very misleading. The concept of robustness refers to the likelihood of getting a misleading result, and parametric methods are less robust than non-parametric alternatives. In selection of method there is a trade-off to be made of simplicity and power versus robustness. Which is more appropriate depends on the specifics of the phenomenon being studied. Non-parametric statistics techniques do not rely on data belonging to any particular distribution. Sometimes these are called distribution-free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. Sometimes in a complex system, individual variables are assumed to be parametric but not the connection between variables. Examples here include nonparametric regression and non-parametric hierarchical Bayesian models. 10.2: How statistical decisions are made using hypothesis testing 3 Unit 10: Statistics for experimental design Further reading Boslaugh, S. (2012) Statistics in a Nutshell, O’Reilly Media Ellison, S. et al. (2009) Practical Statistics for the Analytical Scientist, RSC Larsen, R. and Fox Stroup, D. (1976) Statistics in the Real World, Macmillan Miller, J. and Miller, J. (2010) Statistics and Chemometrics for Analytical Chemistry, Prentice Hall Samuels, M. et al. (2010) Statistics for the Life Sciences, Pearson Swartz, M. and Krull, I. (2012) Handbook of Analytical Validation, CRC Press Statistical calculators online: http://www.danielsoper.com/statcalc3/ http://www.measuringusability.com/calc.php. Checklist At the end of this topic guide, you should be familiar with the following ideas: hypothesis testing is used in experimental design including the null hypothesis, significance level, type I and type II errors, one-tailed tests, two-tailed tests, the power of a test and estimation of sample size the differences between parametric and non-parametric models of analysis. You should: understand how statistical decisions are made using hypothesis testing be able to assess the use of hypothesis testing in experimental design (2.1) be able to describe the differences between parametric and non-parametric models of analysis (2.2). Acknowledgements The publisher would like to thank the following for their kind permission to reproduce their photographs: Shutterstock.com: Sofiaworld Every effort has been made to trace the copyright holders and we apologise in advance for any unintentional omissions. We would be pleased to insert the appropriate acknowledgement in any subsequent edition of this publication. 10.2: How statistical decisions are made using hypothesis testing 4