Download How statistical decisions are made using hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Foundations of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Unit 10: Statistics for experimental design
.
10 2
How statistical
decisions are made
using hypothesis testing
This topic guide will look at how statistical decisions are made using
hypothesis testing. A statistical hypothesis test is used to make decisions
using data from a scientific study. A result is called statistically significant if it
is unlikely to have occurred by chance alone, according to a pre-determined
threshold probability, the significance level. You will gain an understanding
of how hypothesis testing is used in experimental design including the null
hypothesis, significance level, type I and type II errors, one-tailed tests, twotailed tests, the power of a test and estimation of sample size.
You will then look at the differences between parametric and non-parametric
models of analysis.
On successful completion of this topic you will:
•• understand how statistical decisions are made using hypothesis testing
(LO2).
To achieve a Pass in this unit you need to show that you can:
•• assess the use of hypothesis testing in experimental design (2.1)
•• illustrate the differences between parametric and non-parametric models
of analysis (2.2).
1
Unit 10: Statistics for experimental design
1 Hypothesis testing in experimental design
Key terms
Null hypothesis (H0): A statement
that the parameters involved have no
effect on the outcome.
Significance level: Threshold for a
statistical test that indicates the level
of confidence that a hypothesis has
been correctly accepted or rejected.
The significance level is denoted
by the Greek symbol α (alpha). In
principle the significance level can
take any value, but typically 10%
(0.1), 5% (0.05) or 1% (0.01) are used.
Statistical hypothesis tests of significance are used in determining what outcomes
of a study would lead to a rejection of the null hypothesis for a pre-specified level
of significance. In statistics, the word significant does not mean large or important;
with a sufficiently large sample size, a statistically significant effect may be small
in magnitude.
The significance level is denoted by the Greek symbol α (alpha). In principle
the significance level can take any value, but typically 10% (0.1), 5% (0.05) or 1%
(0.01) are used. If a test of significance gives a p-value lower than or equal to the
significance level α, the null hypothesis is rejected. In this case the results are said
to be statistically significant. In this type of study the null hypothesis is that the
results occurred by random variation.
Test requirements
Conditions that must be met in tests of significance for deciding whether or not to
reject the null hypothesis are as follows.
•• Hypotheses that are true shall be rejected only very occasionally, and the
probability of rejection can be chosen by the experimenter.
•• Hypotheses that are false shall be rejected as often as possible.
The failures of a test to fulfil these conditions are known as type I and type II errors:
•• a type I error is a false positive, or the rejection of the null hypothesis when it
is, in fact, true
•• a type II error is a false negative, or the acceptance of the null hypothesis
when it is false.
The power of a statistical test is the probability that the test will commit a type II
error. Power analysis can be used to calculate the minimum sample size required
to detect an effect of a given size, or to calculate the minimum effect that can be
detected using a given sample size. In a complex problem, such as the response
of the human body to a new drug, sample sizes in the thousands are typically
required to be regarded as statistically significant. In a simple system, such as
testing if a die roll or coin toss is fair (each outcome equally likely) a sample size
of 100 would be enough to accept or reject a hypothesis with a high level of
confidence. Statistical power is also used to compare different statistical testing
procedures: for example, between a parametric and a non-parametric test of the
same hypothesis.
Statistical power depends on many factors. This almost always includes the
following three factors:
•• the statistical significance criterion used in the test
•• the magnitude of the effect of interest in the population
•• the sample size used to detect the effect.
10.2: How statistical decisions are made using hypothesis testing
2
Unit 10: Statistics for experimental design
Key terms
Alternative hypothesis (H1): An
alternative statement to the null
hypothesis, that the parameters
involved have a measurable effect on
the outcome.
Parametric statistics: Analysis
that assumes that the data has come
from a specific type of probability
distribution and makes inferences
about the parameters of the
distribution.
Non-parametric statistics:
Analysis that makes no assumptions
about the specific type of probability
distribution of the sample
population.
One- and two-tailed tests
To determine if there is a statistically meaningful difference between the
observations of two samples we use the t-test. The calculation of the t-distribution
is based on the null hypothesis (H0). The manner in which the t-test is applied also
depends on the nature of the alternative hypothesis (H1).
If the alternative hypothesis is of the form:
H1: A < B
then the mean value of B can only be greater than the mean of A and a onetailed test is needed. For a test of significance at 5%, there is a 5% chance that B is
greater than A due to random variation. The possibility of B being less than A is not
considered. Similar reasoning can be applied for:
H1: A > B
where the mean value of B is now always less than that of A.
However, if the alternative hypothesis is of the form:
H1: A ≠ B
then the mean value of B could be either higher or lower than the mean of A and
a two-tailed test is needed. For a test of significance at 5%, there is a 2.5% chance
that B is less than A plus a 2.5% chance that B is greater than A due to random
variation.
2 Parametric and non-parametric methods
In parametric statistics it is assumed that the data has come from a type of
probability distribution and makes inferences about the parameters of the
distribution. In general, parametric methods make more assumptions than nonparametric methods. If those extra assumptions are correct, parametric methods
can produce more accurate and precise estimates. For this reason they are
described as having more statistical power. However, if assumptions made in the
parametric analysis are incorrect then these methods can be very misleading.
The concept of robustness refers to the likelihood of getting a misleading result,
and parametric methods are less robust than non-parametric alternatives. In
selection of method there is a trade-off to be made of simplicity and power
versus robustness. Which is more appropriate depends on the specifics of the
phenomenon being studied.
Non-parametric statistics techniques do not rely on data belonging to any
particular distribution. Sometimes these are called distribution-free methods,
which do not rely on assumptions that the data are drawn from a given probability
distribution.
Sometimes in a complex system, individual variables are assumed to be
parametric but not the connection between variables. Examples here include nonparametric regression and non-parametric hierarchical Bayesian models.
10.2: How statistical decisions are made using hypothesis testing
3
Unit 10: Statistics for experimental design
Further reading
Boslaugh, S. (2012) Statistics in a Nutshell, O’Reilly Media
Ellison, S. et al. (2009) Practical Statistics for the Analytical Scientist, RSC
Larsen, R. and Fox Stroup, D. (1976) Statistics in the Real World, Macmillan
Miller, J. and Miller, J. (2010) Statistics and Chemometrics for Analytical Chemistry, Prentice Hall
Samuels, M. et al. (2010) Statistics for the Life Sciences, Pearson
Swartz, M. and Krull, I. (2012) Handbook of Analytical Validation, CRC Press
Statistical calculators online:
http://www.danielsoper.com/statcalc3/
http://www.measuringusability.com/calc.php.
Checklist
At the end of this topic guide, you should be familiar with the following ideas:
hypothesis testing is used in experimental design including the null hypothesis, significance
level, type I and type II errors, one-tailed tests, two-tailed tests, the power of a test and
estimation of sample size

the differences between parametric and non-parametric models of analysis.
You should:

understand how statistical decisions are made using hypothesis testing

be able to assess the use of hypothesis testing in experimental design (2.1)
be able to describe the differences between parametric and non-parametric models of
analysis (2.2).
Acknowledgements
The publisher would like to thank the following for their kind permission to reproduce their
photographs:
Shutterstock.com: Sofiaworld
Every effort has been made to trace the copyright holders and we apologise in advance for any
unintentional omissions. We would be pleased to insert the appropriate acknowledgement in any
subsequent edition of this publication.
10.2: How statistical decisions are made using hypothesis testing
4