Download IQL Chapter 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
IQL Chapter 9 – Hypothesis Testing
Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition
9.1 Fundamentals of Hypothesis Testing
LEARNING GOAL
Understand the goal of hypothesis testing and the basic structure of a hypothesis test, including how to
set up the null and alternative hypotheses, how to determine the possible outcomes of a hypothesis
test, and how to decide between these possible outcomes.
In statistics, questions are answered through hypothesis testing.
FORMULATING THE HYPOTHESIS
Null and Alternative Hypotheses
The null hypothesis, or H0, is the starting assumption for a hypothesis test. For the types of hypothesis tests in
this chapter, the null hypothesis always claims a specific value for a population parameter and therefore takes
the form of an equality:
H0 (null hypothesis):
population parameter = claimed value
The alternative hypothesis, or Ha, is a claim that the population parameter has a value that differs from the
value claimed in the null hypothesis. It may take one of the following forms:
(left-tailed) Ha: population parameter < claimed value
(right-tailed) Ha: population parameter > claimed value
(two-tailed) Ha: population parameter ≠ claimed value
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 1
POSSIBLE OUTCOMES OF A HYPOTHESIS TEST
A hypothesis test always begins with the assumption that the null hypothesis is true. Then tests are
conducted to determine whether the data gives any reason to think otherwise.
Two Possible Outcomes of a Hypothesis Test
There are two possible outcomes to a hypothesis test:
1. Reject the null hypothesis, H0, in which case we have evidence in support of the alternative
hypothesis.
2. Not reject the null hypothesis, H0, in which case we do not have enough evidence to support the
alternative hypothesis.
DRAWING A CONCLUSION FROM A HPYOTHESIS TEST
STATISTICAL SIGNIFICANCE
The idea of statistical significance was introduced in Section 6.1, statistical significance is defined as: a
measure of the likelihood that a result is meaningful. And we say that a statistically significant result is
a result in a study that is unlikely to have occurred by chance. The most commonly quoted levels of
statistical significance are the 0.05 level (the probability of a result’s having occurred by chance is 5% or
less than 1 in 20) and the 0.01 level (the probability of the result’s having occurred by chance is 1% less,
or less than 1 in 100).
Hypothesis Test Decisions Based on Levels of Statistical Significance
We decide the outcome of a hypothesis test by comparing the actual sample result (mean or
proportion) to the result expected if the null hypothesis is true. We must choose a significance level
for the decision.
•
If the chance of a sample result at least as extreme as the observed result is less than 1 in
100 (or 0.01), then the test is statistically significant at the 0.01 level and offers strong
evidence for rejecting the null hypothesis.
•
If the chance of a sample result at least as extreme as the observed result is less than 1 in 20
(or 0.05), then the test is statistically significant at the 0.05 level and offers moderate
evidence for rejecting the null hypothesis.
•
If the chance of a sample result at least as extreme as the observed result is greater than the
chosen level of significance (0.05 or 0.01), then we do not reject the null hypothesis.
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 2
P – Values
A P – Value in a hypothesis test is the probability of selecting a sample at leaste as extreme as the
observed sample, assuming that the null hypothesis is true.
Hypothesis Test Decisions Based on P-Values
The P-value (probability value) for a hypothesis test of a claim about a population parameter is the
probability of selecting a sample at least as extreme as the observed sample, assuming that the null
hypothesis is true:
•
A small P-value (such as less than or equal to 0.05) indicates that the sample result is unlikely,
and therefore provides reason to reject the null hypothesis.
•
A large P-value (such as greater than 0.05) indicates that the sample result could easily occur
by chance, so we cannot reject the null hypothesis
PUTTING IT ALL TOGETHER
After you have the basic premise of hypothesis testing with the exception of the calculations.
The Hypothesis Test Process
Step 1. Formulate the null and alternative hypotheses, each of which must make a claim about a
population parameter, such as a population mean (μ) or a population proportion (p); be sure
this is done before drawing a sample or collecting data. Based on the form of the alternative
hypothesis, decide whether you will need a left-, right-, or two-tailed hypothesis test.
Step 2. Draw a sample from the population and measure the sample statistics, including the sample
size (n) and the relevant sample statistic, such as the sample mean (x) or sample proportion
(p).
Step 3. Determine the likelihood of observing a sample statistic (mean or proportion) at least as
extreme as the one you found under the assumption that the null hypothesis is true. The
precise probability of such an observation is the P-value (probability value) for your sample
result.
Step 4. Decide whether to reject or not reject the null hypothesis, based on your chosen level of
significance (usually 0.05 or 0.01, but other significance levels are sometimes used).
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 3
9.2 Hypothesis Tests for Population Means
ONE – TAILED HYPOTHESIS TESTS
LEARNING GOAL
Understand and interpret one- and two-tailed hypothesis tests for claims made about population
means, and learn to recognize and avoid common errors (type I and type II errors) in hypothesis tests.
Computing the Standard Score for the Sample Mean in a Hypothesis Test
When we draw a random sample for a hypothesis test, we can consider it to be one of many possible
samples in the sampling distribution. Given the sample size (n), the sample mean
the population
standard deviation (σ), and the claimed population mean (μ), we make the following computations:
standard deviation for the distribution of sample means =
σ
n
standard score for the sample mean, z = x – 
/ n
Note: In reality, it is rare that we know the population standard deviation σ; see Section 10.1 about
how to deal with such cases.
Critical Values of Statistical Significance
A hypothesis test is significant at the 0.05 level if the probability of finding a result is extreme as the one
actually observed is 0.05 of less (assuming the null hypothesis is true).
Decisions Based on Statistical Significance for One-Tailed Hypothesis Tests
We decide whether to reject or not reject the null hypothesis by comparing the standard score (z) for a
sample mean to critical values for significance at a given level. Table 9.1 (next slide) summarizes the
decisions for one-tailed hypothesis tests at the 0.05 and 0.01 levels of significance.
Critical Values: The critical value(s) for a hypothesis test is a threshold to which the value of the test
statistic in a sample is compared to determine whether or not the null hypothesis is rejected.
The critical value for any hypothesis test depends on the significance level at which the test is carried
out, and whether the test is one-sided or two-sided.
http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.html#critval
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 4
TWO – TAILED TESTS
The same basic ideas apply to two-tailed hypothesis tests in which the alternative hypothesis has the
“not equal to” form of Ha: μ = claimed value.
For two-tailed tests a value “as extreme as the one actually found” can lie either on the left or on the
right of the sampling distribution (Figure 9.5 in the next slide). A probability of 0.05, or 5%, therefore
corresponds to standard scores either in
the first 2.5% of the sampling distribution on the left or in the last 2.5% on the right.
From Appendix A, the 2.5th percentile corresponds to a standard score of -1.96 and the 97.5th percentile
corresponds to a standard score of 1.96. These standard scores become the critical values for two-tailed
tests to be significant at the 0.05 level.
Two-Tailed Test (Ha: μ ≠ claimed value)
Statistical significance: A two-tailed test is significant at the 0.05 level if the standard score of the sample
mean is at or below a critical value of -1.96 or at or above a critical value of 1.96. For significance at the
0.01 level, the critical values are -2.575 and 2.575.
P-values: To find the P-value for a sample mean in a two-tailed test, first use the standard score of the
sample mean to find the P-value assuming the test is one-tailed; then double this value to find the P–value
for the two-tailed test.
COMMON ERRORS IN HYPOTHESIS TESTING
Type I and Type II errors
•
An error in which H0 is wrongly rejected, is called a type I error.
•
An error in which we wrongly fail to reject H0, is called a type II error.
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 5
9.3 Hypothesis Tests for Population Proportions
LEARNING GOAL
Understand and interpret hypothesis tests for claims made about population proportions
CALCULATIONS FOR HYPOTHESIS TESTS WITH PROPORTIONS
How do we determine whether there is enough evidence in the sample to reject the null hypothesis?
We will use 4-steps of the hypothesis test process on page 376.
Step 1 is to formulate the hypotheses
H0: p = 0.5 (more than 50% of voters favor the candidate)
Ha: p > 0.5 (more than 50% of voters favor the candidate)
Step 2 is to collect the sample data.ulations for Hypothesis Tests with Proportions
p
Step 3 is to determine the likelihood that the sample result could have arisen by chance if the null
hypothesis is true.
Under the starting assumption that the null hypothesis is true (that the proportion of people in the
population who favor the candidate is 0.5), the peak of this distribution will be the population proportion
claimed by the null hypothesis, p = 0.5.
The standard deviation of distribution of sample proportions is
p(1  p)
0.5(1  0.5)

 0.025
n
400
Standard Score for the Sample Proportion in a Hypothesis Test
Given the sample size (n), the sample proportion ( ), and the claimed population proportion (p),
the standard score for the sample proportion is
z
IQL – CHAPTER 9: HYPOTHSIS TESTING
pˆ  p
p(1  p)/n
Page 6
SIGNIFICANCE LEVELS AND P – VALUES
Statistical significance is a statistical assessment of whether observations reflect a pattern rather than
just chance, the fundamental challenge being that any partial picture is subject to observational error. In
statistical testing, a result is deemed statistically significant if it is unlikely to have occurred by chance,
and hence provides enough evidence to reject the hypothesis of 'no effect'. As used in statistics,
significant does not mean important or meaningful, as it does in everyday speech.
The amount of evidence required to accept that an event is unlikely to have arisen by chance is known
as the significance level or critical p-value: in traditional Fisherian statistical hypothesis testing, the pvalue is the probability of observing data at least as extreme as that observed, given that the null
hypothesis is true. If the obtained p-value is small then it can be said that either the null hypothesis is
false or an unusual event has occurred. p-values do not have any repeat sampling interpretation.[citation
needed][clarification needed]
Research analysts who focus solely on significant results may miss important response patterns which
individually fall under the threshold set for tests of significance. Many researchers[weasel words] urge that
tests of significance should always be accompanied by effect-size statistics, which approximate the size
and thus the practical importance of the difference.
An alternative (but nevertheless related) statistical hypothesis testing framework is the Neyman–
Pearson frequentist school which requires both a null and an alternative hypothesis to be defined and
investigates the repeat sampling properties of the procedure, i.e. the probability that a decision to reject
the null hypothesis will be made when it is in fact true and should not have been rejected (this is called a
"false positive" or Type I error) and the probability that a decision will be made to accept the null
hypothesis when it is in fact false (Type II error). Fisherian p-values are philosophically different from
Neyman–Pearson Type I errors. This confusion[clarification needed] is unfortunately propagated by many
statistics textbooks.[1]
http://en.wikipedia.org/wiki/Statistical_significance
SUMMARY OF HYPOTHESIS TESTS WITH PROPORTIONS
•
Because we are dealing with population proportions, the null hypothesis has the form p =
claimed value. To decide whether to reject or not reject the null hypothesis, we must determine
whether a sample as extreme as the one found in the hypothesis test is likely or unlikely to
occur if the null hypothesis is true.
•
We determine this likelihood from the standard score (z) of the sample proportion, which we
compute from the formula
z
pˆ  p
p(1  p) / n
where n is the sample size, is the sample proportion, and p is the population proportion claimed by the
null hypothesis.
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 7
IQL – CHAPTER 9: HYPOTHSIS TESTING
Page 8