Download How to Conduct a Hypothesis Test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
How to Conduct a Hypothesis Test
The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We
must ask, is the event due to chance alone, or is there some cause that we should be looking for? We need to
have a way to differentiate between events that easily occur by chance and those that are highly unlikely to
occur randomly. Such a method should be streamlined and well defined so that others can replicate our
statistical experiments.
There are a few different methods used to conduct hypothesis tests. One of these methods is known as the
traditional method, and another involves what is known as a p- value. The steps of these two most common
methods are identical up to a point, then diverge slightly. Both the traditional method for hypothesis testing and
the p-value method are outlined below.
The Traditional Method
The traditional method is as follows:
1. Begin by stating the claim or hypothesis that is being tested. Also form a statement for the case that the
hypothesis is false.
2. Express both of the statements from the first step in mathematical symbols. These statements will use
symbols such as inequalities and equals signs.
3. Identify which of the two symbolic statements does not have equality in it. This could simply be a "not
equals" sign, but could also be an "is less than" sign ( ). The statement containing inequality is called the
alternative hypothesis, and is denoted H1 or Ha.
4. The statement from the first step that makes the statement that a parameter equals a particular value is called
the null hypothesis, denoted H0.
5. Choose which significance level that we want. A significance level is typically denoted by the Greek letter
alpha. Here we should consider Type I errors. A Type I error occurs when we reject a null hypothesis that is
actually true. If we are very concerned about this possibility occurring, then our value for alpha should be
small. There is a bit of a trade off here. The smaller the alpha, the most costly the experiment. The values
0.05 and 0.01 are common values used for alpha, but any positive number between 0 and 0.50 could be used
for a significance level.
6. Determine which statistic and distribution we should use. The type of distribution is dictated by features of
the data. Common distributions include: z score, t score and chi-squared.
7. Find the test statistic and critical value for this statistic. Here we will have to consider if we are conducting a
two tailed test (typically when the alternative hypothesis contains a “is not equal to” symbol, or a one tailed
test (typically used when an inequality is involved in the statement of the alternative hypothesis).
8. From the type of distribution, confidence level, critical value and test statistic we sketch a graph.
9. If the test statistic is in our critical region, then we must reject the null hypothesis. The alternative
hypothesis stands. If the test statistic is not in our critical region, then we fail to reject the null hypothesis.
This does not prove that the null hypothesis is true, but gives a way to quantify how likely it is to be true.
10. We now state the results of the hypothesis test in such a way that the original claim is addressed.
The p-Value Method
The p-value method is nearly identical to the traditional method. The first six steps are the same. For step seven
we find the test statistic and p-value. We then reject the null hypothesis if p-value is less than or equal to alpha.
We fail to reject the null hypothesis if the p-value is greater than alpha. We then wrap up the test as before, by
clearly stating the results.
An Example of a Hypothesis Test
Mathematics and statistics are not for spectators. To truly
understand what is going on, we should read through and work
through several examples. If we know about the ideas behind
hypothesis testing and seen an overview of the method, then the
next step is to see an example . The following shows an example of
the both traditional method of a hypothesis test and the p-value
method.
A Statement of the Problem
Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the
commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample
of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9
degrees, with standard deviation of 0.6 degrees.
The Null and Alternative Hypotheses
The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees
This corresponds to the statement x ≥ 98.6.
The negation of this is that the population average is not greater than 98.6 degrees. In other words the average
temperature is less than or equal to 98.6 degrees. In symbols this is x < 98.6.
One of these statements must become the null hypothesis, and the other should be the alternative hypothesis.
The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = 98.6. It is common practice
to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or
equal to.
The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6.
Mathematics and statistics are not for spectators. To truly understand what is going on, we should read through
and work through several examples. If we know about the ideas behind hypothesis testing and seen an overview
of the method, then the next step is to see an example . The following shows an example of the both traditional
method of a hypothesis test and the p-value method.
A Statement of the Problem
Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the
commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample
of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9
degrees, with standard deviation of 0.6 degrees.
The Null and Alternative Hypotheses
The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees
This corresponds to the statement x ≥ 98.6.
The negation of this is that the population average is not greater than 98.6 degrees. In other words the average
temperature is less than or equal to 98.6 degrees. In symbols this is x < 98.6.
One of these statements must become the null hypothesis, and the other should be the alternative hypothesis.
The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = 98.6. It is common practice
to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or
equal to.
The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6.
What is the Difference Between Alpha and P-Values
In conducting a test of significance or hypothesis test there are two numbers that are easy to get confused. One
number is called the p-value of the test statistic. The other number of interest is the level of significance, or
alpha. These numbers are easily confused because they are both numbers between zero and one, and are in fact
probabilities.
Alpha – The Level of Significance
The number alpha is the threshold value that we measure p values against. It tells us how extreme observed
results must be in order to reject the null hypothesis of a significance test.
The value of alpha is associated to the confidence level of our test. The following lists some levels of
confidence with their related values of alpha:
 For results with a 90% level of confidence, the value of alpha is 1 - 0.90 = 0.10.
 For results with a 95% level of confidence, the value of alpha is 1 - 0.95 = 0.05.
 For results with a 99% level of confidence, the value of alpha is 1 - 0.99 = 0.01.
And in general, for results with a C% level of confidence, the value of alpha is 1 – C/100.
Although in theory and practice many numbers can be used for alpha, the most commonly used is 0.05. The
reason for this both because consensus shows that this level is appropriate, and historically it has been accepted
as the standard.
The alpha value gives us the probability of a type I error. Type I errors occur when we reject a null hypothesis
that is actually true. Thus, in the long run, for a test with level of significance of 0.05 = 1/20, a true null
hypothesis will be rejected one out of every 20 times.
P-Values (more on p-values below)
The other number that is part of a test of significance is a p-value. A p-value is also a probability, but it comes
from a different source than alpha. Every test statistic has a corresponding probability or p-value. This value is
the probability that the observed statistic occurred by chance alone.
Since there are a number of different test statistics, there are a number of different ways to find a p-value.
For some cases we need to know the probability distribution of the population.
The p-value of the test statistic is a way of saying how extreme that statistic is for our sample data. The smaller
the p-value, the more unlikely the observed sample.
Statistical Significance
To determine if an observed outcome is statistically significant, we compare the values of alpha and the p value. There are two possibilities that emerge:
 The p-value is less than or equal to alpha. In this case we reject the null hypothesis. When this happens we
say that the result is statistically significant. In other words, we are reasonably sure that there is something
besides chance alone that gave us an observed sample.
 The p-value is greater than alpha. In this case we fail to reject the null hypothesis. When this happens we
say that the result is not statistically significant. In other words, we are reasonably sure that our observed
data can be explained by chance alone.
The implication of the above is that the smaller the value of alpha is, the more difficult it is to claim that a result
is statistically significant. On the other hand, the larger the value of alpha is the easier is it to claim that a result
is statistically significant. Coupled with this, however, is the higher probability that what we observed can be
attributed to chance.
What Level of Alpha Determines Statistical Significance
Not all results of hypothesis tests are equal. A hypothesis test or test of statistical significance typically has a
level of significance attached to it. This level of significance is a number that is typically denoted with the
Greek letter alpha. One question that comes up in statistics class is, “What value of alpha should be used for our
hypothesis tests?”
The answer to this question, as with many other questions in statistics is, “It depends on the situation.” We will
explore what we mean by this. Many journals throughout different disciplines define that statistically significant
results are those for which alpha is equal to 0.05 or 5%. But the main point to note is that there is not a universal
value of alpha that should be used for all statistical tests.
Commonly Used Values Levels of Significance
The number represented by alpha is a probability, so it can take a value of any nonnegative real number less
than one. Although in theory any number between 0 and 1 can be used for alpha, when it comes to statistical
practice this is not the case.
Of all levels of significance the values of 0.10, 0.05 and 0.01 are the ones most commonly used for alpha. As
we will see, there could be reasons for using values of alpha other than the most commonly used numbers.
Level of Significance and Type I Errors
One consideration against a “one size fits all” value for alpha has to do with what this number is the probability
of. The level of significance of a hypothesis test is exactly equal to the probability of a Type I error. A Type I
error consists of incorrectly rejecting the null hypothesis when the null hypothesis is actually true. The smaller
the value of alpha, the less likely it is that we reject a true null hypothesis.
There are different instances where it is more acceptable to have a Type I error. A larger value of alpha, even
one greater than 0.10 may be appropriate when a smaller value of alpha results in a less desirable outcome.
In medical screening for a disease, consider the possibilities of a test that falsely tests positive for a disease with
one that falsely tests negative for a disease. A false positive will result in anxiety for our patient, but will lead to
other tests that will determine that the verdict of our test was indeed incorrect.
A false negative will give our patient the incorrect assumption that he does not have a disease when he in fact
does. The result is that the disease will not be treated. Given the choice we would rather have conditions that
result in a false positive than a false negative.
In this situation we would gladly accept a greater value for alpha if it resulted in a tradeoff of a lower likelihood
of a false negative.
Level of Significance and P-Values
A level of significance is a value that we set to determine statistical significance. This is ends up being the
standard by which we measure the calculated p-value of our test statistic.
To say that a result is statistically significant at the level alpha just means that the p-value is less than alpha. For
instance, for a value of alpha = 0.05, if the p-value is greater than 0.05, then we fail to reject the null hypothesis.
There are some instances in which we would need a very small p-value to reject a null hypothesis. If our null
hypothesis concerns something that is widely accepted as true, then there must be a high degree of evidence in
favor of rejecting the null hypothesis. This is provided by a p-value that is much smaller than the commonly
used values for alpha.
Conclusion
There is not one value of alpha that determines statistical significance. Although numbers such as 0.10, 0.05 and
0.01 are values commonly used for alpha, there is no overriding mathematical theorem that says these are the
only levels of significance that we can use. As with many things in statistics we must think before we calculate
and above all use common sense.
What is a P-Value?
Hypothesis tests or test of significance involve the calculation of a number known as a p-value. This number is
very important to the conclusion of our test. P-values are related to the test statistic and give us a measurement
of evidence against the null hypothesis.
Null and Alternative Hypotheses
Tests of statistical significance all begin with a null and an alternative hypothesis. The null hypothesis is the
statement of no effect or a statement of commonly accepted state of affairs. The alternative hypothesis is what
we are attempting to prove. The working assumption in a hypothesis test is that the null hypothesis is true.
Test Statistic
We will assume that the conditions are met for the particular test that we are working with. A simple random
sample gives us sample data. From this data we can calculate a test statistic. Test statistics vary greatly
depending upon what parameters our hypothesis test concerns. Some common test statistics include:





z - statistic for hypothesis tests concerning the population mean, when we know the population standard
deviation.
t - statistic for hypothesis tests concerning the population mean, when we do not know the population
standard deviation.
t - statistic for hypothesis tests concerning the difference of two independent population mean, when we do
not know the standard deviation of either of the two populations.
z - statistic for hypothesis tests concerning a population proportion.
Chi-square - statistic for hypothesis tests concerning the difference between an expected and actual count
for categorical data.
Calculation of P-Values
Test statistics are helpful, but it can be more helpful to assign a p-value to these statistics. A p-value is the
probability that, if the null hypothesis were true, we would observe a statistic at least as extreme as the one
observed. To calculate a p-value we use the appropriate software or statistical table that corresponds with our
test statistic.
For example, we would use a standard normal distribution when calculating a z test statistic.
Values of z with large absolute values (such as those over 2.5) are not very common and would give a small pvalue. Values of z that are closer to zero are more common, and would give much larger p-values.
Interpretation of the P-Value
As we have noted, a p-value is a probability. This means that it is a real number from 0 and 1. While a test
statistic is one way to measure how extreme a statistic is for a particular sample, p-values are another way of
measuring this.
When we obtain a statistical given sample, the question that we should always is, “Is this sample the way it is
by chance alone with a true null hypothesis, or is the null hypothesis false?” If our p-value is small, then this
could mean one of two things:
The null hypothesis is true, but we were just very lucky in obtaining our observed sample.
Our sample is the way it is due to the fact that the null hypothesis is false.
In general, the smaller the p-value, the more evidence that we have against our null hypothesis.
How Small Is Small Enough?
How small of a p-value do we need in order to reject the null hypothesis? The answer to this is, “It depends.” A
common rule of thumb is that the p-value must be less than or equal to 0.05, but there is nothing universal about
this value.
Typically, before we conduct a hypothesis test, we choose a threshold value. If we have any p-value that is less
than or equal to this threshold, then we reject the null hypothesis. Otherwise we fail to reject the null hypothesis.
This threshold is called the level of significance of our hypothesis test, and is denoted by the Greek letter alpha.
There is no value of alpha that always defines statistical significance.
How to Construct a Confidence Interval for the Population Variance
One of the goals of inferential statistics is to
estimate an unknown population parameter from a
statistical sample. The estimate that we obtain is
an interval of potential values, and is called a
confidence interval. Attached to the interval is a
level of confidence, indicating the reliability of
our estimate. One parameter that we may want to
estimate is the variance. The variance is a
measurement of variability, or in other words, how spread out a data set is. We will see the steps and the theory
behind the construction of a confidence interval for a population variance.
Assumptions
It is always a good idea to clearly state what assumptions we need to make in order move forward. We assume
that we are working with simple random sample of size n from a normal distribution. Or we assume that our
sample size is large enough that we can invoke the central limit theorem.
Chi-Square Random Variable
If there is any variability whatsoever in a random variable, then the variance is always nonnegative.
Due to this fact, the population variance is not distributed normally. Using some mathematical theory from
mathematical statistics, given our assumptions the following is a chi-square random variable with n - 1 degrees
of freedom.
(n - 1)s2 / σ2
Here s2 is the sample variance and σ2 is the population variance.
Confidence Interval
For a two-sided 1 - α confidence interval, we locate the row that corresponds with our number of degrees of
freedom. Next we read two numbers from this row. The first, denoted by A is the table value with probability
α/2 to the left. The second table value, denoted by B is the table value with α/2 to the right.
This means that 1- α is of our chi-square distribution is between these two numbers. This gives us:
A < (n - 1)s2/ σ2 < B
Since we want an interval for σ2 we rearrange our inequality:
A /[ (n - 1)s2] < 1 / σ2 < B / [ (n - 1)s2]
This gives us the following confidence interval:
[ (n - 1)s2] / B < σ2 < [ (n - 1)s2] / A.
Note on Symmetry
Many other confidence intervals are of the form estimate +/- margin of error.
These confidence intervals, such as those for a population mean, are symmetric about the estimate that is used.
Confidence intervals for the variance do not have this property. Variances are always nonnegative, and a chisquare distribution is too. Furthermore, a chi-square distribution is not symmetric.