Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chapter 08
Introduction
Hypothesis testing may best be summarized as a decision making process in which one attempts to arrive
at a particular conclusion based upon "statistical" evidence. A typical hypothesis test contains two
contradicting statements about the value of a population parameter of interest. These statements are called
the null hypothesis, denoted by , and the alternative hypothesis (aka the research hypothesis) denoted
by . Since these two hypotheses are contradicting, at least one of them must be false. Hypothesis testing
is the statistical process used to decide which statement (or hypothesis) appears to be true and which
appears to be false. The evidence we use to determine which hypothesis is correct arrives in the form of
randomly sampled data from the population (or populations) of interest.
The first step of any hypothesis test is to establish the null and alternative hypotheses, which in turn will
help us to determine exactly what we are testing. In practice, the researcher is responsible for setting up the
null and alternative hypotheses based on the type of research conducted. For us, setting up the null and
alternative hypotheses will stem from the careful interpretation of a "claim" found in the description of
a given research statement or question. Typically, a claim is associated with the alternative hypothesis, but is occasionally associated with the null
hypothesis. If the wording of the claim suggests equality of any kind then it is associated with the null
hypothesis. For example, if the claim states a parameter is "equal to", "greater than or equal to" (at least),
or "less than or equal to" (at most) a given value, then it is associated with the null hypothesis. Alternately,
if the claim specifically lacks equality, that is, states a parameter is less than, greater than, or unequal to a
given value, then the claim is associated with the alternative hypothesis (i.e.,
will only contain < , >, or
).
For instance, if we claim "the average surface temperature of the water in the North Atlantic in September
is greater than 38°F" (i.e. µ > 38), then this claim addresses the alternative hypothesis because "more than"
does not imply equality. For this claim, the alternative hypothesis will look like…
When the claim is associated with the alternative hypothesis, a null hypothesis needs to be devised so that
it contradicts the alternative hypothesis. An easy way to accomplish this is to simply set the parameter
equal to the value already specified in the alternative hypothesis. For example, a null hypothesis for the
current alternative hypothesis could simply be stated as…
Another option for the null hypothesis would be to use : µ < 38, which would also contradict the
alternative hypothesis and it too infers equality. However, in an attempt to simplify this process, we will
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=10288[10/12/2009 3:15:08 PM]
Chapter 08
use the equal sign (=) instead of the less than or equal to sign (< ), or the greater than or equal to sign (>).
Let's look at another example. Suppose a researcher makes the following claim: "The mean age of a person
diagnosed with type II diabetes is less than 29 years of age.". In this instance, the claim made by the
researcher is again the alternative hypothesis because "less than" does not imply equality. In this instance,
the alternative hypothesis will look like...
Additionally, if a researcher makes the claim: "The mean age of a person diagnosed with type II diabetes is
greater than 29 years of age.", then the alternative hypothesis becomes…
If a claim indicates a parameter (or parameters) is not equal to some value, as in the statement "The mean
age of a person diagnosed with type II diabetes is not 29 years of age.", the alternative hypothesis would be
listed as…
We now need to fit a null hypothesis to each of these three alternative hypotheses. Fortunately, by
utilizing straight equality in the null hypothesis, we can create a single null hypothesis that can be used
with any of the three previous alternative hypotheses, that being...
Regardless of which of the previous three alternative hypotheses we want to test, this one null hypothesis
contradicts all of them.
The reason we can get away with a single null hypothesis for any of the three previously mentioned
alternative hypotheses is because the remaining steps involved in a hypothesis test are determined by the
type of inequality used in the alternative hypothesis, and not those used in the null hypothesis. This is
precisely the reason why we are able to use the equals sign (=) exclusively in the null hypothesis, and let
the alternative hypothesis contain either <, >, or .
Occasionally, the null hypothesis is specified in the "claim" and the alternative hypothesis has to be created
to fit the research scenario. For example, consider the claim "the mean age of men getting married for the
first time is at least 25 years old". The phrase "at least 25" implies "greater than or equal to 25". Thus, the
alternative hypothesis needs to contradict this statement and therefore should be written such that the
parameter is "less than 25". Again, it is the alternative hypothesis that plays a roll in the completion of the
hypothesis test, so whether the null hypothesis utilizes the "greater than or equal to" sign, or just the "equal
to" sign, the test will remain the same. As a result, either of the following sets of hypotheses would be
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=10288[10/12/2009 3:15:08 PM]
Chapter 08
correct for this claim. However, as mentioned before, for the sake of simplicity, we will utilize the null
hypothesis than contains only the equal sign.
Additional examples of setting up the null and alternative hypotheses through interpretation of different
claims about various parameters are given here.
Claim: The mean body temperature of a healthy adult is not 98.6 degrees F. The claim "is not 98.6" implies this statement is associated with the alternative hypothesis, which
will contain the not equal to symbol. Thus, as a consequence, the null hypothesis will contain the
"equals to" symbol. The only correct set of hypotheses is: Claim: The mean monthly student loan payment of graduates from the University of Oklahoma is
thought to be more than $340.
Since the claim states "more than," with no mention of equality, it must be the alternative hypothesis. Thus, an appropriate null and alternative hypotheses are:
Claim: The population proportion of Democrats who will vote against their own party in the
upcoming election is less than 0.10.
Since the statement contains the phrase "less than", with no mention of equality, it is referencing the
alternative hypothesis. Also notice the claim is about a proportion, not a mean. Appropriate null and
alternative hypotheses for this scenario are...
Claim: The mean temperature in Nashville during the month of July is at least 84 degrees. In this case the claim that the temperature is "at least" 84 degrees contains equality because "at least"
means "greater than or equal to". Therefore, this claim addresses the null hypothesis, and thus, the
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=10288[10/12/2009 3:15:08 PM]
Chapter 08
alternative hypothesis must consist of "less than 84". Incidentally, the null and alternative hypotheses
can be written as ...
Claim: The standard deviation associated with the number of text messages sent by teenagers per day
in the US equals 16. This claim states that the standard deviation associated with US teenagers and their texting habits
equals 16, suggesting the statement is associated with the null hypothesis. Note that there was no
mention of "greater than", "less than", "at most", or "at least" anywhere in the statement. This means
the only way the alternative hypothesis can contradict the claim is, if it contains a "not equals to"
sign. Consequently, the null and alternative hypotheses are:
We can also make claims about two (or more) population parameters. An example might be
"in September, the mean surface temperatures for the North Atlantic will not equal the mean surface
temperature of the North Pacific", which represents the alternative hypothesis and is written as . Noting, that the null hypothesis that contradicts our alternative hypothesis is . Hypothesis test like these, involving two or more parameters, will be the focus of
future chapters. To recap, for each of the hypothesis tests conducted in this chapter (and in the following chapters), three
simple rules can be referenced in helping us with our construction of the null and alternative hypotheses.
1. The null hypothesis is always associated with an equals sign.
2. The alternative hypothesis never contains an equal sign, but contains either a < or > or
3. The null and alternative hypotheses always contradict each other.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=10288[10/12/2009 3:15:08 PM]
.
Chapter 8.2
The Origin of Hypothesis Testing
Regardless of whether the claim coincides with the null or alternative hypothesis, when conducting a
hypothesis test, we always assume the null hypothesis is true and test the reliability of the null hypothesis
with sample data. The reason we test the null hypothesis is because, by assuming the null hypothesis is
true, we are able to utilize the pre-established properties of sampling distributions. The basic idea of
testing the null hypothesis involves using sample data to calculate a statistic that estimates the parameter of
interest. Then, based on the proximity of the statistic in relation to the parameter, we decide whether or not
there is sufficient evidence to conclude the null hypothesis is false. These underlying ideas regarding
hypothesis testing might best be encapsulated by considering an example. The University President Example: Suppose the president of a university hypothesizes that the average age
of students attending her university is 20.5 years. The president's claim (or hypothesis) implies equality, so
the null hypothesis and alternative hypotheses are and
respectively. Note
that because no specific indication of testing for less than, greater than, at most, or at least was given, the
alternative hypothesis must be . One way to investigate our null hypothesis is to take a random sample of students from the university and
calculate their average age. Recall, due to sampling error, values of the sample mean, , vary from
sample to sample, and serve only as a "good guess" to the value of the population mean. We do not expect
the sample mean, , to equal the population mean, µ, but, if the null hypothesis is true, we do expect a vast
majority of 's to be reasonably close to µ = 20.5. Therefore, if the value of the sample mean age is close
to 20.5, we will have little evidence to suggest that µ = 20.5 is not a viable statement. However, if the
value of begins to deviate substantially from 20.5, then we start to question the legitimacy of the null
hypothesis. This brings about the question of just how far does need to deviate from the value of µ stated in the null
hypothesis before we begin to suspect the null hypothesis is incorrect? The answer to this question
depends greatly upon the spread of the distribution of 's. Fortunately, by making use of results provided
by the central limit theorem, the spread of the distribution of 's can be estimated. Recall, if n is greater
than 30, the sampling distribution of the 's will be approximately normally distributed with a mean
of µ and a standard deviation of
.
For instance, suppose a random sample of 30 university students was selected and their average age was
found to be 20.8. Additionally, assume it was known that the value of is 2.4 years. Thus, the standard
deviation of the sampling distribution is
, or 0.438 years. More importantly, if the president's claim
that the average age of the students at her university is 20.5 years is true, we would expect about 95% of
the sample means to be within 1.96(0.438) of 20.5, or between 19.64 and 21.36 years. This is shown in
Figure 8.1with 95% of the sample means falling between the red vertical lines. http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13737[10/12/2009 3:15:28 PM]
Chapter 8.2
If the value of our sample mean, , falls within this central 95% of the sampling distribution, we fail
to reject the null hypothesis because we assume the difference between and µ is the result of sampling
error (recall, sampling error is the reason different samples provide different sample means). That is,
when the value of our sample mean is between the values which define the central 95% of the sampling
distribution, we lack sufficient evidence to suggest the true mean is not 20.5. This is equivalent to saying,
based on the evidence provided by our sample mean, we do not have enough evidence to reject the null
hypothesis. In this instance, our sample mean of 20.8 clearly falls among the commonly expected values of , as
indicated by the green line in Figure 8.2. Therefore, a sample mean of 20.8 provides us with an
unsubstantial amount of evidence against the null hypothesis, meaning there is insufficient evidence to
suggest the average age of students at the university is not 20.5 years. http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13737[10/12/2009 3:15:28 PM]
Chapter 8.2
Notice the last statement said nothing about evidence in support of the null hypothesis, we just don't have
enough evidence to say that it is false. That is, the difference between the sample mean and the
hypothesized population mean was not large enough to "really" convince us otherwise.
On the other hand, if the value of our sample mean, , was much further away from 20.5, like 21.5, we
should be less inclined to believe that the null hypothesis is true. Namely, the veracity of the null
hypothesis would be in question because the value of the sample mean (21.5) is so extreme compared to
the stated value of 20.5 that there appears to be more than just sampling error present. This would indicate
that the true average age is not 20.5, but probably some value larger than 20.5. In fact, we can see from Figure 8.3that the sample mean of 21.5 is not located within the middle 95% of the
distribution. Instead, it deviates greatly from the hypothesized center of 20.5. Although it is possible that
sampling error is the only cause of this sample mean being so extreme, the probability is very small.
Therefore, instead of assuming the large distance between the sample mean and hypothesized population
mean is the rare case of extreme sampling error, we instead adopt the more believable idea that the
population mean is really larger than 20.5 and there is just a little sampling error. When the value of a
statistic (our sample mean in this case), is beyond what would be expected due to sampling error alone, we
say that our result is statistically significant. http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13737[10/12/2009 3:15:28 PM]
Chapter 8.2
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13737[10/12/2009 3:15:28 PM]
Chapter 8.3
Setting up a Hypothesis Test
Once the null and alternative hypotheses are established, the next step is to determine whether we are
conducting a one-tailed test or a two-tailed test, that is whether or not there are one or two rejection
regions. The number of tails in our hypothesis test is determined by the alternative hypothesis becasue it
reveals where the test statistic needs to fall in order to reject the null hypothesis. For instance, say we
hypothesize that the average age of persons diagnosed with type II diabetes is not 29 years old, giving us a
null and alternative hypotheses of:
In this case, there are two different scenarios which allow us to reject the null hypothesis. We could reject
the idea that µ = 29 if the sample mean is very small compared to the hypothesized population mean or if
the sample mean is very large compared to the hypothesized population mean. Either way, we expect more
than just sampling error to be causing the large difference between the hypothesized and sample means (i.e.
a statistically significant difference). As a result we need to keep both tails of the sampling distribution
labeled as potential "rejection regions," where a rejection region is defined as any area of the distribution
typically not attributed to sampling error alone (see Figure 8.4). This is fittingly called a two-tailed test
because both tails are potential "rejection regions."
However, what if the claim was along the lines of "the mean age of a person who is diagnosed with type II
diabetes is less than 29 years"? In keeping with the claim, the appropriate set of null and alternative
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13736[10/12/2009 3:15:48 PM]
Chapter 8.3
hypotheses are:
From inspection of the alternative hypothesis, in order to really convince anyone that the alternative
hypothesis is true, we will need a sample mean that is much smaller than 29 such that its value is beyond
the range of sample means accounted for by sampling error. This type of situation would enable us to
suspect that the cause of the difference between the sample and hypothesized mean is due to more than just
sampling error. Recognize that sample means greater than 29 will surely not convince anyone that the
alternative hypothesis is true. We call this a one-tailed hypothesis test (or more formally, a left-tailed
hypothesis test), as the only rejection region falls in the left tail. Therefore, we only reject the null
hypothesis if the value of our sample mean finds itself in the left tail of the distribution as shown in Figure
8.5.
In a similar fashion, if the null and alternative hypotheses were stated as:
then the rejection region would fall to the right of the distribution because the only way to reject the null
hypothesis is to obtain a sample mean large than 29 such that the value of the sample mean is beyond what
would be considered sampling error (see Figure 8.6).
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13736[10/12/2009 3:15:48 PM]
Chapter 8.3
Since the rejection region is found in the right tail, this too is a one-tailed test, but more specifically, we
call it a right-tailed hypothesis test.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13736[10/12/2009 3:15:48 PM]
Chapter 8.4
Making an Error
When conducting a hypothesis test, we must remember that we can never be absolutely certain which
hypothesis is the correct one. When we complete a hypothesis test, we select what we think is
the correct hypothesis, but we may be wrong. The reason we can never be absolutely certain which
hypothesis is the correct one is because we are using a sample to make an inference about an entire
population. For instance, in the example regarding the mean age of students at a university, we rejected the
null hypothesis when the sample mean ( = 21.5) fell outside of the middle 95% of the distribution.
However, even when the null hypothesis is true, there is still a chance, although small (2.5% for each tail
for a total of 5%), of an estimate (our sample mean in this case) falling outside the central portion of the
distribution (the area due to sampling error). When one of these rare yet possible estimates occurs, we
reject the null hypothesis when in fact it should be retained. If we reject the null hypothesis but the null
hypothesis is correct, we have made an error called a type I error. The probability of a type I error is
denoted by (the Greek letter alpha), where is also called the "level of significance" or, the "type I error
rate". The value of is easily obtained as it is the researcher (you) who gets to decide on what this value
is. The value selected by the researcher always corresponds to the area in the rejection region(s). Thus, if
we decide to use the middle 95% of the sampling distribution to account for sampling error, then that
leaves 5% in the tails or = 0.05 . Just as confidence intervals should never have a level of confidence
lower than 90% or rarely be greater than 99%, the value of should never rise above 0.10 or rarely fall
below 0.01.
Regardless of the value we select for , we need to determine this value of before collecting our data
and conducting our hypothesis test. If we let the results of the hypothesis test influence which alpha level
we choose, what will keep us from selecting an alpha level that supports the result "we" desire, instead of
the results given by our test? Thus, the level of alpha is always chosen a priori "before the fact" and never
ex post facto or "after the fact."
A second type of error arises whenever we fail to reject the null hypothesis, but the null hypothesis is
actually false. This type of error is called type II error. The probability of a type II error is denoted (the
Greek letter beta). For instance, in reference to the mean age of a college student example, a type II error
can occur when the sample mean falls within the middle 95% of the sampling distribution such as = 20.8,
but the population mean is not 20.5, as specified in the null hypothesis. Because the deviation of this
sample mean from the hypothesized population mean could be attributed entirely to sampling error, we
would have no substantial reason to reject the null hypothesis. Therefore, even if the true population mean
is a value other than 20.5, the null hypothesis will not be rejected and a type II error will be committed. The below table summarizes the errors (and the non-errors) one can make when conducting a hypothesis
test. http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13735[10/12/2009 3:16:05 PM]
Chapter 8.4
Unfortunately, we can never be sure of the exact value of because calculating it requires us to know the
true value of when the null hypothesis is wrong. If we knew the real value of , we would not
bother conducting a hypothesis test. Although we will not be able to directly calculate the probability of a
type II error, we will discuss ways in which the chance of comitting a type II error can be reduced.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13735[10/12/2009 3:16:05 PM]
Chapter 8.5
Power
The complement of the probability of a type II error is called power. Power is the probability of rejecting
the null hypothesis when indeed the null hypothesis is false. Thus, power represents the probability of
making a good decision. Although power will not be discussed in detail in this text, the concept of power
is important. If our hypothesis test has high power, then we will be more likely to make the correct
decision of rejecting a false null hypothesis. Also, when making comparisons between different statistical
tests designed to accomplish the same goal, the one with the highest power is generally preferred.
Mathematically, power is denoted as 1- , where is the probability of a type II error.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13734[10/12/2009 3:16:15 PM]
Chapter 8.6
Relationships Between Type I Error, Type II Error, and Power
In this section we will discuss how the probability of type I error, the probability of type II error, and
power are related to one another. To do so, turn you attention to Figure 8.7, where the blue curve (on the
left) represents the distribution with respect to the null hypothesis, which in this case is centered at zero
(i.e., µ = 0). Likewise, the red curve (on the right) represents the distribution specified in the alternative
hypothesis,
(or more specifically,
). In reality we would never know the specific value
of the parameter under the alternative hypothesis (µ = 2 in this case), but it makes it easier to discuss the
relationship between power, the probability of a type I error, and the probability of a type II error when this
value is known. To better understand the interconnections between type I error, type II error, and power, we need to adhere
to the following rules. When referencing type I error, we are assuming the null hypothesis is the correct
hypothesis. Therefore, when discussing type I error, we will consider only the blue curve in Figure 8.7.
When we referencing type II error or power, we are assuming the alternative hypothesis is the correct
hypothesis, thus, we will consider only the red curve in Figure 8.7. When the value of a sample mean, , falls between the vertical green lines, we will not reject the null
hypothesis. In this case, if the null hypothesis is true, then no error has been made. However, if the sample
mean falls to the left of the lower green line or the right of the upper green line and the null hypothesis is
true, we will incorrectly reject the null hypothesis and a type I error will be committed. The probability of a
type I error (denoted ) is represented by the area under the blue curve outside the green lines. In Figure
8.7 this area is labeled "Type I Error" and is also the rejection regions.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13733[10/12/2009 3:16:26 PM]
Chapter 8.6
On the other hand, if the true value of the population mean is two (i.e., the alternative hypothesis is correct)
and the sample mean falls between the green lines, then we fail to reject the null hypothesis and a type II
error is committed. The type II error rate, , (i.e. the probability of committing a type II error) is
represented by the area under the red curve that falls between the green lines. In Figure 8.7 this area is
labeled "Type II Error". Finally, if the position of the sample mean falls to the left of the lower green line,
or the right of the upper green line, then no error has been committed. In fact, if this happens, it is desirable
as we are rejecting a false null hypothesis in support of a true alternative hypothesis. As stated earlier, the
probability of correctly rejecting the null hypothesis is power, and is represented by the area under the red
curve to the left of the lower green line and the right of the upper green line. We can see from Figure 8.7 that almost all of the power falls to the right of the upper green line, which makes sense, as the value of the
mean under the alternative hypothesis is greater than the mean under null hypothesis (two is greater than
zero). Consequently, if the value of the mean under the alternative hypothesis was smaller than the value of
the mean given by the null hypothesis, then most of the power would be found under the red curve and to
the left of the lower green line.
In our continued quest to see how type I error, type II error, and power are all related, consider Figure 8.8,
which is similar to Figure 8.7, except the type I error rate has been reduced (i.e. the vertical green
lines have been moved further apart).
It is a common misconception to think that reducing the probability of a type I error is a beneficial. How
could it be bad thing to lower your chance of committing an error? The problem with reducing, , the
type I error rate, is that you simultaneously increase the probability of a type II error. As displayed in
Figure 8.8, when the type I error rate decreases, the area under the blue curve and between the green lines
increases. When this happens, the area between the green lines and under the red curve also
increases, consequently increasing . In addition, when increases, power decreases. This diminishes our
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13733[10/12/2009 3:16:26 PM]
Chapter 8.6
ability to correctly reject the null hypothesis when it is false. It is also worthy to note how much the
probability of a type II error increased from a very small decrease in the probability of a type I error. By
comparing Figure 8.7 and Figure 8.8, notice, it was not an equal exchange. For this example, when was
decreased only a little, increased substantially.
Additionally, as displayed in Figure 8.9, if we reduce the type II error rate to increase power, we in turn
increase the type I error rate.
This is yet another example of "you can't get something for nothing." If you reduce type I error rate,
your type II error rate increases and power decreases. Similarly, if your type II error rate is decreased and
power increased, you do so at the cost of increasing the type I error rate. This is why it is common to use a
type I error rate that strikes a "happy middle ground", like say 0.05. A type I error rate of 0.05 is small
enough to minimize the chance of rejecting the null hypothesis incorrectly, but large enough to insure a
relatively manageable type II error rate along with hopefully providing a decent amount of power. In
general, it is recommended that we select a type I error rate (a.k.a. level of significance) between 0.1 and
0.01.
When a hypothesis test is a one-tailed test instead of a two-tailed test, the areas representing the type I
error rate, the type II error rate, and power are all on one side of the distribution. Recall, a one-tailed test
places the entire type I error rate (or rejection region) into either the left or right tail of the distribution. For
example, the location of the type I error rate, type II error rate, and power for a right-tailed test are
displayed in Figure 8.10.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13733[10/12/2009 3:16:26 PM]
Chapter 8.6
To see firsthand the relationship between type I error, type II error, and power, activate the Alpha-Beta
interactive tool below.
Click here to use the Alpha-Beta Tool.
To utilize this interactive tool, you can control not only the type I error rate but also the sample size, the
standard deviation of the distributions, and the distance between the means hypothesized in the null and
alternative hypotheses. Once all of these values are determined, the interactive tool displays the probability
of a type II error and the power. This tool can be used to investigate how altering the values of these
variables influences the probability of a type II error and power. However, when using this tool, keep in
mind that the only variables the researcher would have control over in the "real world" are the sample size,
the level of , and possibly whether or not a one or two-tailed test is conducted. The standard deviation
would only be estimated after the sample is taken while the distance between the means, the probability of
a type II error, and power are never known. In the interactive tool, the variables that a researcher would
have control over are coded in blue while the variables that would generally be unknown to the researcher
are coded in red. Upon activating the interactive tool, it may be helpful to increase your understanding of
the relationship between type I error, type II error, and power by answering the following questions.
For a set standard deviation and distance between the means:
1. Which has higher power, a one-tailed or a two-tailed test?
2. What happens to the probability of type II error as the probability of a type I error is decreased?
3. What happens to the power as the probability of a type I error is decreased?
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13733[10/12/2009 3:16:26 PM]
Chapter 8.6
4. What happens to the probability of a type II error as the sample size is increased?
5. What happens to the power as the sample size increases?
For a set sample size and level of alpha:
1. What happens to the probability of a type II error as the standard deviation increases?
2. What happens to the power as the standard deviation increases?
3. What happens to the probability of a type II error as the distance between the means increases?
4. What happens to the power as the distance between the means increases?
Answers: 1. The one tailed test has higher power. 2. The probability of a type II error increases. 3. The power decreases
4. The probability of a type II error decreases.
5. The power increases.
___________________________________
1. The probability of a type II error increases.
2. The power decreases.
3. The probability of a type II error decreases.
4. The power increases. http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13733[10/12/2009 3:16:26 PM]
Chapter 8.7
Hypothesis Tests about a Population Mean, the Right Way!
In section 8.2, we discussed one method of determining which hypothesis appears to be true. Recall, in
section 8.2 the mean age of students at a university was thought to be 20.5 years. To test this theory, a
random sample of 30 students was selected and their mean age was determined. If the sample mean age fell
in the middle 95% of the sampling distribution, we failed to reject the null hypothesis. Whereas, if the
sample mean fell in either tail (the rejection regions), the null hypothesis was rejected and the alternative
hypothesis was thought to be correct. While this method accomplishes the goal, it does not reflect
the process used by researchers. First, in section 8.2, we assumed we knew the population standard
deviation, . In more authentic situations,
will almost never be known and must be estimated with the
sample standard deviation, s. Similar to constructing confidence intervals (see section 7.14), when s is used
to estimate , we base our calculations on the t-distribution instead of a z-distribution. In addition, we do
not usually utilize the sampling distribution alone to determine if our sample mean is "extreme enough" to
reject the null hypothesis, as was done in section 8.2. Although this method suffices, it is missing a key
aspect of hypothesis testing: a value that measures the "strength of the evidence against the null
hypothesis".
For instance, in the example from section 8.2 the null hypothesis,
, was rejected when the
sample mean was 21.5 because it fell into one of the rejection regions. But, this process never really
indicates just how "far out" the sample mean was compared to the hypothesized population mean, or more
importantly, how uncommon it would be to get a sample mean as extreme or even more extreme than 21.5
if the null hypothesis was true. Obviously the evidence was strong enough to reject the null hypothesis, but
think about how much stronger the evidence would have been if the sample mean turned out to be say 25.8
(if 21.5 is extreme, then 25.8 is really extreme).
Introducing the P-Value
In order to mathematically state just how strong, or weak, the evidence is against the null hypothesis, we
calculate the probability of getting a sample mean (or any other estimate for that matter) that is at least as
extreme as the one obtained assuming the null hypothesis is true. This probability is called the p-value,
and is used to determine the degree in which the null hypothesis is either rejected or retained.
Graphically, the p-value is the area in the tail(s) beyond the sample mean. Regardless of where the sample
mean falls in the sampling distribution, if the test is a left-tailed test, then the p-value is associated with the
area to the left of the sample mean as shown in Figure 8.11.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
Similarly, if the test is a right-tailed test, the p-value can be graphically represented by the area to the right
of the sample mean, as shown in Figure 8.12.
Two-tailed tests are approached differently because the definition of the p-value states "…at least as
extreme as the one obtained, in the direction of the alternative hypothesis...". Thus, when the alternative
hypothesis indicates the rejection region is in both tails ( ), the p-value is related to the area that extends
outward towards both tails, beyond not just the sample mean, but beyond the location of its complement
(mirror image for a symmetrical distribution) called the "pseudo" sample mean. An example of a twohttp://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
tailed test is illustrated in Figure 8.13, where the solid green line represents the actual sample mean and the
dashed green line represents the corresponding pseudo sample mean. The p-value is then found by
considering the area the the left of the pseudo mean and to the right of the actual mean. Of course if the
actual mean is positioned in the left tail instead of the right, then the p-value is found by combining the
area to the left of the sample mean with the area to the right of the pseudo sample mean. Note, that if the sample mean is not in the rejection region, then the area beyond the sample mean and the
pseudo sample mean will be larger than the area defined by the rejection region (which is equal to the level
of significance, or ). Therefore, if the p-value is larger than the level of significance, we will fail to reject
the null hypothesis. However, if the sample mean falls in a rejection region, then the area beyond the
sample mean and pseudo sample mean will be less than the level of significance, causing us to reject the
null hypothesis. Note that due to the inclusion of the pseudo sample mean, the p-value for a two-tailed test
will always be twice as large as the p-value for an equivalent one-tailed test. In general, smaller p-values (smaller probabilities) indicate stronger evidence against the null hypothesis
while larger p-values indicate weaker evidence against the null hypothesis. Thus, if your p-value is very
small (smaller than say 0.001) then you have very strong evidence in support of rejecting the null
hypothesis. Keep in mind however, that regardless of the determined p-value, we always reject the null
hypothesis if the p-value is smaller than the stated level of significance. The p-value simply gives us an
"idea" of how strong our evidence is against the null hypothesis. Determining the P-Value
To determine the p-value, (i.e., the strength of the evidence against the null hypothesis), we must first
convert the sample mean into a "test statistic" by standardizing it. This standardized value is called a test
statistic because it is the value that is used to test the null hypothesis. A sample mean can be standardized
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
(turned into a test statistic) by using Equation 8.1. Once the test statistic is found, we can then find the pvalue using an appropriate interactive tool (or table). This process is similar to the method used in chapter
6, where we standardized x-scores by transforming them into z-scores, and then, found the probabilities
associated with our z-scores.
An example problem (or two) will hopefully make this process clearer, but first, let's consider the situations
in which equation 8.1 provides us with reliable p-values.
The assumptions for conducting a hypothesis test on a population mean (i.e. what we need to assume of we
are to use Equation 8.1 appropriately) are:
1. The data was collected via a simple random sample.
2. The sample size must be large enough to ensure an approximately normal sampling distribution.
According to the central limit theorem, we need a sample size adequate enough to make the sampling
distribution approximately normal. Generally a sample size of 30 will suffice. Oxygen Intake Example: A research scientist claims that the mean oxygen intake per breath for smokers is
less than 40.6 ml/kg. Based on a sample of 35 smokers the mean oxygen intake was found to be 39.2 ml/kg
with a sample standard deviation of 3 ml/kg. Obviously the sample mean, 39.2 ml/kg is less than the
stated 40.6 ml/kg. But, is this difference statistically significant, i.e. large enough to statistically convince
us that the mean oxygen intake is less than 40.6 ml/kg? Stated differently, is there enough evidence to
support the claim based on a level of significance of = 0.05?
Step 1: Determine the Null and Alternative Hypotheses
Because the research believes the oxygen intake is less than 40.6 ml/kg, but not equal to it, the
claim is the alternative hypothesis. Therefore the null and alternate hypotheses can be written
as
= 40.6 and < 40.6 respectively. The alternative hypothesis indicates a left-tailed test is to be conducted, meaning the entire
rejection region falls on the left side of the distribution. Since the level of significance is set at
0.05, the area in the rejection region will be 0.05. If the p-value turns out to be less than or
equal to 0.05, the null hypothesis should be rejected, otherwise the null hypothesis should be
retained.
Step 2: Calculate the Test Statistic
Using Equation 8.1, determine the test statistic (the t-score) by standardizing the sample mean of
39.2 ml/kg with a sample standard deviation of 3 ml/kg.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
Step 3: Find the p-value
Next, determine the correct degrees of freedom for which the t-distribution is applicable, and
then find the area to the left of the test statistic. This is equivalent to finding the probability of
obtaining a sample mean as extreme or more extreme than 39.2 when the true mean is assumed
to be 40.6 ml/kg.
The correct degrees of freedom are n - 1 = 34. The probability of being to the left of 39.2 (or
being to the left of a t-score of -2.76 based on 34 degrees of freedom) can be found using the
interactive p-value calculator for a t-distribution. This application (along with others)
is available in the floating menu at the right of the screen.
To utilize the interactive p-value calculator, we first set the degrees of freedom to the correct
value (34 for the current example). Once the degrees of freedom are set, we determine whether
we are conducting a left-tailed test, a right-tailed test, or a two-tailed test. Since we are
currently conducting a left-tailed test, we will utilize the box marked "Left Tailed". From here,
to find the p-value associated with our test statistic, we adjust the slider (corresponding to the
p-value) until the value of the test statistic (-2.76) matches the value in the box labeled
"Left Tailed" (if we cannot get exactly -2.76, then get as close as possible). Once the value in
the box labeled "Left Tailed" contains the value of the test statistic, the corresponding pvalue can be found from inspection of the "Corresponding p-value" box. The p-value for this
example turns out to be about 0.005.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
Step 4: Decide which Hypothesis Appears to be True Since the p-value is 0.005, the probability of getting a sample mean as extreme as 39.2
assuming the population mean is 40.6 is about 0.005 (not very likely). Since this probability is
so small, we are led to believe that the null hypothesis is false. Therefore we reject the null
hypothesis and favor the alternative hypothesis instead. In summary, since the p-value is
smaller than the stated alpha level of 0.05, we reject the null hypothesis in favor of the
alternative hypothesis.
Step 5: Write a Statement(s) Explaining Your Conclusion
An example of a good concluding statement would be: "Based on a sample of 35 smokers, there
is sufficient evidence (p-value = 0.005) to reject the notion that the mean oxygen intake for a
smoker is at least 40.6 ml/kg (the null hypothesis), and therefore, we conclude that the mean
oxygen intake for smokers is lower than 40.6 ml/kg (the alternative hypothesis)."
Professor Salary Example: A study conducted by the American Chemical Society claims that the average
annual salary of tenured chemistry professors is $70,000. To test this claim, the Association of American
Chemistry Professors randomly selected 52 tenured chemistry professors and found their mean salary to be
$70,150 along with a standard deviation of $900. Investigate the claim using an alpha level of 0.05 (the
level of significance).
Step 1: Determine the Null and Alternative Hypotheses
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
Since there was no indication of less than or greater than, the alternative hypothesis must contain the
not-equal sign. Therefore, the null and alternative hypotheses are
and
respectively. As a result, this is a two-tailed test.
Step 2: Calculate the Test Statistic
The sample mean was $70,150. Using the stated standard deviation of $900 and the sample size of
52, we calculate the test statistic using Equation 8.1.
Step 3: Find the P-Value
After determining the degrees of freedom, we can find the p-value using the interactive p-value
calculator for a t-distribution. Since we are conducting a two-tailed test, and the test statistic is
positive, we find the p-value based on the "Upper two-tailed" critical value (if the test statistic was
negative we would utilize the "Lower two-tailed" critical value). Thus, based on 51 degrees of
freedom, the p-value that corresponds to a test statistic of t = 1.2 is about 0.24.
Step 4: Decide which Hypothesis Appears to be True
Because the p-value, 0.24, is larger than the stated alpha level of 0.05 we fail to reject the null
hypothesis. The data does not provide us with sufficient evidence to suggest the null hypothesis is
false as the difference between $70,150 and the hypothesized $70,000 was not statistically
significant. The probability of getting a sample mean as extreme as $70,150 when the population
mean was thought to be $70,000 is 0.24, meaning such a result will happen in about one out of every
four samples, which is quite common.
Step 5: Write a Statement(s) Explaining Your Conclusion
An example of a good concluding statement would be: "Based on a sample of 52 tenured chemistry
professors and a 0.05 level of significance, we do not have enough evidence (p-value = 0.24) to
reject the claim that the mean salary of tenured chemistry professors is $70,000".
The same results can be obtained by using a statistical software package such as Minitab. For instance,
below is the output that corresponds to our Professor Salary Example.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.7
Notice that, among other things, Minitab provides us with the value of the test statistic and the
corresponding p-value. Regardless of whether we use Minitab or conduct the test "by hand" our decision
remains the same, i.e., since the p-value is greater than 0.05 we fail to reject the null hypothesis.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13732[10/12/2009 3:16:47 PM]
Chapter 8.8
Hypothesis Testing about a Population Proportion
Not only can we test values of hypothesized population means, but we can also test the values of
hypothesized population proportions as well. Recall, proportions specify the part or percent of a population
that has a specific characteristic or trait. For instance, we might hypothesize about the population
proportion of registered voters who will participate in an upcoming election, or, hypothesize about the
proportion of grizzly bears in Denali National Park that are male.
There are very few differences between conducting a hypothesis test for a population proportion in
comparison to conducting a hypothesis test for a population mean (see section 8.7). This is because the
basic steps in conducting a hypothesis test on a population proportion are essentially the same as for a
population mean. Additionally, all previously discussed terms retain their exact meaning, such as type I
error, p-value, power, etc. The only difference is, we are concentrating on proportions instead of means, so
consequently, the null and alternative hypotheses will be statements about a population proportion instead
of a population mean. It is also the case that our test statistic will be calculated based
on proportions, and assumes the shape of a z-distribution (similar to when confidence intervals about a
population proportion were created). Thus, our test statistic will take the form of a z-score, where the
equation for the test statistic is given in Equation 8.2.
In Equation 8.2, P is the value of the population proportion (our hypothesized value) , and is the sample
proportion, which is defined as the number of observations with the trait/characteristic of interest, x, out of
the number in our sample, n, i.e.,
. However, before working on some examples, we need to
first consider the requirements necessary for Equation 8.2 to provide reliable results.
The assumptions for conducting a hypothesis test on a population proportion are:
1. The data was collected via a simple random sample.
2. The sample size must be large enough to ensure an approximately normal distribution. An easy rule of
thumb is if both nP > 15 and n(1-P)>15 are true, then the sample size is adequate. Since we are testing the
null hypothesis, we must use the proportion stated in the null hypothesis and not the sample proportion
when checking this assumption (other textbooks claim that 15 can be replaced by 10 or as little as 5,
although research hints otherwise). Voter Example: A political analyst states that fewer than 30% of the voting population will vote in an
upcoming city election. To support his claim, he randomly samples 400 registered voters in the city and
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13731[10/12/2009 3:17:02 PM]
Chapter 8.8
determines that 98 plan to vote in the upcoming election. Conduct an appropriate hypothesis test for the
political analyst using a level of significance of 0.10.
Step 1: Determine the null and alternative hypotheses
Because the political analyst stated that less than 30% will vote (no equality implied), the claim must
be the alternative hypothesis. Therefore the null and alternative hypotheses can be written as and respectively. Note, it is the political analyst's goal to disprove the null
hypothesis, thus verifying his statement. Step 2: Check the assumptions
Since P = 0.30, then (400)(0.3) = 120 and (400)(1-0.3) = 280. Since both are greater than 15, the
sample size is sufficient to use the z-distribution.
Step 3: Calculate the test statistic and p-value
The value of the sample proportion is
and the test statistic is:
Because the alternative hypothesis states "less than," this is a left-tailed test and the rejection
region is contained in the left tail only. Thus, to calculate the p-value that corresponds to the
test statistic, we must first activate the p-value calculator for the z-distribution interactive tool
(found in the floating menu to the right of the screen). Once activated, the interactive tool
should look similar to Figure 8.15.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13731[10/12/2009 3:17:02 PM]
Chapter 8.8
This interactive tool works similar to the one based on the t-distribution. Since we are
conducting a left-tailed test, we will utilize the "Left Tailed Test Statistic Value" and adjust
the slider next to the "Corresponding p-value" until the "Test Statistic Value" matches the our
calculated test statistic of -2.40 as shown in Figure 8.15. In this example, the p-value that
corresponds to the test statistic of approximately -2.40 is about 0.008.
Step 4: Make a decision regarding which contradicting hypothesis appears correct
Because the p-value, 0.008, is smaller than the level of significance, 0.10, we reject the null
hypothesis. Thus, the data provided sufficient evidence to discredit the null hypothesis and support
the claim of the political analyst.
Step 5: Write a statement(s) explaining your conclusion
A final statement explaining the findings of this hypothesis could be: "There was sufficient evidence
(p-value = 0.008) to statistically support the political analyst's statement, based on a random sample
of 400 voters. Therefore, we have statistical evidence that the proportion of voters who will vote in
the next election will be less than 0.30."
For some additional practice, consider the next example.
College Algebra Example: At a specific university, 35% of students do not pass algebra on the first try.
The mathematics department recently selected a new algebra text and is wondering if the new text will lead
to a change in the percent of student passing the course on their first try. Eighty students using the new
textbook were randomly selected and it was later determined that 33 of them did not pass. Test if the
different textbook appears to be affecting the percentage of students who do not pass on their first attempt.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13731[10/12/2009 3:17:02 PM]
Chapter 8.8
Use a level of significance of 0.05.
Step 1: Determine the null and alternative hypotheses
Because the percent of non-passing students was stated as being 35% prior to switching textbooks,
this "is" the null hypothesis. Next, since the problem clearly states the mathematics department is
looking for a change, but does not indicate that they anticipating a higher or lower percent of nonpassing students, we must assume they are looking for either an increase or a decrease in the
proportion of students who do not pass the course on their first attempt. Therefore, the null and
alternative hypotheses are
and
respectively. Step 2: Check the assumptions
Since P = 0.35, then (80)(0.35) = 28 and (80)(1-0.35) = 52. The sample size is sufficient.
Step 3: Calculate the test statistic and p-value
The value of the sample proportion is
and the test statistic is:
Because the alternative hypothesis states "not equal to 0.35," this is a two-tailed test as both tails contain
rejection regions. By using the p-value calculator for the z-distribution, we find the corresponding p-value
to be about 0.242 (making sure you use the "Two-Tailed Test" option).
Step 4: Make a decision regarding which hypothesis appears correct
As a result of the p-value, 0.242, being larger than the level of significance, 0.05, we will fail to
reject the null hypothesis. While the sample proportion, = 0.4125, was not very close to the stated
population proportion of P = 0.35, statistically, it was not extreme enough to convince us that the
proportion of non-passing students has changed.
Step 5: Write a statement(s) explaining your conclusion A final statement explaining the findings of this hypothesis could be: "With a p-value of 0.24, the
evidence provided by the sample of 80 students was not strong enough to convince us that the
proportion of non-passing students has changed from the previously stated 0.35.".
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13731[10/12/2009 3:17:02 PM]
Chapter 8.9
Hypothesis Testing about a Population Variance
In the previous chapter , we learned how to create confidence intervals about a population mean, proportion,
and variance. Thus far, we have been exposed to hypothesis testing for a population mean and proportion,
and therefore, it is fitting to end this chapter with hypothesis testing about a population variance (or
standard deviation). Although testing the population variance is not utilized nearly as often as the
previously discussed types of hypothesis testing, there are occasions where it is very useful. For example, if
a machine is supposed to produce ¼ inch bolts, it would be important to make sure the mean diameter of
these bolts is very close to ¼ inch. However, if the variance is not also checked, it would be possible to
create several bolts that are much too big and several that are much too small and still have a mean around
¼ inch. If the variance is controlled, then the diameter of the bolt will remain consistent.
Fortunately, the terminology and basic steps used to conduct a hypothesis test on a variance are the same as
conducting a hypothesis test for a population mean or proportion. The only real difference is that the test
statistic and the corresponding p-value are based on the Chi-squared distribution instead of the t or zdistributions. For a review of the Chi-squared distribution, refer back to section 7.21. The test statistic for
conducting a hypothesis test on a population variance is obtained by using Equation 8.3.
Additionally, there are three assumptions that must hold to be assured the results given by Equation 8.3 are
valid. These assumptions for a hypothesis test on a population variance are:
1. The data was collected via a simple random sample.
2. The observations in the population are independent.
3. The population from which the sample is taken is normally distributed.
Below is an example which includes the various steps necessary for conducting a hypothesis test about a
population variance.
Gas Mileage Example: A major automobile company claims the variance of the gas mileage of their
competitor's latest sedan is greater than that of the previous established variance of 11.2. Based on a
random sample of 73 competitor automobiles (meaning 72 degrees of freedom), they found a variance of
17.4. Assuming the sample was taken from a population that is normally distributed, investigate the
company's claim using an alpha level of 0.01.
Step 1: Determine the Null and Alternative Hypotheses
Since the claim is that the variance of the competitor's gas mileage for a particular sedan is greater
than the established variance of 11.2, the claim is also the alternative hypothesis. Thus the null and
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13730[10/12/2009 3:17:19 PM]
Chapter 8.9
alternative hypotheses can be written as:
Step 2: Calculate the Test Statistic and P-Value
The value of the test statistic is:
To calculate the p-value that corresponds to a test statistic of 111.86, we activate the interactive
p-value calculator for a Chi-squared distribution found in the floating menu at the right of the
screen. Once activated, the interactive tool should appear similar to Figure 8.16.
Because we are focusing on the upper tail (the right tail), we set the degrees of freedom to 72
and adjust the slider so that the value of the test statistic in the appropriate box is equal to
111.86 (or as close as we can get). In this case, the value given by the interactive tool happens
to be 111.412, which has a corresponding p-value of about 0.002. Note, however, if we were
conducting a two-tailed test then we must adhere to the following rule. If the value of the test
statistic is greater than the degrees of freedom, then the upper test statistic value for a two-tailed
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13730[10/12/2009 3:17:19 PM]
Chapter 8.9
test is used when finding the p-value. Similarly, if the value of the test statistic is less than the
degrees of freedom, we use lower test statistic for a two-tailed test to find the p-value. Step 3: Make a Decision Regarding which Hypothesis Appears Correct
Based on a p-value of 0.002, we have enough evidence to reject the null hypothesis as 0.002<0.01
and therefore support the claim that the variance of the gas mileage of their competitor's latest sedan
is greater than that of the previously established variance of 11.2.
Step 4: Write a Statement(s) Explaining your Conclusion A final statement explaining the findings of this hypothesis could be: "With a p-value of 0.002, the
evidence given by the sample of 73 automobiles is strong enough to reject the null hypothesis and
support the claim that the variance of the gas mileage of their competitor's latest sedan is greater than
that of the previous established variance of 11.2.".
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13730[10/12/2009 3:17:19 PM]
Chapter 8.10
Hypothesis Tests and Equivalent Confidence Intervals
You may have noticed similarities between calculations involved with setting up confidence intervals and
those used for conducting hypotheses tests. In fact, the results yielded by both essentially lead us to the
same conclusion; they just go about reaching the conclusion in a different manner. Hypothesis tests use
sample data in an attempt to determine if a parameter is likely to be equal to a set value, as stated by the
null hypothesis. Confidence intervals on the other hand, use sample data to produce a range of plausible
values for a given parameter. How then, do these two different methods allow us to arrive at the same
conclusion? We will answer this question soon, but first we must insure that our confidence interval is
“equivalent” to the hypothesis test being conducted. Simply put, if the hypothesis test is a one-tailed test, then an equivalent confidence interval is defined as
having a level of confidence given by
where is the level of significance used in the hypothesis test. Likewise, if the hypothesis test is a twotailed test, then the confidence level of an equivalent confidence interval is:
For instance, in the voter example (see section 8.8), the level of significance (alpha level) was set at 0.10
and a one-tailed test was conducted (actually it was left tailed, but which tail, is irrelevant). Therefore, an
equivalent confidence interval can be found using Equation 8.4 and will possess a confidence level of (12(0.10))100%, or 80%. Similarly, recall the hypothesis test conducted on the salaries of chemistry
professors (see section 8.7). In this example, a two-tailed test was conducted and a level of significance of
0.05 was used. Because we performed a two-tailed test, the level of confidence for an equivalent
confidence interval is found using Equation 8.5 and is (1-(0.05))100% or 95%.
The reason Equation 8.4 provides us with the correct confidence level for an equivalent confidence interval
is because it isolates the same area in a tail of the confidence interval as in the rejection region of the
hypothesis test as defined by the level of significance. For example, if a one tailed hypothesis test has a
level of significance of 0.01, then 1% of the area under the curve is in the rejection region (note it does not
matter if the rejection region is in the upper or lower tail). In order for a confidence interval to also have
1% of the area in one of the tails, it must be a 98% confidence interval, leaving the remaining 1% in the
other tail.1 Recall that if a two-tailed hypothesis test is conducted, the area in each of the rejection regions
is half of the level of significance. That is, if the level of significance is 0.01, then the rejection regions
(tails) will both contain 0.05% of the area under the curve. Furthermore, if 0.05% of the area is in each
tail, 99% of the area under the curve will fall in between the two rejection regions. This is tantamount to
having a 99% confidence interval, where the same result is obtained by using Equation 8.5, i.e. (1(0.01))100% = 99%. For convenience, Table 8.1 displays, for both one and two-tailed hypothesis test, some of the more popular
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=17671[10/12/2009 3:17:33 PM]
Chapter 8.10
levels of significance along with their respective levels of confidence for an equivalent confidence
interval. Table 8.1
When a confidence interval is equivalent to a hypothesis test, it is possible to determine the outcome of the
hypothesis test by examining the range of the confidence interval. Conversely, it is also possible to get an
idea of the relative positioning of the confidence interval by knowing the results of the hypothesis test. To
illustrate this point, we return to our voter example in which the claim that fewer than 30% of the voting
population will participate in an upcoming city election was investigated. As a reminder, the null and
alternative hypotheses were: Recall from section 8.8, that in testing the null hypothesis, the p-value was 0.008, causing us to reject the
null hypothesis and support the claim that the proportion of registered voters participating in the upcoming
city election is less than 0.30. If the same sample data was used to calculate an equivalent confidence
interval, the resulting range would be 0.217 to 0.273. Notice that each value contained within the
confidence interval consists of proportions less than 0.30. Hence, the results of our hypothesis test and
confidence interval agree in that both conclude that the population proportion of voters participating in the
upcoming election is less than 0.30. On the other hand, what if we failed to reject the null hypothesis? If
this were the case, our confidence interval would also confirm this outcome. That is, the value 0.30 would
lie within the range provided by the confidence interval, and both methods would be in agreement that 0.30
is a plausible value for the population proportion of participating voters. [1] This is assuming we are dealing with two-sided confidence intervals such as the ones covered in
chapter 7. One-sided confidence intervals (not cover in this text) have either just an upper limit or a lower
limit, not both.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=17671[10/12/2009 3:17:33 PM]
Chapter 8.11
Summary
Hypothesis tests are very useful for investigating inferences about a variety of population parameters. It is
important to understand how they work and what it means when we reject or fail to reject the null
hypothesis. Although sample data may make it seem like a null hypothesis is false (or true), the outcome of
a hypothesis test is determined by just how strong the evidence is against (or for) the null hypothesis. The
strength of the evidence is represented by the p-value, which can then be compared to the desired level of
significance. The smaller the p-value, the more significant the evidence against the null hypothesis. If a pvalue is larger than the level of significance, then the difference between the estimate provided by the
sample and the parameter indicated by the null hypothesis is probably the result of sampling error.
However, if a p-value is less than or equal to the level of significance, then the "extreme" difference
between the estimate and the parameter indicated by the null hypothesis is probably due to more than just
sampling error, and consequently, we conclude that there is a statistically significant difference between the
two values, which implies that the hypothesized value of the population parameter is probably incorrect.
In the next chapter , we will examine a variety of additional hypotheses tests that involve selecting
two samples and then, by examining the results, we will make inferences about the population(s). The nice
thing is, many of the concepts and terminology we were exposed to in this chapter are easily
transferred and applied to the next.
http://webcom2.grtxle.com/introtostats/index.cfm?pageid=13729[10/12/2009 3:17:45 PM]