Download Chapter 9 - The WA Franke College of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1
Chapter 9
Hypothesis Testing for Population Means and Proportions
Hypothesis testing is another method of statistical inference. In constructing confidence intervals we
wanted to try to get a good estimate of the location of a population mean. We did not have any idea
about what value that mean should have, we just wanted to know what it was. Often we have an
idea of what value we want the mean to have or an idea of what value it should have. Go back to
Chapter 1 and read the tire manufacture example. In that case the manufacture has a definite idea
about what value the mean lifetime of the tires should be. The managers of the tire company don't
give these instructions to the engineers: “Go ahead and develop a new tire and then we'll see what
it's like”. No, the instructions will be like this: “Develop a tire with an average lifetime of 60,000
miles”. They know what mean they want. The question to be examined is whether the tire
developed is the one they want. The company will want to market the tire if they feel it is a good
one (i.e.   60000 ) and redesign it if they feel it is a bad one(i.e.   60000 ). Note that the
decision to be made is whether the tire is a good one or a bad one.
9.1 A Slightly More Precise Statement
We have to select between believing one of two things: (a) the tire is good; or (b) the tire is bad.
One of these beliefs will be called the null hypothesis and designated as H 0 . The other will be called
the alternative hypothesis and designated as H A . We could represent the problem as
H 0 : The tire is good
H A : The tire is bad
We could also represent it as
H 0 : The tire is bad
H A : The tire is good
Both these representations can be put into a mathematical form. Recall that the value of
 that we are interested in is 0 =60000 , where  0 indicates the hypothesized value of the mean. So
we could write the problem as
H 0 :   0  60000 (The tire is good)
H A :   0 (The tire is bad)
or as
2
H 0 :   0 (The tire is bad)
H A :   0  60000 (The tire is good)
Does it matter which of the two representations we use? Yes it does. There is nothing that causes
beginning students more grief than determining which should the null and which should be the
alternative hypothesis. The author suspects that much of this grief has been caused by the arbitrary
lifting of the hypothesis testing method baldly from the scientific community. Hypothesis testing
there involves verification of scientific theories. We are not doing anything so elegant, we just want
to know what to do about the tire.
Unfortunately a lot of the nomenclature used in applying hypothesis testing came from the scientific
community as well as the technique. For this course we will adopt a simple rule: Choose as the null
hypothesis the decision with a specific value. So for the tire problem we would write the decisions as
H 0 :   0  60000 (The tire is good)
H A :   0 (The tire is bad)
or, more generally, as
H 0 :   0  some value (decision 1)
H A :   0 (decision 2) , or
H 0 :   0 (decision 2)
H A :   0 (decision 2)
Note that the “=” is always in the null and that decision 2 can take one of three different forms.
9.2 Type I and Type II Errors
In the hypothesis testing decision process two different errors are possible. The first kind of error is
to reject the null hypothesis when it is true. In this case the tire is a good one ( H 0 is true), but for
some reason we decide that it is false. In this happens we must redesign the tire, lose the sales we
could have had from marketing a good tire, and repeat the costly test procedure. The second kind of
mistake possible is to believe the null hypothesis to be true when it actually is false. In this case we
would decide to market a bad tire ( H 0 is false). Later on the company is cursed by consumer
activists, the president gets to testify before House and Senate committees, class action law suits are
files, and customers no longer believe the company's claims about the quality of tires. Both of these
errors have formal statements:
Type I error = rejecting H 0 if it is true
Type II error = accepting H 0 if it is false
It will be convenient for what follows to write these as
3
Type I error = rejecting H 0 | H 0 true
Type I error = accepting H 0 | H 0 false
where the symbol | can be read as “if” (it is a conditional probability).
9.3 Decision Rules
A decision rule is some criteria which is used to reject or accept a hypothesis. Such a rule might
look like this: Take a sample of n=100 tires, calculate the sample mean, X ,
and reject H 0 if X  57500 . Notice that the decision rule says to reject the null hypothesis if
X  57500 , it does not say to reject the null only if it is false. It says reject the null whether it is true
or false. You cannot make a decision rule that says reject only if the null is false. To do that you
would have to know if it was true, and if you knew it was true it would be pretty silly to test to see if
it is true. Again the decision rule says reject the null if X  57500 and to accept it if X  57500 . If
the null is true and we just happen to get X  57500 we will reject the null hypothesis and commit a
Type I error.
We have seen what a decision rule looks like, next we want to set some criteria for determining if a
decision rule is a good one or a bad one. The above decision rule might be a good one, but on the
other hand it could be perfectly rotten. To evaluate the decision rule we need to introduce a couple of
new concepts. Let
  P  rejecting H 0 | H 0 true 
and
  P  accepting H 0 | H 0 false 
For the tire problem this becomes
  P  X  57500 |   0  60000 
and
  P  X  57500 |   0 
where  is the lower case Greek letter “alpha” and  is the lower case Greek letter “beta”.
We can give the following interpretation for  :  is the probability that the decision rule will lead
to the rejection of H 0 if it is true. We can calculate the value of  using material already learned.
Suppose n=100 so that there is reason to think the Central Limit Theorem holds and that we know
4
the population standard deviation for tire lifetimes  = 10000 so that  X 

 1000 . Later we
n
will consider the more realistic case where the population standard deviation is not known. Then
  P  X  57500 |   0  60000 
and we have
X 

n
 1000
so that
Z
 X     57500  60000  2.50
X
1000
  P  X  57500 |   0  60000   P  ( Z  2.50   0.0062
So the probability that this decision rule will lead to the rejection of the null hypothesis when it is
true is 0.0062
The region in which the hypothesis is rejected (here X  57500 ) is often called the rejection or
critical region. Another way of looking at the above results is the following: Suppose that the null
hypothesis is true and that we performed the test a large number of times. Then we would end up
with a result in the critical regions about 0.62% of the time for this problem if we use this decision
rule. We have not yet considered the calculation of  which will be a more complicated proposition
than that of calculating  .
Consider again these terms for the tire problem:
  P  X  57500 |   0  60000 
  P  X  57500 |   0 
Note that we can calculate  because we know what value of  to use, that is we use   60000 in
the calculation of  . But what value of  should be used in the calculation of  ? There exists a
value of  for every value of  less than 60000. In other words there are an infinite number of
values of  , one for each value of   60000 . What we must do is calculate  for several
alternative values of  , a “what if” sort of exercise. So we would calculate  for  =59000, for
 =58000, for  =57000, etc. Suppose H A :   57000 , then
5
  P  accepting H 0 | H 0 false 
  P  X  57500 |   59000 
Again we use the Z-score to find this probability
Z
 X     57500  59000  1.50
X
1000
and
  P  Z  1.5  1  0.0668  0.9332
Now suppose the alternative is H A :   57000 then
Z
and
 X     57500  57000   0.50
X
1000
  P  accepting H 0 | H 0 false 
  P  X  57500 |   57000   P  Z  0.5   1  P  Z  0.5   0.3085
In the first case we can say that the probability that the decision rule will lead us to falsely accept the
null hypothesis if  =59000 is 0.9332. But if  actually has a value of 57000 there is a probability
of 0.3085 that the decision rule will lead us to accept the null hypothesis making a Type II error.
Two things should be apparent from the above discussion. First of all the calculation of  can get
complicated. The second thing is that  and  do not sum to one. So far we have
Decision rule: Reject H 0 : if X  57500
  0.0062
H A :   57000
  0.3085
Nothing says we can’t change the decision rule to see if we might not get results that we like better.
After all, the value of  is much larger than that for  . Let’s try a different decision rule: reject
H 0 : if X  58000.
  P( X  58000 |   0  60000)
Z
 X      58000  60000   2000  2
0
X
1000
  P( Z  2)  0.0228.
1000
6
  P( X  58000 |   57000)
Z
 X      58000  57000   1000  1
0
X
1000
1000
  P( Z  1)  0.1587.
Putting the information for both decision rules in one table
Decision rule: Reject H 0 : if X  57500
  0.0062
Decision rule: Reject H 0 : if X  58000
  0.0228
H A :   57000
  0.3085
H A :   57000
  0.1587
The new decision rule reduces the size of  at the cost of increasing  . Is there anyway that we
can make both of them smaller at the same time? Yes, increase the sample size.
9.4 Selecting a Good Decision Rule
This section will discuss some of the considerations which should be used to determine a good
critical region. As seen in the discussion and problems above changing the critical region changes
both  and  . These two probabilities do not sum to one, but generally reducing  will tend to
increase  and vice versa. In other words decreasing the probability of a Type I error tends to
increase the probability of a Type II error.
The costs of these errors should enter into the consideration of the decision rule. If the cost of a
Type I error is large relative to that of a Type II error this suggests that  should be small relative to
 . On the other hand, if the situation is reversed this indicates that  should be large relative to  .
After all, if the cost of a Type I error is zero, why would you care if you made one? So determining
the costs of these errors is a very important part of the decision making process. How this is done is
beyond the scope of this course and will not be pursued further, except in the most general terms. It
should always be kept in mind however. Determining these costs is management's responsibility. If
management does not know these costs then it should not engage in this sort of testing.
The level of significance is a term often given to  . So far we have determined the decision rule and
then calculated the level of significance. In practice this process is reversed. The level of
significance is first determined (after careful consideration of the costs of Type I and Type II errors,
of course). The level of significance is then used to determine the critical region.
7
To give you a feel for these considerations we will consider two examples without using numbers.
What we want is to gain a feeling for the sort of issues involved in the selection of  and  .
Suppose we know that a drug is effective against a certain deadly disease, say AIDS. We don't
know if the drug is safe, and this is what we want to test. The hypotheses might be
H 0 : (The drug is safe)
H A : (The drug is not safe)
What sort of considerations should go into selecting the level of significance? What are the costs of a
type I error? If we make a type I error here we deny the public a safe and effective drug against a
vicious disease. The company that has manufactured the drug will lose substantial profits. This
would argue for a very low value of  , we want a very small probability of rejecting the null
hypothesis if it is true.
The cost of a type II error is also very high. If the drug turns out to be unsafe a number of people
may be harmed. The company will likely face expensive law suits. This would argue for a very
small value of  . This may require extensive sampling to get both  and  small. To make the
picture even more murky, people may be denied a safe and effective drug during a lengthy testing
process. There are often no clear and easy answers in this business.
Suppose, however, we know the drug to be safe. What we don't know is whether it is effective.
H 0 : (The drug is effective)
H A : (The drug is not effective)
The same sort of arguments will hold for a Type I here as in the preceding case. Again it seems that
 should be made small. But what about the Type II error and  ? How severe are the costs of a
type II error? Would patients really be harmed if we believe the drug is effective and it is not?
Perhaps not, if there is no alternative treatment. If there is this would change the cost calculation.
But in any case, should we insist on as small a value for  as we did in the previous case? My
guess is that the costs here are not as high, and we might use a larger value for  . I am not a
medical expert however.
The choice of  is a management responsibility. Look again at some of the inputs needed in using
hypothesis testing for decision making in the drug examples. Inputs will be needed from the legal
department, the medical department, the marketing department, and the statistical department. Other
opinions might be needed as well. It is managements responsibility to gather these inputs and
organize the decision making process.
To understand what comes next we will recapitulate some of the material presented in earlier
chapters. If the null hypothesis is true, the sampling distribution of x 's will be normally distributed
with mean equal to  x and standard deviation  x . Also recall Z-scores, Z   x    /  x .
8
Now let's see what this means for the tire problem. If the null hypothesis is true then  =60000 and
if we took a sample of n=100 tires we would expect to get a sample mean near 60000. Let's suppose
that   10000 and that when we sample we get X = 59995 then this gives
Z
 X     59995  60000  0.005
0
X
1000
But suppose the null hypothesis is false and  = 56000. Again we expect to get an X near  ,
suppose we get X =56040. Now the Z-score is
Z
 X     56040  60000  3.96
0
X
1000
Note that in both cases we subtract  0 from X rather than subtracting  . This is because we know
what  0 is, we know because we have stated it as the hypothesized value. We can't subtract 
because we don't know what  is.
A very important observation can be made from this example. Small values of Z can be taken as
evidence that the null hypothesis is true. Large values of Z are evidence that it is false. This is
summarized below.
If H 0 true
If H 0 false
  0
  0
X  0
X   NOT 0
Z
X  
0
X
Z 0
0  Z 
 X     0
0
X
0  Z or Z  0
9.5 A Hypothesis Testing Framework
In this section we will state a formal framework for testing hypothesis about the value of  . Again
recall that small values of Z tend to favor the null while large values tend to favor the alternative.
1.
State the null and alternative hypotheses
9
H 0 :   0  some value
H A :   0 , or
H A :   0 , or
H A :   0
2. Choose a level of significance  .
3. Using this value of  determine the rejection region.
4. Calculate the test statistic
 X  x 
Z
x
5. If Z is in the rejection region, reject the null hypothesis; if Z is not in the rejection
region, do not reject the null hypothesis.
alpha/2
alpha/2
reject
reject
Figure 9.1 two tailed hypothesis test
The location of the rejection region depends on the form of X   0 . Suppose that
H A :   0
If X   0 (which makes Z << 0) we would tend to doubt that the null hypothesis is true. Likewise
if X   0 (which makes Z>>0) we would also tend to doubt the truth of the null hypothesis. This
means that large positive or negative values would make us tend to reject the null. So the rejection
region is split into two parts, one to the left of Z=0 and the other to the right of Z=0. In other words
we split  into two parts and put one part in each tail. Because the rejection region is located in
each tail such tests are called two tailed tests. The rejection region for a two tailed test is shown in
Figure 9.1
10
alpha
reject
Figure 9.2 The rejection region for H A :   0 (a one tailed test)
Suppose that the alternative is H A :   0 . In this case values of X   0 will make us tend to
favor the alternative over the null. But values of X   0 lead to values of Z  0 . So in this case
it is large negative values of Z that will make us consider rejecting the null in favor of the alternative.
In this case all of the  is located in one of the tails and the test is called a one tailed test. The
rejection region for this one tailed test is shown in Figure 9.2.
alpha/2
reject
Figure 9.3 The rejection region for H A :   0 (a one tailed test)
Suppose that the alternative is H A :   0 . In this case values of X   0 will make us tend to favor
the alternative over the null. But values of X   0 lead to values of Z  0 . So in this case it is
large positive values of Z that will make us consider rejecting the null in favor of the alternative.
This is also a one tailed test. The rejection region is shown in Figure9.3.
11
9.6 Hints on Solving Problems
The first suggestion is that you follow the testing framework presented above. In particular you
must state the null and alternative hypothesis. If you don't do that no one else will know or care
about what you are doing. If you don't state the hypothesis you aren't doing anything. Statement of
the hypothesis is absolutely vital. Statement of the hypotheses is what gives very many students a
whole lot of trouble. Here I will give you some hints.
1.
First of all follow the advice of stating that the null hypothesis
H 0 :   0  some value
will look like this
2. State what you would do if you believed the null hypothesis to be true.
3. State what you would do if you believed the null hypothesis to be false.
4. Use step three to decide whether the alternative should be H A  0 , H A  0 , or
H A  0 .
Following this hint you will set up the hypothesis
as follows:
H0
Step 1
Step 2
HA
Step 4
Step 3
This outline may seem a little backwards in that you might expect Step 3 to precede Step 4. Actually
Step 3 is decided before Step 4, but we will write Step 4 before Step 3.
Example 9.1 Let's consider the tire problem again. We will only look at how the null and
alternative hypotheses should be stated. We will look at numerical values later. Suppose the tire
manufacturer has stated that the company's tires will have an average lifetime of at least 60,000
miles. In Step 1 of the hint we said that a “=”' sign would be needed. The value we see is 60,000
miles. So we would make Step 1 be the following: H 0 :  =0 . In Step 2 we have to state what we
would do if we believe the null to be true. We could write this as “the claim is true”' or as “the tire is
good”. What you need is an expression in English of what you mean by the null being true.
Here comes the tricky but useful part. In Step 3 we state in English what we mean by the null being
false. For this example this must mean “the claim is false” or “the tire is bad”.
In Step 4 we translate the English into mathematics. If “the tire is bad” this must mean   60000 .
Alright, let's summarize this.
12
Step 1: H 0 :   0  60000
Step 2: The tire is good
Step 3: The tire is bad
Step 4: H A :  <0
We will write this as:
H 0 :   0  60000 (the tire is good)
H A :  <0
(the tire is bad)
Now let's try a slight change to the manufacturer's claim. Suppose the statement is “Our tire have an
average life time that exceeds 60,000 miles.” Again we need a specific value for the null, and that is
H 0 :  =0  60000. That takes care of Step 1. But what does that mean for Step 2. Is the
manufactures claim true or not? The claim is that the average lifetime is greater than 60,000 miles.
So if the null is true the claim must be FALSE. So we could write for Step 2 “The claim is false”.
Step 3 then means “The claim is true”. This mean the alternative (Step 4) should be written as
H A :   0 . So we would set up the hypothesis as:
H 0 :   0  60000 (the claim is false)
H A :   0
(the claim is true)
Example 9.2 A bolt manufacture claims that the pitch of the threads of a certain bolt is 1mm. If the
pitch is too large or too small the threads of the bolt will strip causing damage to the part using the
bolt. A sample of n=100 bolts are measured and the mean pitch of the sample bolts is 0.98mm.
Suppose it is known that the population standard deviation is   0.01 mm. Test the manufacturers
claim with a level of significance of 5%.
The first thing to do is find the hypotheses. Step 1 is   0  1 mm. Step 2 is found as follows: if
we think  really is equal to 1mm then the claim is true. Step 3 must be the claim is false. In Step 4
we must determine whether we are dealing with a one or two tailed test. If the claim had been the
pitch was greater than some value or less than some value we would have a one tailed test. But the
claim is that the pitch is equal to some value. If the claim is false the pitch is not equal to this value,
then the alternative hypothesis must be   0 . Because we would reject the null for very large
positive or negative values of Z (see the table) this is a two tailed test and we split  into two parts,
one part in each tail. The rejection region for this hypothesis test is shown in Figure 9.4.The
problem can be stated more formally in what follows.
13
Figure 9.4 The rejection region for Example 9.3
alpha/2=0.025
alpha/2=0.025
\begin{tabular}{ll}
reject
-1.96
1.96
reject
reject
Figure 9.4 The rejection region for Example 92
H 0 :  =0  1 (the claim is true)
H A :   0
(the claim is false)
 =0.05
reject H 0 : if Z<-1.96 or Z  1.96
Data: X  0.98,   0.01,  X   / n  0.001
Test statistic: Z 
 X      0.98  1  20
0
X
0.001
Conclusion: Because Z  20  1.96 is in the rejection region we will reject the null hypothesis.
This data gives sufficient evidence to deny the manufacturers claim.
Example 9.4 Work the same problem supposing that the standard deviation is 0.15 mm.
H 0 :  =0  1 (the claim is true)
H A :   0
(the claim is false)
 =0.05
reject H 0 : if Z<-1.96 or Z  1.96
Data: X  0.98,   0.15,  X   / n  0.015
Test statistic: Z 
 X      0.98  1  1.33
0
X
0.015
14
Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data
does not give sufficient evidence to deny the manufacturers claim.
Example 9.4 A light bulb manufacturer claims that the average life time of its bulb exceeds 750
hours. A sample of 36 bulbs is taken which gives a sample mean of 755 hours. The population
standard deviation is known to be 20 hours. Test the m manufacturers claim with a level of
significance of 5%.
In Step 1 set   0  750 . In Step 2 we decide whether the claim is true if   750 . The claim is
that the average exceeds 750 hours, so the null hypothesis must be that the claim is false. If the
claim had been that the bulbs last at least 750 hours the null would be that the claim is true. Step 3 the claim is true. Step 4 - if the claim is true   750 .
The rejection region is shown in Figure 9.5
alpha=0.05
1.65
reject
reject
Figure
9.5 The rejection region for Example 9.4
H 0 :  =0  750
H A :   0
(the claim is false)
(the claim is true)
 =0.05
reject H 0 : Z  1.65
Data: X  755,   20,  X   / n  3.33
Test statistic: Z 
 X      755  750   1.50
0
X
3.33
Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data
does not give sufficient evidence to support the manufacturers claim.
15
Example 9.5 Use the above example, but let the manufacturer's claim be that the bulbs average
lifetime is at least 750 hours.
alpha=0.05
reject
-1.65
reject
Figure
9.6 The rejection region for Example 9.5
H 0 :  =0  750
H A :   0
(the claim is true)
(the claim is false)
 =0.05
reject H 0 : Z  1.65
Data: X  755, s  20,  X   / n  3.33
Test statistic: Z 
 X      755  750   1.50
0
X
3.33
Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data
does not give sufficient evidence to deny the manufacturers claim. Note that in these two examples
we reach different conclusions about the manufacturers claims - but the claims were different.
9.8 Hypothesis tests for Small Samples
16
The same considerations that applied to CI for small n also hold for hypothesis test. If  is not
known it must be replaced with s and the t distribution must be used instead of the Z distribution. In
this case the test statistic will be
t
X  
0
sX
Example 9.6 The Flagstaff Chamber of Commerce claims that the average wage for contract
construction workers in Coconino county is $11.25 per hour. To test this claim the wages of 15
construction workers were sampled. The sample results are that the sample mean is $12.75 and
standard deviation is $2.45. Do these data present sufficient evidence to reject the Chamber's claim.
Test using a level of significance of 1%.
H 0 :  =0  11.25
(the claim is true)
H A :   0
(the claim is false)
 =0.01
reject H 0 : t<-2.977 or t  2.977, df=14
Data: X  12.75, s  20, s X  s / n  0.633
Test statistic: t 
 X     12.75  11.25  2.37
0
sX
0.633
Conclusion: Because t is not in the rejection region we cannot reject the null hypothesis. This data
does not give sufficient evidence to deny the Chambers claim.
17
Example 9. 7 The cost of producing a central processor chip for a computer chip manufacturer has
been $98.00 per chip. A new process has been developed which promises to reduce chip costs. To
test this a sample of 20 chips is made and the average cost per chip in the sample is $94.00 while the
sample standard deviation is $5.00. Do these data provide sufficient evidence to conclude the new
process has reduced chip costs? Test using a level of significance of 5%.
H 0 :  =0  98
(no change in chip cost)
H A :   0
(the chip cost has been reduced)
 =0.05
reject H 0 : if t  1.729
Data: X  94, s  5, s X  s / n  1.12, df  19
Test statistic: t 
 X      94  98  3.57
0
sX
1.12
Conclusion: Because t is in the rejection region we can reject the null hypothesis. This data gives
sufficient evidence to conclude the new process has reduced chip costs.
9.9 p--values
An easier way of computing hypothesis testing results is to use p—values. A p—value is the area in
the tail to the right or left of the actual value of the test statistic t in the case of a one tailed tests. In
the case of a two tailed test the p—value will be the area in both tails.
alpha/2
alpha/2
p/2
p/2
-t reject
reject t
Figure 9.7 The p—value for a two tailed test. The test statistic will give a value of t or -t but the
area is put in both tails. Here we could not reject the null hypothesis
18
A plot of a p—value for a two tailed test is shown in Figure 9.7. Here is an example where p   .
But this can only be true if the value of t is in the rejection region
p/2
p/2
alpha/2
alpha/2
reject
-t
t
reject
Figure 9.8 The p—value for a two tailed test where the null hypotheses would be rejected.
Figure 9.8 shows the situation where p   . This can only occur if the value for t is not in the
rejection region.
If p   then we cannot reject the null hypothesis. If p   we can reject the null
hypothesis. Caution: some computer packages produce p—values for two tailed tests and
some produce p—values for one tailed tests. Make sure you find out which is which.
Excel gives you a choice of one or two tailed p—values. The Excel worksheet function to use is
TDIST(t value, degrees of freedom, tails).
Where tails = 1 for a one tailed test and tails = 2 for a two tailed test. Also note that you must use a
positive value for the t—value (if it is negative use the absolute value).
19
Example 9.6 (again, but this time use p—value)
H 0 :  =0  11.25
(the claim is true)
H A :   0
(the claim is false)
 =0.01
Data: X  12.75, s  20, s X  s / n  0.633
Test statistic: t 
 X     12.75  11.25  2.37
0
sX
0.633
Note this is a two tailed test so tails = 2 in Excel
p value =
0.032692 =TDIST(2.37,14,2)
So p  0.033692    0.01 and we cannot reject the null hypotheses
Example 9.7 (again, but this time use p—values)
H 0 :  =0  98
(no change in chip cost)
H A :   0
(the chip cost has been reduced)
 =0.05
Data: X  94, s  5, s X  s / n  1.12, df  19
Test statistic: t 
 X      94  98  3.57
0
sX
1.12
Note this is a one tailed test so tails = 1 and
p value = 0.001022 =TDIST(3.57,19,1)
Here p  0.0010    0.05 so we can reject the null hypothesis.
Example 9.8 A new process has been developed to produce synthetic diamonds. The process is
quite costly and will only be profitable if it can produce diamonds with an average weight of at least
1 caret. Because of the cost, a sample of only n = 5 diamonds are produced with weights of
X={0.99,1.01,0.97,0.98,0.99} carets. Use Excel and p—values to determine if the process is
profitable. Use a 1% level of significance.
20
We have
H 0 :  =0  1 (the process is profitable)
H A :   0
(the process is not profitable)
 =0.01
x bar
S
S bar
0.988 =AVERAGE(0.99,1.01,0.97,0.98,0.99)
0.014832397 =STDEV(0.99,1.01,0.97,0.98,0.99)
0.006633072 =0.014832/SQRT(5)
t=
-1.809136137 =(0.988-1)/0.006633
p-value 0.072345948 =TDIST(1.80914,4,1)
Because p  0.07236    0.01 , this data does not provide sufficient evidence to reject the null
hypothesis at this level of significance. Suppose, however, the data had been
X={0.99,0.98,0.97,0.98,0.99} carets. In that case
In this case p  0.00429    0.01 and the data does provide sufficient evidence to reject the null
hypothesis.
x bar
S
S xbar
0.982 =AVERAGE(0.99,0.98,0.97,0.98,0.99)
0.0083666 =STDEV(0.99,0.98,0.97,0.98,0.99)
0.003741657 =0.0083666/SQRT(5)
t=
-4.810702852 =(0.982-1)/0.003741657
p-value 0.004290467 =TDIST(4.8107,4,1)
9.10 Tests of Proportions
We can also use this framework to perform hypothesis tests on the binomial parameter p. We do
again need to make some changes in notation. The null hypothesis will have the form
H 0 : p  p0  some value . The test statistic will be
Test statistic: Z=
 pˆ  p0 
p
0
where
p 
0
p0 q0
n
21
and p0  1  q0 . Note that we are using Z as the test statistic which implies that we are assuming that
the CLT holds. Here our check to see if the CLT holds is
CLT check: np0  5 and nq0  5.
Example 9.9 Radio station KFUD claims that at least 35% of the listening audience in its coverage
area listens to KFUD's 2:00 PM rock and roll music program. To test this claim, 500 radio listeners
are asked if they listen to KFUD at the stated time. Of those sampled, 160 say they do listen to
KFUD at that time. Do these data provide sufficient evidence to reject KFUD's claim? Test using a
5% level of significance
H 0 : p  p0  0.035
H A : p  p0
(KFUD's claim is true)
(KFUD's claim is exaggerated)
 =0.05
reject H 0 : Z  -1.65
Check CLT: np0  500(0.35)  175  5
nq0  500(0.65)  325  5
Data: X=160, n=500, pˆ 
X 160

 0.32
n 500
p0 q0
(0.35)(0.65)

 0.021
n
500
 pˆ  p0    0.32  0.35  1.43
Test statistic: Z 
 p0
0.021
p 
0
Conclusion: Because Z is not in the rejection region we cannot reject the null hypothesis. This data
do not give sufficient evidence to conclude That KFUD's claim is
false. This does not mean that
KFUD's claim is true, just that the data does not provide sufficient evidence to reject the claim
using a 5% level of significance. Note the role of  here. For such a very small value   0.05 we
are being very careful to avoid rejecting KFUD's claim unless we are fairly sure the claim is untrue.
We could have chosen   0.50 and rejected the claim any time we got a Z- value Z < 0 .
However, if KFUD is telling the truth we will have a 50% probability of falsely rejecting the claim.
We could also use p—values here as well. We need to compute the area to the left of Z  1.43 and
compare that area to  .
p value =
0.076359 =NORMDIST(-1.43,0,1,TRUE)
22
And since p  0.0764    0.05 we cannot conclude that KFUD’s claim is not true using this level
of significance. Had we made   0.10 then we would have rejected the null hypothesis. (Note we
put all of  and all of the p—value in one tail).
Example 9.10 Suppose that the sample results had been X=150 listeners of KFUD's program. In
that case
pˆ 
150
 0.30.
500
and the test statistic would be
Z
 pˆ  p0    0.30  0.35  2.38
p
0
0.21
at which time we would say that we have sufficient evidence ( Z  2.38  1.65 and is in the
rejection region) to reject the null hypothesis and say we do not believe KFUD's claim. We do not
say that KFUD's claim is untrue, just that we don't believe it to be true given this evidence.
The p—value is
p value =
0.008656 =NORMDIST(-2.38,0,1,TRUE)
Now p  0.0086    0.05 and we would reject the null hypothesis using the p—value as well. We
would have rejected the null in this case if  had been as small as 0.01.
Example 9.11
Suppose that a certain process used to make computer chips is not perfect (what is?). The process is
considered to be working properly if the proportion of defectives is 5% or less. If the proportion of
defectives is greater than 5% the process is shut down for repairs. A shut down is very costly
(expense of repair, loss of output, etc.) Because of this management feels that they should set  at a
very low value to avoid this cost if the process is really working properly. A sample is taken of 200
chips and 19 are found to be defective. Using level of   0.02 should the process be halted for
repairs?
23
H 0 : p  p0  0.05
(the process is working OK)
H A : p  p0
(the process should be shut down)
 =0.02
reject H 0 : Z  2.05
Check CLT: np0  200(0.05)  10  5
nq0  200(0.95)  190  5
Data: X=19, n=200, pˆ 
X
19

 0.095
n 200
p0 q0
(0.05)(0.95)

 0.0154
n
200
 pˆ  p0    0.095  0.05  2.92
Test statistic: Z 
 p0
0.0154
p 
0
Conclusion: Because Z is in the rejection region we will reject the null hypothesis. This means we
shut down the process for repairs (possibly we have made an error here, but it is not very likely).
The p—value we get from this is
and we would reject the null using the p—value as well. Note we had to use the right most tail of
p value =
0.00175 =1-NORM DIST(2.92,0,1,TRUE)
the distribution.
Example 9.12 Suppose that we had found X=15 defective items in the sample for the previous
problem. What would the conclusion been then?
H 0 : p  p0  0.05
(the process is working OK)
H A : p  p0
(the process should be shut down)
 =0.02
reject H 0 : Z  2.05
Check CLT: np0  200(0.05)  10  5
nq0  200(0.95)  190  5
Data: X=15, n=200, pˆ 
X
15

 0.075
n 200
p0 q0
(0.05)(0.95)

 0.0154
n
200
 pˆ  p0    0.075  0.05  1.62
Test statistic: Z 
 p0
0.0154
p 
0
24
The value of the test statistic is not in the rejection region, so the data does not present sufficient
evidence to reject the null with this set of data. The p—value in this case is
p value
0.052616 =1-NORMDIST(1.62,0,1,TRUE)
and p  0.0526    0.02 so this data does not give sufficient evidence to reject the null hypothesis
at this level of significance.
You must be careful using the p—values here in determining if you have a one or two tailed test.
The t distribution gave you a facility for selecting a one or two tailed test. The Z distribution does
not. You have to do it on your own..
Example 9.13 A gambling house in Las Vegas is concerned that a certain pair of dice is not honest.
It the dice are honest, the number 7 should occur on 1/6 of the rolls. To test this they roll the dice
500 times and count the number of times a 7 shows. This occurred 100 times out of the 500 rolls.
Test using a 1% level of significance.
H 0 : p  p0  0.167
(the dice are honest)
H A : p  p0
(something funny is going on)
 =0.01
reject H 0 : if Z  2.58 or Z  2.58
Check CLT: np0  500(0.167)  83.5  5
nq0  200(0.833)  416.5  5
Data: X=100, n=500, pˆ 
X 100

 0.2
n 500
p0 q0
(0.167)(0.833)

 0.0167
n
500
 pˆ  p0    0.2  0.167   1.97
Test statistic: Z 
 p0
0.0167
p 
0
Using the Z table we would conclude that there is not sufficient evidence to shut the process down.
Using the p—value we must recall that this is a two tailed test. So we must calculate the area to the
left of –1.97 and the area to the right of +1.97
p/2=
p/2=
0.024419 =NORMDIST(-1.97,0,1,TRUE)
0.024419 =1-NORMDIST(1.97,0,1,TRUE)
p=p/2+p/2
0.048838 =0.024419+0.024419
or
p=2*(p/2)
0.048838 =2*0.024419
25
Note that you can either add the area in the two tails together or save a little time by multiplying the
area in one of the tails by two. Suppose that the number 7 had shown up X  120 so that
 pˆ  p0    0.24  0.167   4.37
X 120
pˆ  
 0.24 so that Z 
n 500
 p0
0.0167
The Z value would lead to the rejection of the null hypothesis in this case. Let’s see what the p—
value suggests:
p=
1.24344E-05 =2*NORMDIST(-4.37,0,1,TRUE)
and p  1.23 105    0.01 and would also lead to the rejection of the null hypothesis.
26
Problems
9.1 Calculate  for the tire problem with H 0 :   0  60000 where  =10000 if the decision rule
is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less
than 58500 miles.
9.2 Calculate  for the tire problem if the decision rule is to take a sample of 100 tires and reject
the null hypothesis if the resulting sample mean is less than 59000 miles.
9.3 Calculate  for the tire problem if the decision rule is to take a sample of 1000
reject the null hypothesis if the resulting sample mean is less than 59000 miles.
tires and
9.4 Calculate  for problem 9.1 for H A :   58000 miles.
9.5 Calculate  for problem 9.2 if H A :   58000 miles. Note what happens to the value of
 and  when the decision rules changes.
9.6 A cattle herd has been producing 2 gallons of milk per day per cow. The manager of the dairy
operation believes that the standard deviation  of the milk production is 0.8 gallon (per cow per
day). The manager decides to try a new kind of feed in the hopes of boosting milk production. He
uses the new feed and collects 36 days worth of data which gives a sample mean of X  2.2 gallons
of milk per day per cow. Does this data give sufficient evidence to conclude that the new feed has
increased milk production? Test with a 5% level of significance.
9.7 A particular airline has observed an average of 198 passengers on a certain flight. The airline
believes that it can increase the number of passengers by offering increased frequent flyer miles. On
its first 16 flights after initiation of the program, the air line observed an average of 205 passengers
per flight standard deviation of 8 passengers. Do these data present sufficient evidence to indicate
that the reduced fare program has been effective in increasing the airlines passenger usage between
these cities? Test using a 1% level of significance. Assume the distribution of passengers to be
normal.
9.8 Suppose that the local electric company announces that it wishes to raise its rates to
30 cents per kilowatt-hour. The company claims that this is the same as the average rate for all
electric companies in the country. A public interest group disputes this and conducts a survey of
rates for 200 different companies. The results of the survey is that the average rate for the surveyed
companies is 25 cents per kilowatt-hour with a standard deviation of 5 cents per kilowatt-hour. Do
these data present sufficient evidence to conclude that the company's claim is valid and that they do
not charge more than similar companies? Test using a 10% level of significance.
9.9 An automobile manufacturer claims its cars have an average gasoline mileage in excess of 30
mpg. A sample of n=100 cars gives a sample mean usage of 30.5 mpg. Assume the population
standard deviation is 2 mpg. Do these data indicate that the company's claim is true. Evaluate using
a level of significance of 10%.
27
9.10 A vendor of soap powder wants to know if the machines filling the boxes with its product are
operating properly. If too much soap is being put in the boxes the company is losing money; if too
little soap is being put in the boxes the customers are being cheated. Suppose a machine is picked
which is supposed to fill the boxes with one pound of powder. A sample of 25 boxes is taken
resulting in a sample mean of 0.981 pound and a standard deviation of 0.037 pounds. Does the data
give sufficient evident to conclude that the machine is not operating properly? Use a 1% level of
significance. Using Excel
9.11 An automobile manufacturer claims that the average pollution generated by its automobiles
does not exceed 100 ppm (parts per million) for each mile driven. To test this claim 40 automobiles
are selected, driven, and the pollution produced measures. The mean of the sample is 100.451 ppm
and the standard deviation is 1.224 ppm. Test the claim using a 10% level of significance.
9.12 The supporters of Proposition 222 on an election ballot claim that a majority of registered
voters favor the proposition. A local newspaper questions 1200 registered voters and 602 of them
claim they support the proposition. Test the claim using a 10% level of significance.
9.13. A college administrator claims that at least 60% of the students at his institution graduate
within 4 years. A statistics professor at that college examines the records of 300 students and finds
that 160 of them graduated within 4 years. Test the administrators claim using a 20% level of
significance.
28
Answers
9.1 Calculate  for the tire problem with H 0 :   0  60000 where  =10000 if the decision rule
is to take a sample of 100 tires and reject the null hypothesis if the resulting sample mean is less
than 58500 miles.
9.2 Calculate  for the tire problem if the decision rule is to take a sample of 100 tires and reject
the null hypothesis if the resulting sample mean is less than 59000 miles.
9.3 Calculate  for the tire problem if the decision rule is to take a sample of 1000
reject the null hypothesis if the resulting sample mean is less than 59000 miles.
tires and
9.4 Calculate  for problem 9.1 for H A :   58000 miles.
9.5 Calculate  for problem 9.2 if H A :   58000 miles. Note what happens to
 and  when the decision rules changes.
the value of
29
9.6 A cattle herd has been producing 2 gallons of milk per day per cow. The manager of
the dairy operation believes that the standard deviation  of the milk production is
0.8 gallon (per cow per day). The manager decides to try a new kind of feed in the
hopes of boosting milk production. He uses the new feed and collects 36 days worth
of data which gives a sample mean of X  2.2 gallons of milk per day per cow. Does
this data give sufficient evidence to conclude that the new feed has increased milk
production? Test with a 5% level of significance. Also test using the p—value
H 0 :  = 0  2
(no change in milk productin)
H A :   0
(increase in milk production)
 =0.05
reject H 0 : Z  1.65
Data: X  2.2,   0.8,  X   / n  0.133
Test statistic: Z 
 X      2.2  2.0   1.504
0
X
0.133
For the p—value, note that all the area is in the right tail. So, using Excel
p value 0.06629071 = 1 - NORMDIST(1.504,0,1,TRUE)
The Z—test for the problem gives Z  1.504  1.65 , so this does not provide sufficient
evidence to reject the null hypothesis at a 5% level of significance. The p—value is
p  0.066    0.05 and this also indicates that there is not sufficient evidence to
reject the null.
30
9.7 A particular airline has observed an average of 198 passengers on a certain flight.
The airline believes that it can increase the number of passengers by offering increased
frequent flyer miles. On its first 16 flights after initiation of the program, the air line
observed an average of 205 passengers per flight standard deviation of 8 passengers.
Do these data present sufficient evidence to indicate that the reduced fare program has
been effective in increasing the airlines passenger usage between these cities? Test
using a 1% level of significance. Assume the distribution of passengers to be normal.
H 0 :  =0  198
(no change in the number of passengers)
H A :   0
(increase in number of passengers)
 =0.01
reject H 0 : if t  2.602 (15df)
Data: X  205, s  8, s X  s / n  2
Test statistic: Z 
p value
 X      205  198  3.50
0
X
2
0.001612 = TDIST(3.5,15,1)
The p—value p  0.0016    0.01 shows we have sufficient evidence to reject the null
hypotheses. Like wise the evidence t  3.50    0.01 is sufficient evidence to reject the null. We
would conclude that the feed has increased milk production.
31
9.8 Suppose that the local electric company announces that it wishes to raise its rates to
30 cents per kilowatt-hour. The company claims that this is the same as the average rate for all
electric companies in the country. A public interest group disputes this and conducts a survey of
rates for 200 different companies. The results of the survey is that the average rate for the surveyed
companies is 25 cents per kilowatt-hour with a standard deviation of 5 cents per kilowatt-hour. Do
these data present sufficient evidence to conclude that the company's claim is valid and that they do
not charge more than similar companies? Test using a 10% level of significance.
H 0 :  =0  30
(The claim is true)
H A :   0
(increase in number of passengers)
 =0.10
reject H 0 : if t  1.298 (199df)
Data: X  25, s  5, s X  s / n  0.354
Test statistic: Z 
p value
 X      25  30   14.12
0
X
0.354
4.03648E-32 = TDIST(14.12,199,1)
There is plenty of evidence to reject the null hypothesis here. The t statistic is huge. The p—value
is extrememly small, almost equal to zero.
32
Suppose that the sample mean had been 29.5 cents per kilowatt hour rather than 25. How would that
have changed the analysis?
H 0 :  = 0  30
(The claim is true)
H A :   0
(the claim is false)
 =0.10
reject H 0 : if t  1.298 (199df)
Data: X  29, s  5, s X  s / n  0.354
Test statistic: Z 
p value
 X      29.5  30   1.449
0
X
0.354
0.074456 = TDIST(1.449,199,1)
We would reject the null here with this value of t and the p—value. We would reject at a 10% level
of significance with p  0.074456    0.10 . We would not reject if the level of significance were
5% ( p  0.074456    0.05 ).
33
9.9 An automobile manufacturer claims its cars have an average gasoline mileage in
excess of 30 mpg. A sample of n=100 cars gives a sample mean usage of 30.5
mpg. Assume the population standard deviation is 2 mpg. Do these data indicate
that the company's claim is true. Evaluate using a level of significance of 10%.
H 0 :  = 0  30
(The claim is false)
H A :   0
(the claim is true)
 =0.10
reject H 0 : if t  1.296 (99df)
Data: X  30.5, s  2, s X  s / n  0.2
Test statistic: Z 
P VALUE
 X      30.5  30   2.5
0
X
0.2
0.007031 = TDIST(2.5,99,1)
We would reject the null hypothesis here using the test statistic because Z  2.5  1.296 .
We would also reject the null hypothesis using the p—value because p  0.007    0.10 .
34
9.10 A vendor of soap powder wants to know if the machines filling the boxes with its product are
operating properly. If too much soap is being put in the boxes the company is losing money; if too
little soap is being put in the boxes the customers are being cheated. Suppose a machine is picked
which is supposed to fill the boxes with one pound of powder. A sample of 25 boxes is taken
resulting in a sample mean of 0.981 pound and a standard deviation of 0.037 pounds. Does the data
give sufficient evident to conclude that the machine is not operating properly? Use a 1% level of
significance. Using Excel
x bar
s
n
s xbar
t
0.981
0.037
25
0.0074 = 0.037 / SQRT(25)
-2.56757 = (0.981 - 1) / 0.0074
p value
0.016896 = TDIST(2.56757 ,24 ,2)
H 0 :  =0  1 (The machine is operating properly)
H A :   0
(It is not operating properly)
 =0.01
Because p  0.016896    0.01 we do not have sufficient evidence to reject the null hypothesis at
a 1% level of significance. Note that we would have rejected the null at a 2% level of significance.
35
9.11 An automobile manufacturer claims that the average pollution generated by its automobiles
does not exceed 100 ppm (parts per million) for each mile driven. To test this claim 40 automobiles
are selected, driven, and the pollution produced measures. The mean of the sample is 100.451 ppm
and the standard deviation is 1.224 ppm. Test the claim using a 10% level of significance.
x bar
100.451
s
1.224
s xbar 0.193531393 = 1.224 / SQRT(40)
t
2.330376012 = (100.451 - 100) / 0.193531
p value 0.012525585 = TDIST(2.330388, 39, 1)
H 0 :  =0  100 (Claim is true )
H A :   0
(Claim is false)
 =0.10
Conclusion: Because p  0.012526    0.10 this data gives sufficient evidence to reject the null
hypothesis. We would have come to a different conclusion if the level of significance had 1%.
36
9.12 The supporters of Proposition 222 on an election ballot claim that a majority of registered
voters favor the proposition. A local newspaper questions 1200 registered voters and 602 of them
claim they support the proposition. Test the claim using a 10% level of significance.
H 0 : p =p0  0.50
H A : p  p0
(not a majority)
(a majority)
 =0.10
reject H 0 : if Z >1.28
CLT: np0 =1200(0.5)=600
nq0 =1200(0.5)=600, so CLT holds
Data: X  602, pˆ 
X
602

 0.502
n 1200
 0.5   0.5 /1000  0.0144
 pˆ  p0    0.502  0.500   0.14
Test statistic: Z 
 p  p0 q0 / n 
0
p
0
0.0144
So there is not sufficient evidence to reject the null hypothesis at the 10% level of significance.
Looking at Excel
po
0.5
qo
1.224
sigma po0.014433757 = SQRT( (0.5)*(0.5)/1200)
p hat
0.501666667
0.501666667
Z
0.115491201 = (0.501667 - 0.5) / 0.014434
p value 0.454028325 = 1 - NORMDIST(0.11549, 0, 1, TRUE)
The p—value p  0.454    0.10 and this indicates insufficient evidence to reject the null.
37
9.13. A college administrator claims that at least 60% of the students at his institution graduate
within 4 years. A statistics professor at that college examines the records of 300 students and finds
that 160 of them graduated within 4 years. Test the administrators claim using a 20% level of
significance.
H 0 : p =p0  0.60 (claim true)
H A : p  p0
(claim false)
 =0.20
reject H 0 : if Z  0.84
CLT: np0 =300(0.6)=180
nq0 =300(0.4)=120, so CLT holds
Data: X  160, pˆ 
X 160

 0.53
n 300
 0.6  0.4  / 300  0.0283
 pˆ  p0    0.53  0.6   2.5
Test statistic: Z 
 p  p0 q0 / n 
0
p
0
0.0283
Using Excel
n
po
qo
sigma
p hat
Z
p value
300
0.6
0.4
0.028284
0.533333
-2.4735
0.00669
Does CLT hold?
n*po=
180 = 300 * 0.6
n*qo=
120 = 300 * 0.4
= SQRT( 0.6 * 0.4 / 300)
= 160 / 300
-2.473498233
= NORMDIST( -2.4735 ,0 ,1, TRUE)
Conclusion : p  0.00669    0.20 , so we can reject the null hypothesis at a 20% level of
significance.