Download 8. Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
8. Hypothesis Testing
8.1
Tests of Population Mean
Consider an automatic machine which bottles cola into 2-liter (2000 cc) bottles. Because of
changes in working conditions, wear and tear, and variations in the process, the exact amount put into
bottles will vary. So the machine needs to be checked periodically to ensure that it puts 2000cc on the
average into each bottle. Three different cases are possible here. We shall see them one by one.
Consumer protection requires the average amount to be at least 2000 cc so that consumers get their
money’s worth. In this case the hypotheses would be formulated as
H0:   2000
Ha:  < 2000
To test the null hypothesis, let us say a random sample of 49 bottles is taken, and the bottles are tested for
the exact amounts of their contents. From sampling theory, we know that since the sample is large the
sample mean, X , will be normally distributed with mean  and variance 2/n. If the computed sample
mean, x , is greater than 2000 there is nothing to complain about and the null hypothesis (H0) would not be
rejected. But if x is less than 2000 there is reason to doubt H0, and the lesser it gets the more doubtful we
become. At some point we might consider x too low to accept H0, and reject it. Because the rejection of
occurs when x is low, which corresponds to the left tail of the (normal) distribution of X , we call this case
a 1-tailed test with rejection on the left.
In rejecting the null hypothesis, we might be committing a Type I error. The probability of this
occurring, though, is not as clear cut as in the case of Acceptance Sampling (with n = 1) that we saw in the
previous section. What should be the p-value here? The convention is to declare the area to the left of x
on the normal distribution for X as the p-value. The p-value of an evidence should therefore be
interpreted as the probability of the evidence being as unfavorable to H0 as, or more unfavorable to H0
than, it currently is. It is also customary to calculate the p-value on the standard normal distribution after
converting x into its z-score using the formula
x
z
.
/ n
Specifically, the p-value will be the area under the standard normal curve to the left of the computed z
value. Since the statistic z decides the outcome of the test, it is called the test statistic. The maximum
allowable p-value is , and its complement, (1 ), is called the confidence level. Whenever the p-value is
less than , H0 is rejected.
Let us carry out a sample calculation for the cola example. We shall assume that from past
experience  is known to be 7. Suppose x is 1998, then the test statistic z = (1998 2000)/(7/ 49 ) = 2.
The p-value is then calculated on the spreadsheet using the formula =NORMSDIST(-2) which yields
0.0228 or 2.28%. Thus H0 will be rejected if  is 5% or 10%, but not if it is 1%.
Figure 8.1.1 shows the sheet named “Tests of mu” in Hypothesis Testing.xls. When the input data
are entered into the shaded cells of this template, the test statistic z and the p-values appear in the designated
places. Note that you, as the template user, should know whether it is a 2-tailed or 1-tailed left/right test.
45
Figure 8.1.1. Testing 
[Workbook: Hypothesis Testing.xls; Sheet: Testing mu]
The data entered currently in the template corresponds to the example calculation above. The p-value for
the 1-tailed test with rejection on the left, 0.0228, appears in cell F9.
When the population is finite, a finite population correction can be applied. The population size N
should be entered in cell J5. The corrected z and p-values appear in their places (not visible in Figure
8.1.1). Note that the p-value has decreased. Thus a hypothesis that is not rejected when the correction was
not applied might be rejected when the correction is applied.
Often, the population standard deviation  is unknown. In this case, we can conduct a t-test if the
population is normal. In a t-test, we substitute the sample standard deviation s in place of . The statistic
X 
t
S/ n
follows Student’s t-distribution with (n  1) degrees of freedom. A t-distribution with degrees of freedom =
df has mean zero, variance df/(df 2) for df > 2, and is symmetric. As df increases it approaches the normal
distribution. The bottom half of the template seen in Figure 8.1.1 contains the t-test. Note that the input
area contains s instead of .
The second concern with the automatic bottling machine is the profit motive. Putting more than
2000 cc on the average into each bottle would waste cola and reduce profits. A special action, of stopping
and resetting the machine, is necessary when the average is greater than 2000. In this case the hypotheses
will be formulated as
H0:   2000
Ha:  > 2000
Here the rejection will occur when the sample mean is too much more than 2000 or when the test statistic z
or t goes too far into the right tail. This case is therefore a 1-tailed test with rejection on the right. On the
template shown in Figure 8.1.1, the p-value for this case appears in cell G9 or G18 must be read-off.
The third concern with the bottling machine is that of process control. To be in control, the
average amount bottled should be neither too much nor too little. It should be as near 2000 as possible.
The special action of stopping and resetting the machine is necessary on both tails of the test statistic. Here
the hypotheses will be formulated as
H0:  = 2000
Ha:   2000
Since the rejection of H0 occurs on both tails, this test is called a 2-tailed test. On the template, the p-value
for this case appears in cell E9 or E18.
46
8.2
Tests of Population Proportion
8.2.1
The Test
A z-test can be used to test hypotheses about population proportion p. The test statistic z is
calculated using the formula
p  p0
z
p0 (1  p0 ) / n
where p is the sample proportion and p0 is the hypothesized value for p. The test may be 1-tailed or 2tailed. The template is shown in Figure 8.2.1.
Figure 8.2.1. Testing p
[Workbook: Hypothesis Testing.xls; Sheet: Testing p]
When the input data are entered in the shaded cells, z and p-values appear in their designated places. The
bottom portion is for applying finite population correction.
The data currently entered are from the Cheese Spread case of Bowerman/O'Connell. The test is
2-tailed and no finite population correction is necessary. The p-value is 0.0000, or almost zero. Hence the
null hypothesis is rejected. If finite population correction is needed, the value for population size, N, should
be entered in cell B13 and the p-value should be read off from the range E15:G15.
8.3
Tests of Population Variance
Figure 8.3.1. Testing 2
[Workbook: Hypothesis Testing.xls; Sheet: Testing Variance]
]
When random samples are drawn from a normally distributed population, the sample statistic
47
(n 1)s2/2 follows a 2 distribution with (n 1) degrees of freedom. Thus a 2 test can be done for
testing hypotheses regarding 2. Figure 8.3.1 shows the template to be used for this test. When the input
data are entered into the shaded cells, the 2 value and the p-values appear in their designated places.
8.4
The Power of a Test
8.4.1
The Power of a Test
Figure 8.4.1 shows the template that can be used for calculating and plotting the Power of a  test.
On this template, the type of test is selected in the drop down box. The template assumes that the critical
value(s) for the sample mean is (are) decided according to the  value in cell B6. For particular values of
actual , the probability of Type II error and the Power are calculated in the range E4:I5. As seen in the
range E3:G5, when  = 2.985, the power of the test is almost 1, when  = 2.99, it is 0.9913 and when  =
2.995, it is 0.6433. As  approaches 0, the null hypothesis becomes "less and less false" and more and
more difficult to detect as being false. Hence the power decreases. At worst, the power equals the  value
used for the hypothesis test.
Figure 8.4.1. Power of a  Test
[Workbook: Hypothesis Testing.xls; Sheet: Power of a mu test]
In the same template, an accompanying plot of Power versus  (known as the power curve) shows
how power varies with . To create this plot, enter a meaningful starting value for  in cell L2. Note how
the power starts from almost 1 and approaches 0.05 (the value of  used for the hypothesis test) as 
approaches 0. [When  = 0, the null hypothesis is true and power is meaningless. But the template has
been programmed to return a value of 1 for power and zero for probability of Type II error.]
48
8.4.2
The Power of a p Test
Figure 8.4.2 shows the template. Its use is similar to that of the previous template. When the input
data are entered in the shaded cells and the type of test is selected from the drop down box, the results
appear in the range E4:I5. The power curve is also plotted in the same template (not shown in Figure).
Enter a meaningful starting value for p in cell L2 to create this curve.
Figure 8.4.2. Power of a p Test
[Workbook: Hypothesis Testing; Sheet: Power of a p Test]
8.5
Sample Size Determination
An important practical decision in hypothesis testing is sample size determination. The objective
of hypothesis testing is to limit the chances of Type I and Type II errors to specified maximums and at the
same time minimize the sample size. Sample size determination is thus an optimization problem with Type
I and Type II error constraints. These constraints will be specified as, “under such and such condition the
chance of Type I/II error should not exceed such and such %.”

Optimal Sample Size for Testing 
Figure 8.5.1. Determining n to Test 
[Workbook: Hypothesis Testing.xls; Sheet: n for testing mu]
49
The template is shown in Figure 8.5.1. This template uses the Solver for optimizing the sample
size. Instructions for using the Solver are in the template.
Some points to note while using this template are:
1. If there is only one constraint each for Type I and Type II Errors, then the suggested value for n in the
cell B21 itself would be optimal. The Solver can then be used to find the optimal C. The formula for
optimal n under these circumstances is
 | z || z |   2 
1
Minimum n   0
 
  0   1  
where z0 is the critical z implied by the Type I Error constraint when  = 0, and z1 and 1 similarly
correspond to Type II Error constraint. The symbol   means rounding up to the nearest integer.
Because there is only one constraint each for Type I and Type II Errors, the suggested value of 94 is itself
optimal for n.
2. The objective in the current setup is to minimize n. A problem may have a total cost function based on
Type I and Type II Error costs which may have to be minimized.
3. If the Solver is unable to find an optimal solution, check all the data input, especially the Type I and Type
II Error constraints. Make sure n and c have been set to the suggested values. After any correction, re-run
the Solver.
8.5.2
Optimal Sample Size for Testing p
The case of finding the optimal sample size for testing p is very similar to the case of  seen in the
previous section. Figure 8.5.2 shows the template. After entering the necessary data in the shaded cells,
one may use the Solver to find the solution.
Figure 8.5.2. Optimal n for Testing p
[Workbook: Hypothesis Testing.xls; Sheet: n for testing p]
While using this template, the following points may be noted.
1. If there is only one constraint each for Type I and Type II Errors, then the suggested value for n in the
cell B18 itself would be optimal. The Solver can then be used to find the optimal value for x-critical. The
formula for optimal n under these circumstances is
50
 | z | p (1  p ) | z | p (1  p )  2 
0
0
1
1
1
 0
 
n

Minimum
 

p 0  p1



where z0 is the critical z implied by the Type I Error constraint when p = p0, and z1 and p1 similarly
correspond to Type II Error constraint. The symbol   means rounding up to the nearest integer.
2. The objective in the current setup is to minimize n. A problem may have a total cost function based on
Type I and Type II Error costs which may have to be minimized.
3. If the Solver is unable to find an optimal solution, check all the data input, especially the Type I and Type
II Error constraints. Make sure n and x-critical have been set to the suggested values. After any correction,
re-run the Solver.
4. Because binomial distribution is approximated as normal distribution, a necessary assumption here is that
the sample is large. At times this may not be satisfied. Specially in Acceptance Sampling problems, the
sample size is very likely to be small. One should then use the binomial template discussed in the next
section.
8.6
Exercises
1. Do exercises 8-18 to 8-20 in the textbook.
2. Do exercises 8-55, 8-57 in the textbook.
3. Do exercises 8-76, 8-77 in the textbook.
4. Do exercises 8-81, 8-82 in the textbook.
8.7
Projects
1. A producer and a consumer of pins want to design a sampling plan that would be acceptable to both of
them. The producer wants a lot that contains 1% defectives to have at least 99.5% probability of
acceptance, and a lot that contains 2% defectives to have at least 98% probability of acceptance. The
consumer wants a lot that contains 8% defectives to have at most 10% probability of acceptance, a lot that
contains 10% defectives to have at most 2% probability of acceptance and a lot that contains 12%
defectives to have not more than 0.5% chance of acceptance. Find the optimal value for sample size n and
acceptance number c (x-critical).
2. A sampling plan has sample size n = 100 and acceptance number c = 3. The probability distribution of
the % defective (p) of incoming lots is given by the table below.
p
Prob
1%
0.3
2%
0.4
3%
0.2
4%
0.1
i. When a random lot is received, what is its probability of acceptance under the given sampling plan?
[Hint: Construct a joint probability table.]
ii. Given that a lot is accepted, what is the probability that it has 1% defectives? 2% defectives?
iii. What is the expected % defective in a lot accepted under this sampling plan? [This quantity is known as
the Average Outgoing Quality (AOQ).