Download Tests of Statistical Hypotheses about Proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1
testprop.nb
Tests of Statistical Hypotheses about Proportions
Johan Belinfante
2005 April 18−20
In[1]:= << Statistics‘DiscreteDistributions‘
In[2]:= << Statistics‘ContinuousDistributions‘
summary
This notebook contains computations related to the first example in section 8.1 of our text, "Probability and Statistical
Inference," by Hogg and Tanis. Using Mathematica makes it unnecessary to discuss the approximation of binomial distributions with Poisson distributions, but the student should read about that anyway because it is an important idea in its own
right.
preliminaries about binomial distributions
This section just introduces Mathematica’s notation for binomial distributions.
In[3]:= ?? BinomialDistribution
BinomialDistribution@n, pD represents the binomial distribution for n trials with probability
p. A BinomialDistribution@n, pD random variable describes the number of successes
occurring in n trials, where the probability of success in each trial is p. More¼
Attributes@BinomialDistributionD = 8Protected, ReadProtected<
This warmup exercise may be useful:
In[4]:= Table@PDF@BinomialDistribution@2, 1  2D, xD, 8x, 0, 2<D
Out[4]= 9 €€€€€ , €€€€€ , €€€€€ =
1
4
1
2
1
4
In[5]:= Table@CDF@BinomialDistribution@2, 1  2D, xD, 8x, 0, 2<D
Out[5]= 9 €€€€€ , €€€€€ , 1=
1
4
3
4
That is, if a fair coin is tossed twice, there is a probability of 3/4 that the number of heads is less than or equal to 1.
testprop.nb
2
the first example in section 8.1
The example in the book is about a manufacturing process with 6% defect rate. If a sample of size 200 is examined, the
expected number of defects is 12, but there is still roughly a 10% chance that the number of defects less than something
around 7 or 8. The Quantile command yields this result:
In[6]:= Quantile@BinomialDistribution@200, .06D, .10D
Out[6]= 8
Since the distribution is discrete it is best to turn it around and get values of the CDF for 7 and 8.
In[7]:= CDF@BinomialDistribution@200, .06D, 7D
Out[7]= 0.0828846
In[8]:= CDF@BinomialDistribution@200, .06D, 8D
Out[8]= 0.146991
It is claimed that a certain new manufacturing process would reduce this defect rate by a factor of 2. To evaluate this claim,
it is proposed to use a sample of size 200 to test whether the new process is a significant improvement, say one that reduces
the defect rate to 3%. The value p0 = .06 is called the null hypothesis, and the hoped for defect rate p1 = .03 is the
alternative hypothesis. In designing the experiment to evaluate the new process, one needs to come up with some cutoff on
the number of defects observed in the test to decide which hypothesis is more likely to be true. Let Y be a random variable
for the number of defects observed in the proposed experiment. The model made is that the experiment is a Bernoulli
experiment with n = 200 trials. If one were to adopt the test Y > 7 for accepting the null hypothesis, then there is less than
a 10% chance of making a type I error, that is, observing a value of Y less than or equal to 7 and erroneously attributing
this to an improvement when in fact there is none. Unfortunately such a test yields a fairly large type II error, that is,
observing a value of Y > 7 using the new process, and therefore erroneously accepting the null hypothesis even in the
situation that the new process does happen to have the claimed defect rate of 3%.
In[9]:= 1 - CDF@BinomialDistribution@200, .03D, 7D
Out[9]= 0.253897
Conclusion: A sample of size 200 is too small to distinguish between a defect rate of 3% and 6% if one wants the probabilities of both types I and II error to be less than 10%. (Read the book to see how to get these numbers from the tables in the
back of the book. Since the binomial distribution tables only go up to the case of 25 trials, one needs to approximate with a
Poisson distribution to get the numbers.) With a sample size of 200 and a cutoff at 7, one can be fairly sure that if the
observed number of defects in the test is less than 7, then the new process is better than the present one, but if the observed
number of defects comes out to be more than 7, one may well miss out on an opportunity to decrease the defect rate simply
because insufficient data were gathered and not because the new process is not a significant improvement.
3
testprop.nb
designing a better experiment
To cope with this, one could increase the sample size for the proposed experiment. The question is, how much should the
sample size be increased? Of course this depends on how confident one wants to be about the results, and how much of an
improvement in the defect rate one wants to be able to detect, and how much it would cost to gather more data versus the
cost of making a wrong decision based on the outcome of the test. It should be noted that the technique of approximating a
binomial distribution with a Poisson process and using the table on page 654 of the book will not work in dealing with
sample sizes over 267 because the parameter lambda = n p will exceed 16, which is the largest value in the range of the
tables for the Poisson approimation to the binomial distribution. Instead one can approximate the binomial distribution with
a normal distribution. One can compare the binomial with the normal distribution:
In[10]:= ?? NormalDistribution
NormalDistribution@mu, sigmaD represents the normal H
GaussianL distribution with mean mu and standard deviation sigma. More¼
Attributes@NormalDistributionD = 8Protected, ReadProtected<
One of the graphs need to be shifted by a half unit to get close agreement − this is due to approximating a discrete distribution by a continuous one.
In[11]:= ?? PDF
PDF@distribution, xD gives the probability density
function of the specified statistical distribution evaluated at x.
In[12]:= ? ListPlot
ListPlot@8y1, y2, ... <D plots a list of values. The x coordinates
for each point are taken to be 1, 2, ... . ListPlot@88x1, y1<, 8x2,
y2<, ... <D plots a list of values with specified x and y coordinates. More¼
In[13]:= Show@Plot@PDF@NormalDistribution@200 * .06, Sqrt@200 * .06 * .94DD, y - .5D, 8y, 0., 20.<D,
ListPlot@Table@ PDF@BinomialDistribution@200, 0.06D, yD, 8y, 0, 20<D,
PlotJoined ® TrueDD
0.12
0.1
0.08
0.06
0.04
0.02
5
10
15
20
4
testprop.nb
0.12
0.1
0.08
0.06
0.04
0.02
5
10
15
20
5
10
15
20
0.12
0.1
0.08
0.06
0.04
0.02
Out[13]= … Graphics …
Note that on the standard normal curve the tails Z < −1.28 and Z > 1.28 both have probability 10%.
In[14]:= Quantile@NormalDistribution@0, 1D, .10D
Out[14]= -1.28155
In[15]:= Quantile@NormalDistribution@0, 1D, .90D
Out[15]= 1.28155
Since the values .03 and .06 are small, one can use the approximation Sqrt[n p] instead of the better expression Sqrt[n p q]
for the standard deviation. Doing so, the condition on the cutoff c is approximately
In[16]:= n p1 + 1.28155 Sqrt@n p1D < c < n p0 - 1.28155 Sqrt@n p0D
Out[16]= n p1 + 1.28155
!!!!!!!!!!!
!!!!!!!!!!!
n p1 < c < n p0 - 1.28155 n p0
This can be solved for the sample size n, yielding
In[17]:= n > H 1.28155  HSqrt@p0D - Sqrt@p1DLL ^ 2
. 8p0 ® .06, p1 ® .03<
Out[17]= n > 319.081
The cutoff is
In[18]:= c Š n p0 - 1.28155 Sqrt@n p0D . 8n ® 319, p0 ® .06<
Out[18]= c Š 13.5333
5
testprop.nb
In[19]:= Show@ListPlot@Table@ PDF@BinomialDistribution@319, 0.06D, yD, 8y, 0, 30<D,
PlotJoined ® TrueD, ListPlot@
Table@ PDF@BinomialDistribution@319, 0.03D, yD, 8y, 0, 30<D, PlotJoined ® TrueDD
0.08
0.06
0.04
0.02
5
10
15
20
25
30
5
10
15
20
25
30
5
10
15
20
25
30
0.12
0.1
0.08
0.06
0.04
0.02
0.12
0.1
0.08
0.06
0.04
0.02
Out[19]= … Graphics …
The probability for a type I error is:
In[20]:= CDF@BinomialDistribution@319, 0.06D, 13D
Out[20]= 0.0864675
The probability for a type II error is:
In[21]:= 1 - CDF@BinomialDistribution@319, 0.03D, 13D
Out[21]= 0.102929
One could use this same test with sample size 319 and cutoff 13 to distinguish between the null hypothesis p0 = 0.06 and
any alternative hypothesis that p1 is less than .03. But if one wanted to be able to reliably distinguish between the null
hypothesis p0 = 0.06 and an alternative hypothesis p1 = .04, say, then more data would be needed assuming that one still
wants the probabilities of type I and II errors to both be less than 10%. Of course one would also need to collect more data
if one wanted to be sure that the probabilities of type I and II errors to be less than 5%, for instance. Whether to use 5% or
10% depends on the costs associated with making the wrong decision, and how much it costs to gather more data.
testprop.nb
6
One could use this same test with sample size 319 and cutoff 13 to distinguish between the null hypothesis p0 = 0.06 and
any alternative hypothesis that p1 is less than .03. But if one wanted to be able to reliably distinguish between the null
hypothesis p0 = 0.06 and an alternative hypothesis p1 = .04, say, then more data would be needed assuming that one still
wants the probabilities of type I and II errors to both be less than 10%. Of course one would also need to collect more data
if one wanted to be sure that the probabilities of type I and II errors to be less than 5%, for instance. Whether to use 5% or
10% depends on the costs associated with making the wrong decision, and how much it costs to gather more data.