Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 testprop.nb Tests of Statistical Hypotheses about Proportions Johan Belinfante 2005 April 18−20 In[1]:= << Statistics‘DiscreteDistributions‘ In[2]:= << Statistics‘ContinuousDistributions‘ summary This notebook contains computations related to the first example in section 8.1 of our text, "Probability and Statistical Inference," by Hogg and Tanis. Using Mathematica makes it unnecessary to discuss the approximation of binomial distributions with Poisson distributions, but the student should read about that anyway because it is an important idea in its own right. preliminaries about binomial distributions This section just introduces Mathematica’s notation for binomial distributions. In[3]:= ?? BinomialDistribution BinomialDistribution@n, pD represents the binomial distribution for n trials with probability p. A BinomialDistribution@n, pD random variable describes the number of successes occurring in n trials, where the probability of success in each trial is p. More¼ Attributes@BinomialDistributionD = 8Protected, ReadProtected< This warmup exercise may be useful: In[4]:= Table@PDF@BinomialDistribution@2, 1 2D, xD, 8x, 0, 2<D Out[4]= 9 , , = 1 4 1 2 1 4 In[5]:= Table@CDF@BinomialDistribution@2, 1 2D, xD, 8x, 0, 2<D Out[5]= 9 , , 1= 1 4 3 4 That is, if a fair coin is tossed twice, there is a probability of 3/4 that the number of heads is less than or equal to 1. testprop.nb 2 the first example in section 8.1 The example in the book is about a manufacturing process with 6% defect rate. If a sample of size 200 is examined, the expected number of defects is 12, but there is still roughly a 10% chance that the number of defects less than something around 7 or 8. The Quantile command yields this result: In[6]:= Quantile@BinomialDistribution@200, .06D, .10D Out[6]= 8 Since the distribution is discrete it is best to turn it around and get values of the CDF for 7 and 8. In[7]:= CDF@BinomialDistribution@200, .06D, 7D Out[7]= 0.0828846 In[8]:= CDF@BinomialDistribution@200, .06D, 8D Out[8]= 0.146991 It is claimed that a certain new manufacturing process would reduce this defect rate by a factor of 2. To evaluate this claim, it is proposed to use a sample of size 200 to test whether the new process is a significant improvement, say one that reduces the defect rate to 3%. The value p0 = .06 is called the null hypothesis, and the hoped for defect rate p1 = .03 is the alternative hypothesis. In designing the experiment to evaluate the new process, one needs to come up with some cutoff on the number of defects observed in the test to decide which hypothesis is more likely to be true. Let Y be a random variable for the number of defects observed in the proposed experiment. The model made is that the experiment is a Bernoulli experiment with n = 200 trials. If one were to adopt the test Y > 7 for accepting the null hypothesis, then there is less than a 10% chance of making a type I error, that is, observing a value of Y less than or equal to 7 and erroneously attributing this to an improvement when in fact there is none. Unfortunately such a test yields a fairly large type II error, that is, observing a value of Y > 7 using the new process, and therefore erroneously accepting the null hypothesis even in the situation that the new process does happen to have the claimed defect rate of 3%. In[9]:= 1 - CDF@BinomialDistribution@200, .03D, 7D Out[9]= 0.253897 Conclusion: A sample of size 200 is too small to distinguish between a defect rate of 3% and 6% if one wants the probabilities of both types I and II error to be less than 10%. (Read the book to see how to get these numbers from the tables in the back of the book. Since the binomial distribution tables only go up to the case of 25 trials, one needs to approximate with a Poisson distribution to get the numbers.) With a sample size of 200 and a cutoff at 7, one can be fairly sure that if the observed number of defects in the test is less than 7, then the new process is better than the present one, but if the observed number of defects comes out to be more than 7, one may well miss out on an opportunity to decrease the defect rate simply because insufficient data were gathered and not because the new process is not a significant improvement. 3 testprop.nb designing a better experiment To cope with this, one could increase the sample size for the proposed experiment. The question is, how much should the sample size be increased? Of course this depends on how confident one wants to be about the results, and how much of an improvement in the defect rate one wants to be able to detect, and how much it would cost to gather more data versus the cost of making a wrong decision based on the outcome of the test. It should be noted that the technique of approximating a binomial distribution with a Poisson process and using the table on page 654 of the book will not work in dealing with sample sizes over 267 because the parameter lambda = n p will exceed 16, which is the largest value in the range of the tables for the Poisson approimation to the binomial distribution. Instead one can approximate the binomial distribution with a normal distribution. One can compare the binomial with the normal distribution: In[10]:= ?? NormalDistribution NormalDistribution@mu, sigmaD represents the normal H GaussianL distribution with mean mu and standard deviation sigma. More¼ Attributes@NormalDistributionD = 8Protected, ReadProtected< One of the graphs need to be shifted by a half unit to get close agreement − this is due to approximating a discrete distribution by a continuous one. In[11]:= ?? PDF PDF@distribution, xD gives the probability density function of the specified statistical distribution evaluated at x. In[12]:= ? ListPlot ListPlot@8y1, y2, ... <D plots a list of values. The x coordinates for each point are taken to be 1, 2, ... . ListPlot@88x1, y1<, 8x2, y2<, ... <D plots a list of values with specified x and y coordinates. More¼ In[13]:= Show@Plot@PDF@NormalDistribution@200 * .06, Sqrt@200 * .06 * .94DD, y - .5D, 8y, 0., 20.<D, ListPlot@Table@ PDF@BinomialDistribution@200, 0.06D, yD, 8y, 0, 20<D, PlotJoined ® TrueDD 0.12 0.1 0.08 0.06 0.04 0.02 5 10 15 20 4 testprop.nb 0.12 0.1 0.08 0.06 0.04 0.02 5 10 15 20 5 10 15 20 0.12 0.1 0.08 0.06 0.04 0.02 Out[13]= Graphics Note that on the standard normal curve the tails Z < −1.28 and Z > 1.28 both have probability 10%. In[14]:= Quantile@NormalDistribution@0, 1D, .10D Out[14]= -1.28155 In[15]:= Quantile@NormalDistribution@0, 1D, .90D Out[15]= 1.28155 Since the values .03 and .06 are small, one can use the approximation Sqrt[n p] instead of the better expression Sqrt[n p q] for the standard deviation. Doing so, the condition on the cutoff c is approximately In[16]:= n p1 + 1.28155 Sqrt@n p1D < c < n p0 - 1.28155 Sqrt@n p0D Out[16]= n p1 + 1.28155 !!!!!!!!!!! !!!!!!!!!!! n p1 < c < n p0 - 1.28155 n p0 This can be solved for the sample size n, yielding In[17]:= n > H 1.28155 HSqrt@p0D - Sqrt@p1DLL ^ 2 . 8p0 ® .06, p1 ® .03< Out[17]= n > 319.081 The cutoff is In[18]:= c n p0 - 1.28155 Sqrt@n p0D . 8n ® 319, p0 ® .06< Out[18]= c 13.5333 5 testprop.nb In[19]:= Show@ListPlot@Table@ PDF@BinomialDistribution@319, 0.06D, yD, 8y, 0, 30<D, PlotJoined ® TrueD, ListPlot@ Table@ PDF@BinomialDistribution@319, 0.03D, yD, 8y, 0, 30<D, PlotJoined ® TrueDD 0.08 0.06 0.04 0.02 5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 0.12 0.1 0.08 0.06 0.04 0.02 0.12 0.1 0.08 0.06 0.04 0.02 Out[19]= Graphics The probability for a type I error is: In[20]:= CDF@BinomialDistribution@319, 0.06D, 13D Out[20]= 0.0864675 The probability for a type II error is: In[21]:= 1 - CDF@BinomialDistribution@319, 0.03D, 13D Out[21]= 0.102929 One could use this same test with sample size 319 and cutoff 13 to distinguish between the null hypothesis p0 = 0.06 and any alternative hypothesis that p1 is less than .03. But if one wanted to be able to reliably distinguish between the null hypothesis p0 = 0.06 and an alternative hypothesis p1 = .04, say, then more data would be needed assuming that one still wants the probabilities of type I and II errors to both be less than 10%. Of course one would also need to collect more data if one wanted to be sure that the probabilities of type I and II errors to be less than 5%, for instance. Whether to use 5% or 10% depends on the costs associated with making the wrong decision, and how much it costs to gather more data. testprop.nb 6 One could use this same test with sample size 319 and cutoff 13 to distinguish between the null hypothesis p0 = 0.06 and any alternative hypothesis that p1 is less than .03. But if one wanted to be able to reliably distinguish between the null hypothesis p0 = 0.06 and an alternative hypothesis p1 = .04, say, then more data would be needed assuming that one still wants the probabilities of type I and II errors to both be less than 10%. Of course one would also need to collect more data if one wanted to be sure that the probabilities of type I and II errors to be less than 5%, for instance. Whether to use 5% or 10% depends on the costs associated with making the wrong decision, and how much it costs to gather more data.