Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
In Statistics, it is important to know which distribution to use, and when. Example 1. To determine the probability for the number of successful surgeries out of 35 assuming the outcomes are independent of each other, and the probability of success on each surgery is .98, we would use the binomial distribution, with the formula P(r) = C n,r (p)r (q)n-r where r is the number of successes, n=35, p=.98 and q=.02. Below is the table for values of r ranging from 26 to 35. Notice when r is 29 or less, the probabilities are practically 0. Successes Probability 26 27 28 29 30 31 34 33 34 2.13795E-08 3.49199E-07 4.88879E-06 5.78226E-05 0.000566661 0.004478452 0.027430521 0.122190504 0.352196158 35 0.493074621 The corresponding histogram is as follows. Distribution for n=35 and p=.98 0.6 Probability 0.5 0.4 0.3 0.2 0.1 0 35 34 33 34 31 30 29 28 27 26 Number of Successes Could you use the normal distribution to find P(r 33)? Absolutely not! Look at the shape of the distribution, it is not anything like bell shaped. Moreover, the rule of thumb for using the normal distribution to approximate the binomial distribution is that we must have np > 5 and nq > 5. In this example, nq= 35(.02) = 0.70 which is not greater than 5. Notice here that n > 30; however, the rule of thumb for using the normal distribution when n 30 is not used for binomial problems or sample proportions, it applies to sample means. In contrast to this, consider the following example where np > 5 and nq> 5. Example 2. Suppose a coin is weighted so that it comes up heads 65% of the time. Find the probability for the number of heads expected if it is tossed 35 times. Here is a table for the probabilities of r successes for r = 11 to 33, the other probabilities are almost 0. 11 12 13 14 15 16 17 18 19 4.16919E-05 0.000154855 0.000508811 0.001484897 0.003860732 0.008962414 0.018602657 0.034547791 0.05740648 20 21 22 23 24 25 26 27 28 0.085289628 0.113139302 0.133710084 0.140354064 0.130328773 0.106497226 0.076069447 0.04709061 0.024986854 29 30 31 32 33 0.011201004 0.004160373 0.001246195 0.000289295 4.8842E-05 Notice that the distribution looks approximately normal. Binomial Distribution with n=35, p=.65 0.16 0.14 Probability 0.12 0.1 0.08 0.06 0.04 0.02 0 31 28 25 22 19 16 13 Number of Successes For example, to find the probability of 20 to 24 successes (inclusive) we could add P(20) + P(21) + P(22) + P(23) + P(24) = .60281 Where we used the numbers from the table above. However, when n is large, it is often difficult to compute these probabilities, so it is often desirable to use the normal approximation (if applicable) in this case it is, as np = 35(.65) = 22.75 > 5 and nq = 12.25 > 5 So we approximate by a normal distribution with = np = 22.75 and = (npq)1/2 = (350.650.35)1/2 2.83179 using the continuity correction P(20 r 24) = P(19.5 < x < 24.5), so we compute P(19.5 < x < 24.5 ) = P(-1.15 < z < .62) = .7324 - .1230 = .6094 This gives us a reasonably good approximation of the correct answer .60281 because the distribution is approximately normal. Example 3. We would not use the normal distribution to determine the probability of getting an even number on the toss of a fair die. The distribution of the outcome of a fair die is uniform and is given below. Distribution for tossing a fair die 0.25 Probability 0.2 0.15 0.1 0.05 0 6 5 4 3 2 1 Outcome of Toss Thus to find the probability that a single toss is even is P(2) + P(2) + P(2) = 1/6 + 1/6 + 1/6 = 1/2 This uniform distribution has Mean: = (1 + 2 + 3 + 4 + 5 + 6)(1/6) = 3.5 Variance: 2 = (12 + 22 + 32 + 42 + 52 + 62 )(1/6) – 3.52 = 2.91667 Standard Deviation: = (2.91667)1/2 = 1.70783 Example 4. The central limit theorem says the sampling distribution of means of 50 tosses is approximately normal (since n = 50 30) even though the original distribution is not normal. For this we would use the following formula to convert to the standard normal z x n Because the sampling distribution is approximately normal with mean and standard deviation x and x n where and were computed in Example 3. Therefore, the sampling distribution has mean 3.5 and standard deviation .2415. So the probability of having 50 tosses with an average of more than 3.7 is approximately P(z > (3.7 – 3.5)/.2415) = P(z > .83) = 1 - .7967 = .2033 Based on a simulation of 400 groups of 50 tosses, the relative frequency histogram for the sample means was as follows. Relative Frequency Histogram 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 4. 1 4. 4 9 3. 8 3. 7 3. 6 3. 5 3. 4 3. 3 3. 2 3. 1 3. 3 9 2. 8 2. Average of 50 tosses Notice that it does have an approximately normal shaped distribution (even though the original distribution did not look at all normal). Of those 400 tosses, there were 85 that had an average of more than 3.7, thus in our simulation, 21.25% of the means were greater than 3.7 which is quite close to the predicted percentage of 20.33% found above.