Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Homework 8, Due March 25 2010 Chapter 6 (pages 299–301): 6.36, 6.38, 6.42, 6.44 1 P (x) = n! px(1 − p)n−x. x!(n − x)! This formula is known as the binomial distribution. The essential conditions for a binomial distribution are: 1. Each outcome of the experiment has exactly two possible outcomes, which we label “success” and “failure” though they could have many different interpretations (alive or dead, rain or no rain, flight arrives on time or not, etc.). Data having this form are called binary data. 2. The experiment is repeated a number of times (n) and the different experiments are independent. 3. The probability of “success” is some number p (between 0 and 1) and is the same for all the experiments. 2 Here is another example. A tennis player serves her first serve in 70% of the time. Assume each serve is independent of all the others. She serves the ball six times. What is the probability that she gets (a) All 6 serves in? (b) Exactly 4 serves in? (c) At least 4 serves in? (d) No more than 4 serves in? 3 Solution: (a) (0.7)6 = .118. 6! × (0.7)4 × (0.3)2 = 6×5×4×3×2×1 × (0.7)4 × (0.3)2 (b) 4!2! 4×3×2×1×2×1 = 15 × .2401 × .09 = .324. 6! × (0.7)5 × 0.3 (c) The probability of exactly 5 is 5!1! = 6 × 0.16807 × 0.3 = .303. So the probability of at least 4 is .324+.303+.118=.745. (d) The probability of at least 5 is .303+.118=.421 so the probability of not more than 4 is 1–.421=.579. 4 The Normal approximation Although the binomial formula is faster to calculate than trying to count all possibilities, it would still be hard to use for large samples, say n = 100. In this case, we use an alternative approach based on approximating the binomial distribution by a normal distribution. Suppose we have a binomial distribution with n trials and probability of success p, and n is some large number (say, 100). In this situation, we are not usually interested in the exact number of successes, but in the probability that the number will be more or less than some given number. 5 Example 1 (from text): In a certain week in 1997, the police at a certain location in Philadephia made 262 car stops. Of these, 207 drivers were African American. Among the whole population of Philadelphia, 42.2% are African American. Does this prove the police were guilty of “racial profiling”, i.e. deliberately stopping drivers because they were African Americans? Assuming the traffic stops are independent and the proportion of African Americans driving at this particular location is the same as the proportion in the whole city, this corresponds to the random variable X (number of African Americans among those stopped) having a binomial distribution with n = 262, p = 0.422. The question is, what is the probability that X ≥ 207 if the binomial distribution is correct? 6 Note: In this case it wouldn’t make sense to try to calculate the probability that X is exactly 207. What we’re really concerned about is that the number is so large, so a natural question is “what is the probability that the number would have been as large as this by chance?” That leads us to consider X ≥ 207 rather than X = 207. 7 A Key Formula (page 297) The binomial distribution for n trials with probability p of success on each trial has mean µ and standard deviation σ given by µ = np, q np(1 − p). σ = 8 The solution proceeds by several steps: Step 1: Calculate the mean of X. This is given by the formula µ = np = 262 × 0.422 = 110.6. Step 2: Calculate the standard deviation of X. This is given by the formula q √ σ = np(1 − p) = 262 × 0.422 × 0.578 = 7.99. Step 3: Convert the given x value (207) to z. So z= 207 − 110.6 x−µ = = 12.07. σ 7.99 Step 4: Calculate the probability associated with this z value. 9 The only problem with step 4 is: the number’s off the chart! The regular table only goes up to 3.49. In fact, if you look at the little table in the bottom corner of page A2, you can see some further numbers: z 3.5 4.0 4.5 5.0 Probability .999767 .9999683 .9999966 .999999713 Even at z = 5, the probability to the left of z (i.e. less than 5) is more than .999999, which means that the probability to the right of z is less than .0000001. Replace z = 5 by z = 12, and the probability of that is much smaller again. 10 Conclusion. The probability that we could have got this result (207 African Americans out of 262) by chance is so small that it is effectively 0. This seems to be completely convincing evidence that the police were engaging in the practice of racial profiling. However, there are other possible explanations — for example, perhaps the proportion of African Americans driving past this particular checkpoint was much greater than 42.2%. Further Discussion. It is possible to compute the exact probability that X ≥ 207} in this example: the answer is 4.9 × 10−34. To give an idea of how small a probability that is, it is roughly equivalent to the probability that your favorite baseball team win the World Series 23 times in succession! [On the assumption that there are 30 Major League Baseball teams, that any one of them is equally likely to win in a given 23 year, and that results from year 1 = 1.1 × 10−34.] to year are independent. 30 11 Example 2: Consider our earlier example about the tennis player who gets in 70% of her serves. In a whole match she serves 80 times. What is the probability she makes at least 65 of these? 12 Solution: First calculate µ and σ: µ = 80 × 0.7 = 56, √ σ = 80 × 0.7 × 0.3 = 4.1. Then for x = 65, we have z = 65−56 4.1 = 2.20. Look up 2.20 in the normal table: the corresponding left-hand probability is .9861. So the answer is 1 − 0.9861 = 0.0139. In other words, it would be very unusual if she actually achieved this in a game, though it would be nothing like as “surprising” as our racial profiling example! [Again it is possible to use a computer to calculate the exact probability. In this case it comes to .0161, compared with the above approximate answer of .0139. This gives an idea how accurate the normal approximation is. It’s not perfect, but it’s good enough for most practical calculations.] 13 Guidelines for the normal approximation to the binomial distribution (see sidebar p. 299): the binomial distribution can be approximated well by a normal distribution when the expected number of successes, np, expected number of failures, n(1 − p), are both at least 15. In the racial profiling example, n = 262, p = 0.422 so np = 110.6, n(1 − p) = 151.4. In the tennis example, n = 70, p = 0.7 so np = 56, n(1 − p) = 24. In both cases, the number is greater than 15 so the condition is satisfied. 14 Sampling Distributions Example: An ABC News/Washington Post opinion poll published February 23 2009 stated that President Obama had an approval rating of 68% (among all voters — the ratings were sharply different among Democrats, Independents and Republicans). This is based on a sample of 1001 voters. The margin of error is described as plus or minus 3%. What exactly does this mean? 15 Let’s focus on the proportion of people who supported the President. In this case 68% is a statistic — the number calculated from the sample. The true proportion in the population is an unknown value p. Collecting a sample is essentially a binomial distribution, with n = 1001. However most opinion polls are reported as the percentage or proportion of people who vote a certain way, rather than the total number. Therefore, our interest is in the sample proportion. If X is the number of people who support Obama in the poll, then the sample proportion is X/n (so in this example, X was about 681, which would lead to X/n = 681/1001 = 0.68 to two decimal places). 16 For a sample proportion we have (see sidebar, page 313): Mean = p, s p(1 − p) Standard Deviation = . n So assuming p = 0.68,rin this case we get a mean of 0.68 and a standard deviation of p(1−p) = n q 0.68×0.32 = 0.0147. 1001 Also the normal distribution applies (because again np > 15, n(1− p) > 15), so we can assume the sampling distribution is approximately normal. 17 Conclusion: If the true value of p = 0.68, then in repeated samples of size 1008, the sampling distribution will be approximately normal with a mean of 0.68 and a standard deviation of 0.0147. In particular, approximately 95% of all polls will result in a sample proportion within 0.0294 (2 standard deviations, or 3 percentage points) of the mean, and approximately 99.7% of all polls will result in a sample proportion within 0.0441 (3 standard deviations, or 4.4 percentage points) of the mean. r plays a special role in this calcuBecause the quantity p(1−p) n lation, it is given a special name — the standard error. 18