* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Homework 5 (due October 27, 2009)
Pattern recognition wikipedia , lookup
Generalized linear model wikipedia , lookup
Least squares wikipedia , lookup
Hardware random number generator wikipedia , lookup
Simulated annealing wikipedia , lookup
Fisher–Yates shuffle wikipedia , lookup
Probability box wikipedia , lookup
Homework 5 (due October 27, 2009) Problem 1. Each night different meteorologists give us the probability that it will rain the next day. To judge how well meteorologists predict, we will score each of them as follows: If a meteorologist says that it will rain with probability p, then he or she will receive a score of 1 − (1 − p)2 if it does rain, 1 − p2 if it does not rain. We will then keep track of scores over a certain time span, and conclude that the meteorologist with the highest average score is the best predictor of weather. Suppose now that a given meteorologist is aware of this and so wants to maximize his or her expected score. If this person truly believes that it will rain tomorrow with probability p∗ , what value of p should he or she assert so as to maximize the expected score? Solution If I let p∗ be the probability that I truely BELIEVE that it will rain tomorrow. Let p be the probability that I will SAY it will rain tomorrow. The question is what should p be to maximize my score. Let X be the random variable equal to the score the meteorologist receives when i say it will rain with probability p. There are only two values that X can take; (1 − (1 − p)2 ) or 1 − p2 . Thus when computing the expected value of X we have to sum of those two values multiplied by the probability that X takes on those values. E[X] = (1 − (1 − p)2 )P {X = 1 − (1 − p)2 } + (1 − p2 )P {X = 1 − p2 }. Now P {X = 1 − (1 − p)2 } is the probability that my score is (1 − (1 − p)2 ), but I get this score when it rains, so this is the probability that it rains. Similarly P {X = 1 − p2 } = P {It does not rain}. But I believe that it will rain with probability p∗ and that it wont rain with probability 1 − p∗ . Thus: E[X] = (1 − (1 − p)2 )p∗ + (1 − p2 )(1 − p∗ ) = −2pp∗ + 1 − p∗ − p2 . Now I seek to maximize this expected value with respect to p. I recognize that it is a quadratic function in p with a negative coefficient of p2 , thus it has a concave down parabola and thus has one maximum where the derivative is 0. If I take the derivative with respect to p and set it equal to 0. This gives that a critical point occurs when p = p∗ . Checking the second derivative is −2, we learn that when p = p∗ we must indeed maximize our score. Problem 2. For a nonnegative integer-valued random variable X, show that E[X] = ∞ X P (X > n) . n=0 1 Hint: Note that ∞ X P (X > n) = n=0 ∞ ∞ X X P (X = m) n=0 m=n+1 and change the order of the sums to show that this can be written as E[X]. Solution ∞ ∞ ∞ X X X P (X > n) = P (X = m) n=0 P∞ k=0 k P (X = k) = n=0 m=n+1 = ∞ m−1 X X P (X = m) m=1 n=0 This last equality comes from the fact that if I fix m then n ranges from 0 to m − 1. Now the inner sum in the above formula is summing over n thus P (X = m) which does not depend on n can be pulled out of the inner sum or: = ∞ X P (X = m) m=1 m−1 X 1 n=0 The inner sum just adds 1, m times. Thus we have: = ∞ X mP (X = m) = ∞ X mP (X = m) = E[X]. m=0 m=1 Problem 3. Let X be a random variable having expected value µ and variance σ 2 . Find the expected value and the variance of the random variable Y = Solution X −µ . σ X −µ E[X] µ E[Y ] = E = − =0 σ σ σ The second to last equality was using the fact that was proved in class as well as in the book that E[aX + b] = aE[X] + b, then we used the fact that µ = E[X]. To compute the Variance of Y we note that E[Y ] = 0 so the V ar(Y ) = E[Y 2 ]. (X − µ)2 1 2 E[Y ] = E = 2 E[(X − µ)2 ] = 1. 2 σ σ The last equality comes from recognizing that E[(X − µ)2 ] = V ar(X) = σ 2 . 2 Problem 4. Airlines find that each passenger who reserves a seat fails to turn up with 1 independently of the other customers. So Teeny Weeny Airlines (TWA) always probability 10 sell 10 tickets for their 9-seat airplane, while Blockbuster Airways (BA) always sell 20 tickets for their 18-seat airplane. Which is more often over-booked? Solution If we first consider TWA and let the random variable X = the number of people who show up. This is a binomial random variable for parameters (n = 10, p = 9/10). The probability that TWA is overbooked is equal to: P (X > 9) = P (X = 10) = 9 10 10 ≈ .349 Doing the same for the BA airline we do the same thing. Let Y = the number of people that show up, then it is a binomial random variable for the parameters 9 ). Here though the probability that they are overbooked is equal to: (n = 20, p = 19 20 9 19 1 9 20 P (Y > 18) = P (Y = 19) + P (Y = 20) = + ≈ .3917 19 10 10 10 Thus the second airline is more likely to be overbooked. Problem 5. A man claims to have extrasensory perception. As a test, a fair coin is flipped 8 times, and the man is asked to predict the outcome in advance. He gets 6 out of 8 correct. What is the probability that he would have done at least this well if he had no extrasensory perception? Solution In this problem we think of flipping a coin 8 times and let X = the random variable of the number of times we guess correctly. Since we have a 1/2 chance of guessing each flip correctly, then X is a binomial random variable associated to the parameter (n = 8, p = .5). Thus the probability that I guess better then 6 correctly would be equal to: P (X ≥ 6) = 8 X k=6 8 1 8 8 1 8 8 1 8 P (X = k) = + + = (.5)8 (28+8+1) 6 7 8 2 2 2 37 ≈ .1445 28 Consequently we are very unlikely to guess as well or better then he did. = Problem 6. To determine whether or not they have a certain disease, a large number of people are to have their blood tested. However, rather than testing each individual separately, it has been decided first to group people in groups of 10. The blood samples of the 10 people in each group will be pooled and analyzed together. If the test is negative, 3 one test will suffice for the 10 people; whereas, if the test is positive, each of the 10 people will also be individually tested and, in all, 11 tests will be made on this group. Assume that the probability that a person has the disease is p for all people, independently of each other, and that the pooled test will be positive if at least one person in the pool has the disease. Let T be a random variable equal to the number of tests needed for a group of 10 people. Find the expected number E[T ] of tests necessary for each group. Calculate the numerical values of E[T ] for the cases p = 0.001, p = 0.01, p = 0.05, p = 0.1, and p = 0.5. Discuss briefly the practical applicability of this method (from purely probabilistic point of view). Solution If T is the random variable defined above then it can take on two possible values: 1 or 11. Thus the expectation value of T is given by: E[T ] = 1P (T = 1) + 11P (T = 11) = (1 − p)10 + 11 1 − (1 − p)1 0 = 11 − 10(1 − p)10 . The second to last equality comes from the fact that P (T = 1) is the probability that no one has the disease, and thus (1 − p)1 0. Similarly P (T = 11) is the probability that someone has the disease which is the complement of no one having the disease thus (1 − (1 − p)1 0). Computing this for the values indicated we get: • p = .001 gives E[T ] ≈ 1.1 • p = .01 gives E[T ] ≈ 1.956 • p = .05 gives E[T ] ≈ 5.01 • p = .1 gives E[T ] ≈ 7.51 • p = .5 gives E[T ] ≈ 10.99 What this tells us is that if the probability of having the disease is too high (in particular if its bigger then one half) then this method of testing results in doing more tests on average then if I just tested everyone since the expected value of the number of tests is bigger then 10. Problem 7. We toss n coins, and each one shows heads with probability p, independently of the others. Each coin which shows heads is tossed again. Let Y be the number of heads resulting from the second round of tosses. Find the probability mass function, fY (k), the expectation, E[Y ], and the variance, var(Y ), of the random variable Y . 4 Solution This problem is not to complicated if you recognize the following. Y counts the number of successes where success is deemed to be when a coin is flipped twice and gets a head both times, which occurs with probability p2 , So Y is a binomial random variable associated to the parameters (n, p2 ). Thus we now have computed all the things we seek in this problem. fY (k) = n k p2k (1 − p2 )n−k , for k = 0, 1, 2, ...... E[Y ] = np2 V ar(Y ) = np2 (1 − p2 ) 5