* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 4. Random Variables, Bernoulli, Binomial, Hypergeometric
Indeterminism wikipedia , lookup
Inductive probability wikipedia , lookup
Birthday problem wikipedia , lookup
Probability box wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Random variable wikipedia , lookup
Probability interpretations wikipedia , lookup
Conditioning (probability) wikipedia , lookup
STA111 - Lecture 4 Random Variables, Bernoulli, Binomial, Hypergeometric 1 Introduction to Random Variables Random variables are functions that map elements in the sample space to numbers (technically, random variables have to satisfy some mathy conditions, but we won’t worry about that here; if you’re really interested, you can take STA711 in a few years). Today we will work with random variables that can take on values on countable subsets of R. Later in the course (probably next time) we will work with “continuous” random variables that take on values on (dense) subsets of R. As always, we’ll try to digest the new concept with some examples. Examples: • Suppose we’re flipping a coin twice. An example of a random variable would be “number of tails”, which we will denote X1 . This is a table summarizing the values that X1 take on: outcome heads, heads heads, tails tails, heads tails, tails X1 0 1 1 2 For example, we can compute probabilities of the type P (X1 = 1) = P ({heads, tails}∪{tails, heads}) = 1/2, or P (X1 > 0) = 1 − P (X1 = 0) = 3/4. As you can see, working with random variables is intuitive and natural, and it’s pretty convenient notation. • Suppose we’re rolling a die twice. An example of a random variable here would be adding up the outcomes. If we call it X2 , a table with some values is: outcome 1,1 1,2 2,1 2,2 .. . X2 2 3 3 4 .. . And we can find probabilities like P (X2 = 3) = 2/36 or P (2 < X2 ≤ 4) = P (X2 = 3) + P (X2 = 4) = 2/36 + 3/36. Exercise 1. Come up with 3 examples of random variables and give an example of a probability using random variable notation for each of them (i.e. of the type P (X = k), P (X ≤ k), P (X 6= k), etc.). Now we’re going to spend some time introducing different “types” of random variables that can be used for modeling random phenomena. 1 1.1 Bernoulli Suppose X is a random variable that can only take on the values 1 or 0 with probabilities P (X = 1) = p and P (X = 0) = 1 − p. Then X is said to have a Bernoulli distribution with probability of “success” p, denoted X ∼ Bernoulli(p). Examples: • We’re flipping a coin once and our random variable is X1 = 1 if the outcome is heads and X1 = 0 if the outcome is tails. Then, X1 ∼ Bernoulli(1/2). • We’re rolling a die and our random variable takes on the value X2 = 1 if the outcome is strictly greater than 4 and X2 = 0 otherwise. Then, X2 ∼ Bernoulli(1/3). 1.2 Binomial Suppose we repeat a “Bernoulli experiment” n times independently and we add up the outcomes. That is, suppose that our random variable is Y = X1 + X2 + · · · + Xn , where Xi ∼ Bernoulli(p) and the Xi are independent. Then Y is said to have a Binomial distribution with sample size n and probability of success p, denoted Y ∼ Binomial(n, p). Examples: • We’re flipping a fair coin 4 times and we want to count the total number of tails. The coin flips (X1 , X2 , X3 , and X4 ) are Bernoulli(1/2) random variables and they are independent by assumption, so the total number of tails is Y = X1 + X2 + X3 + X4 ∼ Binomial(4, 1/2). • You’re taking a multiple choice test with 10 questions and 3 answers per question. For each question, there’s only one correct answer. You haven’t studied for the test and you decide to choose the answers “at random”, so you have a 1/3 chance of getting each question right. Let Xi = 1 if your answer to i -th question is right, so Xi ∼ Bernoulli(1/3). The total number of right answers in your test is Y = X1 + X2 + · · · + X10 ∼ Binomial(10, 1/3). • Suppose that we flip a fair coin. The random variable X1 equals 1 if it comes up heads and X1 = 0 if it comes up tails (so X1 ∼ Bernoulli(1/2)). If X1 = 1, we will use a loaded coin with a probability of coming up heads equal to 2/3 for our next flip (X2 ). If X1 = 0, we will use a fair coin for X2 . The random variable X2 is also Bernoulli, since it can only take on the values 0 or 1. The probability of success is P (X2 = 1) = P (X1 = 0)P (X2 = 1 | X1 = 0) + P (X1 = 1)P (X2 = 1 | X1 = 1) 1 1 1 2 7 = · + · = ≈ 0.583. 2 2 2 3 12 So X1 ∼ Bernoulli(1/2) and X2 ∼ Bernoulli(7/12). Is Y = X1 + X2 a Binomial? The answer is no because 1) the probabilities of success for X1 and X2 are different 2) X1 and X2 are not independent! The coin we flip in X2 depends on the outcome of X1 , so X1 and X2 are clearly dependent. Let Y ∼ Binomial(n, p). What is P (Y = k)? (for k ∈ {0, 1, 2, ... n}). Let’s start with a simple example where n = 4. Note that we can identify the outcomes of the underlying Bernoulli experiments X1 , X2 , X3 , X4 with the strings 0000, 0010, 0100, 1110, etc. (for example, 0010 means that all the Xi except X3 are zero). Then, 2 • P (Y = 0) = P (0000) = (1 − p)4 . • P (Y = 1) = P (1000) + P (0100) + P (0010) + P (0001) = 4p(1 − p)3 . • P (Y = 2) = P (0011) + P (1100) + P (1010) + P (0101) + P (1001) + P (0110) = 6p 2 (1 − p)2 . • P (Y = 3) = P (0111) + P (1011) + P (1101) + (1110) = 4p 3 (1 − p). • P (Y = 4) = P (1111) = p 4 . We can definitely see a pattern here. In general, if Y ∼ Binomial(n, p), P (Y = k) = constant × p k (1 − p)n−k . Again, we can identify the event Y = k with strings of k ones and n − k zeros. They’re all mutually exclusive (disjoint) events so the probability that Y = k happens is the sum of the probabilities that each of the favorable strings happens. Given our independence assumption, all the favorable strings are equally likely with probability p k (1 − p)n−k . The constant that we need is the number of favorable strings, which is the number of strings of that contain k ones and n − k zeros. If we pick the position of the ones, we’re done (the rest must be zero). For example, in the case where n = 4 and k = 2 (see above), the favorable strings can be identified with the sets {3, 4}, {1, 2}, {1, 3}, {2, 4}, {1, 4}, {2, 3}. Therefore, the constant is the number of subsets of k elements out of a set with n elements in total, and order doesn’t matter because we’re just picking where the ones are. Thus, we have that if Y ∼ Binomial(n, p), n k P (Y = k) = p (1 − p)n−k . k From now on you can just use this formula. I don’t expect you to rederive it, but I would like you to understand where the expression comes from. Examples: • Suppose we’re flipping a fair coin 4 times and we want to count the total number of tails, which we denote Y . What is the probability that we get 2 tails? We have that Y ∼ Binomial(4, 1/2), so 4 P (Y = 2) = (1/2)2 (1/2)2 = 0.375. 2 • You’re taking a multiple choice test with 10 questions and 3 different answers per question. For each question, there’s only one correct answer. You haven’t studied for the test and you decide to choose the answers “at random”, so you have a 1/3 chance of getting each question right. What is the probability that you get at least half of them right? Let Y be the total number of right answers in your test. Then, Y ∼ Binomial(10, 1/3). We’re interested in finding P (Y ≥ 5), which is equal to P (Y ≥ 5) = P (Y = 5) + P (Y = 6) + · · · + P (Y = 10) 10 10 5 5 = (1/3) (2/3) + (1/3)6 (2/3)4 + · · · + (1/3)10 ≈ 0.213 5 6 Exercise 2. All students enrolled in STA111 (16 students) have to take a medical test which has probability 0.1 of giving a false positive. Suppose that you’re all healthy. What is the probability that there is at least one false positive? 3 Exercise 3. Suppose you roll a fair die 6 times. What is the probability that you get a number strictly greater than 4 at least twice? Our best friend Bobby is willing to bet $10 that it won’t happen. Would you bet against him? Exercise 4. Give 3 examples of Binomial random variables, and compute one probability for each of them. 1.3 Hypergeometric Suppose we have a population of N elements where M elements have a certain characteristic and N − M don’t. Suppose that we select n elements of the population without replacement. If X is the number of elements in the sample that have the caracteristic, then: M N−M P (X = k) = k n−k N n , and X is said to have a Hypergeometric distribution, denoted X ∼ Hypergeometric(N, M, n). We’ve seen this type of random variable before in examples and homeworks! Example: • Remember Exercise 2 in HW2? A bag has 3 green jelly beans and 7 red jelly beans. If you extract 2 jelly beans, what is the probability that the 2 of them are red? Now suppose that you draw 5 jelly beans out of the bag. What is the probability that 3 are red and 2 are green? This is an example of a Hypergeometric random variable. The characteristic is “being red”. The population is the jelly beans in the bag, so N = 10. There are 7 red jelly beans, so M = 7 and N −M = 3. For part 1 we sample 2 elements of the population, so n = 2 and X1 ∼ Hypergeometric(10, 7, 2) and we want to compute P (X1 = 2). For part 2 of the question we have n = 5, X2 ∼ Hypergeometric(10, 7, 5) and we’re interested in P (X2 = 3) Exercise 5. Suppose that you have 20 really good friends, 10 of which like Broccoli. You want to host a dinner party, but your apartment is too small and can only fit 5 friends. You’re a nice person, so you decide that the right thing to do is selecting 5 of them at random. What is the probability that all of your randomly selected guests like Broccoli? What is the probability that at least one of them doesn’t? Now you might be a little bit confused... What is the difference between Binomial and Hypergeometric? The key is “without replacement”. If you’re in a Hypergeometric scenario, your draws are not independent. In the jelly beans example, the probability that the second jelly bean is red depends on the outcome of the first draw, so the draws are not independent (and recall that independence is an assumption of the Binomial!). Exercise 6. Give two examples of random variables with a Hypergeometric distribution. If the size of the population N is big relative to the sample size n, the Binomial and the Hypergeometric give similar answers. Can you see why? 4