Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 4: Basic Probability (Major Concept Review) random = not predictable once, but patterns emerge in the long run Sample Space S of a random phenomenon is the set of all possible outcomes (the individual things that can happen when you do the random phenomenon) Example 1: Flip a coin: {head, tail} {head, tail, land on edge} {head, tail, land on edge, disappear in mid-air} By definition, exactly one outcome (no more, no less) will occur when you do the random phenomenon. Set up your rules in advance accordingly, but stick to them. Example 2: Roll a die: {1,2,3,4,5,6} Example 3: Flip a coin twice: {hh,ht,th,tt} Example 4: Roll two dice: {2,3,…,11,12} may suffice, if you only care about what they add up to. If you’re going to ask questions referring to the first roll or the second roll, you may want a different sample space: (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) … (3,1) … (4,1) … (5,1) … (6,6)} {(6,1) (6,2) But whatever rules you set for yourself, stick with them. By definition, exactly one outcome in sample space occurs, no more nor less. The probability of something happening is the proportion of times it would occur in a very long series of repetitions (long-term relative frequency). There are always three perspectives on probability. For instance, the following all mean the same thing: 1. The probability of a coin landing heads is ½ =0.5. 2. If we flip the coin many, many times, we will get about 50% heads. 3. The proportion of all possible flips which would lands heads is ½. Notice that (1) is the term for probability, (2) is the vaguest, and (3) is the most technically accurate, but hard to wrap your mind around. Consider the statement: If we flip the coin many times, we will probably get about ½ heads. What do we mean by “many”? How many? What do we mean by “probably?” How probably? What do we mean by “about”? How about? The middle part of the course (sections 7-9) will address these questions. The bottom line is that with probability, there are no guarantees. An event is a set of outcomes (subset of sample space). Notice that this concept only makes sense if you’re very clear on what is your sample space. Example 2: Roll a die: {1,2,3,4,5,6} A = “the die lands odd” = {1, 3, 5} B = “the die lands even” = {2, 4, 6} C = “the die lands on a prime number” = {2, 3, 5} D = “the die lands 2 or 4” = {2,4} We say that the event “happens” if the outcome which occurs is in its set. For instance, the die lands odd if the number that comes up is “1” or “3” or “5”. If a “2” comes up, A does not happen. Every event A has a probability, a proportion of times it would occur in the long run. 1. 0 ≤ 𝑃(𝐴) ≤ 1 a. An event can’t happen more than all of the time or less than none of the time. 2. 𝑃(𝑆) = 1 "something always happens" 𝑃(∅) = 0 "nothing never happens" a. By definition, the outcome that occurs is in the sample space. S is certain. b. By definition, the outcome that occurs cannot be in the empty set. ∅ is impossible. c. But having probability 0 does not necessarily make an event impossible, and having probability 1 does not necessarily make an event certain. For instance, it the probability of our “infinitely sharp spinner” landing at exactly 2.00000… was 0, but it wasn’t impossible. The probability of its landing anywhere else was 1, but it wasn’t certain. d. If the sample space contains an infinite number of outcomes (e.g., every real number between 0 and 4) then such things are possible. 𝑃(certain) = 1 𝑃(impossible) = 0 but not vice versa 3. 𝑃(not 𝐴) ≤ 1 − 𝑃(𝐴) a. For instance, “the coin doesn’t land heads” happens the rest of the time. 4. If A, B have no outcomes in common (disjoint sets, mutually exclusive events), then 𝑃(𝐴 or 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 5 a. In Example 2, the probability that the die lands “odd or 2 or 4” is 6. (We can see that we have 5 equally likely possibilities out of 6). 3 i. 𝑃(𝐴) = 6 2 ii. 𝑃(𝐷) = 6 5 3 2 iii. 𝑃(𝐴 or 𝐷) = 𝑃({1,3,5,2,4}) = 6 = 6 + 6 = 𝑃(𝐴) + 𝑃(𝐷) iv. A and B have no outcomes in common, so the principle works. b. In Example 2, the probability that the die lands “even or prime”: 3 i. 𝑃(𝐵) = 6 3 ii. 𝑃(𝐶) = 6 5 6 3 6 3 6 iii. 𝑃(𝐵 or 𝐶) = 𝑃({2,4,6,3,5}) = ≠ + = 𝑃(𝐵) + 𝑃(𝐶) iv. B and C have outcomes in common. “2” will be double-counted. But “2” is either in the set or not. Sets have no concept of order or repetition. v. We saw something similar in our spinner example. “Less than 3 or greater than 1” will double-count all the area between 1 and 3, because the rectangles overlap. We have whole regions rather than individual possibilities double-counted, but the principle is the same. 5. If A, B are independent (disjoint sets, mutually exclusive events), then a. 𝑃(𝐴 and 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵) b. independent events: knowing that A has occurred does not change the probability of B and vice versa This is to say, we need to know the context of P(B). Remember that probability is subjective; it depends on what you know and when. Consider the similarities and differences in the following examples: Example 5a: Flip a fair coin twice. What is the probability of getting a head followed by a head? 1 A = “the first flip is a head” P(A) = . 2 1 2 B = “the second flip is a head” P(B) = . This is true whether the first flip was a head, or a tail, or we didn’t notice one way or the other. 1 1 1 𝑃(𝐴 and 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵) = ∗ = 2 2 4 Example 5b: A standard deck of cards, 26 red cards and 26 black cards shuffled. What is the probability of getting a red card followed by a red card? 26 1 A = “the first card is red” P(A) = = . 52 2 B = “the second card is red” P(B) = ? What does the deck look like at this point? The probability of B depends on whether A happened or not; if A happened, there are 25 out of 51 red cards remaining. If A didn’t happen, there are 26 out of 51 red cards remaining. 25 51 ≈ 0.49 26 51 ≈ 0.51 These are different numbers. 1 To achieve our objective, we must travel the top route, so the answer would be = 2 ∗ A and B are not independent, so the situation is complicated. 25 51 Example 5c: 100 decks of cards shuffled together: 2600 red cards and 2600 black cards shuffled. What is the probability of getting a red card followed by a red card? 2600 1 A = “the first card is red” P(A) = 5200 = 2. B = “the second card is red” P(B) = ? What does the deck look like at this point? 2599 5199 ≈ 0.499904 2600 5199 ≈ 0.500096 These are practically the same number, one-half. While it is true that A and B are not independent, it is also true that they are close enough to being independent. They are practically independent. Taking the first card away does not change the content of the deck to any noticeable degree. The rules are effectively the same the second time as they are the first time no matter what happens the first time. In practice, it is more like example 5a. This is why we don’t have to worry too much about independence in practice. In statistics, we are taking away a bunch to look at which is (and must be) much, much smaller than what we’re picking from, which is why we’re looking at it in the first place. If you could ask any noticeable portion (say, 10%) of the people in New York State whether they’re left-handed, it wouldn’t be that much harder to ask everybody. But you can’t. Let’s say that 10% of people in NYS are left-handed. Then even if you take away 999 left-handed people from tens of millions, the chance of getting a left-hander the 1000th time is still basically 10%. This is what makes statistics doable, as we’ll see in sections 7-9. We are effectively repeating so many times with the same rules operating each time, and it doesn’t matter whether we replace and shuffle, or not.