Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2.1 Chapter 2: Probability 2.1: Sample space Experiment - an activity for which an outcome is uncertain Example: Flip a coin – head or tail are unknown until it is observed Example: Roll a pair of dice – the numbers rolled are unknown until they are observed. Example: Kick a field goal – the success or failure is unknown until it is observed Example: Clinical trial examining a new drug Whether people are cured or not is unknown until they are observed An outcome of the experiment measured could be the number of platelets in their blood. Example: HIV test – whether or not a person has HIV is unknown until the test outcome is observed. Sample space – the set of all possible outcomes of a statistical experiment; denoted by S 2005 Christopher R. Bilder 2.2 Example: Flip a coin – S = {H, T} Example: Roll a pair of dice Suppose the total on the dice is of interest. Then S = {2, 3, …, 12}. Suppose the actual value of each die is of interest. Then S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)} Suppose the multiplication of the die values are of interest. Then S = {1, 2, 3, …, 36}. Example: Kick a field goal – S = {success, failure} Example: Clinical trial examining a new drug – S = {cured, not cured} Example: HIV test – S = {positive, negative} 2005 Christopher R. Bilder 2.3 2.2: Events Event – a subset of the sample space Example: Flip a coin Let A denote the event of observing a head. A is a subset of S Example: Roll a pair of dice Suppose the total on the dice is of interest so that S = {2, 3, …, 12}. Let A denote the event of observing a total of 2. Also, we could let A denote the event of observing 6 or less. In both cases, A is a subset of S. Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4. Then the outcomes within A are (1,3), (2,2), (3,1). Question: Why do we want to define experiments, sample spaces, and events? Complement of an event A – subset of all elements in S that are not in A; denoted by A (or A or Ac) 2005 Christopher R. Bilder 2.4 Example: Flip a coin Let A denote the event of observing a head. A is the event of observing a tail. Example: Roll a pair of dice Suppose the total on the dice is of interest so that S = {2, 3, …, 12}. Let A denote the event of observing 6 or less. A is the event of observing 7 or more. Intersection of two events A and B – event containing all elements that are common to A and B; denoted by AB (or “A and B”) Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4. Let B denote the event of observing a 2 on at least one of the Rolls. Then B has the outcomes of (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,2), (3,2), (4,2), (5,2), and (6,2). 2005 Christopher R. Bilder 2.5 AB contains only (2,2). Venn Diagrams are useful to see the last result above. Events are represented by regions. Below is an example corresponding to the AB = (2,2): (4,4) (4,5) (4,6) (5,1) (5,3) (5,4) (5,5) (1,3) (3,1) A (2,2) (2,1) (2,3) (2,4) (2,5) (2,6) (1,2) (3,2) (4,2) (5,2) (6,2) B (1,1) (1,4) (1,5) (1,6) (3,3) (3,4) (3,5) (3,6) (4,1) (4,3) (5,6) (6,1) (6,3) (6,4) (6,5) (6,6) S Mutually exclusive events – if AB = then A and B are mutually exclusive or disjoint Notice AA = Union of events – event containing all elements of A only or B only or both A and B; denoted by AB (or “A or B”) 2005 Christopher R. Bilder 2.6 Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4, and let B denote the event of observing a 2 on at least one of the Rolls. AB = (1,3), (3,1), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,2), (3,2), (4,2), (5,2), and (6,2). Some final results: AA = AA = S A = A = A (A) = A De Morgan’s Laws: (AB) = AB and (AB) = AB 2005 Christopher R. Bilder 2.7 To see the last result above, Venn Diagrams are useful: A B S The orange region is the intersection of A and B. This graphically represents AB. Everything outside of the orange region is (AB). Now consider A and B: A A 2005 Christopher R. Bilder 2.8 B B When you combine everything in A and B, (AB), one can see it includes everything excluding the intersection of AB (the orange area). Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4, and let B denote the event of observing a 2 on at least one of the Rolls. (AB) = AB = All elements in S except for (2,2) 2005 Christopher R. Bilder 2.9 (AB) = AB = All events that are only in the blue area of the Venn Diagram = (1,1) (1,4) (1,5) (1,6) (3,3) (3,4) (3,5) (3,6) (4,1) (4,3) (4,4) (4,5) (4,6) (5,1) (5,3) (5,4) (5,5) (5,6) (6,1) (6,3) (6,4) (6,5) (6,6) Question: Why do we want to examine intersections and unions? 2005 Christopher R. Bilder 2.10 2.3: Counting sample points Often we want to count the number of possible outcomes of an experiment or the number of items (or points) in the sample space, S. This can be done a number of different ways depending on the problem. The counting is important so that we can eventually assign probabilities to events. Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Listed a different way, all possible outcomes in S are: Die #1 Die #2 1 1 1 2 1 3 1 4 1 5 1 6 2 1 2 2 2 3 2 4 2 5 2 6 3 1 3 2 Die #1 Die #2 4 1 4 2 4 3 4 4 4 5 4 6 5 1 5 2 5 3 5 4 5 5 5 6 6 1 6 2 2005 Christopher R. Bilder 2.11 Die #1 Die #2 3 3 3 4 3 5 3 6 Die #1 Die #2 6 3 6 4 6 5 6 6 There are a total of 36 different combinations. What is a simpler way to determine this than listing out all possible outcomes? Generalized multiplication rule – If an “operation” can be performed n1 ways, and second operation is performed n2 ways, …, a kth operation is performed nk ways, then the total number of operations can be performed n1n2…nk ways. This assumes that each operation does not have an effect on the outcome of the other operations. Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. n1=6 and n2=6 so that the total number of outcomes in S is n1n2 = 66 = 36. Suppose the total on the dice is of interest so that S = {2, 3, …, 12}. Notice that the generalized multiplication rule can not be used directly here. 2005 Christopher R. Bilder 2.12 The last statement in the generalized multiplication rule is important. For example, suppose the actual value of each die is of interest again. Suppose each die is rolled separately and the type of die for the second roll is dependent on what happens on the first. The second die could be a die with a number of sides equal to the outcome of first die. For example, if a 3 is rolled on the first die, a 3 sided die is rolled for the second die. This is an example where the multiplication rule could not be used directly. The rest of Section 2.3 discusses permutations and combinations. You are not responsible for the permutations material. We will discuss combinations in Section 5.3. 2005 Christopher R. Bilder 2.13 2.4: Probability of an event Notation: P(A) is read as “the probability that event A happens”. Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Listed a different way, all possible outcomes in S are: Die #1 Die #2 1 1 1 2 1 3 1 4 1 5 1 6 2 1 2 2 2 3 2 4 2 5 2 6 3 1 3 2 3 3 3 4 3 5 3 6 Die #1 Die #2 4 1 4 2 4 3 4 4 4 5 4 6 5 1 5 2 5 3 5 4 5 5 5 6 6 1 6 2 6 3 6 4 6 5 6 6 Suppose each outcome is EQUALLY likely. 2005 Christopher R. Bilder 2.14 Let A be the event the sum of the two dies is 2. Then P(A) = 1/36 since there is only one way, (1,1), the sum can be 2 and there are 36 different possibly outcomes of rolling two dice. Less formally, this can be written as P(2) = 1/36. Example: What is the probability that you will win in the Pick 5 game of the Nebraska lottery if you choose only one combination of numbers? Note that 5 numbers are chosen from 1 to 38 and a number can only be chosen once. #1 #2 #3 #4 #5 1 2 3 4 5 1 2 3 4 6 1 2 501,942 34 35 36 37 38 (Section 2.3 discusses how to use a “combination” to find that there is 501,942 different possibilities.) Each outcome is EQUALLY likely. P(win) = 1/501,942 = 1.9910-6 Probability Rules 0P(A)1 for some event A 2005 Christopher R. Bilder 2.15 Example: The probability it rains today can not be 110% or -10% Let A1,…,Ak be all possible events for an experiment and they are MUTUALLY EXCLUSIVE. Then P(A1 A2 ... Ak ) P(A1) P(A2 ) k P(Ak ) P(Ai ) 1 i 1 Example: NFL regular reason football P(win) + P(lose) + P(tie) = 1 Theorem 2.9: If an experiment can only result in one of N different equally likely outcomes AND if exactly n of these correspond to an event A, then P(A) = n/N Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4, and let B denote the event of observing a 2 on at least one of the Rolls. 2005 Christopher R. Bilder 2.16 P(A) = P(total is 4) = 3/36 since A has the outcomes of (1,3), (3,1), and (2,2) P(B) = P(at least one dice is a 2) = 11/36 since B has the outcomes of (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,2), (3,2), (4,2), (5,2), and (6,2). 2005 Christopher R. Bilder 2.17 2.5: Additive rules Below are some important rules regarding probabilities. Theorem 2.10: If A and B are any two events, then P(AB) = P(A) + P(B) – P(AB). Why? A B S Notice the orange area, AB, is added in twice with A and B. Thus, it needs to be subtracted out once This could also be reexpressed as P(AB) = P(A) + P(B) – P(AB). Corollary: If A and B are mutually exclusive, then P(AB) = P(A) + P(B). If A1, A2,…, An are mutually exclusive then P(A1 A2 ... An ) = P(A1) + P(A2) + … + P(An). 2005 Christopher R. Bilder 2.18 What would mutually exclusive events look like in a Venn Diagram? Theorem 2.11: If A, B, and C are any three events, then P(ABC) = P(A) + P(B) + P(C) – P(AB) – P(AC) – P(BC) + P(ABC) Show this on your own through a Venn Diagram! Theorem 2.12: If A and A are complementary events, then P(A) and P(A). Also, P(A) + P(A) = 1 and P(A) = 1 - P(A) Example: Roll a pair of dice Suppose the actual value of each die is of interest so that S = {(1,1), (1,2), …, (1,6), (2,1),…,(6,6)}. Let A denote the event of observing a total of 4, and let B denote the event of observing a 2 on at least one of the Rolls. P(AB) = P(A) + P(B) – P(AB) = 3/36 + 11/36 – 1/36 Note that AB has the outcome of (2,2). 2005 Christopher R. Bilder 2.19 Example: Larry Bird (bird.xls) Free throws (FTs) are typically shot in pairs. Below is a “contingency table” summarizing Larry Bird’s first and second FT attempts during the 1980-1 and 1981-2 NBA seasons. The data source is Wardrop (American Statistician, 1995) Second Made Missed Total Made 251 34 285 First Missed 48 5 53 Total 299 39 338 Interpreting the table: 251 first AND second FTs were both made 34 first FTs were made AND the second FTs were missed 48 first FTs were missed AND the second FTs were made 5 first AND second FTs were both missed 285 first FTs were made regardless what happened on the second attempt 299 second FTs were made regardless what happened on the first attempt 338 FT pairs were shot during these seasons 2005 Christopher R. Bilder 2.20 More formally, Let A = 1st FT is made. Then A is 1st FT is missed. Let B = 2nd FT is made. Then B is 2nd FT is missed. A A B 251 48 B 34 5 The “counts” table can be transformed into a table of probabilities by dividing each numerical cell by 338. Second Made Missed Total Made 0.7426 0.1006 0.8432 First Missed 0.1420 0.0148 0.1568 Total 0.8846 0.1154 1 What is P(A) = P(1st made)? What is P(B) = P(2nd made)? Probabilities on the margins of the table (total column and row) are often called “marginal probabilities”. What does 0.7426 represent in our symbolic notation? What is the most likely joint outcome of the first and second FT to occur? 2005 Christopher R. Bilder 2.21 Probabilities in the body of table are often called “joint probabilities”. P(1st made) = P(1st made 2nd made) + P(1st made 2nd missed) = 0.7426 + 0.1006 = 0.8432 This can be expressed as P(AB)+ P(AB) = P(A) What is P(1st made 2nd made) = probability make at least one? There are a few different ways to find this. 1. Second Made Missed Made 0.7426 0.1006 First Missed 0.1420 0.0148 Add the probabilities in yellow. 2. P(AB) = P(A) + P(B) - P(AB) = 0.8432 + 0.8846 0.7426 = 0.9852 Second Made Missed Total Made 0.7426 0.1006 0.8432 First Missed 0.1420 0.0148 0.1568 Total 0.8846 0.1154 1 2005 Christopher R. Bilder 2.22 3. P(AB) = 1 – P[(AB)] using the complement = 1 – P(AB) using De Morgan’s laws = 1 – 0.0148 = 0.9852 The use of Excel with a contingency table: The use of absolute cell references were helpful when copying formulas. 2005 Christopher R. Bilder 2.23 2.6 and 2.7: Conditional probability and multiplicative rules Conditional probability – The probability an event happens conditioned on another event happening. Consider two events A and B. The probability that A occurs given that B occurred is called a conditional probability. It is denoted by P(A|B). This is read as “the probability of A GIVEN B. This probability can be found from P(A B) , P(A | B) P(B) provided P(B)0. Note that another conditional probability could also be stated as P(B|A) = P(AB)/P(A). Where does the formula P(A | B) P(A B) come from? P(B) Suppose the event B occurs and it had a particular probability (P(B)) of occurring. This now limits the possibility of what other events occur. To determine the probability that A occurs, we must examine P(AB) since B occurs. 2005 Christopher R. Bilder 2.24 To find the probability that A occurs given the B occurred, we use P(AB)/P(B). This gives us the probability of A occurring out of all possibilities where B occurred. Example: Larry Bird (bird.xls) Second Made Missed Total Made 0.7426 0.1006 0.8432 First Missed 0.1420 0.0148 0.1568 Total 0.8846 0.1154 1 P(1st missed 2nd made) P(2 made | 1 missed) P(1st missed) 0.1420 0.9057 0.1568 nd st Written in terms of B B A A P(B|A) = P(AB)/P(A) = 0.1420/0.1568 = 0.9057 Therefore it is still very likely that Larry Bird will make the second free throw even if the first one is missed. 2005 Christopher R. Bilder 2.25 Question for basketball fans: Why would this probability be important to know? Verify on your own that P(2nd made | 1st made) = 0.8807. Example: The Showcase Showdown on the Price is Right On the game show, The Price is Right, three contestants are given an opportunity to spin the big wheel. The big wheel has monetary values of 5, 10, …, 100 cents on it. The contestant that is closest to a dollar (100 cents) in one or a combination of two consecutive spins, without going over, wins the game. If there is a tie, the tied players are given one additional spin with the player having the highest number in that spin winning. Coe and Butterworth (American Statistician, 1995) compute conditional win probabilities for the first person spinning the big wheel. The probabilities are shown in the table below. First Spin P(win | spin once (i) & 1st spin=i) .00034 5 P(win | spin twice & 1st spin=i) .20595 2005 Christopher R. Bilder 2.26 First Spin P(win | spin once (i) & 1st spin=i) .00121 10 .00285 15 .00540 20 .00906 25 .01415 30 .02101 35 .03009 40 .04190 45 .05704 50 .08346 55 .11829 60 .16319 65 .21563 70 .28416 75 .36818 80 .46990 85 .59169 90 .73606 95 .90567 100 P(win | spin twice & 1st spin=i) .20589 .20574 .20547 .20502 .20431 .20326 .20176 .19966 .19681 .19264 .18672 .17856 .16778 .15357 .13517 .11167 .08209 .04528 .00000 For example, P(win | spin once & 1st spin=5 cents) = 0.00034 What is the optimal strategy the first person should follow in deciding whether or not to spin twice? 2005 Christopher R. Bilder 2.27 Independence – Events A and B are independent if P(A|B) = P(A) or equivalently P(B|A) = P(B) In words, this means the probability of event A is not affected by event B and vice versa. As a result of the conditional probability equation, P(AB) = P(A)P(B) also means independence. Why? Example: Larry Bird (bird.xls) What does independence mean in this example? P(2nd made | 1st missed) = 0.9057 P(2nd made) = 0.8846. Dependence exists - but notice how close they are. Notes: Only one conditional probability needs to be checked. Typically, one would consider the 338 free throws here a sample from the population of all Larry Bird’s free throw attempts. This would be especially desirable to do if Larry Bird still was playing basketball professionally. Questions about whether this is a representative sample would need to be addressed. Assuming it was a representative sample, one may be 2005 Christopher R. Bilder 2.28 interested in drawing an inference from the sample to the population all free throws. A chi-square hypothesis test for independence could be conducted using the data. The result is there is not sufficient evidence to prove dependency. In my Chapter 2 lecture notes of my STAT 875 Categorical Data Analysis course, I do perform the test for the data if you would like to see the results. Independence is a VERY important concept to understand and we will be using this frequently in the future. Here is another example of where independence can be used. Example: Quality control Experience has shown that a manufacturing operation produces, on the average, only one defective unit in 10. These are removed from the production line, repaired, and returned to the warehouse. Suppose that during a given period of time you observe five defective units emerging in sequence from the production line. 1) If prior history has shown that defective units emerge randomly from the production line, what is the probability of observing a sequence of five consecutive defective units? 2005 Christopher R. Bilder 2.29 Since units emerge “randomly”, this implies independence. Let A1=1st unit defective,…, A5=5th unit defective. P(All 5 are defective) = P(A1A2A3A4A5) = P(A1)P(A2)P(A3)P(A4)P(A5) because of independence = 0.10.10.10.10.1 = 0.00001 Therefore, this would happen VERY rarely! 2) If five consecutive defective units did emerge from the production line, what would you conclude about the process? There is something wrong with the manufacturing process. Multiplicative rule - P(A | B) P(A B) implies that P(AB) = P(B) P(A|B)P(B) Theorem 2.15: Consider the events of A1, A2,…, Ak. Then 2005 Christopher R. Bilder 2.30 o P(A1A2) = P(A1)P(A2|A1) o P(A1A2A3) = P(A1)P(A2A3|A1) = P(A1)P(A2|A1)P(A3|A2A1) Why is P(A2A3|A1) = P(A2|A1)P(A3|A2A1)? Remember that P(A2A3) = P(A2)P(A3|A2) o In general, P(A1A2A3…Ak) = P(A1) P(A2|A1) P(A3|A1A2) … P(Ak|A1A2...Ak-1) Sensitivity and specificity Diagnostic tests are used to determine if a person has a disease or not. These tests are not always correct. The makers of the tests try to make them very “accurate” in detecting a disease. However, this form of accuracy comes at a cost in terms of incorrectly saying that some people have the disease when they do not really have it. Example: HIV testing Suppose a clinical trial is being conducted on a new HIV test. The test measures a number of different variables related to the presence of HIV. Using the observed 2005 Christopher R. Bilder 2.31 results for a patient, the test decides if a person is HIV positive or not. Below are the possible outcomes: HIV actual HIV test results Negative Positive No Correct=True Negative Error=False positive Yes Error=False Negative Correct=True positive The test is correct when a person with HIV actually tests positive. Similarly, the test is correct when a person without HIV actually test as negative. There is the possibility the test could be incorrect. This happens when someone has HIV and the test says the person is negative. Also, this happens when someone does not have HIV and the test says the person is positive. Obviously, it is important to control the probabilities of making these errors. Statisticians, epidemiologists, physicans,… are specifically interested in two particular probabilities associated with the contingency table above: Sensitivity = P(Test is Positive | Actual is Yes) This is the probability a person tests positive, given the person actually has HIV. Specificity = P(Test is Negative | Actual is No) 2005 Christopher R. Bilder 2.32 This is the probability a person tests negative, given the person does not actually have HIV. According to the FDA, the ELISA test has a sensitivity of 0.993 and the specificity is 0.9999. What is the probability of making each error? P(Test is Negative | Actual is Yes) = P(Test is Positive | Actual is No) = Hint: P(A|B) = 1-P(A|B). Johnson and Gastwirth (1991), estimate the proportion of “incidence of HIV positive in the general population of people without known risk factors” to be 0.000025. Obviously, this may be different now since this value is >10 years old. I could not find an updated value. This means P(Actual is Yes) = 0.000025 or 250 people out of 10,000,000 people have HIV. Out of these 250 people, how many would the Elisa test give a “Test is positive” result? How many of the 10,000,000-250 = 9,999,750 people who do not have HIV would the Elisa test give a “Test is positive” result? 2005 Christopher R. Bilder 2.33 Therefore, the test gives ____ “Test is positive” results, but only ____ actually have HIV. ____% of the “Test is positive” results are incorrect. What should you do if you take the test and it turns up positive? How could we decrease the ____%? Lower the “sensitivity” of the test. If this was done, then more people who actually have HIV would “test is negative”. For more on determining sensitivity and specificity, see my Chapter 8 lectures notes for STAT 873. Specifically, see the discussion on Receiver and Operating Characteristic (ROC) curves. 2005 Christopher R. Bilder