Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Six Probability Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Random Experiment… …a random experiment is an action or process that leads to one of several possible outcomes. For example: Experiment Outcomes Flip a coin Heads, Tails Exam Marks Numbers: 0, 1, 2, ..., 100 Assembly Time t > 0 seconds Course Grades F, D, C, B, A, A+ Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.2 Probabilities… List the outcomes of a random experiment… List: “Called the Sample Space” Outcomes: “Called the Simple Events” This list must be exhaustive, i.e. ALL possible outcomes included. Die roll {1,2,3,4,5} Die roll {1,2,3,4,5,6} The list must be mutually exclusive, i.e. no two outcomes can occur at the same time: Die roll {odd number or even number} Die roll{ number less than 4 or even number} Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.3 Sample Space… A list of exhaustive [don’t leave anything out] and mutually exclusive outcomes [impossible for 2 different events to occur in the same experiment] is called a sample space and is denoted by S. The outcomes are denoted by O1, O2, …, Ok Using notation from set theory, we can represent the sample space and its outcomes as: S = {O1, O2, …, Ok} Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.4 Requirements of Probabilities… Given a sample space S = {O1, O2, …, Ok}, the probabilities assigned to the outcome must satisfy these requirements: (1) The probability of any outcome is between 0 and 1 i.e. 0 ≤ P(Oi) ≤ 1 for each i, and (2) The sum of the probabilities of all the outcomes equals 1 i.e. P(O1) + P(O2) + … + P(Ok) = 1 P(Oi) represents the probability of outcome i Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.5 Approaches to Assigning Probabilities… There are three ways to assign a probability, P(Oi), to an outcome, Oi, namely: Classical approach: make certain assumptions (such as equally likely, independence) about situation. Relative frequency: assigning probabilities based on experimentation or historical data. Subjective approach: Assigning probabilities based on the assignor’s judgment. [Bayesian] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.6 Classical Approach… If an experiment has n possible outcomes [all equally likely to occur], this method would assign a probability of 1/n to each outcome. Experiment: Sample Space: Probabilities: Rolling a die S = {1, 2, 3, 4, 5, 6} Each sample point has a 1/6 chance of occurring. What about randomly selecting a student and observing their gender? S = {Male, Female} Are these probabilities ½? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.7 Classical Approach… Experiment: Rolling 2 die [dice] and summing 2 numbers on top. Sample Space: S = {2, 3, …, 12} What are the underlying, unstated assumptions?? Probability Examples: P(2) = 1/36 1 2 3 4 5 6 P(7) = 6/36 P(10) = 3/36 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 6.8 Relative Frequency Approach… Bits & Bytes Computer Shop tracks the number of desktop computer systems it sells over a month (30 days): For example, 10 days out of 30 2 desktops were sold. Desktops Sold # of Days 0 1 1 2 2 10 3 From this we can construct the “estimated” probabilities of an event 4 (i.e. the # of desktop sold on a given day)… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 12 5 6.9 Relative Frequency Approach… Desktops Sold [X] # of Days Desktops Sold 0 1 1/30 = .03 =P(X=0) 1 2 2/30 = .07 = P(X=1) 2 10 10/30 = .33 = P(X=2) 3 12 12/30 = .40 = P(X=3) 4 5 5/30 = .17 = P(X=4) ∑ = 1.00 “There is a 40% chance Bits & Bytes will sell 3 desktops on any given day” [Based on estimates obtained from sample of 30 days] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.10 Subjective Approach… “In the subjective approach we define probability as the degree of belief that we hold in the occurrence of an event” P(you drop this course) P(NASA successfully land a man on the moon) P(girlfriend says yes when you ask her to marry you) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.11 Events & Probabilities… An individual outcome of a sample space is called a simple event [cannot break it down into several other events], An event is a collection or set of one or more simple events in a sample space. Roll of a die: S = {1, 2, 3, 4, 5, 6} Simple event: the number “3” will be rolled Event: an even number (one of 2, 4, or 6) will be rolled Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.12 Events & Probabilities… The probability of an event is the sum of the probabilities of the simple events that constitute the event. E.g. (assuming a fair die) S = {1, 2, 3, 4, 5, 6} and P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6 Then: P(EVEN) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.13 Interpreting Probability… One way to interpret probability is this: If a random experiment is repeated an infinite number of times, the relative frequency for any given outcome is the probability of this outcome. For example, the probability of heads in flip of a balanced coin is .5, determined using the classical approach. The probability is interpreted as being the long-term relative frequency of heads if the coin is flipped an infinite number of times. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.14 Joint, Marginal, Conditional Probability… We study methods to determine probabilities of events that result from combining other events in various ways. There are several types of combinations and relationships between events: •Complement of an event [everything other than that event] •Intersection of two events [event A and event B] or [A*B] •Union of two events [event A or event B] or [A+B] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.15 Example 6.1… Why are some mutual fund managers more successful than others? One possible factor is where the manager earned his or her MBA. The following table compares mutual fund performance against the ranking of the school where the fund manager earned their MBA: Where do we get these probabilities from? [population or sample?] Mutual fund outperforms the market Mutual fund doesn’t outperform the market Top 20 MBA program .11 .29 Not top 20 MBA program .06 .54 E.g. This is the probability that a mutual fund outperforms AND the manager was in a top20 MBA program; it’s a joint probability [intersection]. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Venn Diagrams 6.16 Example 6.1… Alternatively, we could introduce shorthand notation to represent the events: A1 = Fund manager graduated from a top-20 MBA program A2 = Fund manager did not graduate from a top-20 MBA program B1 = Fund outperforms the market B2 = Fund does not outperform the market A1 A2 B1 B2 .11 .29 .06 .54 E.g. P(A2 and B1) = .06 = the probability a fund outperforms the market and the manager isn’t from a top-20 school. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.17 Marginal Probabilities… Marginal probabilities are computed by adding across rows and down columns; that is they are calculated in the margins of the table: P(A2) = .06 + .54 “what’s the probability a fund manager isn’t from a top school?” B1 B2 P(Ai) A1 A2 .11 .29 .40 .06 .54 .60 P(Bj) .17 .83 1.00 P(B1) = .11 + .06 “what’s the probability a fund outperforms the market?” Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. BOTH margins must add to 1 (useful error check) 6.18 Conditional Probability… Conditional probability is used to determine how two events are related; that is, we can determine the probability of one event given the occurrence of another related event. Experiment: random select one student in class. P(randomly selected student is male) = P(randomly selected student is male/student is on 3rd row) = Conditional probabilities are written as P(A | B) and read as “the probability of A given B” and is calculated as: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.19 Conditional Probability… Again, the probability of an event given that another event has occurred is called a conditional probability… P( A and B) = P(A)*P(B/A) = P(B)*P(A/B) both are true Keep this in mind! Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.20 Conditional Probability… Example 6.2 • What’s the probability that a fund will outperform the market given that the manager graduated from a top-20 MBA program? Recall: A1 = Fund manager graduated from a top-20 MBA program A2 = Fund manager did not graduate from a top-20 MBA program B1 = Fund outperforms the market B2 = Fund does not outperform the market Thus, we want to know “what is P(B1 | A1) ?” Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.21 Conditional Probability… We want to calculate P(B1 | A1) B1 B2 P(Ai) A1 A2 .11 .29 .40 .06 .54 .60 P(Bj) .17 .83 1.00 Thus, there is a 27.5% chance that that a fund will outperform the market given that the manager graduated from a top-20 MBA program. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.22 Independence… One of the objectives of calculating conditional probability is to determine whether two events are related. In particular, we would like to know whether they are independent, that is, if the probability of one event is not affected by the occurrence of the other event. Two events A and B are said to be independent if P(A|B) = P(A) and P(B|A) = P(B) P(you have a flat tire going home/radio quits working) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.23 Independence… For example, we saw that P(B1 | A1) = .275 The marginal probability for B1 is: P(B1) = 0.17 Since P(B1|A1) ≠ P(B1), B1 and A1 are not independent events. Stated another way, they are dependent. That is, the probability of one event (B1) is affected by the occurrence of the other event (A1). Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.24 Union… Determine the probability that a fund outperforms (B1) or the manager graduated from a top-20 MBA program (A1). A1 or B1 occurs whenever: A1 and B1 occurs, A1 and B2 occurs, or A2 and B1 occurs… B1 B2 P(Ai) A1 A2 .11 .29 .40 .06 .54 .60 P(Bj) .17 .83 1.00 P(A1 or B1) = .11 + .06 + .29 = .46 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.25 Probability Rules and Trees… We introduce three rules that enable us to calculate the probability of more complex events from the probability of simpler events… The Complement Rule – May be easier to calculate the probability of the complement of an event and then substract it from 1.0 to get the probability of the event. P(at least one head when you flip coin 100 times) = 1 – P(0 heads when you flip coin 100 times) The Multiplication Rule: P(A*B) “way I write it” The Addition Rule: P(A+B) “way I write it” Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.26 Example 6.5… A graduate statistics course has seven male and three female students. The professor wants to select two students at random to help her conduct a research project. What is the probability that the two students chosen are female? P(F1 * F2) = ??? Let F1 represent the event that the first student is female P(F1) = 3/10 = .30 What about the second student? P(F2 /F1) = 2/9 = .22 P(F1 * F2) = P(F1) * P(F2 /F1) = (.30)*(.22) = 0.066 NOTE: 2 events are NOT independent. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.27 Example 6.6… The professor in Example 6.5 is unavailable. Her replacement will teach two classes. His style is to select one student at random and pick on him or her in the class. What is the probability that the two students chosen are female? Both classes have 3 female and 7 male students. P(F1 * F2) = P(F1) * P(F2 /F1) = P(F1) * P(F2) = (3/10) * (3/10) = 9/100 = 0.09 NOTE: 2 events ARE independent. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.28 Addition Rule… Addition rule provides a way to compute the probability of event A or B or both A and B occurring; i.e. the union of A and B. P(A or B) = P(A + B) = P(A) + P(B) – P(A and B) Why do we subtract the joint probability P(A and B) from the sum of the probabilities of A and B? P(A or B) = P(A) + P(B) – P(A and B) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.29 Addition Rule… P(A1) = .11 + .29 = .40 P(B1) = .11 + .06 = .17 By adding P(A) plus P(B) we add P(A and B) twice. To correct we subtract P(A and B) from P(A) + P(B) B1 A1 B1 B2 P(Ai) A1 A2 .11 .29 .40 .06 .54 .60 P(Bj) .17 .83 1.00 P(A1 or B1) = P(A) + P(B) –P(A and B) = .40 + .17 - .11 = .46 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.30 Addition Rule for Mutually Excusive Events If and A and B are mutually exclusive the occurrence of one event makes the other one impossible. This means that P(A and B) = P(A * B) = 0 The addition rule for mutually exclusive events is P(A or B) = P(A) + P(B) Only if A and B are Mutually Exclusive. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.31 Example 6.7… In a large city, two newspapers are published, the Sun and the Post. The circulation departments report that 22% of the city’s households have a subscription to the Sun and 35% subscribe to the Post. A survey reveals that 6% of all households subscribe to both newspapers. What proportion of the city’s households subscribe to either newspaper? That is, what is the probability of selecting a household at random that subscribes to the Sun or the Post or both? P(Sun or Post) = P(Sun) + P(Post) – P(Sun and Post) = .22 + .35 – .06 = .51 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.32 Probability Trees [Decision Trees]… A probability tree is a simple and effective method of applying the probability rules by representing events in an experiment by lines. The resulting figure resembles a tree. This is P(F), the probability of selecting a female student first First selection Second selection This is P(F|F), the probability of selecting a female student second, given that a female was already chosen first Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.33 Probability Trees… At the ends of the “branches”, we calculate joint probabilities as the product of the individual probabilities on the preceding branches. First selection Second selection Joint probabilities P(FF)=(3/10)(2/9) P(FM)=(3/10)(7/9) P(MF)=(7/10)(3/9) Sample Space:[F1*F2, F1*M2, M1*F2, M1*M2] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. P(MM)=(7/10)(6/9) 6.34 Probability Trees… Note: there is no requirement that the branches splits be binary, nor that the tree only goes two levels deep, or that there be the same number of splits at each sub node… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.35 Example 6.8… Law school grads must pass a bar exam. Suppose pass rate for firsttime test takers is 72%. They can re-write if they fail and 88% pass their second attempt [P(pass take 2/fail take 1)]. What is the probability that a randomly grad passes the bar? [sample space?] P(Pass) = .72 First exam Second exam P(Fail and Pass)= (.28)(.88)=.2464 P(Fail and Fail) = (.28)(.12) = .0336 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.36 Bayes’ Law… Bayes’ Law is named for Thomas Bayes, an eighteenth century mathematician. In its most basic form, if we know P(B | A), we can apply Bayes’ Law to determine P(A | B) P(B|A) P(A|B) for example … Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.37 Breaking News: New test for early detection of cancer has been developed. Let C = event that patient has cancer Cc = event that patient does not have cancer + = event that the test indicates a patient has cancer - = event that the test indicates that patient does not have cancer Clinical trials indicate that the test is accurate 95% of the time in detecting cancer for those patients who actually have cancer: P(+/C) = .95 but unfortunately will give a “+” 8% of the time for those patients who are known not to have cancer: P(+/ Cc ) = .08 It has also been estimated that approximately 10% of the population have cancer and don’t know it yet: P(C) = .10 You take the test and receive a “+” test results. Should you be worried? P(C/+) = ????? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.38 What we know. P(+/C) = .95 P(+/ Cc ) = .08 P(C) = .10 From these probabilities we can find P(-/C) = .05 P(-/ Cc ) = .92 P(Cc) = .90 Test Results True State of Nature Have Cancer: C Do Not Have Cancer: CC + - Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.39 Bayesian Terminology… The probabilities P(A) and P(AC) are called prior probabilities because they are determined prior to the decision about taking the preparatory course. The conditional probability P(A | B) is called a posterior probability (or revised probability), because the prior probability is revised after the decision about taking the preparatory course. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.40 Students Work – Bayes Problem The Rapid Test is used to determine whether someone has HIV [H]. The false positive and false negative rates are 0.05 P(+/ Hc) and 0.09 P(-/H) respectively. The doctor just received a positive test results on one of their patients [assumed to be in a low risk group for HIV]. The low risk group is known to have a 6% P(H) probability of having HIV. What is the probability that this patient actually has HIV [after they tested positive]. Feel free to use a table to work this problem P(H) = 0.06 **** P(Hc) = ? P(+/ Hc) = 0.05 **** P(-/ Hc) = ? P(-/H) = 0.09 **** P(+/H) = ? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.41 Students Work – Bayes Problem Transplant operations for hearts have the risk that the body may reject the organ. A new test has been developed to detect early warning signs that the body may be rejecting the heart. However, the test is not perfect. When the test is conducted on someone whose heart will be rejected, approximately two out of ten tests will be negative (the test is wrong). When the test is conducted on a person whose heart will not be rejected, 10% will show a positive test result (another incorrect result). Doctors know that in about 50% of heart transplants the body tries to reject the organ. *Suppose the test was performed on my mother and the test is positive (indicating early warning signs of rejection). What is the probability that the body is attempting to reject the heart? *Suppose the test was performed on my mother and the test is negative (indicating no signs of rejection). What is the probability that the body is attempting to reject the heart? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.42