Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability definitions 1. Probability of an event = chance that the event will occur. 2. Experiment = any action or process that generates observations. In some contexts, we speak of a “data-generating process” Examples: toss a coin (one or more times), roll 2 dice, select 5 cards from a deck, interview 100 people for market research, observe the reaction of 50 patients to a new drug. 3. Sample space = set of all possible outcomes of an experiment. Example: if two dice (a red die and a green die) are tossed, any outcome is described by the number on the red die and the number on the green die. Note that the outcome is a “low level” description of what happened. Let the number on the red die be indicated by bold italics, so that (3, 2) indicates a 3 on the red die and a 2 on the green die. In the experiment of tossing a red die and a green die, we can list the sample space in a table: 1, 1 1 , 2 1 , 3 1 , 4 1 , 5 1, 6 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 3 , 1 3 , 2 3 , 3 3 , 4 3 , 5 3 , 6 4 , 1 4 , 2 4 , 3 4 , 4 4 , 5 4 , 6 5 , 1 5 , 2 5 , 3 5 , 4 5 ,5 5 , 6 6 , 1 6 , 2 6 , 3 6 , 4 6 ,5 6 , 6 4. Event = any collection or subset of outcomes in the sample space. Events are simple if they contain exactly one outcome, and are compound if they contain more than one. Example: We can define the event A = both dice show the same number, and calculate P(A) = 6 / 36. (If you xerox or copy the sheet, shade in the appropriate areas) Define event B = sum of two numbers at least 10. P (B) = 6 / 36 5. Random variables associate a number with an event. If we define the random variable X as the sum of the two numbers on the dice, we can say P (X = 6) = 5/36 6. Union of two events, denoted by ∪ as in A ∪ B, constructs a new event – both dice show the same number, or the sum of the two numbers is at least 10. This should be read as “A or B”. Find the probability of A or B in the example above. Does P(A or B) = P(A) + P(B) ? Why not? 7. Intersection of two events, denoted by ∩ as in A ∩ B, constructs a new event - both dice show the same number, which is greater than or equal to 10. This should be read as “A and B”. Find the probability of A and B in the example above. Does P (A and B) = P(A) times P(B) ? Why not? 8. Complement of an event A, denoted by a prime or superscript c, as in A’ or Ac , indicates those outcomes not in the event A. What is the probability of A’ in the example above? Note that P(A’) = 1.0 - P(A) This relationship often makes calculations much simpler, especially when the problem includes phrases such as “at least” or “at most”. Example (Chevalier de la Mere): what is the probability that at least one six turns up in 4 tosses of a die? [Hint: it is a little more than half, .5177. De la Mere found that out by extensive experiment.] What is the probability that at least one double six turns up in 24 tosses of two dice? [The chance of double six is 1/36, but we compensate by having 6 times as many tosses, so de la Mere thought the probability should be the same. Is it? Hint: de la Mere lost a lot of money by believing this] Statistics 1040 Dr. McGahagan Probability problems Simple occurences: Event Probability Get a tail in a single toss of a fair coin Roll a 3 on a normal, 6 sided die Draw a heart from a deck or cards Child born on a Saturday or Sunday More complicated events, for which it will be helpful to use random variables to keep track of outcomes (for example, one might want the random variable X = number of heads in 3 tosses of a fair coin) Event Toss 3 heads in 3 tosses of a coin Roll boxcars (6 on each of 2 dice) Be dealt a flush (5 cards of same suit) in a standard 5 card deal. Toss exactly 2 heads and two tails IN THAT ORDER in 4 tosses of a coin Toss exactly 2 heads and two tails IN ANY ORDER in 4 tosses of a coin Roll either boxcars or snake-eyes (2 sixes or 2 ones in a roll of 2 dice. Probability Addition and multiplication rules – the full story In some of the above problems, we used the addition rule in the form P(A) or P(B) = P(A) + P(B) and the multiplication rule in the form P(A and B) = P(A) P(B) We must extend the rules to take account of 1. events that are not mutually exclusive – that is, events which can both happen at the same time. Suppose you have been dealt a poker hand of Jack ,Queen, King and Ace of hearts along with the 2 of clubs. You are interested in the change of coming up with a winning hand if you discard the 2 and draw another card. What are the chances of getting either a 10 or a heart? We can assume that either a flush or a straight will win – but there is the chance here of getting a royal flush with the 10 of hearts. 2. Events that are not independent – that is, in which the probability of one event affects the probability of another. What is the chance of drawing two hearts in a row? Hint: NOT 13/52 times 13/52. Why? Suppose the chance that, for the US as a whole, the chance that a family’s first car is domestic is .75, denoted as P(D1) = .75, and the chance that the second car is domestic is .4, denoted as P(D2) = .4 What is the chance that both cars are domestic? If the two events are independent, we could apply the multiplication rule P(D1 and D2) = P(D1) x P(D2) = 0.75 x 0.4 = 0.30 But it may be that purchasers show buyer loyalty, that is, those who purchased a domestic car for their first car are more likely than the average to buy a domestic car for their second car. Assume that P(D2 given D1) = 0.6 – also written P(D1 | D2) = 0.6 and calculate P(D1 and D2). Hint: look back at the two hearts in a row problem. P (D1 and D2) = P(D1) x P (D2 | D1) = .75 x 0.6 = 0.45 A visual presentation of a similar problem: D1 = a family's first car is domestic; F1 = probability first car is foreign D2 = a family's second car is domestic; F2 = probability second car is foreign Given that P(D1) = 0.75 and P(D2) = 0.4, and assuming there are 100 total cars and that P(D1 and D2) = 0.35 fill in the following table: A few numbers have been filled in to get you started. Be sure you understand how they were arrived at, and how they reflect the statements above. Car 2 is domestic Car 2 is foreign ROW TOTALS Car 1 is domestic 35 75 40 100 Car 1 is foreign COLUMN totals After doing so, calculate all the joint probabilities: P(D1 and D2) = 0.35 P(D1 and F2) = P(F1 and D2) = P(F1 and F2) = And all the conditional probabilities P(D2 | D1) = P(D2 | F1) = P(F2 | D1) = P(F2 | F1) = What is the probability that (for two car families described in the table above): a. Both a family's cars are domestic? b. Both a family's cars are foreign? c. A family has one domestic and one foreign car? d. We know that the Smith family has at least one domestic car. What is the chance that they also have a foreign car? e. We know that the Jones family has at least one foreign car. What is the chance that they also have a domestic car? Answers: Car 2 is domestic Car 2 is foreign ROW TOTALS 35 40 75 Car 1 is foreign 5 20 25 COLUMN totals 40 60 100 Car 1 is domestic The JOINT PROBABILITIES are easy: The table gives the NUMBERS of families in each category -there are 35 families with both car 1 and car 2 being domestic, so the probability that any two-car family chosen at random having two domestic cars is 35 / 100 = 0.35 or 35 percent. P (D1 and D2 ) = 0.35 (answer for [a] on last page) P (D1 and F2) = 0.40 P (F1 and D2) = 0.05 P (F1 and F2) = 0.20 (answer for [b] on last page) For [c] on the previous page, note that families will have one domestic and one foreign car if their two cars are in the square D1 and F2 OR in the square F1 and D2. The OR is telling us to ADD the joint probabilities P(D1 and F2) + P (F1 and D2) = 0.40 + 0.05 = 0.45 The ROW and COLUMN totals give the MARGINAL PROBABILITIES, since they are written in the margin of the detailed table (don't think of marginal cost !). P(D1) = 0.75 P (F1) = 0.40 P (D2) = 0.25 P (F2) = 0.60 CONDITIONAL PROBABILITIES can be read off the table: If you are GIVEN that D1 is domestic, you know you are in the first ROW of the table -the information given means the family is one of the 75 for whom the first car is domestic. You can mentally reduce the entire table to: Car 1 is domestic Car 2 is domestic Car 2 is foreign ROW TOTALS 35 40 75 Hence the probability that their second car is foreign is: P (F2 | D1) = 40 / 75 = 0.5333 If we are GIVEN that the second car is domestic, the probability that the first car is foreign is P (F1 | D2) = 5 / 40 = 0.1250 (mentally reduce the entire table to the first column) If we are given (as in part [d]) that the Smith family has at least one domestic car, we delete the F1 - F2 square from the table, leaving 80 families, so chance of also having a foreign car is 45 / 80 = 0.5625 For the Jones family in part [e], with at least one foreign car, delete the D1-D2 square; their chance of also having a domestic car is 20 / 65 = 0.3077. Bayes's Theorem Suppose that 80 percent of the taxicabs in town are owned by the Yellow Cab Company and 20 percent are owned by the Blue Cab Company, and that they are painted accordingly. A taxicab driver was arrested in a bank robbery. A witness claims that a cab was used as the getaway car, and thinks that the cab was blue, although he admits that the light was poor. As a result of repeated tests in similar lighting conditions, the defense finds that the witness is 75 percent accurate -- that he correctly identifies a blue cab as blue 75 percent of the time, but incorrectly identifies a blue cab as yellow 25 percent of the time. We should: (a) reject the witness testimony as not perfectly accurate, and treat the probability of the cab being blue as 20 percent. (b) accept the witness as having a 75 percent chance of being right and the cab being blue (c) treat the probability of the cab being blue as somewhere between 20 and 75, but closer to 20 (that is, more than 20 but less than 47.5) (d) treat the probability of the cab being blue as somewhere between 20 and 75, but closer to 75 (that is, more than 47.5 but less than 75) Answer: The answer will be between the two extremes -- although not perfectly accurate, the witness is right more often than not, and his claim that the cab is blue raises the chance to more than 20 percent. But it is NOT 75 percent: the test establishes the chance the witness SAYS the cab is blue, GIVEN THAT it is in fact blue at 75 percent, but we are interested in the "inverse probability" that the cab is REALLY blue, GIVEN that the witness SAYS it is blue. P (Says B | B) = 0.75 P (Says B | B) = P (B and SAYS B) / P (B) from the definition of conditional probability P (B | Says B) = P (B and SAYS B) / P (Says B) has the same numerator, but a different denominator. It is a simple application of the multiplication rule for dependent events to compute the numerator: P (B and Says B) = P (B) * P (Says B | B) = 0.20 * (.75) = 0.15 Make a table and fill in the other possibilites: P (B and Says Y) = P (B) * P (Says Y | B) = 0.20 * (.25) = 0.05 P (Y and Says B) = P (Y) * P (Says B | Y) = 0.80 * (.25) = 0.20 P (Y and Says Y) = P (Y) * P (Says Y | Y) = 0.80 * (.75) = 0.60 Says B Says Y Row totals IS B 0.15 0.05 0.20 IS Y 0.20 0.60 0.80 Col. totals 0.35 0.65 [Grand total = 1.00 or 100 percent] We know the witness said B, so we know that the first column is the only one that counts. Note that despite his 75 percent accuracy, the fact that there are more yellow cabs means that our witness makes more mistakes than correct identifications. P (B | Says B) = 15 / 35 = 3/7 = 0.4286 = 42.86 percent