Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Analysis Lecture 5 Tony Cox My e-mail: [email protected] Course web site: http://cox-associates.com/DA/ Agenda • Assignment 5 – Bayes’ Rule, simulation-optimization • Readings 5: Decision psychology • Probability and Bayes’ Rule – Monty Hall problem – Conditioning. Statistical independence. Laws of probability. Joint, marginal, and conditional probabilities. Bayes’ Rule – HIV and Problem set 4 solutions via Bayes’ Rule • Decision psychology: Heuristics and biases • Probabilistic expert systems (Netica) 2 Homework #5 (Due by 4:00 PM, February 21) • Problems: (a) Soft pretzels, (b) Optimal R&D intensity • Readings – Required: Tversky and Kahneman (1974) on Judgment under uncertainty: Heuristics and biases, http://psiexp.ss.uci.edu/research/teaching/Tversky_Kahneman_1974.pdf • Know these: Representativeness, Availability, Anchoring and Adjustment, – Required: Charniak (1991), pages 50-53, http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836 – Ok to skim for main points: Slovic et al. (2002) on affect heuristic, decision utility vs. experience utility, http://faculty.psy.ohiostate.edu/peters/lab/pubs/publications/2002_Slovic_Finucane_etal._Rational_actors_or_rational_fools.pdf – Ok to skim: Thaler (1981) on dynamic inconsistency, inconsistent discounting, http://faculty.chicagobooth.edu/richard.thaler/research/pdf/Some%20Empirical%20Evi dence%20on%20Dynamic%20Inconsistency.pdf – Optional: Camerer et al. (2012) on Neuroeconomics and ambiguity aversion, http://neuroecon.berkeley.edu/public/papers/Ambiguity-Chapter.pdf 3 Assignment 5, Problem 1: Soft pretzels • Your decision model: – If your new pretzel is a hit, market share will be 30% – If flop, then market share = 10% – Prior probabilities: Pr(hit) = 0.5, Pr(flop) = 0.5 • Evidence: 5 of 20 people preferred your pretzel in a taste test • Find: Pr(hit | 5 of 20 preferred new pretzel) – First step toward getting probabilities from data 4 Assignment 5, Problem 2: Optimal R&D effort • Each new employee a company hires has a 10% probability (independently of anyone else) of solving a certain R&D problem in the next year • If solution is obtained in the next year, it is worth $1M (else $0). • Each new employee costs $0.05M. • To maximize EMV, how many new employees should the company hire to work on this R&D problem? – Suggested approach: Simulation-optimization in R 5 Probability and Bayes’ Rule Monty Hall problem Conditioning Statistical independence Laws of probability Joint, marginal, and conditional distributions Bayes’ Rule Solutions to problems 6 Where are we going? 1. Monty Hall problem and intuitive solution 2. Bayes’ Rule – Probability background: joint, marginal, and conditional probabilities. Statistical independence. 3. Where do models, Pr(E | H), come from? – Data for inferring causality – Valid vs. non-valid statistical models • Causal inference: Biases and solutions – Simpson’s Paradox 4. Causal Graphs (and paper/presentation topics) – DAG models show conditional independence relations – Topics: Path analysis, Structural Equation Models (SEMs), and Simon causal ordering 7 Example: Monty Hall problem • A prize is hidden in one of three boxes. The location of the prize is selected at random • You pick any one of three boxes. • Before you open it, Monty opens an empty box. He offers to let you switch to the remaining unopened box. Should you? – If you picked the one with the prize, he opens another box at random – Else, he open the box without the prize 8 Example: Monty Hall problem • Pr(you picked the right box) = 1/3 • Pr(the remaining box contains the prize) = (1/3)(0) + (2/3)(1) = 2/3 • Thus, you should switch. – Doing so doubles your chances of getting the prize, from 1/3 to 2/3 – It is not true that, after Monty opens the empty box, each of the other two is equally likely to contain the prize 9 Intuitive solution to Monty Hall Problem • In effect, the host “points to” the correct box (by opening all incorrect ones) if the contestant has not already selected it. (Otherwise, the host points to a randomly selected incorrect box.) • So, with probability 2/3, we should accept the box that the host points to (i.e., Box 3). • How to discover this without intuition? 10 Our goal: Bayes’ Rule Conditional probability: Pr(H | E) = Pr(H & E)/Pr(E) Total probability: Pr(E) = Pr(E | H)Pr(H) + Pr(E | not-H)Pr(not-H) Bayes’ Rule: Pr(H | E) = Pr(E | H)Pr(H)/Pr(E), with Pr(E) calculated via Total Probability formula. • • • • H = hypothesis or theory E = evidence (observed data) Pr(H) = prior probability for H Pr(H | E) = posterior probability for H, given evidence E (or conditioned on evidence E) (“|” = “conditioned on”) • Pr(E | H) = likelihood of E, given H. • Data base interpretation: – E tells you which rows to look at – (H & E) tells you how many of those rows H holds in. 11 DAG form of Bayes Rule • Simple DAG model: H E – DAG = directed acyclic graph – H is now a random variable (e.g., with one value for each alternative hypothesis) • Inference problem: Given a prior probability for each possible value of H and given an observed value of E, what are the posterior probabilities for each value of H, conditioned on the data E? • “Arc-reversal”: See E, infer H. Find Pr(H = h | E = e). • Generalization: From one arc to a whole DAG! 12 Conditioning 13 Notation: Conditioning • Pr(A | B) = conditional probability of A, given B – A and B are events – Events are subsets of a “sample space” – We will sometimes condition on acts/decisions • “|” is the sign for “conditioned on” or “given” • Example: For a fair die, what is Pr(3)? What is Pr(3 | odd)? 14 Interpretation of conditioning • Let A and B be two events – An “event” is usually interpreted as a subset of a sample space – A sample space is often interpreted as a set of possible observed outcomes • (A | B) is the symbol for “A given B” or “A conditioned on B” • Pr(A | B) = Pr(A is true given that B is true) • Data set interpretation: Pr(A | B) = fraction of (rows in which B holds) for which (A holds too) – Rows = random cases; columns = fields/variables 15 Conditioning in a data set Record 1 Gender M Age 31 Smoker? Yes COPD? No 2 3 4 F F M 41 59 26 No Yes No No Yes No 5 6 F M 53 58 No Yes No Yes • For a randomly sampled record, what is… – Pr(smoker), Pr(COPD | smoker), Pr(smoker | COPD) – Pr(COPD | male smoker), Pr(male smoker | COPD & > 50) – Pr(smoker & COPD) 16 Conditioning in a data set Record 1 Gender M Age 31 Smoker? Yes COPD? No 2 3 4 F F M 41 59 26 No Yes No No Yes No 5 6 F M 53 58 No Yes No Yes • For a randomly sampled record, what is… – Pr(smoker) = 3/6 = 1/2, Pr(COPD | smoker) = 2/3 – Pr(smoker | COPD) = 1, Pr(COPD | male smoker) = 1/2 – Pr(male smoker | COPD & > 50) = 1/2, Pt(smoker & COPD) = 1/3 17 Why does conditioning matter? • Learning (or “updating of beliefs”) takes place by conditioning, in traditional DA • Value of information is determined by how much it increases the conditional expected utility of the best decision • Causality: Suggests (but does not prove) how changing some choices (e.g., smoking) might change the probabilities of consequences (e.g., COPD) • Inference: Can be used to answer queries (Netica) 18 Examples of conditioning • • • • A B C Case 1 3 1 4 2 1 5 9 3 2 6 5 4 3 5 8 All rows equally likely Find: Pr(B = 5 | C > 4)? Find: Pr(B > C)? (unconditional probability) Pr(B > C | A + B > C)? (conditional probability) 19 Examples of conditioning • • • • A B C Case 1 3 1 4 2 1 5 9 3 2 6 5 4 3 5 8 All rows equally likely Pr(B = 5 | C > 4) = 2/3 Pr(B > C) = 1/4 (unconditional probability) Pr(B > C | A + B > C) = 1 (conditional probability) 20 Conditioning can be used to answer queries • A universal rule for data-based inference? – Sound? (Only correct conclusions?) – Complete? (Can we use it to answer any answerable question, as well as possible?) • How to use conditioning to draw causal inferences? – Causal graph models will address this • What other inference rules are there? – Deduction, induction, analogy, insight 21 Statistical independence Laws of probability Joint, marginal, and conditional probabilities Bayes’ Rule 22 Statistical Independence • Events A and B are statistically independent if and only if: Pr(A and B) = Pr(A)*Pr(B) – (Thus, since Pr(A and B) = Pr(A)Pr(B | A), we must have Pr(B | A) = Pr(B) = Pr(B | not-A).) • Random variables X and Y are statistically independent if Pr(x, y) = Pr(x)Pr(y) (“joint PDF equals product of marginals”) – Justification: Pr(x, y) = Pr(x)*Pr(y | x), If X and Y are independent, Pr(y | x) = Pr(y) • If X and Y are not statistically independent, then each provides information about the other 23 Example • Joint probability table. Are X and Y independent? • (No!) Y X 0 1 0 1 0 0.2 0.2 0.3 0.3 0.5 0.8 0.7 24 Probability Basics • • Let X be a random variable with possible values of a, b, c,… Then a probability density function (PDF) for X assigns numbers (“probabilities”) to possible values of X such that: 1. Pr(X = a) + Pr(X = b) + Pr(X = c) + … = 1 Summation notation: xPr(X = x) = 1; xPr(x) = 1 2. Each probability number is non-negative 3. If X is continuous, then replace summation with integration: all x Pr(X = x)dx = 1 For more tutorial, www.sci.utah.edu/~gerig/CS6640F2010/prob-tut.pdf 25 Probability Basics (Cont.) Rules of probability • Pr(A or B) = Pr(A) + Pr(B) – Pr(A and B) – A and B are “events” (subsets of possible outcomes in sample space) – = intersection of events (at semantic level) corresponds to “&” at the syntactic level) • Total probability: Pr(A) + Pr(not-A) = 1 • For additive probability measures, summation of probs. represents disjoint union of events • Pr(A & B) = Pr(A | B)*Pr(B) = Pr(B | A)*Pr(A) 26 Application: “Law of Total Probability” • If exactly one of {H1, H2, H2,…Hn} is true, • Then Pr(E) = Pr(E & H1) + Pr(E & H2) + … + Pr(E & Hn) • Q: Why? • A: Because for E to happen, it must happen in exactly one of these ways (i.e., with exactly one of these hypotheses being true.) – E here represents any event, not only “evidence” • We will soon see that the above implies: Pr(E) = Pr(E | H1)Pr(H1) + … + Pr(E | Hn)Pr(Hn) 27 Joint, Marginal, and Conditional Probabilities Let X and Y be two random variables. Their joint PDF is: Pr(X, Y) = Pr(X = x and Y = y)= f(x, y) Joint distribution can be factored as a product of a marginal PDF and a conditional PDF: Key formula: Pr(x, y) = Pr(x)Pr(y | x) = marginal * conditional = Pr(X = x)Pr(Y = y | X = x) 28 Example of Joint and Marginal Distributions Q: Given the joint PDF for X and Y below, what is the PDF for X? Find Pr(X > 1). X 1 2 3 Y 4 0 0.2 0.1 8 0.1 0.2 0 16 0.1 0 0.3 29 Solution to Marginalization Problem • To marginalize out Pr(X), just sum! • Pr(X > 1) = 0.8. X 1 2 3 Y 4 0 0.2 0.1 8 0.1 0.2 0 16 0.1 0 0.3 Marginal Pr(X) 0.2 0.4 0.4 30 Why does Pr(y | x) = Pr(x, y)/Pr(x)? • Pr(x, y) = Pr(X = x and Y = y) = Pr(X = x)Pr(Y = y | X = x) = Pr(x)Pr(y | x) • Dividing both sides by Pr(x) gives Pr(y | x) = Pr(x, y)/Pr(x) By symmetry, Pr(y | x)Pr(x) = Pr(x | y)Pr(y) = Pr(x, y) So, Pr(y | x) = Pr(x | y)Pr(y)/Pr(x). This is almost Bayes’ Rule! 31 Bayes’ Rule Start with Pr(y | x) = Pr(x | y)Pr(y)/Pr(x). Now, write Pr(x) as: Pr(x) = vPr(X = x, Y = v) (“marginalize out” Y) Pr(x) = vPr(x, v) = vPr(x | v)Pr(v). Then, Pr(y | x) = Pr(x | y)Pr(y)/[vPr(x | v)Pr(v)] This is Bayes’ Rule. (v ranges over all possible values of Y.) 32 Bayes’ Rule for Conditioning on Evidence Bayes’ Rule: Pr(H | E) = Pr(E | H)Pr(H)/Pr(E) • • • • H = hypothesis or theory – or query! E = evidence (observed data) Pr(H) = prior probability for H (from model) Pr(H | E) = posterior probability for H, given evidence E (or conditioned on (“|”) evidence E) • Pr(E | H) = likelihood of E, given H (from model). 33 Solving Monty Hall with Bayes’ Rule • H = prize is in Box # 1, H3 = prize in Box 3, H2 = prize is in Box 2. • E = Host shows ball is not in Box # 2 • Pr(H | E) = Pr(E | H)Pr(H)/Pr(E) = Pr(E | H)Pr(H)/ Pr(E | v)Pr(v)] • Pr(E | H) = 0.5 • Pr(H) = 1/3 • Pr(E) = Pr(E | H)Pr(H) + Pr(E | H2)Pr(H2) + Pr(E | H3)Pr(H3) = (1/6) + 0 + (1/3) = 0.5 • Pr(H | E) = 1/3, Pr(H3 | E) = 2/3. v 34 Bayesian learning of p(s): Conditional probabilities • Pr(s | y) = Pr(y | s)Pr(s)/Pr(y) – Definition of conditional probability – s = state, Pr(s) = prior probability of s – y = evidence, data, signal, etc. – Pr(y | s) = likelihood of y given s – Pr(s | y) = posterior probability of s give y • Pr(y) = s′ Pr(y | s′)Pr(s′) Law of total probability • Bayes Rule: Pr(s | y) = Pr(y | s)Pr(s)/ s′ Pr(y | s′)Pr(s′) 35 Example: HIV screening manual solution • Pr(s) = 0.01 = fraction of population with HIV – s = has HIV, s′ = does not have HIV – y = test is positive • Pr(test positive | HIV) = 0.99 • Pr(test positive | no HIV) = 0.02 • Find: Pr(HIV | test positive) = Pr(s | y) – Subjective probability estimates? 36 Example: Screening for HIV • Bayes Rule: Pr(s | y) = Pr(y | s)Pr(s)/ s′ Pr(y | s′)Pr(s′) – Pr(s) = 0.01 – s = has HIV, s′ = does not have HIV – Pr(y | s) = 0.99 Pr(y | not-s) = 0.02 – Pr(y) = Pr(y | s)Pr(s) + Pr(y | not-s)Pr(not-s) • = (0.99*0.01)/(0.99*0.01 + 0.02*0.99) = 1/3 • If test is positive, Pr(HIV | positive test)= 1/3 – Twice as many false positives as true positives 37 Assignment 4, Problem 1: Fair Coin Problem (due 2-14-17) • A box contains two coins: (a) A fair coin; and (b) A coin with a head on each side. One coin is selected at random (we don’t know which) and tossed once. It comes up heads. • Q1: What is the probability that the coin is the fair coin? • Q2: If the same coin is tossed again and shows heads again, then what is the new (posterior) probability that it is the fair coin? Solve manually and/or using Netica. 38 Manual Solution to Fair Coin Problem, Part 1 • • • • • H1 = coin is fair H2 = coin has 2 heads E1 = head is obtained on first toss Posterior = Likelihood*Prior/Pr(evidence) Pr(H1 | E1) = Pr(E1 | H1)Pr(H1)/Pr(E1) • = Pr(E1 | H1)Pr(H1)/[Pr(E1 | H1)Pr(H1) + Pr(E1 | H2)Pr(H2)] • = (0.5)(0.5)/[(0.5)(0.5) + (1)(0.5)] = 1/3 39 Solution to Coin Tossing Part 2 Pr(Fair |HH) = Pr(HH | Fair)Pr(Fair)/Pr(HH) = (0.25)(0.5)/[Pr(HH | Fair)Pr(Fair) + Pr(HH | Unfair)Pr(Unfair)] = (0.125)/[(0.125) + (1)(0.5)] = 1/(1 + 4) = 0.2 Alternative (sequential approach): Pr(Fair | 2nd toss is H) = Pr(2nd toss is H | Fair)Pr(Fair)/Pr(H) where all information is conditioned on first toss being H = (0.5)(1/3)/[(0.5)(1/3) + (1)(2/3)] = (0.5)/[0.5 + 2] = 1/(1 + 4) = 1/5 = 0.2. Using posteriors from first stage as priors for second stage gives the same answer as conditioning on entire history. 40 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” (right-click) to specify observations (i.e., histories of observations on which answers are to be conditioned, e.g., “Head on first toss” or “Heads on first two tosses”) • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). 41 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” CoinIsFair Yes 50.0 No 50.0 FirstToss Head 75.0 Tail 25.0 SecondToss1 Head 75.0 Tail 25.0 42 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). CoinIsFair Yes 33.3 No 66.7 FirstToss Head 100 Tail 0 SecondToss1 Head 83.3 Tail 16.7 43 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). CoinIsFair Yes 20.0 No 80.0 FirstToss Head 100 Tail 0 SecondToss1 Head 100 Tail 0 44 Assignment 4, Problem 2: Defective Items (due 2-14-17) • Machines 1, 2, and 3 produced (20%, 30%, 50%) of items in a large batch, respectively. • The defect rates for items produced by these machines are (1%, 2%, 3%), respectively. • A randomly sampled item is found to be defective. What is the probability that it was produced by Machine 2? • Exercise: (a) Solve using Netica (b) Solve manually • E-mail answer (a single number,) to [email protected] 45 Manual solution to defective items problem • Machines 1, 2, and 3 produced (20%, 30%, 50%) of items in a large batch, respectively. • The defect rates for items produced by these machines are (1%, 2%, 3%), respectively. • A randomly sampled item is found to be defective. What is the probability that it was produced by Machine 2? • Pr(machine 2 | defective) = Pr(defective | machine 2) * Pr(machine 2)/[vPr(defective |machine = v)Pr(v)] • = (0.02)*(0.30)/[(0.01)*(0.20) + (0.02)*(0.30) + (0.03)*(0.50)] • = 0.261 (answer) 46 Netica solution Machine A B C 8.70 26.1 65.2 Defect Defect NoDefect 100 0 47 Simulation-optimization: One-dimensional choice set A and state set S 48 Mockingbird Airline’s overbooking decision • Plane has 16 seats • Each seat reserved generates $225 revenue. – Total revenue = $225*Res – Res = number of reservations sold = decision variable • C1 = operations costs: Cost of flight to Mockingbird = $900 + $100*min(arrivals,16) • C2 = penalty costs: Also, if > 16 passengers with reservations show up, then each such passenger (after 16) costs Mockingbird $325 = (325*max(0, arrivals -16)) • Pr(passenger with reservation arrives/shows up) = 0.96 • How many reservations should Mockingbird sell to maximize EMV? 49 Structuring the decision problem • What is the set of acts, A? • What is the set of states, S? • What is the probability of each state, s, if act a is taken? • What is the consequence of each act-state pair, c(a, s)? 50 Mockingbird’s decision problem in normal form • A = choice set = number of reservations to sell = res (rows of decision table) • S = state of nature = number of arrivals = arrivals (columns of decision table) • C = set of consequences = profit c(a, s) = c(Res, arrivals) = 225*Res – (900 + 100*min(arrivals,16)) – (325*max(0, arrivals -16)) • P(arrivals | res) = dbinom(arrivals, res, 0.96) 51 Mockingbird Overbooking Expected profit calculation when Res = 16 • R = 225*16 = 3600 • E(C1 | Res = 16) = 900 + 100*E(arrivals) = 900 + 100*np = 900 + 100*16*0.96 = 2436 • E(C2 | Res= 16) = 325*max(0, arrivals - 16) =0 • E(Profit | Res = 16) = 3600 - 2436-0 = 1164 52 Solution using R Profit = Rev = Ecost1 = Ecost2 = NULL for (Res in 1:20){ p = c1 = c2 = NULL # Variables to hold answers # loop over all acts # Reset variables to empty for (arrivals in 0:Res) { # loop over all states p[arrivals] = dbinom(arrivals, Res, 0.96) # calculate Pr(s | a) c1[arrivals] = 900 + 100*min(arrivals,16) # calculate c(a, s) c2[arrivals] = 325*max(0, arrivals-16)} Revenue = Res*225; EC1 = sum(p*c1); EC2 = sum(p*c2) Profit[Res] = Revenue - EC1 - EC2 # Store EU(a) = EU(c(a,s)) Rev[Res] = Revenue; Ecost1[Res] = EC1; Ecost2[Res] = EC2} plot(Profit) best_act <- which(Profit == max(Profit)) print(best_act) 53 Profit as a function of Res (reservations sold) 1000 Inspecting the answers: 0 Profit 500 > Profit[16] [1] 1164 > Profit[17] [1] 1180.593 > Profit[18] [1] 1125.245 > Profit[19] [1] 1045.438 -500 So, Mockingbird maximizes expected profit by selling 17 reservations, overbooking by 1 5 10 Index 15 20 54 Bayesian analysis and subjective probabilities 55 How to get needed probabilities? 1. Derive from other probabilities and models; condition on data – Bayes’ rule, decomposition and logic, event trees, fault trees, probability theory & models – Monte Carlo simulation models 2. Make them up (subjective probabilities), ask others (elicitation) – Calibration, biases (e.g., over-confidence) 3. Estimate them from data 56 Subjective EU theory • Theory: If preferences are coherent, then one should have coherent subjective probabilities for all events. – In theory, subjective probabilities can be quantified from preferences for betting on events vs. betting on outcomes of random numbers with known probabilities. • These subjective probabilities can be used to calculate subjective EU (SEU) for acts 57 Subjective conditional probabilities • Linda is 31 year old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice and also participated in antinuclear demonstrations. Use your judgment, conditioned on this information, to rank the following statements by their probability, from 1 = most probable to 8 = least probable… a. b. c. d. e. f. g. h. Linda is a teacher in an elementary school Linda works in a bookstore and takes Yoga classes Linda is active in the feminist movement Linda is a psychiatric social worker Linda is a member of the League of Women Voters Linda is a bank teller Linda in an insurance salesperson Linda is a bank teller and is active in the feminist movement 58 How do you rank the (conditional) probabilities? • Compare c, f, h – c Linda is active in the feminist movement – f Linda is a bank teller – h Linda is a bank teller and is active in the feminist movement • Pr(h) = Pr(c & f) = Pr(c)*Pr(f | c) = Pr(f)*Pr(c | f) • So Pr(h) cannot exceed Pr(f) • But subjectively assessed probabilities can (and often do) violate such logical constraints • Training improves consistency and calibration of subjective probability judgments 59 Tversky and Kahneman (1974) • Subjective estimates of probabilities are often/usually wrong – Overconfidence and poor calibration – Representativeness, availability, and anchoring heuristics bias subjective probability assessments • Incorrect priors – Example: Profit rates of companies • Ignoring relevant information (and base rates) – Scope insensitivity, affect heuristic • Seeking and using irrelevant information – Confirmation bias 60 Wrap-up on subjective probabilities • In theory, coherence of preferences implies that subjective probabilities and can be used to calculate subjective expected utilities to optimize choices • In reality, heuristics and biases make many/most subjective probabilities unreliable – Superforecasting discusses exceptions • We will emphasize probability models and statistical methods that rely on data rather than subjective judgments 61