Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Analysis Lecture 8 Tony Cox My e-mail: [email protected] Course web site: http://cox-associates.com/DA/ Agenda • • • • • • Charniak: 0.5 is correct (not a typo) Papers/projects; revised course schedule Homework and readings Car buying – solution Mini-review for midterm Other probability models and practice problems – Discrete: geometric, Poisson – Continuous: exponential, beta, normal 2 Projects 3 Papers and projects: 3 types • Applied: Analyze decision using DA methods • Review a book, explain its contributions – Decision psychology, decision analysis with risk and uncertainty • Research/review paper (3-5 articles) – Explain a topic within decision analysis (e.g., Netica algorithms, multicriteria decision-making, commonly used utility functions, decision analysis of bargaining, auctions, risky investments, etc.) 4 Projects (cont.) • Typical paper is about 10 pages, font 12, space 1.5. (This is typical, not required) • Content matters; length does not • Purposes: 1. Learn something interesting and useful; 2. Either explain/show what you learned, or show how to use it in practice (or both) 5 Project proposals due next week! • If you have not yet done so, please send me a succinct description of what you want to do (and perhaps what you hope to learn by doing it). • Due by end of day on Wednesday, March 15th (though sooner is welcome) • Key dates: April 18 for rough draft (or very good outline) • May 4, 8:00 PM for final 6 Revised course schedule • March 14: No class: Take-home midterm (20%) • March 28 14: Project/paper proposals due • March 21: No class (Spring break) • April 18: Draft of project/term paper due • May 4: Project/term paper due by 8:00 PM (30%) • May 9: Final Exam (20%) 7 Car buying – solution 8 Car buying • You can buy a new car for $4000 or a used one for $2700. • If the used car is good, you will spend only $750 on repairs. If the used car is bad, it will cost you $1750 in repairs. • The prior probability that the used car is good is 0.4. • The AAA offers a free road test that has a 90% of showing the correct state of the used car (good or bad). [So, it has a 10% error rate (shows good as bad or bad as good).] • If you do the AAA test, there is a 1/3 probability that the car will be sold to someone else while you are waiting for the test; if so, then you have no other option but to buy the new car. • Your Garage can test the used car now for free, with no chance that you will lose the opportunity to buy it. If it is good, they will say so; if it is bad, there is a 50% chance they say it is bad. • What should you do to maximize EMV? What is the EMV of the optimal policy? (You can arrange at most one test.) Solution strategy • Setting up the problem: Very clear (and long) notation helps keep calculations clear when working with R • Logic of set-up: Backward chaining – Start with EMV formula you want to calculate – Figure out how to get each piece from data you have • Logic of solution calculations: Forward chaining – Each step in calculation uses values from preceding steps; initial values are given in the problem data 10 Notation # Notation and problem data • Pr_oldcar_is_good <- 0.4 • Pr_oldcar_is_bad <- 1- Pr_oldcar_is_good • Pr_AAA_says_good_if_oldcar_is_good <- 0.9 • Pr_AAA_says_good_If_oldcar_is_bad <- 0.1 • Pr_Garage_says_good_if_oldcar_is_good <- 1 • Pr_Garage_says_good_If_oldcar_is_bad <- 0.5 • Pr_old_car_remains_available_during_AAA_test <- 2/3 • Pr_old_car_is_sold_during_AAA_test <- 1Pr_old_car_remains_available_during_AAA_test • cost_of_oldcar_if_good <- 3450 • cost_of_oldcar_if_bad <-4450 • cost_of_newcar <- 4000 11 Logic for Garage test (backward chaining) # EMV_Garage = EMV(do Garage test, buy old car if test says good, else buy new) # In each line below, unknown quantities to be calculated from known ones are in bold • • EMV_Garage <Pr_Garage_says_oldcar_is_good*EMV_oldcar_if_Garage_says_good + cost_of_newcar*(1- Pr_Garage_says_oldcar_is_good) EMV_oldcar_if_Garage_says_good <Pr_oldcar_is_good_if_Garage_says_good *cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_Garage_says_good *cost_of_oldcar_if_bad – Pr_oldcar_is_bad_if_Garage_says_good <- 1Pr_oldcar_is_good_if_Garage_says_good – Pr_oldcar_is_good_if_Garage_says_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_Garage_ says_oldcar_is_good • Pr_Garage_says_oldcar_is_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good + Pr_Garage_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad 12 Evaluating Garage option (forward chaining) # Each line can be evaluated from the results or data that precede it • Pr_Garage_says_oldcar_is_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good + Pr_Garage_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad • Pr_oldcar_is_good_if_Garage_says_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good/ Pr_Garage_says_oldcar_is_good • Pr_oldcar_is_bad_if_Garage_says_good <- 1Pr_oldcar_is_good_if_Garage_says_good • EMV_oldcar_if_Garage_says_good <- Pr_oldcar_is_good_if_Garage_says_good * cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_Garage_says_good * cost_of_oldcar_if_bad • EMV_Garage <Pr_Garage_says_oldcar_is_good*EMV_oldcar_if_Garage_says_good + cost_of_newcar*(1- Pr_Garage_says_oldcar_is_good) • > EMV_Garage • [1] 3915 13 Logic for AAA test (backward chaining) # Calculate EMV_AAA = EMV(do AAA test, buy old car if test says good and it is still available, else buy new car for $4000) # In each line below, unknown quantities to be calculated from known ones are in bold • EMV_AAA <Pr_AAA_says_oldcar_is_good*(cost_of_newcar*Pr_old_car_is_sold_during_AAA_test + Pr_old_car_remains_available_during_AAA_test*EMV_oldcar_if_AAA_says_good) + cost_of_newcar*(1- Pr_AAA_says_oldcar_is_good) • EMV_oldcar_if_AAA_says_good <- Pr_oldcar_is_good_if_AAA_says_good * cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_AAA_says_good * cost_of_oldcar_if_bad – Pr_oldcar_is_good_if_AAA_says_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_AAA_says_oldcar_is _good • Pr_AAA_says_oldcar_is_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good + Pr_AAA_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad – Pr_oldcar_is_bad_if_AAA_says_good <- 1- Pr_oldcar_is_good_if_AAA_says_good 14 Evaluating AAA option (forward chaining) # Calculate EMV_AAA = EMV(do AAA test, buy old car if test says good and it is still available, else buy new car for $4000) • Pr_AAA_says_oldcar_is_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good + Pr_AAA_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad • Pr_oldcar_is_good_if_AAA_says_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_AAA_says_oldcar_is_good • Pr_oldcar_is_bad_if_AAA_says_good <- 1- Pr_oldcar_is_good_if_AAA_says_good • EMV_oldcar_if_AAA_says_good <- Pr_oldcar_is_good_if_AAA_says_good * cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_AAA_says_good * cost_of_oldcar_if_bad • EMV_AAA <Pr_AAA_says_oldcar_is_good*(cost_of_newcar*(Pr_old_car_is_sold_during_AAA_ test) + (Pr_old_car_remains_available_during_AAA_test)*EMV_oldcar_if_AAA_says_good ) + cost_of_newcar*(1- Pr_AAA_says_oldcar_is_good) 15 Numerical values > Pr_AAA_says_oldcar_is_good [1] 0.42 > Pr_oldcar_is_good_if_AAA_says_good [1] 0.8571429 > Pr_oldcar_is_bad_if_AAA_says_good [1] 0.1428571 > EMV_oldcar_if_AAA_says_good [1] 3592.857 > EMV_AAA [1] 3886 16 Decision table of expected costs Expected Cost No test, buy new car 4000 No test, buy used car P(good)*(cost if good) + P(bad)*(cost if bad) = 0.4*3450 + 0.6*4450 = 4050 Do AAA test. Buy used car if test (2/3)*0.42*(0.857*3450 + 0.143*4450) + says good & used car is still (2/3)*0.58*(4000) + (1/3)*4000 = 3886 available; else buy new Do Garage test. Buy used car if Garage test says good, else buy new 0.7*(0.571*3450 + 0.429*4450) + 0.3*(4000) = 3915 Used_car_state Good 40.0 Bad 60.0 AAA_test Good 42.0 Bad 58.0 Used_car_state Good 85.7 Bad 14.3 AAA_test Good 100 Bad 0 Used_car_state Good 40.0 Bad 60.0 Used_car_state Good 57.1 Bad 42.9 Garage_test Good 70.0 Bad 30.0 Garage_test Good 100 Bad 0 Student solution 1 Good use of Netica as a probability calculator 18 Student solution 2 19 Student solution 3 ChooseToTest CarToBuy Used New AAA None Garage -3886.0 0 -3886.0 0 0 Diagnosis TestsGood TestsBad 42.0 58.0 LoseCar Lost NotLost 33.3 66.7 UsedCar EMV Good Bad 40.0 60.0 20 Comments on car buying problem • Information has value: The AAA test is so valuable that it is worth doing it even a the risk of losing the option to buy the old car. • More than one way to get the right answer. One student got correct answers treating “Test is correct” and “test is not correct” as the two states. • System 1 is helpless for some problems. 21 Homework # 7 (Due by 4:00 PM, March 28) • Problem – Insurance • Readings – Required: Niu, 2005 • www.utdallas.edu/~scniu/OPRE-6301/documents/Important_Probability_Distributions.pdf • Poisson, exponential, uniform, normal • Beta will be covered in class – Required: Charniak (rest of paper, skim algorithms) • • • • www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836 Factoring joint distributions (p. 56) Influence diagrams (p. 60) (NP hardness), 22 Insurance problem • An EMV-maximizing sports promoter must decide if he should insure an event (“Rockies game”) against rain. – If no rain, he makes $20,000 from net sales. – If rain, he makes only $2,000 from net sales. – The insurance policy costs $5000. – The insurance policy pays him $20,000 if it rains, else $0 a) How large must p = probability of rain be for him to buy insurance? (Find the smallest such value of p.) b) If p = 0.8, what is the most he should pay for the policy? c) Before deciding whether to buy insurance, what is the most he should pay for a weather forecast (rain or not) if it has a 75% probability of being right? Assume p = 0.8. 23 Midterm exam 24 Midterm logistics • Midterm problems will be posted tonight. • Do your own work; do not discuss with anyone else • E-mail me your answers (clearly summarized/highlighted) by 9:00 PM on March 17 • Showing work might help you get partial credit if needed. 25 Midterm format • Designed to take < 2 hours • Open-everything – You may use R, Netica, textbooks, notes… whatever you might use in practice • A few problems, intended to be straightforward 26 Midterm content • Things you must know: – Formulating and solving decision trees – Normal form (table) analysis, EU theory – Utility theory, risk premiums, certainty equivalents – Risk profiles, stochastic dominance – Conditional probability calculations – Bayes’ Rule (manual and/or Netica) – Formulating and solving (with R and/or formulas) binomial distribution models 27 Midterm content • Things you will not be tested on: – Heuristics and biases – Other aspects of decision psychology – Data interpretation (e.g., Simpson’s Paradox) • Simpson’s Paradox can be resolved using DAG (directed acyclic graph) models • It arises when relevant variables are omitted – Causal analysis – Simulation-optimization 28 Mini-review 29 Big picture: Rational decision framework • What are the possible choices? – Acts, actions, alternatives, decisions, decision rules, interventions, options, policies, etc. • What are the possible consequences? – Outcomes, gains/losses, results, rewards, returns, etc. • How likely is each consequence for each choice? • How desirable is each consequence? 30 Rational decision-making • “Rational” (consequence-driven) decisionmaking chooses actions based on their probable consequences – Preferences for consequences, and beliefs about them, determine preferences for actions value action consequence state 31 Rational decision-making • “Rational” (consequence-driven) decisionmaking chooses actions based on their probable consequences – Preferences for consequences, and beliefs about them, determine preferences for actions value = u(c) EU(a) action consequence state a c Pr(s) 32 The essence of SEU • Preferences for consequences represented by scores or numbers, called utilities – “von Neumann-Morgenstern” (NM) utilities – Notation: u(c) = utility for consequence c • Beliefs are represented by probabilities – Pr(c | a) = probability of consequence c if action a is taken; Pr(s) = Pr(state is s) • Recommendation: Choose act with maximum expected utility 33 Five typical problem/skill types • Calculate and compare expected values – Buy risky prospect? How many spares? • Calculate probabilities with logic (&, or, not) • Calculate a conditional probability, P(A | B) • Find P(a < X < b) if X has known distribution, using pdist(b) – pdist(a) – Find probability of at least (or at most) x successes in n binomial trials (pbinom) • Apply Bayes’ Rule 34 Using probability distributions • If 10% of jelly beans are red, what is the probability that a well mixed bag of 10 jelly beans contains either 1 or 2 red ones? • What is the probability that it contains more than 2 red ones? • What is the probability of 0 reds? 35 Using probability distributions • If 10% of jelly beans are red, what is the probability that a well mixed bag of 10 jelly beans contains either 1 or 2 red ones? • pbinom(2, 10, 0.1) - pbinom(0, 10, 0.1) = 10*0.1*0.9^9 + (10*9/2)*(0.1^2)*(0.9)^8 = 0.5811307 • P(x > 2 red ones) = 1 - pbinom(2, 10, 0.1) = 1 - 0.5811307 - (0.9^10) = 0.070191 • P(0 reds) = dbinom(0, 10, 0.1) = 0.9^10 = 0.3486784 36 Applying Bayes’ Rule • Bag A has 75% white marbles, 25% black • Bag B has 25% white marbles, 75% black • A bag is selected at random. We do not know which bag it is. • 5 marbles are drawn (sampled) at random from the bag. 4 of them are white. • What is the probability that it is bag A? 37 Applying Bayes’ Rule • • • • Bag A has 75% white marbles, 25% black Bag B has 25% white marbles, 75% black 4 of 5 sampled marbles are white. What is the probability that it is bag A? • P(A | 4 of 5 white) = P(4 of 5 white | A)P(A) / [P(4 of 5 white | A)P(A)+P(4 of 5 white | B)P(B)] = P(4 of 5 white | A)/[P(4 of 5 white | A) + P(4 of 5 white | B)] (since P(A) = P(B) = 0.5) = 5*(0.75^4)*(0.25)/(5*(0.75^4)*(0.25) + 5*(0.25^4)*(0.75)) = 0.9642857 http://www.eecs.qmul.ac.uk/~norman/BBNs/Bayes_Rule_Example.htm 38 Other typical Bayes’ Rule examples • • • • • • P(smoker | lung cancer) P(item from machine A | defective) P(steroid use | test result) P(disease | symptom) P(rain | forecast) http://stattrek.com/probability/bayes-theorem.aspx P(submarine is of type x | signal is y) 39 Midterm… Here it is! 40 Midterm Problem # 1 Consider a choice between the following two gambles: • Gamble 1: 50% probability of winning $1, probability 50% of losing $0.60 • Gamble 2: 50% probability of winning $10, probability 50% of losing $5 a. Calculate the expected monetary value (EMV) of each gamble, 1 and 2. b. If your utilities for the four possible outcomes are u(10) = 1, u(1) = 0.2, u(-0.60) = -0.1, and u(-5) = -1, calculate the expected utility (EU) of each gamble, 1 and 2. Which one should you choose? Midterm Problem # 2 Suppose that you are indifferent between receiving $40 for certain and receiving a 50-50 chance of either $100 or $0. (a) What is your risk premium for this uncertain prospect (50-50 chance of $100 or $0)? (b) Does your preference pattern here exhibit risk aversion, risk proneness, risk neutrality, or is there insufficient information to be sure? Midterm problem # 3: Attack the Death Star? • The Rebel Alliance will attack the Death Star only if it has at least a 95% probability of destroying it. Each missile independently has a 60% probability of hitting and destroying the Death Star. • (a) What is the smallest number of missiles that the Alliance must fire to have at least a 95% probability of destroying the Death Star? • (b) With this salvo size (number of missiles), what is the expected number of hits? 43 Midterm Problem # 4 Disease testing • • • • Mary tests positive for a disease P(test is positive | disease) = 0.95 P(test is negative | no disease) = 0.90 Mary is a randomly chosen woman from a population in which 3% have the disease. • Based on this information, what is the posterior probability (given the positive test result) that Mary has the disease? 44 Midterm Problem # 5 Rocket launch decision • An unmanned rocket is being launched. • An unreliable warning light has come on. – Light comes on with probability 1/2 if rocket has a problem – Light comes on with probability 1/3 if no problem. • Goal is to minimize expected loss. – Loss = 2 if no launch when there is no problem – Loss = 5 if rocket launched when there is a problem – Loss = 0 otherwise. • The prior probability of problem is p. • If p = 0.2, should the rocket be launched even after warning light comes on? • How small must p be to justify launching even if the warning light comes on? 45 Good luck! 46 A non-exam concept question • We will toss a fair six-sided die (equally likely to give outcomes 1-6, each with probability 1/6) • You may bet on either A or B • Must all EU-maximizing decision-makers (riskaverse, risk-neutral, or risk-seeking) necessarily agree which choice, A or B, is better? A B 1 $1.1 $2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 47 What does FSD say? • B dominates A by FSD • All EU-maximizing decision makers who prefer more to less satisfy FSD, and therefore should prefer B. – Ignore irrational regret/disappointment! A B 1 $1.1 $2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 1 48 Another also-ran problem • For $25, you can buy a raffle ticket that gives probability 1/3 of winning $125 (and probability 2/3 of winning nothing). • Assume that your initial wealth is $200. • Your utility function for final wealth is u(200 + x) = ln(200 + x), where x is the change in wealth from this transaction (i.e., $100 or -$25 if you buy the ticket, or $0 if you do not). • What are the value (certainty equivalent, CE) and risk premium (RP) for buying the raffle ticket? Should you buy it? Solution to CE problem • For $25, you can buy a raffle ticket that gives probability 1/3 of winning $125 (and probability 2/3 of winning nothing). • u(200 + x) = ln(200 + x) • EMV = (1/3)*(100) + (2/3)*(-25) = $16.67 • log(200 + CE) = (1/3)*log(200 + 100) + (2/3)*log(200 - 25) = 5.344451 • CE = exp(5.344451) - 200 = $9.44 • RP = EMV – CE = 16.667 - 9.443 = $7.22 • CE > 0, so you should buy the ticket. Other probability models 51 Preview of applied probability models • Geometric distribution: Time until first occurrence in binomial trials • Poisson distribution: Number of rare events or purely random, independent events in a time interval (fires, Geiger counter clicks) • Poisson process: Random arrivals • Exponential distribution: Time between arrivals in a Poisson process • Normal distribution: Sum of independent random variables • Beta distribution: Conjugate priors and posteriors for success probability in binomial trials 52 Waiting time until first success in a binomial process • If probability of success on each try is p, then what is the probability that the first success occurs on trial n? • Pr(first success on trial n) = Pr(no successes in first (n – 1) trials & success on trial n) = (1 – p)(n – 1)*p – Check: Sum of (1 – p)(n – 1) from n = 1 to infinity is 1/(1 – (1 – p)) = 1/p, so sum over n of Pr(first success on trial n) = p/p = 1 – This is the geometric distribution 53 Example: Landlord vs. dog • You need to keep a dog in your apartment for 12 more days, until the end of the month • Each day, there is a probability 1/10 that the landlord will stop by. • What is the probability that the landlord’s next visit does not occur until after the dog has left (i.e., after 12 days)? 54 Example: Landlord vs. dog • You need to keep a dog in your apartment for 12 more days, until the end of the month • Each day, there is a probability 1/10 that the landlord will stop by. • What is the probability that the landlord’s next visit does not occur until after the dog has left (i.e., after 12 days)? • 1 - pgeom(11, 0.1) = 0.9^12 = 0.2824 – In R, dgeom(x, p) = p*(1-p)^x 55 Poisson process • Model for “purely random” occurrences with known intensity, m – m = average occurrences per unit time • The intensity, m, of a Poisson arrival process is usually denoted by or . • Expected number of arrivals in an interval of length t is mt • Probability of no arrivals in [0, t] is exp(-mt) • So, Pr(arrival by t) is F(t) = 1 – exp(-mt) 56 Exponential distribution • If the probability that random variable T (e.g., failure time of a component) is at most t is P(T ≤ t) = 1 – exp(-mt), then T is said to have an exponential distribution. – CDF is F(t) = 1 – exp(-mt) = pexp(t, m*t) – PDF is m*exp(-mt) = dexp(t, m*t) – Exponential distribution is memoryless: The remaining time until event occurs does not depend on how much time has already passed 57 Example calculation with exponential distribution • Example: Customers arrive at an average rate of 2 arrivals per hour. – This is a Poisson process • What is the probability of no customers in first 15 minutes? • Solution: 1 - pexp(0.25, 2) = exp(-2*0.25) = 0.6065307 • The time between arrivals in a Poisson process has an exponential distribution 58 Explanation for exp(-mt) • Let S(t) = survival function = P(time of first occurrence > t) = 1 – F(t) • For a pure random process, S(2*t) = [S(t)]2 and S(n*t) = [S(t)]n – Also S(0) = 1, and S(infinity) = 0 • So, S(t) = akt could work • But for small t, 1 – S(t) should be proportional to t. And E(T) should be 1/m. • S(t) = exp(-mt) satisfies these conditions. 59 Poisson process (cont.) • P(k arrivals in interval of length t) = exp(-mt)*(mt)k/k! = dpois(k,m*t) • More generally, if the probability of k occurrences is e-mmk/k! = dpois(k, m), then the number of occurrences has a Poisson distribution with mean m (and possible values of k = 0, 1, 2, ….) – Describes random number of independent rare events (fires or burglaries in a city, deaths from horse kicks in a year, etc.) quite well 60 Using probability distributions • It is Saturday PM, and Joe wants to take a nap… but only if he can nap for at least an hour without being disturbed by the phone. • Phone calls arrive at an average rate of 1 call every 2 hours • Joe’s utilities are: 0 if starts to nap and call arrives in < 1 hour; 0.5 if no nap; and 1 if naps for > 1 hour with no call. • What should Joe do? (Nap or no nap?) 61 Joe’s nap -- Solution • P(call arrives in < 1 hour | 0.5 calls/hr.) = 1 - exp(-0.5) = pexp(1,0.5) = 0.393469 • EU(nap) = 0.393469*0 + (1 - 0.393469)*1 = 0.60653 • This is greater than EU(no nap) = 0.5, so Joe should risk taking a nap. 62 Example calculation with Poisson distribution • A roll of fabric has an average defect blemish) rate of 1 blemish per 27 yards • What is the probability of 9 blemishes in 200 yards of the fabric? • Answer: dpois(9, 200/27) = 0.1122631 • What is the probability of 9 or more blemishes? • Answer: 1 - ppois(8, 200/27) = 0.325359 63 Explanation of Poisson distribution • The Poisson distribution with mean m can be obtained from the binomial distribution by holding np fixed (at value m = np) and letting n approach infinity and p approach 0. (Limit of binomial process as time steps approach zero length) • Intuitively, e-mmk/k! gives probability of exactly k occurrences (in any order) if expected number is m. 64 Protecting worker health • To protect worker health, concentrations of crystalline silica (quartz) dust particles in air at a mine are monitored by a sampling apparatus, with the goal of triggering an alarm whenever the concentration exceeds 1 particle per liter of air. • If the true average concentration in air is 6 particles per liter, what is the probability that a 1-liter sample will have < 2 particles? 65 Protecting worker health • If the true average concentration of crystalline silica (quartz) dust particles in air is 6 particles per liter, what is the probability that a 1-liter sample will have less than 2 particles? • Solution: ppois(1,6) = exp(-6) + 6*exp(-6) = 0.01735127 – dpois(x, 6) = exp(-6)*(6^x)/factorial(x) – r = p = mu = NULL; mu = 6; for (r in 1:15) {; p[r] = exp(-mu)*(mu^(r-1))/factorial(r-1)}; p 66 Normal distribution, N(, 2) • A continuous distribution, like unif and exp • Describes the distribution of a sum of independent random variables (with finite means and variances) • “Central Limit Theorem(s)” prove that such sums approach normal distributions – Assuming finite means and variances – “Power law” or “heavy-tailed” distributions do not have means and are exceptions • Notation: N(, 2) = normal distribution with mean and variance 2 67 Variance • A normal distribution is specified by two parameters: its mean and its variance – Mean is usually denoted by or by E(X) – Variance is usually denoted by 2 or Var(X) – Variance is defined as: Mean squared error around mean = E[X – E(X)]2 = E(x – )2 – Standard deviation = = s.d. = square root of variance • 95% of normal distribution falls within 1.96 standard deviations of its mean 68 Algebra of means and variances • If X is a random variable with expected value E(X), then E(aX + b) = aE(X) + b • If X is a random variable with variance Var(X), then Var(aX + b) = a2Var(X) • If X has a normal distribution with mean and variance 2, then (X - )/ has a “standard normal distribution” with mean 0 and variance 1 69 dnorm(x, mean, sd) 0.2 0.1 0.0 y 0.3 0.4 • x = c(0:100)/10; m = mean(x); x = x -m • y = dnorm(x,0,1); plot(x,y) -4 -2 0 2 4 70 x Effects of different variances on normal PDF and CDF http://en.wikipedia.org/wiki/Normal_distribution 71 Does batch meet buyer specifications? • The lifetimes of a large batch of electronic components are approximately normally distributed with mean 500 days and standard deviation of 50 days. • The buyer requires at least 95% of them to have a lifetime greater than 400 days. • Should the buyer accept this batch (i.e., do at least 95% of the components have lifetimes greater than 400 days)? 72 Solving batch acceptance using pnorm • The lifetimes of a batch of electronic components are approximately normally distributed with mean 500 days and standard deviation of 50 days. • Does this batch meet the buyer’s requirement that at least 95% of them to have a lifetime greater than 400 days? • Solution: P(T > 400) = 1 – P(T ≤ 400) = 1 pnorm(400, 500, 50) = 0.9772499, so Yes. 73 Solution using standard normal, N(0, 1) A different way: • P(T > 400) = P(T > -2 sd above mean) > 1 - pnorm(-2, 0, 1) [1] 0.9772499 • The probability of being above -2 sd below the mean is greater than 0.05/2, so the lot is acceptable. 74 Solution via simulation using rnorm • New approach: Solution via simulation in R x = y = n = NULL; n = 100000 # n = simulation size x = rnorm(n, 500, 50); # simulates normal values for (i in 1:n){; if(x[i] > 400) y[i] = 1 else y[i] = 0}; mean(y) # quantifies compliance fraction [1] 0.9774 • Solution: Simulated P(T > 400) ≈ 0.98, so Yes, the batch should be accepted. 75 Applications of normal distribution in statistics • The sample mean (x1 + x2 + …. + xn)/n is approximately normally distributed with mean E(X) and variance Var(X)/n – Assuming independent random samples – Sample mean is unbiased estimate of E(X) • Special case: Sample proportions • Normal approximation to binomial: If np > 5 and n(1 – p) > 5, binomial is approximately normal, mean = np, Var = n*p*(1-p) 76 Applications of normal distribution • Rules of thumb: 95% probability that value < 2 standard deviations from mean • Storage processes (e.g., dams) – Amount in inventory = sum of many independent contributions and withdrawals • Insurance pool losses – Amount paid out in a year = sum of losses on many independent policies • Diffusion processes (e.g., random walk) – Stock price movements 77 Example calculation using normal • A machine produces parts with widths normally distributed, having mean 4 mm and s.d. 0.0019 mm. • A part fails if its width is more than 0.005 millimeters away from 4 mm. • What is the probability that a part fails? 78 Example calculation using normal • Part fails if its width is more than 0.005/0.0019 = 2.6316 standard deviations away from mean. • For standard normal distribution, N(0, 1), P(X is more than 2.63158 s.d. from mean) = pnorm(-2.6316,0,1) + 1 - pnorm(2.6316,0,1) = 2*pnorm(-2.6316,0,1) = 0.008498385 79 Example calculation using normal • In general, P(normally distributed X lies more than d standard deviations from mean) = 2*pnorm(-d,0,1) • Example: If d = 0.005/0.0026 = 1.923 standard deviations, then probability of failure is 2*pnorm(-1.923,0,1) = 0.0545 80 Example calculation • Purchases at a retail store have a mean of $14.31 and a standard deviation of $6.40. Amounts are approximately normally distributed. What percentage of purchases are under $10? 81 Example calculation • Purchases at a retail store have a mean of $14.31 and a standard deviation of $6.40. Amounts are approximately normally distributed. What percentage of purchases are under $10? • The probability of being at least 4.31/6.4 standard deviations below the mean is: pnorm(-4.31/6.4, 0, 1) = 0.2503345 82