Download DA_Lecture08

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Decision Analysis
Lecture 8
Tony Cox
My e-mail: [email protected]
Course web site: http://cox-associates.com/DA/
Agenda
•
•
•
•
•
•
Charniak: 0.5 is correct (not a typo)
Papers/projects; revised course schedule
Homework and readings
Car buying – solution
Mini-review for midterm
Other probability models and practice problems
– Discrete: geometric, Poisson
– Continuous: exponential, beta, normal
2
Projects
3
Papers and projects: 3 types
• Applied: Analyze decision using DA methods
• Review a book, explain its contributions
– Decision psychology, decision analysis with risk
and uncertainty
• Research/review paper (3-5 articles)
– Explain a topic within decision analysis (e.g.,
Netica algorithms, multicriteria decision-making,
commonly used utility functions, decision analysis
of bargaining, auctions, risky investments, etc.)
4
Projects (cont.)
• Typical paper is about 10 pages, font 12,
space 1.5. (This is typical, not required)
• Content matters; length does not
• Purposes:
1. Learn something interesting and useful;
2. Either explain/show what you learned, or
show how to use it in practice (or both)
5
Project proposals due next week!
• If you have not yet done so, please send
me a succinct description of what you
want to do (and perhaps what you hope to
learn by doing it).
• Due by end of day on Wednesday, March
15th (though sooner is welcome)
• Key dates: April 18 for rough draft (or very
good outline)
• May 4, 8:00 PM for final
6
Revised course schedule
• March 14: No class: Take-home midterm
(20%)
• March 28 14: Project/paper proposals due
• March 21: No class (Spring break)
• April 18: Draft of project/term paper due
• May 4: Project/term paper due by 8:00
PM (30%)
• May 9: Final Exam (20%)
7
Car buying – solution
8
Car buying
• You can buy a new car for $4000 or a used one for $2700.
• If the used car is good, you will spend only $750 on repairs.
If the used car is bad, it will cost you $1750 in repairs.
• The prior probability that the used car is good is 0.4.
• The AAA offers a free road test that has a 90% of showing the correct state
of the used car (good or bad). [So, it has a 10% error rate (shows good as
bad or bad as good).]
• If you do the AAA test, there is a 1/3 probability that the car will be sold to
someone else while you are waiting for the test; if so, then you have no other
option but to buy the new car.
• Your Garage can test the used car now for free, with no chance that you will
lose the opportunity to buy it. If it is good, they will say so; if it is bad, there
is a 50% chance they say it is bad.
• What should you do to maximize EMV? What is the EMV of the optimal
policy? (You can arrange at most one test.)
Solution strategy
• Setting up the problem: Very clear (and long)
notation helps keep calculations clear when
working with R
• Logic of set-up: Backward chaining
– Start with EMV formula you want to calculate
– Figure out how to get each piece from data you have
• Logic of solution calculations: Forward chaining
– Each step in calculation uses values from preceding
steps; initial values are given in the problem data
10
Notation
# Notation and problem data
• Pr_oldcar_is_good <- 0.4
• Pr_oldcar_is_bad <- 1- Pr_oldcar_is_good
• Pr_AAA_says_good_if_oldcar_is_good <- 0.9
• Pr_AAA_says_good_If_oldcar_is_bad <- 0.1
• Pr_Garage_says_good_if_oldcar_is_good <- 1
• Pr_Garage_says_good_If_oldcar_is_bad <- 0.5
• Pr_old_car_remains_available_during_AAA_test <- 2/3
• Pr_old_car_is_sold_during_AAA_test <- 1Pr_old_car_remains_available_during_AAA_test
• cost_of_oldcar_if_good <- 3450
• cost_of_oldcar_if_bad <-4450
• cost_of_newcar <- 4000
11
Logic for Garage test
(backward chaining)
# EMV_Garage = EMV(do Garage test, buy old car if test says good, else buy new)
# In each line below, unknown quantities to be calculated from known ones are in bold
•
•
EMV_Garage <Pr_Garage_says_oldcar_is_good*EMV_oldcar_if_Garage_says_good +
cost_of_newcar*(1- Pr_Garage_says_oldcar_is_good)
EMV_oldcar_if_Garage_says_good <Pr_oldcar_is_good_if_Garage_says_good *cost_of_oldcar_if_good +
Pr_oldcar_is_bad_if_Garage_says_good *cost_of_oldcar_if_bad
– Pr_oldcar_is_bad_if_Garage_says_good <- 1Pr_oldcar_is_good_if_Garage_says_good
– Pr_oldcar_is_good_if_Garage_says_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_Garage_
says_oldcar_is_good
• Pr_Garage_says_oldcar_is_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good +
Pr_Garage_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad
12
Evaluating Garage option
(forward chaining)
# Each line can be evaluated from the results or data that precede it
• Pr_Garage_says_oldcar_is_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good +
Pr_Garage_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad
• Pr_oldcar_is_good_if_Garage_says_good <Pr_Garage_says_good_if_oldcar_is_good*Pr_oldcar_is_good/
Pr_Garage_says_oldcar_is_good
• Pr_oldcar_is_bad_if_Garage_says_good <- 1Pr_oldcar_is_good_if_Garage_says_good
• EMV_oldcar_if_Garage_says_good <- Pr_oldcar_is_good_if_Garage_says_good *
cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_Garage_says_good *
cost_of_oldcar_if_bad
• EMV_Garage <Pr_Garage_says_oldcar_is_good*EMV_oldcar_if_Garage_says_good +
cost_of_newcar*(1- Pr_Garage_says_oldcar_is_good)
• > EMV_Garage
• [1] 3915
13
Logic for AAA test
(backward chaining)
# Calculate EMV_AAA = EMV(do AAA test, buy old car if test says good and it is still available,
else buy new car for $4000)
# In each line below, unknown quantities to be calculated from known ones are in bold
• EMV_AAA <Pr_AAA_says_oldcar_is_good*(cost_of_newcar*Pr_old_car_is_sold_during_AAA_test +
Pr_old_car_remains_available_during_AAA_test*EMV_oldcar_if_AAA_says_good) +
cost_of_newcar*(1- Pr_AAA_says_oldcar_is_good)
• EMV_oldcar_if_AAA_says_good <- Pr_oldcar_is_good_if_AAA_says_good *
cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_AAA_says_good *
cost_of_oldcar_if_bad
– Pr_oldcar_is_good_if_AAA_says_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_AAA_says_oldcar_is
_good
• Pr_AAA_says_oldcar_is_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good +
Pr_AAA_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad
– Pr_oldcar_is_bad_if_AAA_says_good <- 1- Pr_oldcar_is_good_if_AAA_says_good
14
Evaluating AAA option
(forward chaining)
# Calculate EMV_AAA = EMV(do AAA test, buy old car if test says good and it is still
available, else buy new car for $4000)
• Pr_AAA_says_oldcar_is_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good +
Pr_AAA_says_good_If_oldcar_is_bad*Pr_oldcar_is_bad
• Pr_oldcar_is_good_if_AAA_says_good <Pr_AAA_says_good_if_oldcar_is_good*Pr_oldcar_is_good/Pr_AAA_says_oldcar_is_good
• Pr_oldcar_is_bad_if_AAA_says_good <- 1- Pr_oldcar_is_good_if_AAA_says_good
• EMV_oldcar_if_AAA_says_good <- Pr_oldcar_is_good_if_AAA_says_good *
cost_of_oldcar_if_good + Pr_oldcar_is_bad_if_AAA_says_good * cost_of_oldcar_if_bad
• EMV_AAA <Pr_AAA_says_oldcar_is_good*(cost_of_newcar*(Pr_old_car_is_sold_during_AAA_
test) +
(Pr_old_car_remains_available_during_AAA_test)*EMV_oldcar_if_AAA_says_good
) + cost_of_newcar*(1- Pr_AAA_says_oldcar_is_good)
15
Numerical values
> Pr_AAA_says_oldcar_is_good
[1] 0.42
> Pr_oldcar_is_good_if_AAA_says_good
[1] 0.8571429
> Pr_oldcar_is_bad_if_AAA_says_good
[1] 0.1428571
> EMV_oldcar_if_AAA_says_good
[1] 3592.857
> EMV_AAA
[1] 3886
16
Decision table of expected costs
Expected Cost
No test, buy new car
4000
No test, buy used car
P(good)*(cost if good) + P(bad)*(cost if bad) =
0.4*3450 + 0.6*4450 = 4050
Do AAA test. Buy used car if test (2/3)*0.42*(0.857*3450 + 0.143*4450) +
says good & used car is still
(2/3)*0.58*(4000) + (1/3)*4000 = 3886
available; else buy new
Do Garage test. Buy used car if
Garage test says good, else buy
new
0.7*(0.571*3450 + 0.429*4450) + 0.3*(4000) = 3915
Used_car_state
Good
40.0
Bad
60.0
AAA_test
Good
42.0
Bad
58.0
Used_car_state
Good
85.7
Bad
14.3
AAA_test
Good
100
Bad
0
Used_car_state
Good
40.0
Bad
60.0
Used_car_state
Good
57.1
Bad
42.9
Garage_test
Good
70.0
Bad
30.0
Garage_test
Good
100
Bad
0
Student solution 1
Good use of Netica as a probability calculator
18
Student solution 2
19
Student solution 3
ChooseToTest
CarToBuy
Used
New
AAA
None
Garage
-3886.0
0
-3886.0
0
0
Diagnosis
TestsGood
TestsBad
42.0
58.0
LoseCar
Lost
NotLost
33.3
66.7
UsedCar
EMV
Good
Bad
40.0
60.0
20
Comments on car buying problem
• Information has value: The AAA test is so
valuable that it is worth doing it even a the
risk of losing the option to buy the old car.
• More than one way to get the right answer.
One student got correct answers treating
“Test is correct” and “test is not correct” as
the two states.
• System 1 is helpless for some problems.
21
Homework # 7
(Due by 4:00 PM, March 28)
• Problem
– Insurance
• Readings
– Required: Niu, 2005
• www.utdallas.edu/~scniu/OPRE-6301/documents/Important_Probability_Distributions.pdf
• Poisson, exponential, uniform, normal
• Beta will be covered in class
– Required: Charniak (rest of paper, skim algorithms)
•
•
•
•
www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836
Factoring joint distributions (p. 56)
Influence diagrams (p. 60)
(NP hardness),
22
Insurance problem
• An EMV-maximizing sports promoter must decide if he
should insure an event (“Rockies game”) against rain.
– If no rain, he makes $20,000 from net sales.
– If rain, he makes only $2,000 from net sales.
– The insurance policy costs $5000.
– The insurance policy pays him $20,000 if it rains, else $0
a) How large must p = probability of rain be for him to buy
insurance? (Find the smallest such value of p.)
b) If p = 0.8, what is the most he should pay for the policy?
c) Before deciding whether to buy insurance, what is the most
he should pay for a weather forecast (rain or not) if it has a
75% probability of being right? Assume p = 0.8.
23
Midterm exam
24
Midterm logistics
• Midterm problems will be posted tonight.
• Do your own work; do not discuss with
anyone else
• E-mail me your answers (clearly
summarized/highlighted) by 9:00 PM on
March 17
• Showing work might help you get partial
credit if needed.
25
Midterm format
• Designed to take < 2 hours
• Open-everything
– You may use R, Netica, textbooks, notes…
whatever you might use in practice
• A few problems, intended to be
straightforward
26
Midterm content
• Things you must know:
– Formulating and solving decision trees
– Normal form (table) analysis, EU theory
– Utility theory, risk premiums, certainty
equivalents
– Risk profiles, stochastic dominance
– Conditional probability calculations
– Bayes’ Rule (manual and/or Netica)
– Formulating and solving (with R and/or
formulas) binomial distribution models
27
Midterm content
• Things you will not be tested on:
– Heuristics and biases
– Other aspects of decision psychology
– Data interpretation (e.g., Simpson’s Paradox)
• Simpson’s Paradox can be resolved using DAG
(directed acyclic graph) models
• It arises when relevant variables are omitted
– Causal analysis
– Simulation-optimization
28
Mini-review
29
Big picture:
Rational decision framework
• What are the possible choices?
– Acts, actions, alternatives, decisions, decision
rules, interventions, options, policies, etc.
• What are the possible consequences?
– Outcomes, gains/losses, results, rewards,
returns, etc.
• How likely is each consequence for each
choice?
• How desirable is each consequence?
30
Rational decision-making
• “Rational” (consequence-driven) decisionmaking chooses actions based on their
probable consequences
– Preferences for consequences, and beliefs
about them, determine preferences for actions
value

action  consequence  state
31
Rational decision-making
• “Rational” (consequence-driven) decisionmaking chooses actions based on their
probable consequences
– Preferences for consequences, and beliefs
about them, determine preferences for actions
value = u(c)  EU(a)


action  consequence  state
a
c
Pr(s)
32
The essence of SEU
• Preferences for consequences represented
by scores or numbers, called utilities
– “von Neumann-Morgenstern” (NM) utilities
– Notation: u(c) = utility for consequence c
• Beliefs are represented by probabilities
– Pr(c | a) = probability of consequence c if
action a is taken; Pr(s) = Pr(state is s)
• Recommendation: Choose act with
maximum expected utility
33
Five typical problem/skill types
• Calculate and compare expected values
– Buy risky prospect? How many spares?
• Calculate probabilities with logic (&, or, not)
• Calculate a conditional probability, P(A | B)
• Find P(a < X < b) if X has known
distribution, using pdist(b) – pdist(a)
– Find probability of at least (or at most) x
successes in n binomial trials (pbinom)
• Apply Bayes’ Rule
34
Using probability distributions
• If 10% of jelly beans are red, what is the
probability that a well mixed bag of 10 jelly
beans contains either 1 or 2 red ones?
• What is the probability that it contains
more than 2 red ones?
• What is the probability of 0 reds?
35
Using probability distributions
• If 10% of jelly beans are red, what is the
probability that a well mixed bag of 10 jelly
beans contains either 1 or 2 red ones?
• pbinom(2, 10, 0.1) - pbinom(0, 10, 0.1) =
10*0.1*0.9^9 + (10*9/2)*(0.1^2)*(0.9)^8 =
0.5811307
• P(x > 2 red ones) = 1 - pbinom(2, 10, 0.1)
= 1 - 0.5811307 - (0.9^10) = 0.070191
• P(0 reds) = dbinom(0, 10, 0.1) = 0.9^10 =
0.3486784
36
Applying Bayes’ Rule
• Bag A has 75% white marbles, 25% black
• Bag B has 25% white marbles, 75% black
• A bag is selected at random. We do not
know which bag it is.
• 5 marbles are drawn (sampled) at random
from the bag. 4 of them are white.
• What is the probability that it is bag A?
37
Applying Bayes’ Rule
•
•
•
•
Bag A has 75% white marbles, 25% black
Bag B has 25% white marbles, 75% black
4 of 5 sampled marbles are white.
What is the probability that it is bag A?
• P(A | 4 of 5 white) = P(4 of 5 white | A)P(A) /
[P(4 of 5 white | A)P(A)+P(4 of 5 white | B)P(B)]
= P(4 of 5 white | A)/[P(4 of 5 white | A) + P(4 of
5 white | B)] (since P(A) = P(B) = 0.5) =
5*(0.75^4)*(0.25)/(5*(0.75^4)*(0.25) +
5*(0.25^4)*(0.75)) = 0.9642857
http://www.eecs.qmul.ac.uk/~norman/BBNs/Bayes_Rule_Example.htm
38
Other typical Bayes’ Rule
examples
•
•
•
•
•
•
P(smoker | lung cancer)
P(item from machine A | defective)
P(steroid use | test result)
P(disease | symptom)
P(rain | forecast) http://stattrek.com/probability/bayes-theorem.aspx
P(submarine is of type x | signal is y)
39
Midterm… Here it is!
40
Midterm Problem # 1
Consider a choice between the following two gambles:
• Gamble 1: 50% probability of winning $1, probability 50%
of losing $0.60
• Gamble 2: 50% probability of winning $10, probability
50% of losing $5
a. Calculate the expected monetary value (EMV) of each
gamble, 1 and 2.
b. If your utilities for the four possible outcomes are u(10) =
1, u(1) = 0.2, u(-0.60) = -0.1, and u(-5) = -1, calculate the
expected utility (EU) of each gamble, 1 and 2. Which one
should you choose?
Midterm Problem # 2
Suppose that you are indifferent between
receiving $40 for certain and receiving a 50-50
chance of either $100 or $0.
(a) What is your risk premium for this uncertain
prospect (50-50 chance of $100 or $0)?
(b) Does your preference pattern here exhibit risk
aversion, risk proneness, risk neutrality, or is
there insufficient information to be sure?
Midterm problem # 3:
Attack the Death Star?
• The Rebel Alliance will attack the Death Star
only if it has at least a 95% probability of
destroying it. Each missile independently has a
60% probability of hitting and destroying the
Death Star.
• (a) What is the smallest number of missiles that
the Alliance must fire to have at least a 95%
probability of destroying the Death Star?
• (b) With this salvo size (number of missiles),
what is the expected number of hits?
43
Midterm Problem # 4
Disease testing
•
•
•
•
Mary tests positive for a disease
P(test is positive | disease) = 0.95
P(test is negative | no disease) = 0.90
Mary is a randomly chosen woman from a
population in which 3% have the disease.
• Based on this information, what is the
posterior probability (given the positive test
result) that Mary has the disease?
44
Midterm Problem # 5
Rocket launch decision
• An unmanned rocket is being launched.
• An unreliable warning light has come on.
– Light comes on with probability 1/2 if rocket has a problem
– Light comes on with probability 1/3 if no problem.
• Goal is to minimize expected loss.
– Loss = 2 if no launch when there is no problem
– Loss = 5 if rocket launched when there is a problem
– Loss = 0 otherwise.
• The prior probability of problem is p.
• If p = 0.2, should the rocket be launched even after warning
light comes on?
• How small must p be to justify launching even if the warning
light comes on?
45
Good luck!
46
A non-exam concept question
• We will toss a fair six-sided die (equally likely to
give outcomes 1-6, each with probability 1/6)
• You may bet on either A or B
• Must all EU-maximizing decision-makers (riskaverse, risk-neutral, or risk-seeking) necessarily
agree which choice, A or B, is better?
A
B
1
$1.1
$2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
1
47
What does FSD say?
• B dominates A by FSD
• All EU-maximizing decision makers who
prefer more to less satisfy FSD, and
therefore should prefer B.
– Ignore irrational regret/disappointment!
A
B
1
$1.1
$2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
1
48
Another also-ran problem
• For $25, you can buy a raffle ticket that gives
probability 1/3 of winning $125 (and probability
2/3 of winning nothing).
• Assume that your initial wealth is $200.
• Your utility function for final wealth is u(200 + x)
= ln(200 + x), where x is the change in wealth
from this transaction (i.e., $100 or -$25 if you
buy the ticket, or $0 if you do not).
• What are the value (certainty equivalent, CE)
and risk premium (RP) for buying the raffle
ticket? Should you buy it?
Solution to CE problem
• For $25, you can buy a raffle ticket that gives
probability 1/3 of winning $125 (and probability
2/3 of winning nothing).
• u(200 + x) = ln(200 + x)
• EMV = (1/3)*(100) + (2/3)*(-25) = $16.67
• log(200 + CE) = (1/3)*log(200 + 100) +
(2/3)*log(200 - 25) = 5.344451
• CE = exp(5.344451) - 200 = $9.44
• RP = EMV – CE = 16.667 - 9.443 = $7.22
• CE > 0, so you should buy the ticket.
Other probability models
51
Preview of applied probability
models
• Geometric distribution: Time until first occurrence in
binomial trials
• Poisson distribution: Number of rare events or purely
random, independent events in a time interval (fires,
Geiger counter clicks)
• Poisson process: Random arrivals
• Exponential distribution: Time between arrivals in a
Poisson process
• Normal distribution: Sum of independent random
variables
• Beta distribution: Conjugate priors and posteriors for
success probability in binomial trials
52
Waiting time until first success in a
binomial process
• If probability of success on each try is p,
then what is the probability that the first
success occurs on trial n?
• Pr(first success on trial n) =
Pr(no successes in first (n – 1) trials &
success on trial n) = (1 – p)(n – 1)*p
– Check: Sum of (1 – p)(n – 1) from n = 1 to infinity
is 1/(1 – (1 – p)) = 1/p, so sum over n of Pr(first
success on trial n) = p/p = 1
– This is the geometric distribution
53
Example: Landlord vs. dog
• You need to keep a dog in your apartment
for 12 more days, until the end of the
month
• Each day, there is a probability 1/10 that
the landlord will stop by.
• What is the probability that the landlord’s
next visit does not occur until after the dog
has left (i.e., after 12 days)?
54
Example: Landlord vs. dog
• You need to keep a dog in your apartment for 12
more days, until the end of the month
• Each day, there is a probability 1/10 that the
landlord will stop by.
• What is the probability that the landlord’s next
visit does not occur until after the dog has left
(i.e., after 12 days)?
• 1 - pgeom(11, 0.1) = 0.9^12 = 0.2824
– In R, dgeom(x, p) = p*(1-p)^x
55
Poisson process
• Model for “purely random” occurrences
with known intensity, m
– m = average occurrences per unit time
• The intensity, m, of a Poisson arrival process is
usually denoted by  or .
• Expected number of arrivals in an interval
of length t is mt
• Probability of no arrivals in [0, t] is exp(-mt)
• So, Pr(arrival by t) is F(t) = 1 – exp(-mt)
56
Exponential distribution
• If the probability that random variable T
(e.g., failure time of a component) is at
most t is P(T ≤ t) = 1 – exp(-mt), then T is
said to have an exponential distribution.
– CDF is F(t) = 1 – exp(-mt) = pexp(t, m*t)
– PDF is m*exp(-mt) = dexp(t, m*t)
– Exponential distribution is memoryless: The
remaining time until event occurs does not
depend on how much time has already passed
57
Example calculation with
exponential distribution
• Example: Customers arrive at an average
rate of 2 arrivals per hour.
– This is a Poisson process
• What is the probability of no customers in
first 15 minutes?
• Solution: 1 - pexp(0.25, 2) = exp(-2*0.25)
= 0.6065307
• The time between arrivals in a Poisson
process has an exponential distribution
58
Explanation for exp(-mt)
• Let S(t) = survival function = P(time of first
occurrence > t) = 1 – F(t)
• For a pure random process, S(2*t) = [S(t)]2
and S(n*t) = [S(t)]n
– Also S(0) = 1, and S(infinity) = 0
• So, S(t) = akt could work
• But for small t, 1 – S(t) should be
proportional to t. And E(T) should be 1/m.
• S(t) = exp(-mt) satisfies these conditions.
59
Poisson process (cont.)
• P(k arrivals in interval of length t)
= exp(-mt)*(mt)k/k! = dpois(k,m*t)
• More generally, if the probability of k
occurrences is e-mmk/k! = dpois(k, m), then
the number of occurrences has a Poisson
distribution with mean m (and possible
values of k = 0, 1, 2, ….)
– Describes random number of independent rare
events (fires or burglaries in a city, deaths from
horse kicks in a year, etc.) quite well
60
Using probability distributions
• It is Saturday PM, and Joe wants to take a
nap… but only if he can nap for at least an
hour without being disturbed by the phone.
• Phone calls arrive at an average rate of 1
call every 2 hours
• Joe’s utilities are: 0 if starts to nap and
call arrives in < 1 hour; 0.5 if no nap; and 1
if naps for > 1 hour with no call.
• What should Joe do? (Nap or no nap?)
61
Joe’s nap -- Solution
• P(call arrives in < 1 hour | 0.5 calls/hr.)
= 1 - exp(-0.5) = pexp(1,0.5) = 0.393469
• EU(nap) = 0.393469*0 + (1 - 0.393469)*1
= 0.60653
• This is greater than EU(no nap) = 0.5, so
Joe should risk taking a nap.
62
Example calculation with
Poisson distribution
• A roll of fabric has an average defect
blemish) rate of 1 blemish per 27 yards
• What is the probability of 9 blemishes in
200 yards of the fabric?
• Answer: dpois(9, 200/27) = 0.1122631
• What is the probability of 9 or more
blemishes?
• Answer: 1 - ppois(8, 200/27) = 0.325359
63
Explanation of Poisson
distribution
• The Poisson distribution with mean m can
be obtained from the binomial distribution
by holding np fixed (at value m = np) and
letting n approach infinity and p approach
0. (Limit of binomial process as time steps
approach zero length)
• Intuitively, e-mmk/k! gives probability of
exactly k occurrences (in any order) if
expected number is m.
64
Protecting worker health
• To protect worker health, concentrations of
crystalline silica (quartz) dust particles in air
at a mine are monitored by a sampling
apparatus, with the goal of triggering an
alarm whenever the concentration exceeds
1 particle per liter of air.
• If the true average concentration in air is 6
particles per liter, what is the probability
that a 1-liter sample will have < 2 particles?
65
Protecting worker health
• If the true average concentration of
crystalline silica (quartz) dust particles in
air is 6 particles per liter, what is the
probability that a 1-liter sample will have
less than 2 particles?
• Solution: ppois(1,6) = exp(-6) + 6*exp(-6)
= 0.01735127
– dpois(x, 6) = exp(-6)*(6^x)/factorial(x)
– r = p = mu = NULL; mu = 6; for (r in 1:15) {;
p[r] = exp(-mu)*(mu^(r-1))/factorial(r-1)}; p
66
Normal distribution, N(, 2)
• A continuous distribution, like unif and exp
• Describes the distribution of a sum of independent
random variables (with finite means and
variances)
• “Central Limit Theorem(s)” prove that such sums
approach normal distributions
– Assuming finite means and variances
– “Power law” or “heavy-tailed” distributions do
not have means and are exceptions
• Notation: N(, 2) = normal distribution with mean
 and variance 2
67
Variance
• A normal distribution is specified by two
parameters: its mean and its variance
– Mean is usually denoted by  or by E(X)
– Variance is usually denoted by 2 or Var(X)
– Variance is defined as: Mean squared error
around mean = E[X – E(X)]2 = E(x – )2
– Standard deviation =  = s.d. = square root of
variance
• 95% of normal distribution falls within 1.96
standard deviations of its mean
68
Algebra of means and variances
• If X is a random variable with expected
value E(X), then E(aX + b) = aE(X) + b
• If X is a random variable with variance
Var(X), then Var(aX + b) = a2Var(X)
• If X has a normal distribution with mean 
and variance 2, then (X - )/ has a
“standard normal distribution” with mean 0
and variance 1
69
dnorm(x, mean, sd)
0.2
0.1
0.0
y
0.3
0.4
• x = c(0:100)/10; m = mean(x); x = x -m
• y = dnorm(x,0,1); plot(x,y)
-4
-2
0
2
4
70
x
Effects of different variances on
normal PDF and CDF
http://en.wikipedia.org/wiki/Normal_distribution
71
Does batch meet buyer
specifications?
• The lifetimes of a large batch of electronic
components are approximately normally
distributed with mean 500 days and
standard deviation of 50 days.
• The buyer requires at least 95% of them to
have a lifetime greater than 400 days.
• Should the buyer accept this batch (i.e., do
at least 95% of the components have
lifetimes greater than 400 days)?
72
Solving batch acceptance using
pnorm
• The lifetimes of a batch of electronic
components are approximately normally
distributed with mean 500 days and
standard deviation of 50 days.
• Does this batch meet the buyer’s
requirement that at least 95% of them to
have a lifetime greater than 400 days?
• Solution: P(T > 400) = 1 – P(T ≤ 400) = 1 pnorm(400, 500, 50) = 0.9772499, so Yes.
73
Solution using standard normal,
N(0, 1)
A different way:
• P(T > 400) = P(T > -2 sd above mean)
> 1 - pnorm(-2, 0, 1)
[1] 0.9772499
• The probability of being above -2 sd below
the mean is greater than 0.05/2, so the lot
is acceptable.
74
Solution via simulation using
rnorm
• New approach: Solution via simulation in R
x = y = n = NULL; n = 100000 # n = simulation size
x = rnorm(n, 500, 50); # simulates normal values
for (i in 1:n){;
if(x[i] > 400) y[i] = 1 else y[i] = 0};
mean(y)
# quantifies compliance fraction
[1] 0.9774
• Solution: Simulated P(T > 400) ≈ 0.98, so
Yes, the batch should be accepted.
75
Applications of normal distribution
in statistics
• The sample mean (x1 + x2 + …. + xn)/n is
approximately normally distributed with
mean E(X) and variance Var(X)/n
– Assuming independent random samples
– Sample mean is unbiased estimate of E(X)
• Special case: Sample proportions
• Normal approximation to binomial: If np > 5
and n(1 – p) > 5, binomial is approximately
normal, mean = np, Var = n*p*(1-p)
76
Applications of normal
distribution
• Rules of thumb: 95% probability that
value < 2 standard deviations from mean
• Storage processes (e.g., dams)
– Amount in inventory = sum of many
independent contributions and withdrawals
• Insurance pool losses
– Amount paid out in a year = sum of losses on
many independent policies
• Diffusion processes (e.g., random walk)
– Stock price movements
77
Example calculation using normal
• A machine produces parts with widths
normally distributed, having mean 4 mm
and s.d. 0.0019 mm.
• A part fails if its width is more than 0.005
millimeters away from 4 mm.
• What is the probability that a part fails?
78
Example calculation using normal
• Part fails if its width is more than
0.005/0.0019 = 2.6316 standard deviations
away from mean.
• For standard normal distribution, N(0, 1), P(X
is more than 2.63158 s.d. from mean) =
pnorm(-2.6316,0,1) + 1 - pnorm(2.6316,0,1) =
2*pnorm(-2.6316,0,1) = 0.008498385
79
Example calculation using normal
• In general, P(normally distributed X lies more
than d standard deviations from mean) =
2*pnorm(-d,0,1)
• Example: If d = 0.005/0.0026 = 1.923
standard deviations, then probability of failure
is 2*pnorm(-1.923,0,1) = 0.0545
80
Example calculation
• Purchases at a retail store have a mean of
$14.31 and a standard deviation of $6.40.
Amounts are approximately normally
distributed. What percentage of
purchases are under $10?
81
Example calculation
• Purchases at a retail store have a mean of
$14.31 and a standard deviation of $6.40.
Amounts are approximately normally
distributed. What percentage of
purchases are under $10?
• The probability of being at least 4.31/6.4
standard deviations below the mean is:
pnorm(-4.31/6.4, 0, 1) = 0.2503345
82