Download Statistical patterns - The University of Chicago Booth School of

Document related concepts

Choice modelling wikipedia , lookup

Transcript
Statistical patterns
Business Statistics 41000
Fall 2015
1
Topics
1. Probability rules
2. Random variables and distributions
3. Expected value
4. Using statistics to make decisions
2
Statistical pattern
A statistical pattern is a pattern which holds only approximately.
For example, most professional basketball players are tall.
It isn’t necessary to be tall, nor is it sufficient to be tall.
But, as a generality, NBA players are tall.
This generality may not hold for any given player, but as a statement
about the aggregate population of NBA players, it is valid statistically
speaking.
3
Topic: probability
Probability is a language for talking and thinking about such statistical
patterns. The key idea is to assign a number between 0 and 1 to each
event, which reflects how likely that event is to occur. The language has
three rules:
1. If an event A is certain to occur, it has probability 1, denoted
P(A) = 1.
2. If two events A and B are mutually exclusive (both cannot occur
simultaneously), then P(A or B) = P(A) + P(B).
3. P(not-A) = 1 − P(A).
See OpenIntro sections 2.1 and 2.6.1.
4
Key example: random draws from a database
A critical example moving forward will be the idea of randomly selecting
items from a database of, say:
I
Customers at a grocery store with purchase histories.
I
Stock price histories for publicly traded firms.
I
Patients at a doctors office along with symptoms and diagnoses.
I
Anything of interest that one might measure and recorded...
This framework allows us to talk about the proportion of items in the
database satisfying a given property.
5
Probability is basically just fractions
The following statements are equivalent:
I
“25% of Dr. Smith’s patients got a flu shot.”
I
“1 in 4 of Dr. Smith’s patients received a flu shot.”
I
“The probability that a randomly selected patient from Dr. Smith’s
patient database got a flu shot is 1/4.”
I
“The probability you had a flu shot if you were a patient of Dr.
Smith’s is 0.25.”
6
Classic example: fair six-sided die
The possible outcomes are {1, 2, 3, 4, 5, 6} or {one, two, three, four, five, six}
according to our need. This is called our sample space. Each of these
events has probability 61 .
Event
Probability
one
1
6
1
6
1
6
1
6
1
6
1
6
two
three
four
five
six
7
Example: fair six-sided die (cont’d)
We can calculate the probabilities of compound events, for example:
Event
Satisfying outcomes
Probability
Starts with a consonant.
two, three, four, five, six
5
6
Divides evenly by 3.
3, 6
2
6
=
1
3
Has three letters.
one, two, six
3
6
=
1
2
Greater than 2.
3,4,5,6
4
6
=
2
3
8
Famous example: Monty Hall problem
You’re on a game show, and are given a choice of three doors: behind
one door is a new car; behind the others, goats. You pick a door. What
is your probability of winning?
Let A = “door you picked has the car behind it”. Because there are three
doors, only one of which has the car, we say P(A) = 1/3. This gives
equal weight to the hypothetical worlds where the car is behind door one,
door two or door three.
9
Example: Monty Hall problem (cont’d)
Now the twist. The host – who knows what’s behind the doors – opens
one of the other two doors to reveal a goat. He asks “would you like to
switch doors?”
Which strategy do you chose?
10
Example: Monty Hall problem (cont’d)
Under the “always switch” policy, all of the instances where you would
have won, you lose, and vice versa. So if we switch, we win whenever
not-A occurs.
We conclude that:
P(win if stay) = P(A) = 1/3,
so
P(win if switch) = P(not-A) = 1 − P(A) = 2/3.
11
Visualizing probability
It may be helpful to think about probability visually, as an area. The
larger the area, the higher the probability.
A
B
You can think about the random process in terms of throwing darts at a
painted target.
12
Visualizing probability (cont’d)
Imagine picking a vacation destination by throwing darts at a map. We’d
likely end up somewhere rural west of the Mississippi river.
13
Definition: the overlap equation
P(A or B) = P(A) + P(B) − P(A & B).
A
B
This formula makes sure we don’t double count the overlap region. What
does it mean if P(A & B) = 0?
14
Idea: new events from old
We can describe the yellow region in terms of the events A and B as
Y = not-(A or B).
A
B
From which we can determine that
P(Y ) = 1 − {P(A) + P(B) − P(A & B)}.
15
Idea: new events from old (cont’d)
This yellow region can be expressed as Y = A & not-B.
A
B
And we can find that P(Y ) = P(A) − P(A & B)}.
16
Remark: nothing special about rectangles
All the same rules apply.
B
A
Similarly, what might “A” and “B” stand for? Why might knowing these
rules be useful?
17
Example: the “Linda” problem
Linda is 31 years old, single, outspoken, and smart. She was a philosophy
major. When a student, she was an ardent supporter of Native American
rights, and she picketed a department store that had no facilities for
nursing mothers. Rank the following statements in order of probability
from 1 (most probable) to 6 (least probable).
a Linda is an active feminist.
b Linda is a bank teller.
c Linda works in a small bookstore.
d Linda is a bank teller and an active feminist.
e Linda is a bank teller and an active feminist who takes yoga classes.
f Linda works in a small bookstore and is an active feminist who takes yoga
classes.
18
Example: the “Linda” problem (cont’d)
Most respondents didn’t realize that the probability of a conjunction of
two events is less than (or equal to) the probability of each of the
individual events: P(A & B) ≤ P(A) and P(A & B) ≤ P(B).
a Linda is an active feminist.
b Linda is a bank teller.
c Linda works in a small bookstore.
d Linda is a bank teller and an active feminist.
e Linda is a bank teller and an active feminist who takes yoga classes.
f Linda works in a small bookstore and is an active feminist who takes yoga
classes.
For example, we can determine that P(e) ≤ P(a) and P(f ) ≤ P(c),
irrespective of the background info we have on Linda.
19
Idea: decomposing events into disjoint (mutually exclusive)
pieces
Why must P(d) ≤ P(a)?
a Linda is an active feminist.
b Linda is a bank teller.
d Linda is a bank teller and an active feminist.
We can think of a as being the union of two disjoint groups: feminists
who are bank tellers and feminists who are not bank tellers. We find that
P(a) = P(a & b) + P(a & not-b),
≥ P(a & b).
20
Idea: decomposing events into disjoint (mutually exclusive)
pieces (cont’d)
Let a =“feminists” and b =“bank tellers”.
a
a&b
b
The blue region is non-bank teller feminists and the cross-hatched region
is bank teller feminists. They are both sub-regions of the event a =
“feminists” and so necessarily have a smaller area.
21
Definition: the Law of Total Probability
The Law of Total Probability
P(A) = P(A & B) + P(A & not-B).
If the event A can happen in several mutually exclusive ways, to find the
overall probability of event A we add up the ways.
22
Definition: the Law of Total Probability (cont’d)
We can extend this idea to more than two disjoint events. If A can
happen in n mutually exclusive ways we can write
The Law of Total Probability
P(A) = P(A & B1 ) + · · · + P(A & Bn ),
n
X
=
P(A & Bj ).
j=1
This gives a slogan for the LoTP: Overall probability is “a sum of
separate ways”.
23
Example: different colored coupes
For example, let A =“is a two-door vehicle” and Bj denotes different
colors; B1 =“red”, B2 =“blue”, etc.
The overall probability of two-door cars can be expressed as:
P(A) = P(A & B1 ) + · · · + P(A & Bn ),
n
X
=
P(A & Bj ).
j=1
How many total different colors of car does this equation imply?
24
Jargon: Odds vs. Probability
The odds of an event are related to, but distinct from, the probability of
the event. The “odds in favor of event A” is defined as
P(A)
P(A)
=
.
P(not-A)
1 − P(A)
The “odds against A” is defined as
P(not-A)
1 − P(A)
=
P(A)
P(A)
25
Jargon: Odds vs. Probability (cont’d)
In a gambling setting, odds are given as odds against. So if A =
“Polson-Pony wins the derby” with P(A) = 0.20 the corresponding odds
would be,
1 − P(A)
P(not-A)
=
P(A)
P(A)
0.8
=
.
0.2
or 4:1.
For bookies to make money, the stated odds are typically not the actual
probabilities (or even their best guess of them).
26
Example: dutch books
Your buddy wants to bet on the Bulls-Pacers game. He sets his odds by
judging that each team’s probability of winning is equal to their current
winning percentage, 0.50 (6-6) and 0.417 (5-7) respectively. So he gives
odds of 1:1 and 7:5.
You don’t even follow the NBA, but you jump at the chance, placing two
bets with him: $50 on the Pacers and $60 on the Bulls.
Event
Bulls bet
Pacers bet
Total profit
Bulls win
+$60
-$50
+$10
Pacers win
-$60
+$70
+$10
No matter what happens you are guaranteed to take his money! What
rule did he violate?
27
Topic: random variables
A random variable refers to situations where the “event” in question is a
numerical measurement; i.e. the number of annual office visits a patient
makes, the dollar amount a customer spends, the height of a professional
athlete.
More formally, a random variable assigns a number to each outcome in a
sample space.
The simplest example is the venerable coin toss. The outcomes are
HEAD or TAIL. We might “code” this as HEAD = 1 and TAIL = 0.
See OpenIntro sections 2.4 and 2.6.4.
28
Jargon: dummy variable
A random variable that assigns a 1 when some event occurs and a 0
otherwise is called a dummy variable. It is called this because the
number 1 “stands in” for the event.
Event
Value
Probability
HEAD
1
TAIL
0
1
2
1
2
As before, we assign each outcome a probability, which together
constitute the distribution of the random variable. It describes how the
total probability mass is distributed across the various outcomes.
29
Example: Bernoulli random variable
The distribution of a dummy variable can be described with a single
number. (Why only one and not two?) We refer to any such variable as
a Bernoulli random variable with parameter p.
Event
X
Probability
Obama wins
1
p
Romney wins
0
1−p
(It is standard to use the letter p in this context, but it is just a
place-holder and any other would do as well: q or a or maybe η or ξ if
you prefer Greek.)
30
Definition: Bernoulli RVs
Event
x
P(X = x)
A
1
p
not-A
0
1−p
For any parameter p between 0 and 1 this describes a valid probability
distribution.
It may seem natural to call p here a “variable”, but we avoid the urge
because it collides with the “random variable” terminology, so we call it a
“parameter” instead.
31
Example: uniform multiple outcomes
More generally there can be any number of outcomes. Recall our fair die
example.
X
x
p(x) = P(X = x)
1
1
6
1
6
1
6
1
6
1
6
1
6
2
3
4
5
6
Random variables are commonly denoted by capital letters as shorthand
for the whole list of possible outcomes. Individual outcomes are referred
to by the same letter, but in lower case.
32
Example: general discrete distribution
Of course, the probabilities need not be equal.
X
x
P(X = x)
-10
√
2
0.02
20
0.30
40
0.07
0.61
The outcomes can be positive or negative, integers or real numbers, etc.
33
Remark: why RVs?
The probability framework does not require the outcomes to be numbers.
But working with numerical outcomes is useful:
I
Many outcomes we care about are already numerical: prices,
temperatures, distances, etc.
I
Even if the outcomes are qualitative, our eventual analysis often
assigns costs to these outcomes which are numerical.
I
Defining compound events is natural when the outcomes are
orderable, e.g. P(X ≤ 4) or P(2 ≤ X ≤ 4) or P(10 ≤ X ).
I
Because numerical outcomes can be ordered, plotting distributions is
possible.
34
Plotting distributions
0.6
0.4
0.2
0.0
Probability
0.8
1.0
The rationale behind the term distribution makes good sense pictorially.
1
2
3
4
Value
35
Plotting distributions (cont’d)
0.08
0.04
0.00
Probability
0.12
Plotting is especially helpful when the variable can take many different
values, making tables cumbersome.
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Value
36
At-bat outcomes
We can associate a 0, 1, 2, 3, or 4 to the outcomes of a baseball at-bat.
Event
x
P(X = x)
Out
0
0.82
Base hit
1
0.115
Double
2
0.033
Triple
3
0.008
Home run
4
0.024
37
Example: at-bat outcomes
We can plot this distribution.
0.4
0.2
0.0
Probability
0.6
0.8
At-bat results
0
1
2
3
4
Bases attained
38
Example: medical expenditures
We can “bin” household medical expenditures and think of the
distribution over medical expenses.
Event
Between 0 and $100
Between $100 and $1000
Between $1000 and $5000
x
P(X = x) × 10, 000
50
2,600
550
3,300
3K
2,500
Between $5000 and $10,000
7.5K
800
Between $10,000 and $20,000
15K
500
Between $20,000 and $30,000
25K
200
Between $30,000 and $40,000
35K
60
Between $40,000 and $50,000
45K
30
Between $50,000 and $100,000
75K
7
Between $100,000 and $600,000
350K
3
39
Example: daily high temps for Chicago
The Midway weather station has records going back to 1929.
0.000
0.005
Probability
0.010
0.015
Chicago Daily High Temps
Temperature in Degree Fahrenheit
40
Example: height of NBA players
NBA players tend to be very tall. Can we say more than that?
0.04
0.02
0.00
Probability
0.06
0.08
NBA heights
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Height in Inches
41
Jargon: measures of central tendency
Although random variables can take many different values it is helpful to
think about them having a “tendency” or a general location. So, not all
basketball players are tall, but most are. Can we be more precise than
this?
A not-so-common example would be the mid-range: one-half the
difference between the maximum and minimum value. Why might the
mid-range not be very informative?
We will focus on three common measures of central tendency, in turn:
mean, median and mode.
42
Definition: mean
By far the most common measure of central tendency is the mean.
Mean
The mean of a random variable X is defined as
E(X ) =
J
X
xj P(X = xj ).
j=1
The mean is also called expectation, expected value, arithmetical
average, or first moment.
43
Example: mean of a Bernoulli RV
We can use the definition to compute the mean of a Bernoulli random
variable in terms of its probability parameter p.
E(X ) = 0(1 − p) + 1(p)
= p.
This is the advantage of coding dummy variables as 0 and 1 instead of
other arbitrary numbers.
44
Example: at-bats
Event
x
P(X = x)
Out
0
0.82
Base hit
1
0.115
Double
2
0.033
Triple
3
0.008
Home run
4
0.024
For our “bases” random variable we can calculate the mean as:
E(X ) =
J
X
xj P(X = xj )
j=1
= 0(0.82) + 1(0.115) + 2(0.033) + 3(0.008) + 4(0.024)
= 0.301.
45
Example: medical expenditure
For our medical costs random variable we can calculate the mean as:
E(X ) =
J
X
xj P(X = xj )
j=1
= 50(0.26) + 550(0.33) + 3000(0.25) + 7500(0.08)
+ 15000(0.05) + 25000(0.02) + 35000(0.006)
+ 45000(0.003) + 75000(0.0007) + 350000(0.0003),
= 3297.
46
Example: NBA heights
The calculation is too long to show (but easy with a computer), so we
show it pictorially.
0.04
0.00
0.02
Probability
0.06
0.08
NBA heights
60
62
64
66
68
70
72
74
76
78
80
82
84
86
88
90
92
94
96
98 100
Height in Inches
The mean is approximately 79 inches (6 foot, 7 inches)!
47
Example: temps
Similarly, the mean daily high temperature in Chicago is about 59
degrees.
0.010
0.005
0.000
Probability
0.015
Chicago Daily High Temps
Temperature in Degree Fahrenheit
48
Mental image: balancing point
You can think of means as “balancing points”.
160.2
0
0.5
72
0.35
900
0.15
In fact, this is where the term “moment” comes from, an analogy with
the physics terminology.
49
Definition: median
Informally, the median of a random variable is the value where it’s just
as likely to see a value below it as it is to see a value above it.
Median
A random variable X has median m if
P(X ≤ m) ≥
1
1
and P(X ≥ m) ≥ .
2
2
The non-strict inequalities (“less-than OR equal-to”) are important here.
There can be more than one!
50
Example: bases
To find a median, we can sum up probabilities of outcomes, from
smallest to largest, stopping once we get over 12 .
Event
x
Out
0
Base hit
1
Double
2
Triple
3
Home run
4
P(X = x)
820
1000
115
1000
33
1000
8
1000
24
1000
In this case 0 is the median: P(X ≤ 0) = 0.82 and P(X ≥ 0) = 1.
51
Example: Bernoulli RV
The median of a Bernoulli random variable can be written in terms of p.
(
0 if p ≤ 21
m(p) =
1 if p > 12 .
52
Examples: weather, NBA height, and medical costs
For our other three examples we find:
I
P(high temp ≤ 60) = 0.5068 and P(high temp ≥ 60) = 0.5057.
I
P(height ≤ 79) = 0.535 and P(height ≥ 79) = 0.547.
I
P(med. costs ≤ $550) = 0.59 and P(med. costs ≥ $550) = 0.74.
53
Definition: mode
The mode of a distribution is its most likely value.
Mode
For random variable X , m is a mode of its distribution if
P(X = m) ≥ P(X = m0 ) for m0 =
6 m.
As with the median, there can be multiple such values.
54
Remark: local versus global modes
The mode refers to the globally most likely value. But for distributions
with many possible outcomes, we sometimes refer to “local” modes:
isolated peaks of the distribution plot.
0.000
0.005
Probability
0.010
0.015
Chicago Daily High Temps
Temperature in Degree Fahrenheit
Here the global mode is 81 degrees. But there is a second local mode at
37 degrees. We say that this distribution is multimodal.
55
Remark: local versus global modes (cont’d)
A distribution with a single mode, like the NBA heights, is said to be
unimodal.
0.04
0.00
0.02
Probability
0.06
0.08
NBA heights
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Height in Inches
The modal height is 80 inches.
56
mean6=median6=mode
As we have already observed, these measures of central tendency differ
from one another.
variable
mean
median
mode
bases
0.301
0
0
med. costs
3,297
550
550
NBA height
78.89
79
80
High temp.
58.8
60
81
Which is a better summary depends on its intended use. Note that the
mean does not have to be one of the attainable values.
57
Definition: skewness
Skewness
The distribution of a random variable X is said to be right skewed if
E (X ) m, where m is the median. It is said to be left skewed if
E (X ) m. It is not skewed if E (x) ≈ m.
Some sources define skewness quantitatively, but we will use this notion
qualitatively.
58
Example: skewed medical expenditures
Medical expenditures are strongly right skewed.
x
P(X = x) × 10, 000
50
2,600
Between $100 and $1000
550
3,300
Between $1000 and $5000
3K
2,500
Between $5000 and $10,000
7.5K
800
Between $10,000 and $20,000
15K
500
Between $20,000 and $30,000
25K
200
Between $30,000 and $40,000
35K
60
Between $40,000 and $50,000
45K
30
Between $50,000 and $100,000
75K
7
Between $100,000 and $600,000
350K
3
Event
Between 0 and $100
The median is $550 but the mean is about $3,300. The relatively low
probability of having a very large expenditure drives up the mean.
59
Skewness examples (cont’d)
We can check which of our previous examples exhibited skewness.
variable
mean
median
mode
skewness
bases
0.301
0
0
right
med. costs
3,297
550
550
right
NBA height
78.89
79
80
none
High temp.
58.8
60
81
none
60
Idea: dispersion
Measures of central tendency are not the whole story. The variability
about the trend can also be important. Informally, we call this
dispersion.
To see that neither mean, nor median, nor mode, nor skewness tell the
whole story, consider a symmetric, unimodal distribution, one for which
the mean, median and mode are all the same.
Measuring how variable random outcomes are can be very important
practically. Can you think of any examples?
61
Definition: variance
A common measure of the spread of a random variable is the variance.
Variance
The variance of a random variable X with distribution p(x) is defined as
V(X ) =
J
X
2
(xj − E(X )) p(xj ).
j=1
See OpenIntro page 107, equation 2.72.
62
Definition: standard deviation
The standard deviation is the square-root of the variance.
Standard deviation
The standard
p deviation of a random variable X with variance V(X ) is
given by V (X ).
The standard deviation has the advantage that it has the same units as
the random variable itself.
63
Example: Bernoulli RVs
We find the variance is of a Bernoulli RV with parameter p is
V (X ) = (0 − p)2 (1 − p) + (1 − p)2 p
= p 2 (1 − p) + (1 − p)2 p
= (p 2 + (1 − p)p)(1 − p)
= p(1 − p)
The standard deviation is therefore
p
p(1 − p)).
64
Example: bases earned
Event
x
p(x)
Out
0
0.82
Base hit
1
0.115
Double
2
0.033
Triple
3
0.008
Home run
4
0.024
For our “bases” random variable we can calculate the variance as:
V(X ) =
J
X
(xj − E(X ))2 p(xj )
j=1
= (0 − 0.301)2 (0.82) + (1 − 0.301)2 (0.115)+
(2 − 0.301)2 (0.033) + (3 − 0.301)2 (0.008)+
(4 − 0.301)2 (0.024)
= 0.6124.
The standard deviation is then
√
0.6124 = 0.782.
65
Topic: statistical prediction
Suppose we are tasked with predicting some random event.
Assuming that we know the distribution of the random variable in
question, how should we make our prediction?
The answer depends on how we judge the goodness of our predictions.
We do this by defining a utility function. Then, we figure out what
prediction (action) gives us the best expected utility — the best long
run average utility.
66
Definition: expected value
An expectation is a probability-weighted sum taken over the possible
values of a random variable.
Expectation
The expectation or expected value of a function g (X ) is defined as:
E (g (X )) =
J
X
g (xj )p(xj ).
j=1
The mean is the expected value of the identity function g (x) = x.
See OpenIntro 2.4.1.
67
Aside: computational shortcut for variance
A convenient way to compute the variance is via the identity
V(X ) =
J
X
2
(xj − E(X )) p(xj ),
j=1
= E(X 2 ) − E(X )2 .
In words: the variance is the “expected value of the square minus the
square of the expected value.”
68
Properties of expectation
Here are some rules that make working with calculating expectations
easier.
Properties of expectation operator
For any random variables X and Y and any constant number c the
following properties hold.
I
E(X + c) = E(X ) + c.
I
E (X + Y ) = E (X ) + E (Y ).
E(cX ) = cE(X ).
I
The facts are not hard to show directly from the definition.
(We will make sense of an expression like X + Y in lecture 3.)
69
Expected utility
Denote your utility of action a by u(a, x) for a given value x (of random
variable X ).
Maximum expected utility principle
Among all possible actions a, chose the action a∗ that maximizes
E (u(a, X )).
You will not necessarily win the most often, but you will in aggregate/in
the long-run get the most utility.
70
Example: chuck-a-luck
It costs $1 to play the following gambling game where the payoff depends
on the outcome of a single roll of a six-sided die.
If you roll a 1,2,3 or 4, you win the dollar amount of the number rolled.
So if you roll a 1, you get your $1 back. If you roll a 2, you make a dollar,
etc. If you roll a 5 or a 6, you win nothing (and so lose a dollar overall).
What is the expected value of the game? Is it worth playing?
71
Example: chuck-a-luck (version two)
Now the rules are reversed. If you roll a 1,2,3 or 4, you win nothing. If
you roll a 5 or a 6, you win the corresponding dollar amount.
How much would you be willing to play this game?
72
Example: predicting milk demand
Suppose you own a cafe.
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Gallons of Milk Needed
1
2
3
4
5
6
7
8
9
10
On any given day you run through between one and ten gallons of milk,
with the shown probabilities. You have to order your daily milk delivery in
advance: how many should you order?
73
Example: predicting milk demand (cont’d)
Buying too much milk and not having enough are not necessarily equally
bad.
If you buy too much, you overpaid for unneeded milk. Let’s say this costs
us $5 a gallon.
On the other hand, for every gallon of milk you end up needing, but
don’t have, you lose (say) five customers who wanted lattes. Not only do
you forfeit the profit from the latte, but you forfeit the chips and John
Mayer CD they might have bought too if they didn’t end up going to the
cafe across the street. Let’s put this loss at $35 a gallon.
This reasoning suggests it is generally better to have milk and not need
it, than to need milk and not have it. Can we quantify this?
74
Example: predicting milk demand (cont’d)
Our milk demand random variable is:
x
P(X = x)
1
4%
2
15 %
3
35%
4
5%
5
5%
6
5%
7
5%
8
20%
9
3%
10
.
3%
Our utility function is
(
−$5(a − x)
u(a, x) =
−$35(x − a)
if a > x,
if x > a.
where the action a is the number of gallons we order and the “state” x is
the amount of milk required.
We must now compute E(u(a, X )) for each possible value of
a = 1, . . . , 10 and order the number of gallons for which this is largest.
75
Example: predicting milk demand (cont’d)
Let’s work through an example. For a = 2 we have
E(u(a, X )) =
10
X
u(2, x)P(X = x)
x=1
= −5(0.04) + 0(0.15) − 35(0.35) − 2(35)(0.05)
− 3(35)(0.05) − 4(35)(0.05) − 5(35)(0.05)
− 6(35)(0.2) − 7(35)(0.03) − 8(35)(0.03)
= −$94.7
76
Example: predicting milk demand (cont’d)
Computing the rest is easy with a computer:
The results are
a
1
2
3
4
5
6
7
8
9
10
.
E (u(a, X )) -128.1 -94.7 -67.3 -53.9 -42.5 -33.1 -25.7 -20.3 -22.9 -26.7
We order 8 gallons.
77
Food for thought: doctors versus patients
Consider a scenario where a patient is offered two treatments for a deadly
disease. The first treatment works 60% of the time, and the second
treatment works only 50% of the time.
What is the doctor’s rationale for recommending the first treatment?
Can the patient appeal to a similar rationale?
One of the most important things you can learn in this class is to
distinguish between one shot versus repeated decision scenarios. That
is, are you the doctor or the patient?
If you are the doctor, you can use the laws of probability to calculate your
optimal statistical decision.
78