Download Handout - Amherst College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Amherst College
Department of Economics
Economics 360
Fall 2012
Monday, September 10 Handout: Random Processes, Probability, Random
Variables, and Probability Distributions
Preview
• Random Processes and Probability
o Random Process: A process whose outcome cannot be predicted with certainty.
o Probability: The likelihood of a particular outcome of a random process.
• Random Variable: A variable that is associated with an outcome of a random process; a
variable whose numerical value cannot be determined beforehand.
o Discrete Random Variables and Probability Distributions
ƒ Probability Distribution: Describes the probability for all possible values
of a random variable.
ƒ A Random Variable’s Bad News and Good News.
ƒ Relative Frequency Interpretation of Probability: When a random
process is repeated many, many times, the relative frequency of an
outcome equals its probability.
o Describing a Probability Distribution
ƒ Center of the Distribution: Mean
ƒ Spread of the Distribution: Variance
o Continuous Random Variables and Probability Distributions
• Estimation Procedures
o Clint’s Dilemma: Assessing Clint’s Political Prospects
o Center of an Estimate’s Probability Distribution: Mean
o Spread of an Estimate’s Probability Distribution: Variance
Random Processes and Probability
Experiment: Random card draw from a deck composed of the 2♣, 3♥, 3♦, and 4♥.
• Shuffle the 4 cards thoroughly.
• Draw one card and record it.
• Replace the card.
Computing Probabilities
There is ___ chance in ___ of drawing the 2♣; therefore, Prob[2♣] = ____.
There is ___ chance in ___ of drawing the 3♥; therefore, Prob[3♥] = ____.
There is ___ chance in ___ of drawing the 3♦; therefore, Prob[3♦] = ____.
There is ___ chance in ___ of drawing the 4♥; therefore, Prob[4♥] = ____.
Random Variable: A variable whose value _________ be predicted beforehand with certainty.
•
A discrete random variable can only take on a countable number of ____________ values.
•
A continuous random variable can take on a _____________________________ of values.
2
Discrete Random Variables and Probability Distributions
An Example: Define the random variable v:
v = “Value” of the selected card: 2, 3, or 4.
Probability Distribution of
Numerical Values
Question: What do we know about v beforehand?
Answer: While we cannot determine the value of v
beforehand, we can calculate its probability distribution.
Card Drawn
v
Prob[v]
2♣
2
____ = ____
3♥ or 3♦
3
_______= ____ = ____
4♥
4
____ = ____
.50
.25
2
3
4
v
NB: The probabilities must sum to ___. Why?
A Random Variable’s Bad News and Good News: Beforehand, that is, before the experiment
is conducted:
• Bad News: We cannot determine the numerical value of the random variable with certainty.
• Good News: On the other hand, we can often calculate the random variable’s probability
distribution telling us how likely it is for the random variable to equal each of its possible
numerical values.
Card Draw Simulation: Illustrating the Relative Frequency Interpretation of Probability
Default specification: 2♣, 3♥, 3♦, and 4♥.
Repetitions > 1,000,000:
2 of Hearts
Value
Relative Frequency
2 of Diamonds
2
_______
2 of Clubs
3 of Spades
3
_______
Cards selected to
3 of Hearts
be in the deck
4
_______
3 of Diamonds
3 of Clubs
Question: How are probabilities and
4 of Spades
Card drawn
relative frequencies related?
4 of Hearts
in this repetition
Histogram of
Numerical Values
Start
Repetitions
.50
Value
.25
Mean
Var
2
3
4
v
Stop
Pause
Value of card drawn
in this repetition
Mean (average) of the
numerical values of
the cards drawn from
all repetitions
Variance of the
numerical values of
the cards drawn from
all repetitions
Relative Frequency Interpretation of Probability: After many, many repetitions of the
experiment, the distribution of the actual numerical values mirrors the random variable’s
probability distribution.
3
Question: How can we describe the general properties of a random variable; that is, how
can we describe the probability distribution of a random variable?
• Center of its probability distribution: Mean
• Spread of its probability distribution: Variance
Center of the Probability Distribution: Mean (Expected Value) of the Random
Variable – The average of the numerical values of v after many, many repetitions of the
experiment.
NB: The mean of a random variable is often called the expected value.
After many, many repetitions v will be
• 2 about a quarter of the time
• 3 about a half of the time
• 4 about a quarter of the time
On average, the outcome, v, will be _____.
More formally,
Σ
Mean[v] =
all v
For each possible value, multiply the
value and its probability; then, add.
v Prob[v]
v=2
↓
=
v=3
↓
_____×
=
_________
v=4
↓
+
_____×
+
_____×
+
________
+
__________
= ____________
Spread of the Probability Distribution: Variance of the Random Variable – The
average of the squared deviations of the numerical values from their mean after many,
many repetitions of the experiment:
• For each possible value of the random variable, calculate the deviation from the mean;
• Square the each value’s deviation;
• Multiply each value’s squared deviation by the value’s probability;
• Sum the products.
Deviation From
Squared
Card Drawn v Mean[v]
Mean[v]
Deviation
Prob[v]
2♣
2
3
_______
_______
3♥ or 3♦
3
3
_______
_______
4♥
4
3
_______
_______
Var[v] =
Σ
2
all v
(v − Mean[v]) Prob[v]
v=2
↓
=
=
_____×
_________
v=3
↓
v=4
↓
+
_____×
+
_____×
+
________
+
__________
1
=
4
1
=
2
1
=
4
.25
.50
.25
For each possible value, multiply
the squared deviation and its
probability; then, add.
=
____________
NB: The distribution mean and variance are general properties of the random variable:
• The mean represents the center of the random variable’s distribution.
• The variance represents the spread of the random variable’s distribution.
4
Card Draw Simulation: Checking Our Math
Default specification: The 2♣, 3♥, 3♦, and 4♥ are included in a deck of four cards.
Repetitions > 1,000,000
Mean
______
Variance
______
After many, many repetitions of the experiment:
• The mean reflects the center of the distribution; more specifically, the mean equals
the average of the numerical values after many, many repetitions of the experiment.
• The variance reflects the spread of the distribution.
NB: Value of Simulations: By exploiting the relative frequency interpretation of probability
(after many, many repetitions of the experiment, the distribution of the actual numerical
values mirrors the random variable’s probability distribution), we can use simulations to
reveal the probability distribution. That is, simulations allow us to confirm our logic.
Continuous Random Variables and Probability Distributions
Eighteen Hole
An Example: Dan Duffer
• Good news: Dan Duffer consistently
hits 200 yard drives from the tee.
• Bad News: His drives can land up to 40
yards to the left and up to 40 yards to
the right of his target point.
• Suppose that Dan’s target point is the
center of the fairway.
• The fairway is 32 yards wide 200 yards
from the tee.
Fairway
32 yards
Lake
Target
Left
Rough
200 yards
Let v equal the lateral distance from Dan’s
target point. A negative v indicates that the
drive went to the left; a positive v indicates
that the drive went to the right.
Right
Rough
A continuous random variable, unlike a
discrete random variable, can take on a
continuous range of values, a
______________ of values.
Tee
Probability Distribution
.025
v is a ______________ random variable
.020
What does v’s probability distribution
suggest?
.015
.010
What is the area beneath the probability
distribution? Applying the equation for the
area of a triangle:
Area Beneath = ___________ + ___________
= ___________ +
= ___________
What does this imply?
___________
.005
-40
-32
-24
-16
-8
0
8
16
24
32
40
v
5
Let us now calculate some probabilities:
• What is the probability that Dan’s drive will land in the left rough?
Prob[Drive in Left Rough] = Prob[v Less Than −16]
=
•
= ______
What is the probability that Dan’s drive will land in the lake?
Prob[Drive in Lake] = Prob[v Greater Than +16]
=
•
____________________
____________________
= ______
What is the probability that Dan’s drive will land in the fairway?
Prob[Drive in Fairway] = Prob[v Between −16 and +16]
=
____________________
=
______
Prob[Drive in Left Rough] + Prob[Drive in Lake] + Prob[Drive in Fairway] = _____ + _____ + _____
= _____
What does this imply?
Clint Ton’s Dilemma
On the day before the election, Clint must decide whether or not to hold a pre-election party:
• If he is comfortably ahead, he will not hold the party; he will save his campaign
funds for a future political endeavor (or perhaps a vacation to the Caribbean next
January).
• If he is not comfortably ahead, he will fund a party to try to sway some voters.
There is not enough time to poll every member of the student body, however. What should
he do?
Econometrician’s Philosophy: If you lack the information to determine the value directly, do the
best you can by estimating the value using the information you do have.
Clint’s Opinion Poll: Poll a sample of the population
• Questionnaire: Are you voting for Clint?
• Procedure: Clint selects 16 students at random and poses the question.
• Results: 12 students report that they will vote for Clint and 4 against Clint.
12
3
Estimate Fraction of the Population Supporting Clint = 16 = 4 = .75
Clint wishes to use the information collected from the sample to draw inferences about the
entire population. Seventy-five percent, .75, of those polled support Clint. This suggests that
Clint leads, does it not?
Clint’s Dilemma: Should Clint be confident that he has the election in hand or should he
fund the party?
6
Polling Simulation: Learning More about Clint’s Polling Procedure
Questionnaire: Are you voting for Clint?
Terms
ActFrac = Actual
Fraction of the
Population Supporting
Clint
EstFrac = Estimated
Fraction of the
Population Supporting
Clint
ActFrac
Actual
Population
Fraction
To decide how much
confidence Clint should have,
we shall learn a little more
about the polling procedure. A
simulation will help us.
Sample Size
Sample
Size
10
16
25
50
.1
.2
.3
.4
.5
.6
.7
Start
Repetition:
Stop
Pause
Mean (average) of the
numerical values of
the sample fraction
from all repetitions
EstFrac
In a simulation, we can do
something that we cannot do
in the real world. We can
Numerical value
Mean
Variance of the
specify the actual proportion of
of the estimated
numerical values of
Var
the population, ActFrac, and
fraction in this
the sample fraction
then observe the estimated
repetition
from all repetitions
fraction, EstFrac, when we
conduct a poll. In this way, we
can learn more about the polling procedure itself. To do so, suppose that the election is a tossup; that is, suppose that the actual population fraction supporting Clint, ActFrac, equals .5.
Sample Size = 16
ActFrac = .50
Repetition
Number
Supporting
Clint
EstFrac
1
______
______
2
______
______
3
______
______
4
______
______
5
______
______
Observations:
• The estimated fraction, EstFrac, is a random variable. Even if we knew the actual
fraction supporting Clint, ActFrac, we could not predict EstFrac before the poll.
• Only occasionally does the estimated fraction, EstFrac, in one repetition of the poll
equal the actual population fraction.
• When the election is actually a toss-up, it is entirely possible that 12 or even more of
the 16 students polled will support Clint.
7
Populations and Samples: Estimates and Actual Values
Question: How can sample information be used to draw inferences about the entire
population? This is the question Clint must address.
We begin with an unrealistic, but instructive, example. So, please be patient.
Sample Size of One
Questionnaire: Are you voting for Clint?
Experiment: Write the names of every individual in the population on a 3x5 card, then
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
The random variable v:
v = 1
if the individual polled supports Clint.
= 0
otherwise
Question: Can we determine with certainty the numerical value of v before the
experiment is conducted? ______. Hence, v is a ______________ variable.
Question: What can we say about the random variable v beforehand?
Answer: _______________________________________________.
Question: How can we describe the probability distribution?
Answer: _______________________________________________.
For the moment, continue to assume that the population is split evenly; that is, suppose
that half the population supports Clint and half does not:
Individual’s
Response
v
Prob[v]
For Clint
1
______
Not for Clint
0
______
____
For Clint
v
1
Prob
____
Not for Clint
0
____
Individual
____
Center of the Probability Distribution: Mean. The average of the numerical values
after the many, many repetitions of the experiment.
After the many, many repetitions of the experiment, v will equal
• 1 about half of the time
• 0 about half of the time
On average, what will the numerical value of v equal? _____.
Mean[v] =
Σ
all v
v Prob[v]
v=1
↓
Mean[v]
=
=
_____×
_________
v=0
↓
+
_____×
+
________
For each possible value, multiply the
value and its probability; then, add.
= ____________
8
Spread of the Probability Distribution: Variance. The average of the squared
deviations of the numerical values from their mean after many, many repetitions of
the experiment:
• For each possible value, calculate the deviation from the mean;
• Square each value’s deviation;
• Multiply each value’s squared deviation by the value’s probability;
• Sum the products.
Individual’s
Response
v
Mean[v]
Deviation From
Mean[v]
Squared
Deviation
For Clint
1
___
______ = ___
___
Not for Clint
0
___
______ = ___
___
Var[v]
Σ
all v
=
2
(v − Mean[v]) Prob[v]
v=1
↓
Var[v]
=
_____×
=
_________
Prob[v]
1
2
1
2
For each possible value, multiply
the squared deviation and its
probability; then, add.
v=0
↓
+
_____×
+
________
= ____________
Opinion Poll Simulation – Sample Size of One: Checking Our Math
1
Actual Population Fraction = ActFrac = p = 2 = .50
Equations:
Simulation:
Mean of
Variance of
Mean (Average) of
Variance of
v’s
v’s
Numerical Values Numerical Values
Probability
Probability Simulation
of v from
of v from
Distribution Distribution Repetitions the Experiments
the Experiments
______
_____
__________
≈_____
≈_____
Conclusion: Our equations and simulation produce identical results. Again, this
illustrates how we can exploit the relative frequency interpretation of probability:
After many, many repetitions of the experiment, the distribution of the actual
numerical values mirrors the random variable’s probability distribution.
9
Generalization: Let
p = ActFrac = Actual fraction of the population supporting Clint
Consider the experiment: Write the name of
each individual in the population on a 3×5 card
Individual’s
Response
v
Prob[v]
For Clint
1
______
Not for Clint
0
______
____
For Clint
v
1
Prob
____
Not for Clint
0
____
Individual
____
Center of the Probability Distribution: Mean. The average of the numerical values
after many, many repetitions of the experiment.
After many, many repetitions of the experiment, v will equal
•
1, _____ of the time
•
0, _____ of the time
Mean[v] =
Mean[v]
Σ
all v
For each possible value, multiply the
value and its probability; then, add.
v Prob[v]
v=1
v=0
↓
↓
= ____×_____ + ____×_____
=
_________
+
________
= ____________
Spread of the Probability Distribution: Variance. The average of the squared
deviations of the numerical values from their mean after many, many repetitions of
the experiment:
• For each possible value, calculate the deviation from the mean;
• Square each value’s deviation;
• Multiply each value’s squared deviation by the value’s probability;
• Sum the products.
Individual’s
Response
v
Mean[v]
Deviation From
Mean[v]
Squared
Deviation
Prob[v]
For Clint
1
____
____
____
p
Not for Clint
0
____
____
____
1−p
Var[v]
Var[v]
=
=
Σ
all v
2
(v − Mean[v]) Prob[v]
v=1
↓
________×_____
+
For each possible value, multiply
the squared deviation and its
probability; then, add.
v=0
↓
________×_____
=
________________________________________
=
________________________________________
=
___________
10
Sample Size of Two
Questionnaire: Are you voting for Clint?
Experiment: Write the names of every individual in the population on a card
• In the first stage:
o Thoroughly shuffle the cards.
o Randomly draw one card.
o Ask that individual if he/she supports Clint and record the answer; this
yields a specific numerical value of v1 for the random variable. v1 equals
1 if the first individual polled supports Clint; 0 otherwise.
o Replace the card.
• In the second stage, the procedure is repeated:
o Thoroughly shuffle the cards.
o Randomly draw one card.
o Ask that individual if he/she supports Clint and record the answer; this
yields a specific numerical value of v2 for the random variable. v2 equals
1 if the second individual polled supports Clint; 0 otherwise.
o Replace the card.
• Calculate the fraction of those polled supporting Clint.
v1 + v2 1
= 2(v1 + v2)
2
The estimated fraction of the population supporting Clint is a random variable; that
is, EstFrac is a random variable. We cannot determine with certainty the numerical
value of the estimated fraction, EstFrac, before the experiment is conducted.
Fraction of Sample Supporting Clint, Estimated Fraction: EstFrac =
Question: What can we say about the random variable EstFrac beforehand?
Answer: We can describe its probability distribution.
Question: How can we describe the probability distribution?
Answer: Compute its center (mean) and spread (variance).
Center of the Estimated Fraction’s Probability Distribution: Mean.
1
Mean[EstFrac] = Mean[2(v1 + v2)]
What do we know?
Mean[v1] = Mean[v] = p
Mean[v2] = Mean[v] = p
Arithmetic of Means: Mean[cx] = cMean[x]
Mean[x + y] = Mean[x] + Mean[y]
1
Mean[cx] = cMean[x]
↓
Mean[2(v1 + v2)]
=
Mean[x + y] = Mean[x] + Mean[y]
↓
____________________
=
_______________________
=
______________________________________________
=
___________________ = _____
11
Spread of the Estimated Fraction’s Probability Distribution: Variance.
1
Var[EstFrac] = Var[2(v1 + v2)]
What do we know?
Var[v1] = Var[v] = p(1 − p)
Var[v2] = Var[v] = p(1 − p)
2
Arithmetic of Variances: Var[cx] = c Var[x]
Var[x + y] = Var[x] + 2Cov[x, y] + Var[y]
2
Var[cx] = c Var[x]
↓
1
Var[2(v1 + v2)]
Var[x + y] = Var[x] + 2Cov[x, y] + Var[y]
↓
=
______________
=
___________________________
=
______________
v1 and v2 are independent: Cov[v1, v2] = 0
=
________________________
= ________________________ = __________
Question: Why are v1 and v2 independent?
Answer:
• Since the card of the first name drawn is replaced, whether or not the first voter
polled supports Clint does not affect the probability that the second voter will
support Clint.
• In either case, the probability that the second voter will support Clint is p, the
actual population fraction.
• Consequently, knowing the value of v1 does not help us predict the value of v2.
More formally, the numerical value of v1 does not affect v2’s probability distribution
and vice versa. The random variables are independent. Hence, their covariance
equals 0.
Opinion Poll Simulation – Sample Size of Two: Checking Our Math
1
Actual Population Fraction = ActFrac = p = 2 = .50
Equations:
Mean of
Variance of
EstFrac’s
EstFrac’s
Sample Probability
Probability
Size
Distribution Distribution
2
_______
_____
Simulations:
Mean (Average) of
Variance of
Numerical Values
Numerical Values
Simulation
of EstFrac from
of EstFrac from
Repetitions the Experiments
the Experiments
________
≈_____
≈_____
Conclusion: Our equations and simulation produce identical results. Again, this
illustrates how we can exploit the relative frequency interpretation of probability: After
many, many repetitions of the experiment, the distribution of the actual numerical values
mirrors the random variable’s probability distribution.
12
Summary of Random Variables
Before the experiment is conducted
• Bad news. What we do not know: We cannot determine the numerical value of the
random variable with certainty.
• Good news. What we do know: On the other hand, we can often calculate the
random variable’s probability distribution telling us how likely it is for the random
variable to equal each of its possible numerical values.
Relative Frequency Interpretation of Probability: After many, many repetitions of the
experiment:
• The distribution of the numerical values from the experiments mirrors the random
variable’s probability distribution; the two distributions are identical.
Distribution of the Numerical Values
↓
After many, many repetitions
Probability Distribution
• The distribution mean and variance describe the general properties of the random
variable:
o The mean reflects the center of the distribution; more specifically, the mean
equals the average of the numerical values after many, many repetitions.
o The variance reflects the spread of the distribution.
Mean of the Numerical Values
Variance of Numerical Values
↓
After many, many repetitions
↓
Mean of Probability Distribution
Variance of Probability Distribution
for One Repetition
for One Repetition