Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Amherst College Department of Economics Economics 360 Fall 2012 Monday, September 10 Handout: Random Processes, Probability, Random Variables, and Probability Distributions Preview • Random Processes and Probability o Random Process: A process whose outcome cannot be predicted with certainty. o Probability: The likelihood of a particular outcome of a random process. • Random Variable: A variable that is associated with an outcome of a random process; a variable whose numerical value cannot be determined beforehand. o Discrete Random Variables and Probability Distributions Probability Distribution: Describes the probability for all possible values of a random variable. A Random Variable’s Bad News and Good News. Relative Frequency Interpretation of Probability: When a random process is repeated many, many times, the relative frequency of an outcome equals its probability. o Describing a Probability Distribution Center of the Distribution: Mean Spread of the Distribution: Variance o Continuous Random Variables and Probability Distributions • Estimation Procedures o Clint’s Dilemma: Assessing Clint’s Political Prospects o Center of an Estimate’s Probability Distribution: Mean o Spread of an Estimate’s Probability Distribution: Variance Random Processes and Probability Experiment: Random card draw from a deck composed of the 2♣, 3♥, 3♦, and 4♥. • Shuffle the 4 cards thoroughly. • Draw one card and record it. • Replace the card. Computing Probabilities There is ___ chance in ___ of drawing the 2♣; therefore, Prob[2♣] = ____. There is ___ chance in ___ of drawing the 3♥; therefore, Prob[3♥] = ____. There is ___ chance in ___ of drawing the 3♦; therefore, Prob[3♦] = ____. There is ___ chance in ___ of drawing the 4♥; therefore, Prob[4♥] = ____. Random Variable: A variable whose value _________ be predicted beforehand with certainty. • A discrete random variable can only take on a countable number of ____________ values. • A continuous random variable can take on a _____________________________ of values. 2 Discrete Random Variables and Probability Distributions An Example: Define the random variable v: v = “Value” of the selected card: 2, 3, or 4. Probability Distribution of Numerical Values Question: What do we know about v beforehand? Answer: While we cannot determine the value of v beforehand, we can calculate its probability distribution. Card Drawn v Prob[v] 2♣ 2 ____ = ____ 3♥ or 3♦ 3 _______= ____ = ____ 4♥ 4 ____ = ____ .50 .25 2 3 4 v NB: The probabilities must sum to ___. Why? A Random Variable’s Bad News and Good News: Beforehand, that is, before the experiment is conducted: • Bad News: We cannot determine the numerical value of the random variable with certainty. • Good News: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Card Draw Simulation: Illustrating the Relative Frequency Interpretation of Probability Default specification: 2♣, 3♥, 3♦, and 4♥. Repetitions > 1,000,000: 2 of Hearts Value Relative Frequency 2 of Diamonds 2 _______ 2 of Clubs 3 of Spades 3 _______ Cards selected to 3 of Hearts be in the deck 4 _______ 3 of Diamonds 3 of Clubs Question: How are probabilities and 4 of Spades Card drawn relative frequencies related? 4 of Hearts in this repetition Histogram of Numerical Values Start Repetitions .50 Value .25 Mean Var 2 3 4 v Stop Pause Value of card drawn in this repetition Mean (average) of the numerical values of the cards drawn from all repetitions Variance of the numerical values of the cards drawn from all repetitions Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable’s probability distribution. 3 Question: How can we describe the general properties of a random variable; that is, how can we describe the probability distribution of a random variable? • Center of its probability distribution: Mean • Spread of its probability distribution: Variance Center of the Probability Distribution: Mean (Expected Value) of the Random Variable – The average of the numerical values of v after many, many repetitions of the experiment. NB: The mean of a random variable is often called the expected value. After many, many repetitions v will be • 2 about a quarter of the time • 3 about a half of the time • 4 about a quarter of the time On average, the outcome, v, will be _____. More formally, Σ Mean[v] = all v For each possible value, multiply the value and its probability; then, add. v Prob[v] v=2 ↓ = v=3 ↓ _____× = _________ v=4 ↓ + _____× + _____× + ________ + __________ = ____________ Spread of the Probability Distribution: Variance of the Random Variable – The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: • For each possible value of the random variable, calculate the deviation from the mean; • Square the each value’s deviation; • Multiply each value’s squared deviation by the value’s probability; • Sum the products. Deviation From Squared Card Drawn v Mean[v] Mean[v] Deviation Prob[v] 2♣ 2 3 _______ _______ 3♥ or 3♦ 3 3 _______ _______ 4♥ 4 3 _______ _______ Var[v] = Σ 2 all v (v − Mean[v]) Prob[v] v=2 ↓ = = _____× _________ v=3 ↓ v=4 ↓ + _____× + _____× + ________ + __________ 1 = 4 1 = 2 1 = 4 .25 .50 .25 For each possible value, multiply the squared deviation and its probability; then, add. = ____________ NB: The distribution mean and variance are general properties of the random variable: • The mean represents the center of the random variable’s distribution. • The variance represents the spread of the random variable’s distribution. 4 Card Draw Simulation: Checking Our Math Default specification: The 2♣, 3♥, 3♦, and 4♥ are included in a deck of four cards. Repetitions > 1,000,000 Mean ______ Variance ______ After many, many repetitions of the experiment: • The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions of the experiment. • The variance reflects the spread of the distribution. NB: Value of Simulations: By exploiting the relative frequency interpretation of probability (after many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable’s probability distribution), we can use simulations to reveal the probability distribution. That is, simulations allow us to confirm our logic. Continuous Random Variables and Probability Distributions Eighteen Hole An Example: Dan Duffer • Good news: Dan Duffer consistently hits 200 yard drives from the tee. • Bad News: His drives can land up to 40 yards to the left and up to 40 yards to the right of his target point. • Suppose that Dan’s target point is the center of the fairway. • The fairway is 32 yards wide 200 yards from the tee. Fairway 32 yards Lake Target Left Rough 200 yards Let v equal the lateral distance from Dan’s target point. A negative v indicates that the drive went to the left; a positive v indicates that the drive went to the right. Right Rough A continuous random variable, unlike a discrete random variable, can take on a continuous range of values, a ______________ of values. Tee Probability Distribution .025 v is a ______________ random variable .020 What does v’s probability distribution suggest? .015 .010 What is the area beneath the probability distribution? Applying the equation for the area of a triangle: Area Beneath = ___________ + ___________ = ___________ + = ___________ What does this imply? ___________ .005 -40 -32 -24 -16 -8 0 8 16 24 32 40 v 5 Let us now calculate some probabilities: • What is the probability that Dan’s drive will land in the left rough? Prob[Drive in Left Rough] = Prob[v Less Than −16] = • = ______ What is the probability that Dan’s drive will land in the lake? Prob[Drive in Lake] = Prob[v Greater Than +16] = • ____________________ ____________________ = ______ What is the probability that Dan’s drive will land in the fairway? Prob[Drive in Fairway] = Prob[v Between −16 and +16] = ____________________ = ______ Prob[Drive in Left Rough] + Prob[Drive in Lake] + Prob[Drive in Fairway] = _____ + _____ + _____ = _____ What does this imply? Clint Ton’s Dilemma On the day before the election, Clint must decide whether or not to hold a pre-election party: • If he is comfortably ahead, he will not hold the party; he will save his campaign funds for a future political endeavor (or perhaps a vacation to the Caribbean next January). • If he is not comfortably ahead, he will fund a party to try to sway some voters. There is not enough time to poll every member of the student body, however. What should he do? Econometrician’s Philosophy: If you lack the information to determine the value directly, do the best you can by estimating the value using the information you do have. Clint’s Opinion Poll: Poll a sample of the population • Questionnaire: Are you voting for Clint? • Procedure: Clint selects 16 students at random and poses the question. • Results: 12 students report that they will vote for Clint and 4 against Clint. 12 3 Estimate Fraction of the Population Supporting Clint = 16 = 4 = .75 Clint wishes to use the information collected from the sample to draw inferences about the entire population. Seventy-five percent, .75, of those polled support Clint. This suggests that Clint leads, does it not? Clint’s Dilemma: Should Clint be confident that he has the election in hand or should he fund the party? 6 Polling Simulation: Learning More about Clint’s Polling Procedure Questionnaire: Are you voting for Clint? Terms ActFrac = Actual Fraction of the Population Supporting Clint EstFrac = Estimated Fraction of the Population Supporting Clint ActFrac Actual Population Fraction To decide how much confidence Clint should have, we shall learn a little more about the polling procedure. A simulation will help us. Sample Size Sample Size 10 16 25 50 .1 .2 .3 .4 .5 .6 .7 Start Repetition: Stop Pause Mean (average) of the numerical values of the sample fraction from all repetitions EstFrac In a simulation, we can do something that we cannot do in the real world. We can Numerical value Mean Variance of the specify the actual proportion of of the estimated numerical values of Var the population, ActFrac, and fraction in this the sample fraction then observe the estimated repetition from all repetitions fraction, EstFrac, when we conduct a poll. In this way, we can learn more about the polling procedure itself. To do so, suppose that the election is a tossup; that is, suppose that the actual population fraction supporting Clint, ActFrac, equals .5. Sample Size = 16 ActFrac = .50 Repetition Number Supporting Clint EstFrac 1 ______ ______ 2 ______ ______ 3 ______ ______ 4 ______ ______ 5 ______ ______ Observations: • The estimated fraction, EstFrac, is a random variable. Even if we knew the actual fraction supporting Clint, ActFrac, we could not predict EstFrac before the poll. • Only occasionally does the estimated fraction, EstFrac, in one repetition of the poll equal the actual population fraction. • When the election is actually a toss-up, it is entirely possible that 12 or even more of the 16 students polled will support Clint. 7 Populations and Samples: Estimates and Actual Values Question: How can sample information be used to draw inferences about the entire population? This is the question Clint must address. We begin with an unrealistic, but instructive, example. So, please be patient. Sample Size of One Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a 3x5 card, then • Thoroughly shuffle the cards. • Randomly draw one card. • Ask that individual if he/she supports Clint and record the answer. • Replace the card. The random variable v: v = 1 if the individual polled supports Clint. = 0 otherwise Question: Can we determine with certainty the numerical value of v before the experiment is conducted? ______. Hence, v is a ______________ variable. Question: What can we say about the random variable v beforehand? Answer: _______________________________________________. Question: How can we describe the probability distribution? Answer: _______________________________________________. For the moment, continue to assume that the population is split evenly; that is, suppose that half the population supports Clint and half does not: Individual’s Response v Prob[v] For Clint 1 ______ Not for Clint 0 ______ ____ For Clint v 1 Prob ____ Not for Clint 0 ____ Individual ____ Center of the Probability Distribution: Mean. The average of the numerical values after the many, many repetitions of the experiment. After the many, many repetitions of the experiment, v will equal • 1 about half of the time • 0 about half of the time On average, what will the numerical value of v equal? _____. Mean[v] = Σ all v v Prob[v] v=1 ↓ Mean[v] = = _____× _________ v=0 ↓ + _____× + ________ For each possible value, multiply the value and its probability; then, add. = ____________ 8 Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: • For each possible value, calculate the deviation from the mean; • Square each value’s deviation; • Multiply each value’s squared deviation by the value’s probability; • Sum the products. Individual’s Response v Mean[v] Deviation From Mean[v] Squared Deviation For Clint 1 ___ ______ = ___ ___ Not for Clint 0 ___ ______ = ___ ___ Var[v] Σ all v = 2 (v − Mean[v]) Prob[v] v=1 ↓ Var[v] = _____× = _________ Prob[v] 1 2 1 2 For each possible value, multiply the squared deviation and its probability; then, add. v=0 ↓ + _____× + ________ = ____________ Opinion Poll Simulation – Sample Size of One: Checking Our Math 1 Actual Population Fraction = ActFrac = p = 2 = .50 Equations: Simulation: Mean of Variance of Mean (Average) of Variance of v’s v’s Numerical Values Numerical Values Probability Probability Simulation of v from of v from Distribution Distribution Repetitions the Experiments the Experiments ______ _____ __________ ≈_____ ≈_____ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable’s probability distribution. 9 Generalization: Let p = ActFrac = Actual fraction of the population supporting Clint Consider the experiment: Write the name of each individual in the population on a 3×5 card Individual’s Response v Prob[v] For Clint 1 ______ Not for Clint 0 ______ ____ For Clint v 1 Prob ____ Not for Clint 0 ____ Individual ____ Center of the Probability Distribution: Mean. The average of the numerical values after many, many repetitions of the experiment. After many, many repetitions of the experiment, v will equal • 1, _____ of the time • 0, _____ of the time Mean[v] = Mean[v] Σ all v For each possible value, multiply the value and its probability; then, add. v Prob[v] v=1 v=0 ↓ ↓ = ____×_____ + ____×_____ = _________ + ________ = ____________ Spread of the Probability Distribution: Variance. The average of the squared deviations of the numerical values from their mean after many, many repetitions of the experiment: • For each possible value, calculate the deviation from the mean; • Square each value’s deviation; • Multiply each value’s squared deviation by the value’s probability; • Sum the products. Individual’s Response v Mean[v] Deviation From Mean[v] Squared Deviation Prob[v] For Clint 1 ____ ____ ____ p Not for Clint 0 ____ ____ ____ 1−p Var[v] Var[v] = = Σ all v 2 (v − Mean[v]) Prob[v] v=1 ↓ ________×_____ + For each possible value, multiply the squared deviation and its probability; then, add. v=0 ↓ ________×_____ = ________________________________________ = ________________________________________ = ___________ 10 Sample Size of Two Questionnaire: Are you voting for Clint? Experiment: Write the names of every individual in the population on a card • In the first stage: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v1 for the random variable. v1 equals 1 if the first individual polled supports Clint; 0 otherwise. o Replace the card. • In the second stage, the procedure is repeated: o Thoroughly shuffle the cards. o Randomly draw one card. o Ask that individual if he/she supports Clint and record the answer; this yields a specific numerical value of v2 for the random variable. v2 equals 1 if the second individual polled supports Clint; 0 otherwise. o Replace the card. • Calculate the fraction of those polled supporting Clint. v1 + v2 1 = 2(v1 + v2) 2 The estimated fraction of the population supporting Clint is a random variable; that is, EstFrac is a random variable. We cannot determine with certainty the numerical value of the estimated fraction, EstFrac, before the experiment is conducted. Fraction of Sample Supporting Clint, Estimated Fraction: EstFrac = Question: What can we say about the random variable EstFrac beforehand? Answer: We can describe its probability distribution. Question: How can we describe the probability distribution? Answer: Compute its center (mean) and spread (variance). Center of the Estimated Fraction’s Probability Distribution: Mean. 1 Mean[EstFrac] = Mean[2(v1 + v2)] What do we know? Mean[v1] = Mean[v] = p Mean[v2] = Mean[v] = p Arithmetic of Means: Mean[cx] = cMean[x] Mean[x + y] = Mean[x] + Mean[y] 1 Mean[cx] = cMean[x] ↓ Mean[2(v1 + v2)] = Mean[x + y] = Mean[x] + Mean[y] ↓ ____________________ = _______________________ = ______________________________________________ = ___________________ = _____ 11 Spread of the Estimated Fraction’s Probability Distribution: Variance. 1 Var[EstFrac] = Var[2(v1 + v2)] What do we know? Var[v1] = Var[v] = p(1 − p) Var[v2] = Var[v] = p(1 − p) 2 Arithmetic of Variances: Var[cx] = c Var[x] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] 2 Var[cx] = c Var[x] ↓ 1 Var[2(v1 + v2)] Var[x + y] = Var[x] + 2Cov[x, y] + Var[y] ↓ = ______________ = ___________________________ = ______________ v1 and v2 are independent: Cov[v1, v2] = 0 = ________________________ = ________________________ = __________ Question: Why are v1 and v2 independent? Answer: • Since the card of the first name drawn is replaced, whether or not the first voter polled supports Clint does not affect the probability that the second voter will support Clint. • In either case, the probability that the second voter will support Clint is p, the actual population fraction. • Consequently, knowing the value of v1 does not help us predict the value of v2. More formally, the numerical value of v1 does not affect v2’s probability distribution and vice versa. The random variables are independent. Hence, their covariance equals 0. Opinion Poll Simulation – Sample Size of Two: Checking Our Math 1 Actual Population Fraction = ActFrac = p = 2 = .50 Equations: Mean of Variance of EstFrac’s EstFrac’s Sample Probability Probability Size Distribution Distribution 2 _______ _____ Simulations: Mean (Average) of Variance of Numerical Values Numerical Values Simulation of EstFrac from of EstFrac from Repetitions the Experiments the Experiments ________ ≈_____ ≈_____ Conclusion: Our equations and simulation produce identical results. Again, this illustrates how we can exploit the relative frequency interpretation of probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the random variable’s probability distribution. 12 Summary of Random Variables Before the experiment is conducted • Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty. • Good news. What we do know: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values. Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment: • The distribution of the numerical values from the experiments mirrors the random variable’s probability distribution; the two distributions are identical. Distribution of the Numerical Values ↓ After many, many repetitions Probability Distribution • The distribution mean and variance describe the general properties of the random variable: o The mean reflects the center of the distribution; more specifically, the mean equals the average of the numerical values after many, many repetitions. o The variance reflects the spread of the distribution. Mean of the Numerical Values Variance of Numerical Values ↓ After many, many repetitions ↓ Mean of Probability Distribution Variance of Probability Distribution for One Repetition for One Repetition