Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Probability Michael Ash Lecture 2 Flexibility of the Program Evaluation Model Follow-up Remarks Discrimination can be mapped into the Program Evaluation Model. Consider “being non-white” (or, alternatively, “being white”) as the treatment and evaluate an outcome, e.g., hourly wage, probability of receiving a home-mortgage loan, for treated and non-treated. Discrete and Continuous Variables Continuous variables can be easily converted into discrete variables. Consider Continuous variable Household Income between 0 and 12,000 12,001 and 25,000 25,001 and 50,000 50,001 and up Income Category 1 2 3 4 For better or for worse, this mapping throws away some information: $12,001 is not the same as $25,000 but both incomes are in category 2. The reverse conversion is not possible. Why study probability? 1. Understand and respond to a risky world (Public Policy and Administration) I I I I Social Insurance (Social Security, Medicare, Medicaid, Unemployment Insurance) Private Insurance (Employment-Based Health Insurance, Life Insurance) Natural disasters, e.g., administrator faces risk of snow storms and costs of snow removal; floods, insurance, and permitting Risky or uncertain costs, e.g., heating fuel 2. Application to statistical inference I Ability to discern chance outcomes from true program effects Key concepts in probability Some examples to motivate I Consider the computer crash example, Table 2.1 and Figure 2.1 I Alternative example (using the same data): number of winter storms that require overtime snow removal during one winter Key Concepts Outcomes the mutually exclusive results of a random process Probability the proportion of the time that an outcome occurs, in the long run (Each time, only one outcome can be realized; therefore probability is never seen in an outcome but only manifests itself over time or instances) Sample space the set of all possible outcomes Event a subset of the sample space (one or more outcomes) Random variable (R.V.) a numerical summary of a random outcome Discrete e.g., number of snow storms requiring overtime Continuous e.g., the dollar value of overtime paid Probability Distribution Probability Distribution of a Discrete R.V. I Lists each of the possible outcomes and the probability that each will occur I The probabilities must sum to one. (Some outcome must occur.) I Notation: Pr(M = 0) I Graphical representation with a bar chart Probability of an Event The probability of an event is the sum of the probabilities of each of the mutually exclusive outcomes. Examples with the R.V. M, the number of computer crashes (snow storms). I Probability of not more than one failure Pr(M = 0 or M = 1) = Pr(M = 0) + Pr(M = 1) I Probability of at least one failure Pr(M = 1 or M = 2 or M = 3 or M = 4) = Pr(M = 1) + Pr(M = 2) + Pr(M = 3) + Pr(M = 4) Alternative approach? “At least one failure” is the complement of “No failures” Pr(at least one failure) = 1 − Pr(no failures) = 1 − Pr(M = 0) Cumulative Probability Distribution (C.D.F.) Defn : the probability that the random variable is less than or equal to a particular value. Example: the probability of at most one crash, Pr(M ≤ 1) Probability Distribution of a Continuous R.V. Example: Commuting Time. The commute “data”: imagine collecting commuting time on 100 randomly chosen days (Figure 2.2) Date Commute Time 1 18.2 2 14.3 3 15.2 .. .. . . 99 21.1 100 15.7 Cumulative Probability Distribution (C.D.F.) Same Defn : the probability that the random variable is less than or equal to a particular value. Rank 1 2 3 . . . 19 20 21 . . . 77 78 79 . . . 99 100 Commute Time 12.3 12.3 12.4 . . . 14.8 15.0 15.1 . . . 19.9 20.0 21.7 . . . 31.2 33.1 Date 17 11 62 . . . 33 33 33 . . . 73 73 73 . . . 44 89 I None (0 percent) of the commutes were as short as 10 minutes I None (0 percent) of the commutes were as short as 12 minutes I 20 percent of the commutes were 15 minutes or shorter. I 78 percent of the commutes were 20 minutes or shorter. I All (100 percent) of the commutes were shorter than 35 minutes. The CDF plots this relationship. Using the CDF, we can compute the probability that the commute falls into any range of times, e.g., between 15 minutes and 20 minutes. Pr(15 ≤ C ≤ 20) = Pr(C ≤ 20) − Pr(C ≤ 15) = .78 − .20 From Histogram to Probability Density Function (p.d.f.) Histograms I Create equal-sized bins for different length commutes and put a token in the appropriate bin for each commute. This would generate a histogram of commute times. I It is extremely important to understand what a histogram is and how to construct one. PDF A continuous version of the histogram (infinitely many, infinitely thin bins) is the Probability Density Function (p.d.f.) Heights on the CDF correspond to areas on the pdf. Expected Value, Expectation, or Mean The symbols for the expected value of Y are E (Y ) or µ Y (Greek Letter mu) Expected value The long-run average value of a random variable over many repeated trials. Weighted Average Each outcome is “weighted” by the probability of that outcome. Low probability outcomes (low p i ) receive low weight. Expected value examples Compute the expected value for the computer crash (snow storms) example. E (M) = k X i =1 yi × p i = y1 × p1 + y2 × p2 + y3 × p3 + y4 × p4 + y5 × p5 = 0 × 0.80 + 1 × 0.10 + 2 × 0.06 + 3 × 0.03 + 4 × 0.01 = 0.35 Expected value examples, continued The Bernoulli Distribution Defn : A random variable that has a binary, that is “0 or 1” or “no or yes” outcome. 1 with probability p G= 0 with probability 1 − p Compute the expected value for a Bernoulli example: the probability that a senior in the Hill Towns will use a home assistance program offered by the Hill Towns Elder Network (HEN). 1 with probability 0.45 G= 0 with probability 0.55 To compute the expected value: E (G ) = k X i =1 yi · p i = y1 × p1 + y2 × p2 = 0 × (1 − p) + 1 × p = 0 × 0.55 + 1 × 0.45 = 0.45 Variance and Standard Deviation Measuring Spread, or Dispersion Variance. Variance is also mean: The expected value of the square of the deviation of Y from its mean. (In some contexts called the “mean square deviation.”) σY2 ≡ var(Y ) h i ≡ E (Y − µY )2 ≡ k X i =1 (Yi − µY )2 pi ≡ (Y1 − µY )2 p1 + (Y2 − µY )2 p2 + · · · + (Yk − µY )2 pk Standard deviation is the square root of the variance Main advantage: Standard deviation is measured in the the same units, e.g., dollars, inches, computer crashes, snow storms, as Y and µY . (Variance is measured in the square of these units, which is hard to interpret.) Example: variance and s.d. of computer crashes (snow storms) 2 σM ≡ k X i =1 (Mi − µM )2 pi ≡ (M1 − µM )2 p1 + (M2 − µM )2 p2 + (M3 − µM )2 p3 + · · · + (Mk − µM )2 pk = (0 − 0.35) 2 × 0.80 + (1 − 0.35) 2 × 0.10 + (2 − 0.35) 2 × 0.06 + (3 − 0.35) 2 × 0.03 + (4 − 0.35) 2 × 0.01 = 0.6475 σM q √ 2 = = σM 0.6475 ≈ 0.80 Variance and s.d. of a Bernoulli Distribution σG2 ≡ k X i =1 (Gi − µG )2 pi ≡ (G1 − µG )2 p1 + (G2 − µG )2 p2 ≡ (0 − p)2 × (1 − p) + (1 − p)2 × p = p(1 − p) σG = q p σG2 = p(1 − p) Example: variance and s.d. of a Bernoulli Distribution G is the Bernoulli-distributed HEN-use variable q p √ √ σG = σG2 = p(1 − p) = 0.45 × 0.55 = 0.2475 ≈ 0.4975 Mean and Variance of Linear Functions of an R.V. Linear Function of an R.V. Y = a + bX Examples After-Tax Earnings Y = 2000 + 0.8X HEN W = 10 + 800G Principles E (Y ) = E (a + bX ) = E (a) + E (bX ) = a + bE (X ) or equivalently µY = a + bµX var(Y ) = var(a + bX ) = E (a + bX − E (a + bX ))2 = E (a − E (a) + bX − E (bX ))2 = E b(X − E (X ))2 = b 2 E (X − E (X ))2 = b 2 var(X ) Examples After-Tax Earnings µY = 2000 + 0.8µX σY2 = (0.8)2 σX2 = 0.64σX2 After-Tax Earnings µW = 10 + 800µG 2 σW = (800)2 σG2 = 64000 × 0.2475 √ = 64000 × 0.2475 = 551.36 σW Exercise 2.4 The random variable Y has a mean of 1 and a variance of 4. Let Z = 12 (Y − 1). Compute µZ and σZ2 . Comments on the Expected Value Advantages I Small samples quickly generate precise estimates I Easy to compute I Easy to make statistical inferences I Means are useful for budgeting and other aggregate outcomes Critiques I The mean may never actually occur, i.e., 0.35 computer crashes (snow storms) is not possible. I Susceptible to outliers, e.g., the average income in our class if Bill Gates joined PubP&A 608. I Ignores interesting outcomes, e.g., bimodality in welfare reform outcomes I Other measures of central tendency: the median and mode