Download Introduction to Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Introduction to Probability
Michael Ash
Lecture 2
Flexibility of the Program Evaluation Model
Follow-up Remarks
Discrimination can be mapped into the Program Evaluation
Model. Consider “being non-white” (or, alternatively, “being
white”) as the treatment and evaluate an outcome, e.g., hourly
wage, probability of receiving a home-mortgage loan, for treated
and non-treated.
Discrete and Continuous Variables
Continuous variables can be easily converted into discrete
variables. Consider
Continuous variable
Household Income between
0 and 12,000
12,001 and 25,000
25,001 and 50,000
50,001 and up
Income
Category
1
2
3
4
For better or for worse, this mapping throws away some
information: $12,001 is not the same as $25,000 but both incomes
are in category 2.
The reverse conversion is not possible.
Why study probability?
1. Understand and respond to a risky world (Public Policy and
Administration)
I
I
I
I
Social Insurance (Social Security, Medicare, Medicaid,
Unemployment Insurance)
Private Insurance (Employment-Based Health Insurance, Life
Insurance)
Natural disasters, e.g., administrator faces risk of snow storms
and costs of snow removal; floods, insurance, and permitting
Risky or uncertain costs, e.g., heating fuel
2. Application to statistical inference
I
Ability to discern chance outcomes from true program effects
Key concepts in probability
Some examples to motivate
I
Consider the computer crash example, Table 2.1 and Figure
2.1
I
Alternative example (using the same data): number of winter
storms that require overtime snow removal during one winter
Key Concepts
Outcomes the mutually exclusive results of a random process
Probability the proportion of the time that an outcome occurs,
in the long run (Each time, only one outcome can be
realized; therefore probability is never seen in an
outcome but only manifests itself over time or
instances)
Sample space the set of all possible outcomes
Event a subset of the sample space (one or more outcomes)
Random variable (R.V.) a numerical summary of a random
outcome
Discrete e.g., number of snow storms requiring overtime
Continuous e.g., the dollar value of overtime paid
Probability Distribution
Probability Distribution of a Discrete R.V.
I
Lists each of the possible outcomes and the probability that
each will occur
I
The probabilities must sum to one. (Some outcome must
occur.)
I
Notation: Pr(M = 0)
I
Graphical representation with a bar chart
Probability of an Event
The probability of an event is the sum of the probabilities of each
of the mutually exclusive outcomes. Examples with the R.V. M,
the number of computer crashes (snow storms).
I
Probability of not more than one failure
Pr(M = 0 or M = 1) = Pr(M = 0) + Pr(M = 1)
I
Probability of at least one failure
Pr(M = 1 or M = 2 or M = 3 or M = 4) =
Pr(M = 1) + Pr(M = 2) + Pr(M = 3) + Pr(M = 4)
Alternative approach?
“At least one failure” is the complement of “No failures”
Pr(at least one failure) = 1 − Pr(no failures) = 1 − Pr(M = 0)
Cumulative Probability Distribution (C.D.F.)
Defn : the probability that the random variable is less than or equal
to a particular value.
Example: the probability of at most one crash, Pr(M ≤ 1)
Probability Distribution of a Continuous R.V.
Example: Commuting Time. The commute “data”: imagine
collecting commuting time on 100 randomly chosen days (Figure
2.2)
Date
Commute Time
1
18.2
2
14.3
3
15.2
..
..
.
.
99
21.1
100
15.7
Cumulative Probability Distribution (C.D.F.)
Same Defn : the probability that the random variable is less than or
equal to a particular value.
Rank
1
2
3
.
.
.
19
20
21
.
.
.
77
78
79
.
.
.
99
100
Commute Time
12.3
12.3
12.4
.
.
.
14.8
15.0
15.1
.
.
.
19.9
20.0
21.7
.
.
.
31.2
33.1
Date
17
11
62
.
.
.
33
33
33
.
.
.
73
73
73
.
.
.
44
89
I None (0 percent) of the commutes
were as short as 10 minutes
I None (0 percent) of the commutes
were as short as 12 minutes
I 20 percent of the commutes were 15
minutes or shorter.
I 78 percent of the commutes were 20
minutes or shorter.
I All (100 percent) of the commutes
were shorter than 35 minutes.
The CDF plots this relationship.
Using the CDF, we can compute the probability that the commute
falls into any range of times, e.g., between 15 minutes and 20
minutes.
Pr(15 ≤ C ≤ 20) = Pr(C ≤ 20) − Pr(C ≤ 15) = .78 − .20
From Histogram to Probability Density Function (p.d.f.)
Histograms
I
Create equal-sized bins for different length commutes and put
a token in the appropriate bin for each commute. This would
generate a histogram of commute times.
I
It is extremely important to understand what a histogram is
and how to construct one.
PDF
A continuous version of the histogram (infinitely many, infinitely
thin bins) is the Probability Density Function (p.d.f.)
Heights on the CDF correspond to areas on the pdf.
Expected Value, Expectation, or Mean
The symbols for the expected value of Y are E (Y ) or µ Y (Greek
Letter mu)
Expected value The long-run average value of a random variable
over many repeated trials.
Weighted Average Each outcome is “weighted” by the probability
of that outcome. Low probability outcomes (low p i )
receive low weight.
Expected value examples
Compute the expected value for the computer crash (snow storms)
example.
E (M) =
k
X
i =1
yi × p i
= y1 × p1 + y2 × p2 + y3 × p3 + y4 × p4 + y5 × p5
= 0 × 0.80 + 1 × 0.10 + 2 × 0.06 + 3 × 0.03 + 4 × 0.01 = 0.35
Expected value examples, continued
The Bernoulli Distribution
Defn : A random variable that has a binary, that is “0 or 1” or “no
or yes” outcome.
1 with probability
p
G=
0 with probability 1 − p
Compute the expected value for a Bernoulli example: the
probability that a senior in the Hill Towns will use a home
assistance program offered by the Hill Towns Elder Network (HEN).
1 with probability 0.45
G=
0 with probability 0.55
To compute the expected value:
E (G ) =
k
X
i =1
yi · p i
= y1 × p1 + y2 × p2
= 0 × (1 − p) + 1 × p
= 0 × 0.55 + 1 × 0.45
= 0.45
Variance and Standard Deviation
Measuring Spread, or Dispersion
Variance. Variance is also mean: The expected value of the square
of the deviation of Y from its mean. (In some contexts called the
“mean square deviation.”)
σY2
≡ var(Y )
h
i
≡ E (Y − µY )2
≡
k
X
i =1
(Yi − µY )2 pi
≡ (Y1 − µY )2 p1 + (Y2 − µY )2 p2 + · · · + (Yk − µY )2 pk
Standard deviation is the square root of the variance
Main advantage: Standard deviation is measured in the the same
units, e.g., dollars, inches, computer crashes, snow storms, as Y
and µY . (Variance is measured in the square of these units, which
is hard to interpret.)
Example: variance and s.d. of computer crashes (snow
storms)
2
σM
≡
k
X
i =1
(Mi − µM )2 pi
≡ (M1 − µM )2 p1 + (M2 − µM )2 p2
+ (M3 − µM )2 p3 + · · · + (Mk − µM )2 pk
= (0 − 0.35) 2 × 0.80 + (1 − 0.35) 2 × 0.10 + (2 − 0.35) 2 × 0.06
+ (3 − 0.35) 2 × 0.03 + (4 − 0.35) 2 × 0.01
= 0.6475
σM
q
√
2 =
= σM
0.6475 ≈ 0.80
Variance and s.d. of a Bernoulli Distribution
σG2
≡
k
X
i =1
(Gi − µG )2 pi
≡ (G1 − µG )2 p1 + (G2 − µG )2 p2
≡ (0 − p)2 × (1 − p) + (1 − p)2 × p
= p(1 − p)
σG =
q
p
σG2 = p(1 − p)
Example: variance and s.d. of a Bernoulli Distribution
G is the Bernoulli-distributed HEN-use variable
q
p
√
√
σG = σG2 = p(1 − p) = 0.45 × 0.55 = 0.2475 ≈ 0.4975
Mean and Variance of Linear Functions of an R.V.
Linear Function of an R.V.
Y = a + bX
Examples
After-Tax Earnings
Y = 2000 + 0.8X
HEN
W = 10 + 800G
Principles
E (Y ) = E (a + bX ) = E (a) + E (bX ) = a + bE (X )
or equivalently
µY = a + bµX
var(Y ) = var(a + bX ) = E (a + bX − E (a + bX ))2
= E (a − E (a) + bX − E (bX ))2
= E b(X − E (X ))2
= b 2 E (X − E (X ))2
= b 2 var(X )
Examples
After-Tax Earnings
µY
= 2000 + 0.8µX
σY2
= (0.8)2 σX2 = 0.64σX2
After-Tax Earnings
µW
= 10 + 800µG
2
σW
= (800)2 σG2 = 64000 × 0.2475
√
=
64000 × 0.2475 = 551.36
σW
Exercise 2.4
The random variable Y has a mean of 1 and a variance of 4. Let
Z = 12 (Y − 1). Compute µZ and σZ2 .
Comments on the Expected Value
Advantages
I
Small samples quickly generate precise estimates
I
Easy to compute
I
Easy to make statistical inferences
I
Means are useful for budgeting and other aggregate outcomes
Critiques
I
The mean may never actually occur, i.e., 0.35 computer
crashes (snow storms) is not possible.
I
Susceptible to outliers, e.g., the average income in our class if
Bill Gates joined PubP&A 608.
I
Ignores interesting outcomes, e.g., bimodality in welfare
reform outcomes
I
Other measures of central tendency: the median and mode