Download 5.0 Lesson Plan

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
5.0 Lesson Plan
• Answer Questions
1
• Expectation and Variance
• Densities and Cumulative Distribution Functions
• The Exponential Distribution
• The Normal Approximation to the Binomial
5.1 Expectation and Variance
The expected value of a random variable is the average value of an
infinite number of draws. It is often denoted by µ. Or, in math,
2
IE[X] ≡ µ ≡
X
all possible
xp(x).
x
This is a weighted average of the possible values of the random variable,
where the weights are the probabilities that a value will occur.
For example, the expected value of a fair die is 3.5.
The expected value of a function of a random variable, say h(X), is the
long-run average value of repeated evaluations of that function. Or
IE[h(X)] =
X
all possible
h(x)p(x).
x
For example, the expected value of X 2 for a fair die is
3
6
X
i=1
i2 ∗
1
= 15.1667
6
Some functions are especially interesting. For example, the linear
function aX + b has the useful property that
IE[aX + b] = aIE[X] + b.
The variance of a random variable is the expected value of
h(X) = (X − µ)2 where µ is the expected value of X.
The variance is often written as σ 2 , and its units are the squares of the
original units. It provides a measure of how spread out the sample is,
since it is the average squared distance of an observation from µ. A little
algebra shows:
4
Var[X] = IE[(X − µ)2 ]
X
(x − µ)2 p(x)
=
all
=
x
X
all
x
(x2 − 2xµ + µ2 )p(x)
= IE[X 2 ] − 2µIE[X] + µ2
= IE[X 2 ] − µ2
To measure how spread out a distribution is, we mostly use the
standard deviation (or σ). This is the square root of the variance.
The variance of the result of a roll of a fair die is IE[X 2 ] − IE[X]2 =
√
2
15.1667 − (3.5) = 2.9167. Its standard deviation is 2.9167 = 1.7078.
5
One can show that the mean, variance and standard deviation of the
p
Bin(n, p) distribution are np, np(1 − p) and np(1 − p), respectively.
For the Pois(λ) distribution, the mean is λ, the variance is λ, and the
√
standard deviation is λ.
We won’t use this, but for the hypergeometric distribution, the mean is
nM/N and the variance is
M
N − n nM
∗
∗ 1−
.
σ2 =
N −1
N
N
5.2 Densities and CDFs
In the previous lecture, we described discrete probability distributions,
such as the binomial, Poisson, and hypergeometric. Discrete distributions
have positive probability on finite sets (e.g., the binomial) or countably
infinite sets (e.g., the Poisson).
6
Note: A countably infinite set is one whose elements can be placed in
one-to-one correspondence with the integers. The even numbers are
countably infinite, but the set of numbers between 0 and 1 is uncountably
infinite.
Note: Mixed distributions combine features of both discrete and
continuous distributions. For example, the lifetime of a possibly defective
light bulb.
For continuous distributions, specific values have probability zero. One
can only assign probabilities to intervals.
For example, there is a positive probability of tossing exactly five heads
in 10 flips. But there is zero probability of finding someone who is
exactly five feet tall, when height is measured to an infinite number of
decimal places.
7
To define probabilities for intervals, we use a density function. A
density function is any function f (x) such that
• f (x) is non-negative
• f (x) integrates to 1.
Then
IP[a ≤ X ≤ b] =
Z
b
f (x) dx.
a
This definition of the density function ensures the Kolmogorov
axioms: (1) all probabilities are between 0 and 1, inclusive, (2)
IP[−∞ < X < ∞] = 1, and (3) if A and B are disjoint intervals, then
IP[X ∈ A or X ∈ B] = IP[X ∈ A] + IP[X ∈ B].
8
It turns out to be useful to define the cumulative distribution
function (or cdf) as
Z x
F (x) = IP[X ≤ x] =
f (y) dy.
−∞
Then
IP[a ≤ X ≤ b] =
Z
a
b
f (x) dx = F (b) − F (a).
The expected value of a continuous random variable is
Z ∞
IE[X] ≡ µ ≡
xf (x) dx.
−∞
Clearly, this is similar to the definition of expected value for a discrete
random variable:
X
xp(x).
IE[X] ≡ µ ≡
all
x
9
Analogously, the expected value of a function h(X) of a continuous
random variable is
Z ∞
IE[h(X)] ≡
h(x)f (x) dx.
−∞
As before, IE[aX + b] = aIE[X] + b. And Var [X] = IE[X 2 ] − (IE[X])2 .
5.3 The Exponential Distribution
The exponential distribution is used to model the “memoryless” processes
(among many other things). Arguably accurate approximations include
10
• the wait-time between successive phone calls;
• the lifespan of a vacuum tube;
• the distance on a chromosome between mutations;
• monthly maximum rainfall.
The exponential distribution is ubiquitous in queueing theory, where it is
used to describe the time required to service a single customer.
The density function for an exponential random variable with parameter
λ is:


0 x<0
f (x) =
 λ exp(−λx) x ≥ 0
11
The cdf is
F (x) =


0 x<0
 1 − exp(−λx) x ≥ 0
The parameter λ is called the rate. The mean of an exponential
distribution is 1/λ and the standard deviation is also 1/λ.
Integration by parts enables one to find the mean of the exponential
distribution. Recall that
Z b
Z b
u(x) v ′ (x) dx = u(x)v(x) |ba −
v(x)u′ (x) dx.
a
a
For
12
IE[X] =
Z
∞
xf (x) dx =
0
Z
∞
xλ exp(−λx) dx
0
set u(x) = x and v ′ (x) = λ exp(−λx). Then
Z
IE[X] = −x exp(−λx)|∞
0 +
= −(1/λ) exp(−λx)|∞
0
= 1/λ.
∞
exp(−λx) dx
0
Example: On average, a manufacturing plant experiences 5.4 shutdowns
per year (due to power failure, mechanical breakdown, hurricane, or
other incident). What is the probability that there are no shutdowns in
the first quarter?
The yearly rate is λ = 5.4. We want to find the probability that X > 1/4,
where years are the time unit.
IP[ first shutdown is after March ] = IP[X > 1/4]
13
= 1 − F (1/4)
= 1 − (1 − exp(−5.4 ∗ 0.25))
= exp(−1.35)
= 0.2592.
We could have said that the average time between shutdowns is 1/5.4 =
0.185 years (or 67.59 days). This gives the mean of the X values, which
is 1/λ.
5.4 Normal Approximation to the Binomial
A perfect normal distribution describes data that can take any possible
value—negatives, fractions, irrationals, etc. But often data can only take
non-negative integer values.
14
In a class of ten students, each tosses a fair coin to decide whether to
attend class. So class attendance is a random variable that has the
Bin(10, 0.5) distribution. Its mean is np = 5 and the standard standard
p
deviation is n ∗ p ∗ (1 − p) = 1.581.
We can use the normal distribution to estimate the approximate
probability that, say, 3 or fewer students will attend tomorrow’s lecture.
But because only integers are possible, we can improve the accuracy of
the normal approximation by using the continuity correction.
We approximate the binomial by a normal distribution with the same
mean and standard deviation.
15
The bad approximation uses the z-transformation z = (3 − 5)/1.581 =
−1.265, and finds the area under the N(0,1) curve that lies below -1.265
as 0.1020.
The good way handles the area between 3 and 4 appropriately, to
take account of the fact that the histogram bar is centered at 3 and
we want to include the area up to 3.5 We use the z-transformation
z = (3.5 − 5)/1.581 = −0.949, and find the probability as 0.1711.
16
The normal approximation to the binomial is helpful when n is very
large. For example, suppose we wanted to find the probability that more
than 20,000 of the 228,330 residents of Durham are unemployed, when
the unemployment rate in NC is 10.1%. To use the binomial, we would
have to calculate
!
20,000
X
228, 330
(0.101)x (1 − 0.101)228,300−x .
x=0
x
This is intractable, but the normal approximation is not.
The normal approximation is accurate when np > 10 and n(1 − p) > 10.
So our toy example with student attendance was bogus, but it gave a
histogram that was small enough that we could visualize the role of the
continuity correction.
17
By the way, how do we know that the mean of the binomial is np?
X
xp(x)
IE[X] =
all
=
x
n
X
x
x
x=0
=
=
n
X
x=0
n
X
x=0
n
x
!
px (1 − p)n−x
n!
px (1 − p)n−x
x!(n − x)!
n!
px (1 − p)n−x .
(x − 1)!(n − x)!
Make the change of variable y = x − 1 to get
IE[X] = np
n−1
X
y=0
= np
n−1
X
y=0
(n − 1)!
py (1 − p)n−1−y
y!(n − 1 − y)!
!
n−1
y
py (1 − p)n−1−y .
18
We recognize the summation as the sum of binomial probabilities for a
Bin(n − 1, p) distribution. The sum of all possible probabilities is always
1, so the mean is np.
A similar argument works for IE[X 2 ], from which one can verify that the
variance is np(1 − p). (Recall that the variance is IE[X 2 ] − (IE[X])2 .)
Note: One can also use the normal to approximate the Poisson.