Download Probability- the good parts version I. Random variables and their

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probability- the good parts version
I. Random variables and their distributions; continuous random variables.
A random variable (r.v) X is continuous if its distribution is given by a probability density
function (pdf) f (x) that is positive on an interval. For real numbers a < b,
Z b
P (a < X < b) =
f (x)dx.
a
Random variables X and Y are jointly continuous if there’s a joint density function f (x, y)
such that
Z bZ d
P (a < X < b, c < Y < d) =
f (x, y)dydx.
a
c
The marginal density for X is found by integrating out the y and vice-versa for Y . X and Y
are called independent if their joint density is the product of their marginals. In this case,
it follows that P (a < X < b, c < Y < d) = P (a < X < b)P (c < Y < d).
The cumulative distribution function (cdf) of X is the function F ,
Z x
F (x) = P (X ≤ x) =
f (t)dt.
−∞
Given a function g(), the expected value of g(X) =
Z ∞
E(g(X)) =
g(x)f (x)dx.
−∞
1
In particular, the mean of X= E(X)= µ; √
the variance of X= V (X)= σ 2 = E(X − µ)2 =
EX 2 − µ2 ; and the standard deviation σ = σ 2 .
The moment generating function of X is
Z
Xt
∞
MX (t) = E(e ) =
ext f (x)dx.
−∞
Properties of the moment generating function:
(n)
(1) MX (0) =
dn
MX (t),
dtn
evaluated at t=0= EX n .
(2) If X1 , X2 , ..., Xn are independent random variables then
MX1 +X2 +...+Xn (t) = MX1 (t) ∗ MX2 (t) ∗ · · · ∗ MXn (t).
(3) The m.g.f. specifies the distribution: if MX (t) = MY (t), then X and Y have the same
distribution.
(4) MaX+b (t) = ebt MX (at).
The standard class of continuous distributions.
(1) X ∼ N (µ, σ 2 )
density: f (x) =
2
2
√ 1 e−(x−µ) /2σ ,
2πσ
−∞ < x < ∞.
mean: E(X) = µ.
2
variance: V (X) = σ 2 .
mgf: M (t) = eµt+σ
2 t2 /2
.
If X ∼ N (µ, σ 2 ), then (X − µ)/σ ∼ N (0, 1).
2
If X ∼ N (µX , σX
) and Y ∼ N (µY , σY2 ) are independent, then aX + bY ∼ N (aµX +
2 2
2 2
bµY , a σX + b σY ).
(2) X ∼ Chi-Square with n degrees of freedom (X ∼ χ2 (n)) if X ∼ Gamma(n/2, 1/2).
mean: E(X) = n.
variance: V (X) = 2n.
1
)n/2 , t < 1/2.
mgf: M (t) = ( 1−2t
(3) X ∼ Exponential(λ).
density: f (x) = λe−λx , x ≥ 0.
mean: E(X) = 1/λ.
3
variance: V (X) = 1/λ2 .
λ
), t < λ.
mgf: M (t) = ( λ−t
P (X > x) = e−λx , for x > 0.
Note: There are 2 common conventions followed for the Exponential(λ): (i) λ is the parameter in the density as above and the mean is 1/λ, and (ii) λ is the mean and the density is
f (x) = (1/λ)e−x/λ ; the interpretation in a particular problem should be clear from context.
(4) X ∼ Gamma(α, λ).
density: f (x) =
λα α−1 −λx
x e ,x
Γ(α)
≥ 0.
mean: E(X) = α/λ.
variance: V (X) = α/λ2 .
λ α
) , t < λ.
mgf: M (t) = ( λ−t
N ote : Γ(α) =
R∞
0
tα−1 e−t dt. Γ(n) = (n − 1)! when n is a positive integer.
When X1 , X2 , . . . , Xn ∼ Exponential(λ) are independent, X1 +X2 +· · ·+Xn ∼ Gamma(n, λ).
In particular, Exponential(λ) = Gamma(1, λ).
4
(5) X ∼ U [a, b]
density: f (x) = 1/(b − a) for a < x < b.
mean: E(X) = (a + b)/2.
variance: V (X) = (b − a)2 /12.
mgf: M (t) =
ebt −eat
, −∞
(b−a)t
< t < ∞.
If [c,d] ⊂ [a,b], then P (c ≤ X ≤ d) is Length[c,d]/Length[a,b].
II. Random variables and their distributions; discrete random variables.
A discrete rv X takes on a finite or countable number of values. Probabilities are computed
using a frequency function p(k) = P (X = k); this is also called a probability density function
(pdf) or probability mass function.
P (a < X < b) =
X
k∈(a,b)
5
p(k).
Given a function g, the expected value of g(X) =
X
E(g(X)) =
g(k)p(k);
k
in particular, the moment generating function of X is
X
MX (t) = E(eXt ) =
ekt p(k).
k
The standard class of discrete distributions.
(1) X ∼ Bernoulli(p)
frequency function: p(1) = p, p(0) = q = 1 − p.
mean: E(X) = p.
variance: V (X) = pq.
mgf: M (t) = (q + pet ), −∞ < t < ∞.
(2) X ∼ Binomial(n, p)
n
frequency function: p(k) =
pk q n−k , k = 0, 1, . . . , n.
k
6
mean: E(X) = np.
variance: V (X) = npq.
mgf: M (t) = (q + pet )n , −∞ < t < ∞.
If X is the number of successes in n independent Bernoulli trials, then X ∼ Binomial(n, p).
(3) X ∼ Geometric(p). There are two definitions for the Geometric(p) distribution. (I) X
is the number of failures required to see the first success in a sequence of Bernoulli trials
and (II) X is the number of trials required to see the first success in a sequence of Bernoulli
trials. If X and Y represent have these respective distributions, then Y = X + 1. We give
results separately for the two definitions.
Case I.
frequency function: p(k) = pq k , k = 0, 1, . . ., where q = 1 − p.
mean: E(X) = q/p.
variance: V (X) = q/p2 .
mgf: M (t) =
p
, −∞
1−qet
< t < ln(1/q).
7
Case II.
frequency function: p(k) = pq k−1 , k = 1, 2, . . ., where q = 1 − p.
mean: E(X) = 1/p.
variance: V (X) = q/p2 .
mgf: M (t) =
pet
, −∞
1−qet
< t < ln(1/q).
(4) X ∼ Negative Binomial(r, p). Again, there are two definitions for the Negative Binomial(r, p)
distribution. (I) X is the number of failures before the rth success in a sequence of Bernoulli
trials and (II) Y is the number of trials required to see the rth success in a sequence of
Bernoulli trials. If X and Y have these respective distributions, then Y = X + r. We give
results separately for the two definitions.
Case I.
frequency function: pX (k) =
k+r−1
r−1
pr q k , k = 0, 1, . . ., where q = 1 − p.
mean: E(X) = rq/p.
variance: V (X) = rq/p2 .
8
p
r
mgf: M (t) = ( 1−qe
t ) , −∞ < t < ln(1/q).
Case II.
frequency function: pY (k) =
k−1
r−1
pr q k−r , k = r, r + 1, . . ., where q = 1 − p.
mean: E(X) = r/p.
variance: V (X) = rq/p2 .
t
pe
r
mgf: M (t) = ( 1−qe
t ) , −∞ < t < ln(1/q).
(5) X ∼ Poisson(λ)
frequency function: p(k) =
e−λ λk
,k
k!
= 0, 1, . . . .
mean: E(X) = λ.
variance: V (X) = λ.
t
mgf: M (t) = eλ(e −1) , −∞ < t < ∞.
9
III. General Properties.
E(X)
E(aX + bY ) = aE(X) + bE(Y ) for any random variables X and Y .
E(XY ) = E(X)E(Y ) if X and Y are independent.
V (X) and Cov(X, Y )
V (X) = E[(X − µ)2 ] = E(X 2 ) − µ2 .
V (aX + b) = a2 V (X).
V (X + Y ) = V (X) + V (Y ) + 2Cov(X, Y ) for any random variables X and Y .
V (X + Y ) = V (X) + V (Y ) if X and Y are independent.
Cov(X, Y ) = E[(X − µX )(Y − µY )] = E(XY ) − E(X)E(Y ).
Cov(X, Y ) = 0 if X and Y are independent.
10
Cov[X, X) = V (X).
Cov(aX, bY ) = abCov(X, Y ).
Cov(X + Y, U + V ) = Cov(X, U ) + Cov(X, V ) + Cov(Y, U ) + Cov(Y, V ).
Sampling
If {Xi } are n independent,
P identically distributed random variables with E(Xi ) = µ and
V (Xi ) = σ 2 and X̄ = n1 ni=1 Xi is the sample mean, then:
E(X̄) = µ and V (X̄) = σ 2 /n.
Central limit theorem
If {Xi } are n independent, identically distributed random variables with E(Xi ) = µ and
V (Xi ) = σ 2 , then
Pn
i=1
Xi approximately ∼ N (nµ, nσ 2 ) as n → ∞.
X̄ approximately ∼ N (µ, σ 2 /n) as n → ∞.
Joint and conditional distributions
11
If X and Y have joint pdf fX,Y (x, y), then
fX (x) =
R∞
fY (y) =
R∞
−∞
−∞
fX,Y (x, y)dy or
P
all y
fX,Y (x, y);
fX,Y (x, y)dx or
P
all x
fX,Y (x, y);
fX|Y =y (x) = fX,Y (x, y)/fY (y);
fY |X=x (y) = fX,Y (x, y)/fX (x);
fX,Y (x, y) = fX (x)fY (y) if and only if X and Y are independent.
Xmax , Xmin
If {Xi } are n independent, identically distributed random variables with pdf fX (x), then
fXmax (x) = nfX (x)[FX (x)]n−1
fXmin (x) = nfX (x)[1 − FX (x)]n−1
12
Related documents