Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability- the good parts version I. Random variables and their distributions; continuous random variables. A random variable (r.v) X is continuous if its distribution is given by a probability density function (pdf) f (x) that is positive on an interval. For real numbers a < b, Z b P (a < X < b) = f (x)dx. a Random variables X and Y are jointly continuous if there’s a joint density function f (x, y) such that Z bZ d P (a < X < b, c < Y < d) = f (x, y)dydx. a c The marginal density for X is found by integrating out the y and vice-versa for Y . X and Y are called independent if their joint density is the product of their marginals. In this case, it follows that P (a < X < b, c < Y < d) = P (a < X < b)P (c < Y < d). The cumulative distribution function (cdf) of X is the function F , Z x F (x) = P (X ≤ x) = f (t)dt. −∞ Given a function g(), the expected value of g(X) = Z ∞ E(g(X)) = g(x)f (x)dx. −∞ 1 In particular, the mean of X= E(X)= µ; √ the variance of X= V (X)= σ 2 = E(X − µ)2 = EX 2 − µ2 ; and the standard deviation σ = σ 2 . The moment generating function of X is Z Xt ∞ MX (t) = E(e ) = ext f (x)dx. −∞ Properties of the moment generating function: (n) (1) MX (0) = dn MX (t), dtn evaluated at t=0= EX n . (2) If X1 , X2 , ..., Xn are independent random variables then MX1 +X2 +...+Xn (t) = MX1 (t) ∗ MX2 (t) ∗ · · · ∗ MXn (t). (3) The m.g.f. specifies the distribution: if MX (t) = MY (t), then X and Y have the same distribution. (4) MaX+b (t) = ebt MX (at). The standard class of continuous distributions. (1) X ∼ N (µ, σ 2 ) density: f (x) = 2 2 √ 1 e−(x−µ) /2σ , 2πσ −∞ < x < ∞. mean: E(X) = µ. 2 variance: V (X) = σ 2 . mgf: M (t) = eµt+σ 2 t2 /2 . If X ∼ N (µ, σ 2 ), then (X − µ)/σ ∼ N (0, 1). 2 If X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) are independent, then aX + bY ∼ N (aµX + 2 2 2 2 bµY , a σX + b σY ). (2) X ∼ Chi-Square with n degrees of freedom (X ∼ χ2 (n)) if X ∼ Gamma(n/2, 1/2). mean: E(X) = n. variance: V (X) = 2n. 1 )n/2 , t < 1/2. mgf: M (t) = ( 1−2t (3) X ∼ Exponential(λ). density: f (x) = λe−λx , x ≥ 0. mean: E(X) = 1/λ. 3 variance: V (X) = 1/λ2 . λ ), t < λ. mgf: M (t) = ( λ−t P (X > x) = e−λx , for x > 0. Note: There are 2 common conventions followed for the Exponential(λ): (i) λ is the parameter in the density as above and the mean is 1/λ, and (ii) λ is the mean and the density is f (x) = (1/λ)e−x/λ ; the interpretation in a particular problem should be clear from context. (4) X ∼ Gamma(α, λ). density: f (x) = λα α−1 −λx x e ,x Γ(α) ≥ 0. mean: E(X) = α/λ. variance: V (X) = α/λ2 . λ α ) , t < λ. mgf: M (t) = ( λ−t N ote : Γ(α) = R∞ 0 tα−1 e−t dt. Γ(n) = (n − 1)! when n is a positive integer. When X1 , X2 , . . . , Xn ∼ Exponential(λ) are independent, X1 +X2 +· · ·+Xn ∼ Gamma(n, λ). In particular, Exponential(λ) = Gamma(1, λ). 4 (5) X ∼ U [a, b] density: f (x) = 1/(b − a) for a < x < b. mean: E(X) = (a + b)/2. variance: V (X) = (b − a)2 /12. mgf: M (t) = ebt −eat , −∞ (b−a)t < t < ∞. If [c,d] ⊂ [a,b], then P (c ≤ X ≤ d) is Length[c,d]/Length[a,b]. II. Random variables and their distributions; discrete random variables. A discrete rv X takes on a finite or countable number of values. Probabilities are computed using a frequency function p(k) = P (X = k); this is also called a probability density function (pdf) or probability mass function. P (a < X < b) = X k∈(a,b) 5 p(k). Given a function g, the expected value of g(X) = X E(g(X)) = g(k)p(k); k in particular, the moment generating function of X is X MX (t) = E(eXt ) = ekt p(k). k The standard class of discrete distributions. (1) X ∼ Bernoulli(p) frequency function: p(1) = p, p(0) = q = 1 − p. mean: E(X) = p. variance: V (X) = pq. mgf: M (t) = (q + pet ), −∞ < t < ∞. (2) X ∼ Binomial(n, p) n frequency function: p(k) = pk q n−k , k = 0, 1, . . . , n. k 6 mean: E(X) = np. variance: V (X) = npq. mgf: M (t) = (q + pet )n , −∞ < t < ∞. If X is the number of successes in n independent Bernoulli trials, then X ∼ Binomial(n, p). (3) X ∼ Geometric(p). There are two definitions for the Geometric(p) distribution. (I) X is the number of failures required to see the first success in a sequence of Bernoulli trials and (II) X is the number of trials required to see the first success in a sequence of Bernoulli trials. If X and Y represent have these respective distributions, then Y = X + 1. We give results separately for the two definitions. Case I. frequency function: p(k) = pq k , k = 0, 1, . . ., where q = 1 − p. mean: E(X) = q/p. variance: V (X) = q/p2 . mgf: M (t) = p , −∞ 1−qet < t < ln(1/q). 7 Case II. frequency function: p(k) = pq k−1 , k = 1, 2, . . ., where q = 1 − p. mean: E(X) = 1/p. variance: V (X) = q/p2 . mgf: M (t) = pet , −∞ 1−qet < t < ln(1/q). (4) X ∼ Negative Binomial(r, p). Again, there are two definitions for the Negative Binomial(r, p) distribution. (I) X is the number of failures before the rth success in a sequence of Bernoulli trials and (II) Y is the number of trials required to see the rth success in a sequence of Bernoulli trials. If X and Y have these respective distributions, then Y = X + r. We give results separately for the two definitions. Case I. frequency function: pX (k) = k+r−1 r−1 pr q k , k = 0, 1, . . ., where q = 1 − p. mean: E(X) = rq/p. variance: V (X) = rq/p2 . 8 p r mgf: M (t) = ( 1−qe t ) , −∞ < t < ln(1/q). Case II. frequency function: pY (k) = k−1 r−1 pr q k−r , k = r, r + 1, . . ., where q = 1 − p. mean: E(X) = r/p. variance: V (X) = rq/p2 . t pe r mgf: M (t) = ( 1−qe t ) , −∞ < t < ln(1/q). (5) X ∼ Poisson(λ) frequency function: p(k) = e−λ λk ,k k! = 0, 1, . . . . mean: E(X) = λ. variance: V (X) = λ. t mgf: M (t) = eλ(e −1) , −∞ < t < ∞. 9 III. General Properties. E(X) E(aX + bY ) = aE(X) + bE(Y ) for any random variables X and Y . E(XY ) = E(X)E(Y ) if X and Y are independent. V (X) and Cov(X, Y ) V (X) = E[(X − µ)2 ] = E(X 2 ) − µ2 . V (aX + b) = a2 V (X). V (X + Y ) = V (X) + V (Y ) + 2Cov(X, Y ) for any random variables X and Y . V (X + Y ) = V (X) + V (Y ) if X and Y are independent. Cov(X, Y ) = E[(X − µX )(Y − µY )] = E(XY ) − E(X)E(Y ). Cov(X, Y ) = 0 if X and Y are independent. 10 Cov[X, X) = V (X). Cov(aX, bY ) = abCov(X, Y ). Cov(X + Y, U + V ) = Cov(X, U ) + Cov(X, V ) + Cov(Y, U ) + Cov(Y, V ). Sampling If {Xi } are n independent, P identically distributed random variables with E(Xi ) = µ and V (Xi ) = σ 2 and X̄ = n1 ni=1 Xi is the sample mean, then: E(X̄) = µ and V (X̄) = σ 2 /n. Central limit theorem If {Xi } are n independent, identically distributed random variables with E(Xi ) = µ and V (Xi ) = σ 2 , then Pn i=1 Xi approximately ∼ N (nµ, nσ 2 ) as n → ∞. X̄ approximately ∼ N (µ, σ 2 /n) as n → ∞. Joint and conditional distributions 11 If X and Y have joint pdf fX,Y (x, y), then fX (x) = R∞ fY (y) = R∞ −∞ −∞ fX,Y (x, y)dy or P all y fX,Y (x, y); fX,Y (x, y)dx or P all x fX,Y (x, y); fX|Y =y (x) = fX,Y (x, y)/fY (y); fY |X=x (y) = fX,Y (x, y)/fX (x); fX,Y (x, y) = fX (x)fY (y) if and only if X and Y are independent. Xmax , Xmin If {Xi } are n independent, identically distributed random variables with pdf fX (x), then fXmax (x) = nfX (x)[FX (x)]n−1 fXmin (x) = nfX (x)[1 − FX (x)]n−1 12