Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summary of Common Distributions Stat 305 Spring Semester 2006 Discrete Distributions This section summarizes important facts about the following discrete random variables: Bernoulli, binomial, hypergeometric, geometric, negative binomial, and Poisson. Discrete random variables have sample spaces that are either finite or countably infinite. A good source of information on a wide variety of discrete distributions can be found at http://mathworld.wolfram.com/topics/DiscreteDistributions.html. Bernoulli ¥ Sample Space S = {success, failure} The interpretation of what constitutes a success or failure depends on the application. ¥ Definition X(success) = 1 X(failure) = 0 ¥ PDF If 0 < p < 1, then fX (x) = P (X = x) = ½ p, if x = 1 . 1 − p, if x = 0 ¥ Parameters p = probability of a success ¥ Symbol X ∼ Ber(p) ¥ Expectation E(X) = p ¥ Variance V ar(X) = p(1 − p) ¥ MGF ψX (t) = pet + (1 − p), − ∞ < t < ∞ Binomial ¥ Sample Space S is the set of all sequences of length n consisting of successes and failures The interpretation of what constitutes a success or failure depends on the application. 1 ¥ Definition X(s) is the number of successes in the sequence s ¥ PDF µ ¶ n x fX (x) = P (X = x) = p (1 − p)n−x , x = 0, 1, ..., n x ¥ Parameters n = number of trials of the experiment p = probability of a success ¥ Symbol X ∼ Bin(n, p) ¥ Expectation E(X) = np ¥ Variance V ar(X) = np(1 − p) ¥ MGF ψ X (t) = (pet + (1 − p))n , − ∞ < t < ∞ ¥ Facts 1. If X1 , X2 , ..., Xn are independent, Bernoulli random variables each with parameter p, then X = X1 + X2 + · · · + Xn is a binomial random variable with parameters (n, p). 2. If X1 , X2 , ..., Xk are independent, binomial random variables with Xi having parameters (ni , p), then X = X1 + X2 + · · · + Xk is a binomial random variable with parameters (n1 + n2 + · · · + nk , p). Therefore, the sum of independent, binomial random variables with the same parameter p is again binomial. Hypergeometric ¥ Sample Space S is the set of all samples consisting of n balls (drawn without replacement) from a total of r + w balls where r is the number of red balls and w is the number of white balls ¥ Definition X(s) is the number of red balls in the sample s ¥ PDF fX (x) = P (X = x) = ¡r ¢¡ ¢ w n−x ¡r+w ¢ n x , x = max{0, n − w}, ..., min{n, r} 2 ¥ Parameters r = w = n = number of red balls number of while balls size of the sample ¥ Symbol X ∼ Hyp(r, w, n) ¥ Expectation nr E(X) = r+w ¥ Variance µ ¶ nrw r+w−n V ar(X) = (r + w)2 r + w − 1 µ ¶ T −n = npq where T = r + w, p = r/T , and q = w/T T −1 Note that T is the total number of balls, p is the probability of choosing a red ball, and q is the probability of choosing a white ball. ¥ MGF The moment-generating function for the hypergeometric distribution is complicated and involves what is called the hypergeometric function, a special function in applied mathematics. (See http://mathworld. wolfram.com/HypergeometricDistribution.html for details.) ¥ Facts 1. If Y is the binomial random variable whose value is the number of red balls obtained in n independent choices (drawn with replacement), then r = np and r+w w r · = npq. V ar(Y ) = n · r+w r+w E(Y ) = n · Notice that the expectations of X and and Y are the same, and the variances differ only by the factor α= T −n . T −1 As T → ∞, that is, as the total number of balls becomes large in relation to a fixed value of n, we have that α → 1. Therefore, the variance of X converges to the variance of the binomial random variable Y as T → ∞. What this means is that there is little difference in sampling with or without replacement when T , the total number of balls, is large in comparison the the sample size n. This fact is important in applications to polling. 2. The formula for V ar(X) depends of the concept of covariance. See p. 253. 3 Geometric ¥ Sample Space S is the set of all sequences of a (possibly empty) sequence of failures followed by a single success For example, s, fs, f fs, ff fs, and f ff f s are outcomes in S. The interpretation of what constitutes a success or failure depends on the application. ¥ Definition X(t) is the number of failures in the outcome t For example, X(f ff f s) = 4. ¥ PDF If 0 < p < 1, fX (x) = P (X = x) = (1 − p)x p, x = 0, 1, .... ¥ Parameters p = probability of a success ¥ Symbol X ∼ Geo(p) ¥ Expectation E(X) = 1−p p ¥ Variance V ar(X) = ¥ MGF ψX (t) = 1−p p2 p , t < − ln(1 − p) 1 − (1 − p)et Negative Binomial ¥ Sample Space S is the set of all sequences of success and failures ending with a success and containing a total of r successes For example if r = 3, then f f fsf sff ff s is an outcome in S. The interpretation of what constitutes a success or failure depends on the application. ¥ Definition X(t) is the number of failures in the outcome t For example, X(f ff sfsf f ff s) = 8. 4 ¥ PDF If 0 < p < 1, fX (x) = P (X = x) = µ ¶ x+r−1 (1 − p)x pr , x = 0, 1, .... x ¥ Parameters p = r = probability of a success number of successes that need to be obtained ¥ Symbol X ∼ Negbin(r, p) ¥ Expectation E(X) = r(1 − p) p ¥ Variance V ar(X) = ¥ MGF ψX (t) = µ r(1 − p) p2 p 1 − (1 − p)et ¶r , t < − ln(1 − p) ¥ Facts 1. If r = 1, the negative binomial is just the geometric. generalizes the geometric distribution. That is, the negative binomial distribution 2. If X1 is the number of failures obtained before the first success, and for i > 1, the random variable Xi is the number of failures obtained after the (i − 1)st success but before the ith success, then X1 , X2 , ..., Xr are independent, geometric random variables each with parameter p. Also, note that X = X1 + X2 +· · · + Xr is the total number of failures obtained before the rth success. Therefore, X is a negative binomial random variable with parameters (r, p). Thus, the sum of independent, geometric random variables with the same parameter p is negative binomial. 3. If X1 , X2 , ..., Xk are independent, negative binomial random variables such that Xi has parameters (ri , p), then X = X1 + X2 + · · · + Xk is a negative binomial random variable with parameters (r1 + r2 + · · · + rk , p). Therefore, the sum of independent, negative binomial random variables with the same parameter p is negative binomial. Poisson ¥ Sample Space S = {0, 1, 2, ...} ¥ Definition X(s) = s The value of the random variable is simply the outcome of the experiment. 5 ¥ PDF For λ > 0, fX (x) = P (X = x) = e−λ λx , x = 0, 1, .... x! ¥ Parameters λ = average number of random events occurring in unit time (called the Poisson rate) ¥ Symbol X ∼ Poi(λ) ¥ Expectation E(X) = λ ¥ Variance V ar(X) = λ ¥ MGF ψX (t) = exp(λ(et − 1) , − ∞ < t < ∞ ¥ Facts 1. Poisson random variables are most often used to model the number of random events that occur in unit time. (Random events occurring in time is called a Poisson process. See p. 259.) Examples of random events are accidents at an intersection or hurricanes in the Gulf of Mexico. In this case, λ is the average number of random events occurring in unit time, and is called the Poisson rate. For example, if the Bears fumble the ball twice on average in a game, then the Poisson rate is λ = 2/60 = 0.033 fumbles/minute. Therefore, if X is the number of fumbles the Bears commit in a minute of play, fX (x) = (0.033)x e−0.033 , x = 0, 1, .... x! 2. If X1 , X2 , ..., Xk are independent, Poisson random variables such that Xi has parameter λi , then X = X1 + X2 + · · · + Xk is a Poisson random variable with parameter λ1 + λ2 + · · · + λk . The sum of independent, Poisson random variables is Poisson. 3. The Poisson distribution can be used to approximate the binomial distribution if n is large and p is small. Specifically, if X is a binomial random variable with parameters (n, p) and n is large and p is small, then if λ = np, µ ¶ n x e−λ λx . p (1 − p)n−x ≈ fX (x) = x x! Continuous Distributions This section summarizes important facts about the following continuous random variables: uniform, normal, gamma, exponential, and beta. Sample spaces of continuous random variables are intervals (either finite or infinite) on the real line. The value of a continuous random variable is simply the outcome of the experiment. That is, continuous random variables are the identity function on their respective sample spaces. Therefore, there is no need to define a continuous random variable. A good source of information on a wide variety of continuous distributions can be found at http://mathworld.wolfram.com/topics/ContinuousDistributions.html. 6 Uniform ¥ Sample Space S = (a, b), a < b ¥ PDF fX (x) = 1 b−a ¥ Parameters a = b = left endpoint right endpoint ¥ Symbol X ∼ U(a, b) ¥ Expectation E(X) = a+b 2 ¥ Variance V ar(X) = (b − a)2 12 ¥ MGF ψX (t) = ebt − eat , −∞<t<∞ t(b − a) Normal ¥ Sample space S = (−∞, ∞) ¥ PDF If −∞ < µ < ∞ and σ > 0, à µ ¶2 ! 1 1 x−µ fX (x) = √ exp − , −∞<x<∞ 2 σ 2πσ ¥ Parameters µ = σ2 = mean variance ¥ Symbol X ∼ N(µ, σ2 ) 7 ¥ Expectation E(X) = µ ¥ Variance V ar(X) = σ2 ¥ MGF µ ¶ 1 22 ψX (t) = exp µt + σ t , − ∞ < t < ∞ 2 ¥ Facts 1. If X1 , X2 , ..., Xk are independent, normal random variables such that Xi has parameters (µi , σ 2i ) and α1 , α2 , ..., αk , β are real numbers, then X = α1 X1 + α2 X2 + · · · + αk Xk + β is a normal random variable with parameters (α1 µ1 + α2 µ2 + · · · + αk µk + β, α21 σ 21 + α22 σ22 + · · · + α2k σ2k ). That is, the mean of X is α1 µ1 + α2 µ2 + · · · + αk µk + β and the variance of X is α21 σ21 + α22 σ 22 + · · · + α2k σ2k . Therefore, the sum of independent, normal random variables is normal. 2. If X1 , X2 , ..., Xn is a random sample of size n from a normal random variable with parameters (µ, σ2 ), then X n = (X1 + X2 + · · · + Xn )/n is normal with parameters (µ, σ2 /n). 3. If X is a normal random variable with parameters (µ, σ), then Z = (X − µ)/σ is a standard normal random variable. That is, Z has µ = 0 and σ 2 = 1. 4. If Φ(x) denotes the cumulative distribution function of the standard normal, then Φ(x) + Φ(−x) = 1 for −∞ < x < ∞. This formula can be used to compute probabilities from a standard normal table of values. 5. If X is a normal random variable with parameters (µ, σ), and FX (x) is the cumulative distribution −1 −1 (p) of X is given by FX (p) = µ + σΦ−1 (p) where Φ−1 (p) function of X, then the quantile function FX is the quantile function of the standard normal. 6. If X is a normal random variable with parameters (µ, σ), then Pr(|X − µ| ≤ kσ) = Pr(|Z| ≤ k) where Z denotes the standard normal. That is, the probability that a normal random variable is within k standard deviations of its mean is exactly equal to the probability that the standard normal is within k standard deviations of its mean (since σ = 1 for the standard normal). Therefore, the probability that any two normal random variables are within k standard deviations of their respective means is the same. 7. If X is a continuous random variable such that log(X) is normal, then X is said to have a lognormal distribution. 8. If X1 , X2 , ..., Xn , ... is an infinite sequence of independent and identically distributed random variables each with mean µ and variance σ 2 . Then ¶ Z b µ µ ¶ (X1 + X2 + · · · + Xn ) − nµ 1 1 √ √ exp − z 2 dz = Φ(b) − Φ(a). lim Pr a ≤ ≤b = n→∞ 2 nσ 2π a This result, called the Central Limit Theorem, says that X = X1 + X2 + · · · + Xn is approximately normal with mean nµ and variance nσ2 . As a special case, we have ¶ Z b µ µ ¶ 1 Xn − µ 1 √ ≤b = √ exp − z 2 dz = Φ(b) − Φ(a). lim Pr a ≤ n→∞ σ/ n 2 2π a Gamma ¥ Sample space S = (0, ∞) 8 ¥ PDF If α, β > 0, fX (x) = β α α−1 −βx x e ,x>0 Γ(α) ¥ Parameters α = β = shape parameter scale parameter ¥ Symbol X ∼ Γ(α, β) ¥ Expectation α E(X) = β ¥ Variance V ar(X) = ¥ MGF ψX (t) = µ α β2 β β−t ¶α ,t<β ¥ Facts 1. The gamma pdf is defined in terms of the gamma function which is traditionally denoted as Γ(α). The gamma function is an important special function in applied mathematics. For α > 0, Γ(α) is defined as Z ∞ xα−1 e−x dx. Γ(α) = 0 It is easy to show that Γ(1) = 1, and (by using integration by parts) that Γ(α) = (α − 1)Γ(α − 1). In particular, if n > 1 is an integer, then Γ(n) = (n − 1)!. Therefore, Γ(2) = 1, Γ(3) = 2, Γ(4) = 6, etc. 2. The gamma distribution is often used to model the time elapsed until the occurrence of the nth random event in a Poisson process. We take α = n and β to be the Poisson rate. For example, if the Bears fumble the ball at a Poisson rate of β = 0.033 fumbles/min (about 2 fumbles per game) and X measures the time until the 10th fumble of the season, then fX (x) = (0.033)10 9 −0.033x x e , x > 0. Γ(10) 3. Note that fX (x) integrates to 1 (a necessary requirement for a pdf) since Z ∞ Z ∞ α β xα−1 e−βx dx fX (x)dx = Γ(α) 0 0 Z ∞ βα = xα−1 e−βx dx Γ(α) 0 Z ∞ 1 βα · α uα−1 e−u du (after the substitution u = βx) = Γ(α) β 0 1 βα · · Γ(α) = Γ(α) β α = 1. 9 4. The graph of the gamma distribution depends on the choices made for the parameters α and β. The graphs below depict the gamma distribution for β = 2 and various values of α. 5. The expectation, variance, and moment-generating function of the gamma distribution are easily derived using the definition of the gamma function. See p. 297 for details. 6. If X1 , X2 , ..., Xk are independent, gamma random variables such that Xi has parameters (αi , β) and X = X1 + X2 + · · · + Xk , then X is a gamma random variable with parameters (α1 + α2 + · · · ÷ αk , β). Therefore, the sum of independent, gamma random variables with the same β is a gamma random variable. Exponential ¥ Sample Space S = (0, ∞) ¥ PDF If β > 0, fX (x) = βe−βx , x > 0. ¥ Parameters β = scale parameter ¥ Symbol X ∼ Exp(β) ¥ Expectation E(X) = 1 β 10 ¥ Variance V ar(X) = 1 β2 ¥ MGF ψX (t) = β ,t<β β−t ¥ Facts 1. The exponential distribution is a special case of the gamma distribution. If α = 1 in the pdf of the gamma, we obtain fX (x) = β 1 1−1 −βx x e Γ(1) = βe−βx . 2. The exponential distribution has a very special property commonly called the memoryless property. Specifically, if X is an exponential random variable with parameter β, then for t, h > 0, R∞ βe−βx dx e−β(t+h) P (X ≥ t + h) = Rt+h = = e−βh = P (X ≥ h). P (X ≥ t + h | X ≥ t) = ∞ −βt −βx dx P (X ≥ t) e βe t If we think of t as representing time (say) and X as recording the time of arrival of some random event, then what the above equation is saying is that if the event has not occurred after the first t time units has elapsed, the probability of it occurring in the next h time units is the same as if we reset the time back to t = 0. That is, the random variable X ‘forgets’ about the time interval [0, t]. The fact that the event has not happened by time t has no effect on whether it will happen during the next h time units. 3. The exponential distribution is often used to model the elapsed time before the occurrence of the first event in a Poisson process. For example, if the Bears fumble the ball with a Poisson rate β = 0.033 fumbles/minute and X is the time until the first fumble, then fX (x) = (0.033)e−0.033x , x > 0. In fact, even more is true! Because of the memoryless property, the pdf above also models the time between fumbles. For example, the time that elapses between the 7th and 8th fumbles is given by the same pdf. 4. If X1 , X2 , ..., Xk are independent, exponential random variables such that Xi has parameter β and X = X1 + X2 + · · · + Xk , then X is a gamma random variable with parameters (k, β). Therefore, the sum of independent, exponential random variables with the same β is a gamma random variable. Beta ¥ Sample Space S = (0, 1) ¥ PDF If α, β > 0, fX (x) = Γ(α + β) α−1 x (1 − x)β−1 , 0 < x < 1. Γ(α)Γ(β) 11 ¥ Parameters α = β = first shape parameter second shape parameter ¥ Symbol X ∼ Beta(α, β) ¥ Expectation E(X) = α α+β ¥ Variance V ar(X) = αβ (α + β)2 (α + β + 1) ¥ MGF The moment-generating function for the beta distribution is complicated and involves what is called the confluent hypergeometric function, a special function in applied mathematics. (See http://mathworld.wolfram. com/BetaDistribution.html for details.) However, for certain values of α and β, the moment-generating function has a simple form. For example, if α = 3 and β = 2, then ψX (t) = 1 2 t (t e − 4tet + 6et − 2t − 6), − ∞ < t < ∞. t4 ¥ Facts 1. If α = β = 1, then the gamma distribution reduces to the uniform distribution on (0, 1). 2. Since beta random variables take values in (0, 1), they are useful in modeling proportions, percentages, or probabilities. A common use of the beta distribution is to model the unknown probability p of a success in a Bernoulli trial. 3. Jacobians and the Change of Variables Theorem must be used to show that the pdf of a beta distribution integrates to 1. (See p. 304.) 4. The parameters α and β are shape parameters for the beta distribution. (a) If α = β = 1, the beta distribution reduces to the uniform distribution. (See (a) below.) (b) If α < 1, the pdf approaches +∞ as x → 0+. (See (c) and (f) below.) (c) If β < 1, the pdf approaches +∞ as x → 1−. (See (d) and (f) below.) (d) If α < 1 and β < 1, the pdf is ‘U -shaped’. (See (b) and (f) below.) (e) If α < 1 and β ≥ 1, the pdf is decreasing. (See (c) below.) (f) If α ≥ 1 and β < 1, the pdf is increasing. (See (d) below.) (g) If α > 1 and β > 1, the pdf has a single maximum. (See (e) below.) (h) If α = β, the pdf is symmetric about x = 1/2. (See (a) and (b) below.) If α 6= β, the pdf is skewed. (See (c) through (f) below.) 12 (a) (b) (c) (d) 13 (e) (f) 5. If X1 and X2 are independent, gamma random variables with parameters (α1 , β) and (α2 , β) respectively, then it turns out that X1 /(X1 + X2 ) is beta with parameters (α1 , α2 ). (This is easy to verify by first letting Y1 = X1 + X2 and Y2 = X1 /(X1 + X2 ) and then finding the joint distribution of Y1 and Y2 using Jacobians. It is then straightforward to find the marginal distribution of Y2 and verify that it is beta with parameters (α1 , α2 ). Of course, Y1 is gamma with parameters (α1 + α2 , β).) 14