Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ST 371 (VII): Families of Continuous Distributions 1 Normal Distribution The family of normal random variables plays a central role in probability and statistics. This distribution is also called the Gaussian distribution after Carl Friedrich Gauss, who proposed it as a model for measurement errors. The central limit theorem, which will be discussed in Chapter 5, justifies the use of the normal distribution in many applications. Roughly, the central limit theorem says that if a random variable is the sum of a large number of independent random variables, it is approximately normally distributed. The normal distribution has been used as a model for such diverse phenomena as a person’s height, the distribution of IQ scores and the velocity of a gas molecule. The bell-shaped curve: In day to day life much use is made of statistics, in many cases without the person doing so even realising it. If you were to go into a shop and you noticed that everybody waiting to be served was over 7 feet tall, you would more than likely be surprised. You probably would have expected most people to be around the “average” height, maybe spotting just one or two people in the shop that would be taller than 6 and a half feet. In making this judgement you are actually employing a well used statistical distribution known as the Normal Distribution. A histogram which plots of the heights of women in US aged 18-24 illustrates this situation. We say that X is a normal random variable or simply that X is normally distributed, with parameters µ and σ if the pdf of X is given by (1.1) f (x; µ, σ 2 ) = √ 1 2 2 e−(x−µ) /2σ , − ∞ < x < ∞. 2πσ Techniques from multivariate calculus can be used to show that Z ∞ f (x; µ, σ 2 )dx = 1, −∞ 1 100 Intervals 0 0 5000 10000 Frequency 50000 Frequency 100000 15000 150000 10 Intervals 55 60 65 70 75 55 Height 60 65 70 75 Height Figure 1: Heights of Women in US aged 18-24. therefore f (x; µ, σ 2 ) fulfills the condition necessary for specifying a pdf. Mean, Variance and SD of a Normal Distribution: If X is normally distributed with parameters µ and σ, then it can be shown that E(X) = µ and Var(X) = σ 2 . Figure 2 presents graphs of f (x; µ, σ 2 ) for several different (µ, σ 2 ) pairs. We can see that each graph is symmetric about µ and bellshaped. The center of the bell is the mean of the distribution. The value of σ determines the spread of the distribution. 0.5 0.35 0.45 0.3 0.4 0.25 0.35 0.3 0.2 0.25 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 −5 0 0 −5 5 0 Figure 2: Density Functions of Normal Distribution 2 5 Example 1 The test scores of an examination can be approximated by a normal density curve. In other words, a graph of the frequency of grade scores should have approximately the bell-shaped form of the normal density. The instructor often uses the test scores to estimate the normal parameters µ and σ 2 and then assigns the letter grade A to those whose test score is greater than µ + σ, B to those whose score is between µ and µ + σ, C to those whose score is between µ − σ and µ, D to those whose score is between µ − 2σ and µ − σ, and F to those getting a score below µ − 2σ. This is sometimes referred to as grading “on the curve”. Decide the percent of the class receiving grades A-F. 3 1.1 Standard Normal Distribution An important fact about normal random variables is that if X is normally distributed with parameters µ and σ 2 , then aX + b is normally distributed with parameters aµ + b and a2 σ 2 . Therefore if X is normally distributed with parameters µ and σ 2 , then (1.2) Z= X −µ σ is normally distributed with parameters 0 and 1. Such a random variable is said to be a standard normal random variable and will be denoted by Z. The pdf of a standard normal variable is 1 2 φ(z) = f (z; 0, 1) = √ e−z /2 , −∞ < z < ∞ 2π Rz The cdf of Z is P (Z ≤ z) = −∞ φ(y)dy, which will be denoted by Φ(z). Φ(z) = P (Z ≤ z) gives the area under the graph of the standard normal pdf to the left of z. It is easy to see that φ(z) = φ(−z). Note the z-curve is symmetric about 0, we have (1.3) Φ(−z) = 1 − Φ(z). Example 2 (z-curve) Find normal curve areas: • P (Z ≤ 1.96) • P (Z > 1.96) 4 • P (Z ≤ −1.96) • P (−0.38 ≤ z ≤ 1.25). Example 3 (z-percentile) The 100pth percentile of the standard normal distribution is the value on the horizontal axis such that the area under the curve to its left is p. Find (a) the 95th z-percentile, (b) the 5th z-percentile. zα notation: zα is the value on x-axis for which α of the area under the z-curve lies to the right of zα . In other words, zα is the 100(1 − α)th percentile of the standard normal distribution. The zα ’s are usually referred to as z critical values. Important z critical values include: z0.10 = 1.28, z0.05 = 1.645, z0.025 = 1.96 and z0.005 = 2.33. 5 1.2 Non-Standard Normal Distribution When X ∼ N (µ, σ 2 ), probabilities involving X can be computed by “standardizing”. Let Z = (X − µ)/σ, then Z ∼ N (0, 1). Thus b−µ a−µ ≤Z≤ ) σ σ b−µ a−µ = Φ( ) − Φ( ) σ σ P (a ≤ X ≤ b) = P ( Similarly we have P (X ≤ a) = Φ( a−µ b−µ ) and P (X ≥ b) = 1 − Φ( ). σ σ The key idea is that by standardizing, any probability involving X can be expressed as a probability involving Z. Example 4 If X is a normal random variable with parameters µ = 3 and σ 2 = 9, find (a) P (2 < X < 5); (b) P (X > 0); (c) P (|X − 3| > 6). 6 Example 5 The army is developing a new missile and is concerned about its precision. By observing points of impact, launchers can adjust the missile’s initial trajectory, thereby controlling the mean of its impact distribution. If the standard deviation of the impact distribution is too large, though, the missile will be ineffective. Suppose the Pentagon requires that at least 95% of the missiles must fall within 1/8 mile of the target when the missiles are aimed properly. Assume the impact distribution is normal. What is the maximum allowable standard deviation for the impact distribution? 7 1.3 R commands for Normal Distribution The command dnorm(x, mu, s2) calculates the probability that a normal random variable with mean mu and variance s2 equals x, e.g., > dnorm(2, 2, 1.5^2) [1] 0.1773077 is the value of pdf f (2; µ = 2, σ 2 = 1.52 ). The command pnorm(x, mu, s2) calculates the probability that a normal rv with mean mu and variance s2 is less than or equal to x, e..g, > pnorm(2, 2, 1.5^2) [1] 0.5 is the value of cdf P (X ≤ 2), where X ∼ N (2, 1.52 ). 1.4 Normal Approximation to Binomial An important result in probability theory, known as the DeMoivre-Laplace limit theorem, states that when n is large, a binomial random variable with parameters n and p will have approximately the same distribution as a normal random variable with the same mean and variance as the binomial. DeMoivre-Laplace Limit Theorem: If Sn denotes the number of successes that occur when n independent trials, each resulting in a success with probability p, are performed, then for any a < b, ! à Sn − np ≤ b → Φ(b) − Φ(a) (1.4) P a≤p np(1 − p) as n → ∞. Normal Approx. vs. Poisson Approx. to Binomial: The normal distribution provides an approximation to the binomial distribution when n is large [The normal approximation to the binomial will, in general, be quite good for 8 Binomial(20, 0.7) 0.10 pmf 0.15 0.00 0.00 0.05 0.05 0.10 pmf 0.20 0.15 0.25 Binomial(10, 0.7) 2 4 6 8 10 0 5 10 15 X X Binomial(30, 0.7) Binomial(40, 0.7) 20 0.10 pmf 0.15 0.00 0.00 0.05 0.05 0.10 pmf 0.20 0.15 0.25 0 0 5 10 15 20 25 30 0 X 10 20 30 40 50 X Figure 3: The pmf of a Binomial (n, p) random variable becomes more and more “normal” ad n becomes larger and larger. 9 values n of satisfying np(1 − p) ≥ 10]. The Poisson distribution provides an approximation to the binomial distribution when n is large and p is small so that np is moderate (n > 50 and np < 5). Example 6 Airlines A and B offer identical service on two flights leaving at the same time (meaning that the probability of a passenger choosing either is 1/2). Suppose that both airlines are competing for the same pool of 400 potential passengers. Airline A sells tickets to everyone who requests one, and the capacity of its plane is 230. Approximate the probability that airline A overbooks. 10 2 2.1 The Exponential and Gamma Distributions Exponential distribution Exponential Random Variables: A random variable with CDF F (x) = 1 − e−λx , x ≥ 0 is called an exponential random variable with parameter λ. The probability density function of an exponential random variable is for x ≥ 0, (2.5) f (x) = d d F (x) = 1 − e−λx = λe−λx . dx dx 1.0 and 0 for x < 0. The following plot shows the pdfs for some exponential distributions. 0.0 0.2 0.4 pdf 0.6 0.8 Exponential(0.5) Exponential(1) Exponential(2) Exponential(3) 0 1 2 3 4 5 x Figure 4: Exponential distribution densities. 11 6 Example 7 The time until the first event in a Poisson process with rate λ is an exponential random variable with parameter λ. Suppose customers arrive at a bank according to a Poisson process with arrival intensity 2 per minute. What is the probability that starting at 1 p.m., the first customer arrives within two minutes? Mean and variance of exponential random variable: we will show that E(X) = 1 1 , Var(X) = 2 . λ λ For n > 0, we have Z ∞ n E(X ) = xn λe−λx dx. 0 , u = xn ) yields Z ∞ ¯ n n −λx ¯∞ E(X ) = −x e + e−λx nxn−1 dx 0 0 Z ∞ n λe−λx nxn−1 dx = 0+ λ 0 n = E[X n−1 ]. λ Integrating by parts (dv = λe −λx 12 Thus, 1 1 E(X 0 ) = λ λ 2 2 E(X 2 ) = E(X) = 2 λ µ ¶2λ 2 1 1 V ar(X) = 2 − = 2 λ λ λ E(X) = Memorylessness of exponential random variable: Let X be an exponential random variable with parameter λ. We have for all s, t ≥ 0, P ({X > s + t} ∩ X > t) P (X > t) P (X > s + t) = P (X > t) 1 − (1 − e−λ(s+t) ) = 1 − (1 − e−λt ) = e−λs P (X > s + t|X > t) = In other words, (2.6) P (X ≤ s + t|X > t) = 1 − e−λs = P (X ≤ s). If we think of X as the lifetime of some electrical device, equation (2.6) states the probability that the device survives for at least s + t hours given that it has survived t hours is the same as the initial probability that it survives for at least s hours. In other words, if the device is alive at age t, the distribution of the remaining amount of time that it survives is the same as the original lifetime distribution (that is, it is as if the instrument does not remember that it has already been in use for a time t). This is called the memorylessness property of the exponential distribution. Question: Is the exponential distribution a good model for the distribution of human lifetimes? 13 Example 8 Consider a post office that is staffed by two clerks. Suppose that when Mr. Smith enters the post office, he discovers that Ms. Jones is being served by one of the clerks and Mr. Brown by the other. Suppose also that Mr. Smith is told that his service will begin as soon as either Jones or Brown leaves. If the amount of time that a clerk spends with a customer is exponentially distributed with parameter , what is the probability that, of the three customers, Mr. Smith is the last to leave the post office? 14 2.2 The Gamma Distribution Suppose events occur according to a Poisson process with arrival intensity λ. Suppose we start observing the process at some time (which we denote by time 0). The time until the first event occurs has an exponential (λ) distribution. Let X denote the time until the first α events occur. Let W denote the number of occurrences of the event in the interval [0, x]. Then W is a Poisson random variable with parameter λx. The pdf of X can be obtained using W as λα fX (x) = xα−1 e−λx (α − 1)! The gamma family can be generalized to cases in which α is positive but not necessarily an integer. To do this, we replace (α − 1)! with a continuous function of (nonnegative) α, Γ(α), the latter reducing to (α − 1)! when α is a positive integer. For any real number α > 0, the gamma function (of α) is given by Z ∞ Γ(α) = xα−1 e−x dx. 0 The most important properties of the gamma function are the following: 1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1). 2. For any positive integer, n, Γ(n) = (n − 1)!. √ 3. Γ(1/2) = π. Gamma Distribution: Let X be a random variable with pdf ( −λx α−1 λe (λx) x≥0 Γ(α) f (x) = 0 x<0 Then X is said to have a gamma distribution with parameters α and γ. The mean and variance of the gamma distribution are E(X) = α α , Var(X) = 2 . λ λ 15 1.0 The gamma family of distributions is a flexible family of probability distributions for modeling nonnegative valued random variables. The following plot shows the pdfs for some gamma distributions. 0.0 0.2 0.4 pdf 0.6 0.8 Gamma(0.5, 1) Gamma(1, 1) Gamma(5, 1) Gamma(10, 1) 0 5 10 15 20 x Figure 5: Gamma distribution densities. Example 9 Suppose customers arrive at a bank according to a Poisson process with arrival intensity 2 per minute. What is the probability that starting at 1 p.m., the first two customers have arrived within three minutes? What is the expected value of the amount of time it takes for two customers to arrive? 16 2.3 The Chi-Squared Distribution The chi-squared distribution is important because it is the basis for statistical procedures for testing hypotheses and constructing confidence intervals. Relation to normal distribution (not required): Let X1 , · · · , Xn be a random sample from N (µ, σ 2 ). Then the rv P (n − 1)S 2 (Xi − X̄)2 (2.7) = σ2 σ2 has a χ2 distribution with (n − 1) df. Let ν be a positive integer. Then a rv X is said to have a chi-squared distribution with parameter ν if the pdf of X is the gamma density with α = ν/2 and β = 2. The pdf of a chi-squared rv is thus f (x, ν) = 1 xν/2−1 e−x/2 . ν/2 2 Γ(ν/2) 0.10 0.00 0.05 density 0.15 χ2 4 χ2 8 χ2 12 χ2 20 0 10 20 30 40 x Figure 6: Chi-Square Densities. The χ2 curve is positively skewed and becomes more flat symmetric as n increases. 17 3 Distribution of Functions of R.V. (optional) It is often the case that we know the probability distribution of a random variable and are interested in determining the distribution of some function of it as in Example 1 above. Suppose that we know the distribution of X and we want to find the distribution of g(X). Our basic approach for continuous random variables is the following: 1. Find the CDF of g(X). To do so, write the event g(X) ≤ y in terms of X being in some set. 2. Differentiate the CDF of g(X) to find the pdf. Example 10 Suppose that a random variable X has pdf ½ 6x(1 − x) 0 < x < 1 fX (x) = 0 elsewhere Let Y be a random variable that equals 2X + 1. What is the pdf of Y ? Note, first that the random variable Y maps the range of X-values (0,1) onto the interval (1,3). Let 1 < y < 3. Then ¶ µ y−1 FY (y) = P (Y ≤ y) = P (2X + 1 ≤ y) = P X ≤ 2 µ ¶ y−1 = FX 2 We have 0R t FX (t) = 6x(1 − x)dx = 3t2 − 2t3 0 1 18 t≤0 0<t<1 t≥1 Therefore, 0 ³ ´2 ¡ ¢ ¡ y−1 ¢ 3 y-1 − 2 y−1 3 2 2 FY (y) = FX 2 = 2 3 = − y4 + 3y2 − 9y 4 +1 1 Differentiating FY (y) gives fY (y): 0 2 fY (y) = − 3y4 + 3y − 0 4 4.1 y≤1 1<y<3 y≥3 y≤1 1<y<3 y≥3 9 4 Other Useful Distributions (optional) The Weibull Distribution The Weibull distribution is widely used in engineering as a model for the lifetime of objects. A random variable X is said to have a Weibull distribution with parameters (α, β) and ν if its pdf is ( 0 x≤ν n ¡ ¢ o ¡ ¢ f (x) = β β x−ν β−1 exp − x−ν x>ν α α α 4.2 The Lognormal Distribution A nonnegative rv X is said to have a lognormal distribution if the rv Y = log(X) has a normal distribution. The resulting pdf of a lognormal rv when log(X) is normally distributed with parameter µ and σ is f (x; µ, σ) = √ 1 2 2 e−[log(x)−µ] /(2σ ) . 2πσx 19 4.3 The Beta Distribution The beta distribution is a distribution for a continuous random variable that lies between 0 and 1, and is widely used as a model for proportions. A random variable X is said to have a beta distribution with parameters a and b if its pdf is ½ 1 a−1 x (1 − x)b−1 0<x<1 f (x) = B(a,b) 0 otherwise R1 where B(a, b) = 0 xa−1 (1 − x)b−1 dx. 4.4 The Cauchy Distribution A random variable X is said to have a Cauchy distribution with parameter θ if its pdf is given by f (x) = 1 1 , π 1 + (x − θ)2 20 −∞ < x < ∞