Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 03: Continuous random variables Ba Chu E-mail: ba [email protected] Web: http://www.carleton.ca/∼bchu (Note that this is a lecture note. Please refer to the textbooks suggested in the course outline for details. Examples will be given and explained in the class.) 1 Objectives In reality, we often encounter situations where the random event under our consideration has uncountable prospects. The purpose of this lecture is to introduce the concept of continuous random variables associated with events of uncountable numbers of outcomes. First, I will motivate this concept with the continuous uniform distribution. Second, I will define the probability density functions (pdf) and the cumulative distribution functions (cdf) of continuous random variables. Third, I will explain typical continuous random variables such as the normal random variable, the t random variable, and the χ2 random variable. Fourth, I will explain the concept of moment generating functions (mgf) and their usefulness in deriving moments of random variables. Examples will be explained along the way. 2 Motivation Consider a random variable X whose set of possible values is the unit interval, i.e., [0, 1]. Specifically, we ask the following question: what probability distribution formalizes the notion of ‘equally likely’ outcomes in the 1 unit interval [0, 1]? We have learnt to formalize the notion of ‘equally likely’ by assigning the same probability to each individual outcome in the same space. Thus, if S = {s1 , . . . , sN }, then P (si ) = 1/N . Let E = {si1 , . . . , sik } denote an event in S, we have P (E) = k X P ({sij }) = j=1 k X 1 k = . N N j=1 Now, if E is an event with countable outcomes with real values in [0, 1] (say, E = { 21 , 13 , 14 , . . . }, a difficulty arises. This is because, if we assume that every value in [0,1] is equally likely, i.e., P (X = x) = for every x ∈ [0, 1], we obtain ∞ X ∞ X 1 P (E) = P ({ }) = = ∞. j j=2 j=2 (2.1) Hence, we must assign a probability of zero to every outcome in [0, 1], i.e., P (X = x) = 0 for every x ∈ [0, 1]; thus P (E) = 0. This may give you confusion. However, in probability we always emphasize that If an event is impossible, then it necessarily has zero probability; however, having zero probability does not necessarily mean that an event is impossible. As we have seen, probabilities of any countable event must be zero. Hence, to obtain positive probabilities, we must consider events whose cardinality is uncountable. Consider the events [0, 0.5] and [0.5, 1]. Because all outcomes are equally likely, these events must have the same probability, i.e., P (X ∈ [0, 0.5]) = P (X ∈ [0.5, 1]). Since P (X ∈ [0, 1]) = 1, we have: P (X ∈ [0, 0.5]) = P (X ∈ [0.5, 1]) = 0.5. The intuitive conclusion here is that the probability of each subinterval equals the proportion of the entire interval occupied by the subinterval. In mathematical notation, our conclusion can be expressed as follows: Suppose that X(S) = [0, 1] and each x ∈ [0, 1] is equally likely. If 0 ≤ a ≤ b ≤ 1, then P (X ∈ [a, b]) = b − a. 2 The probability distribution that we discussed above is called the uniform distribution. If X ∼ Uniform[0, 1], the the cdf of X is given by: • If y < 0, then F (y) = P (X ≤ y) = 0. • If y ∈ [0, 1], then F (y) = y. • If y > 1, then F (y) = 0. 3 Basic Concepts Consider a real-valued function f : R −→ R. Let Area[a,b] (f ) denote the region bounded by the graph of f at the specified end points a and b. By elementary integral calculus, we can determine that Rb Area[a,b] (f ) = a f (x)dx. Now, we introduce the notation of pdfs of continuous random variables on the real probability space. Definition 1. A pdf is a function f : R −→ R such that 1. f (x) ≥ 0 for every x ∈ R. 2. Area[−∞,+∞] (f ) = R∞ −∞ f (x)dx = 1. It is apparent that the cdf of X is given by F (y) = P (X ≤ y) = Ry −∞ f (x)dx; and f (y) = Definition 2. The expected value of X is Z ∞ µ = E(X) = xf (x)dx. −∞ If the function g : R −→ R is such that Y = g(X) is a random variable, then we have Z ∞ E(Y ) = E[g(X)] = g(x)f (x)dx. −∞ The variance of X is 2 Z ∞ σ = V ar(X) = −∞ 3 (x − µ)2 f (x)dx. d F (y). dy Example 1. What is the probability that a battery-powered watch will stop with its minute hand positioned between 10 and 20 minutes past the hour? Answer: Let X denote the number of minutes past the hour to which the minute hand points when the watch stops. The pdf that describes the notion of equally likely values in the interval [0, 60] is 0, x ∈ (−∞, 0), f (x) = 1/60, x ∈ [0, 60), 0, x ∈ [60, ∞). 1 = To check that f is a pdf, observe that f (x) ≥ 0 for every x ∈ R and that Area[ 0, 60)(f ) = (60−0) 60 1 = 1/6. 1. Thus, P (X ∈ (10, 20)) = Area(10,20) (f ) = (20 − 10) 60 Example 2. Consider two battery-powered watches. Let X1 denote the number of minutes past the hour at which the first watch stops and let X2 denote the number of minutes past the hour at which the second watch stops. What is the probability that the larger of X1 and X2 will be between 30 and 50? Answer: Here we have two independent variables, each uniformly distributed – Uniform[0,60), and a third random number Y = max(X1 , X2 ). Let F denote the cdf of Y . We want to calculate P (30 < Y < 50) = F (50) − F (30). By the independence of X1 and X2 , we have F (y) = P (Y ≤ y) = P ((X1 ≤ y) ∩ (X2 ≤ y)) = P (X1 ≤ y)P (X2 ≤ y) = y2 . 602 Hence, P (30 < Y < 50) = 4/9. 4 Normal Random Variable This is the most important family of distributions in probability or statistics. Definition 3. A continuous random variable X is normally distributed with mean µ and variance 4 σ 2 > 0, denoted X ∼ N (µ, σ 2 ), if the pdf of X is 1 1 f (x) = √ exp − 2 2πσ x−µ σ 2 . Here are useful properties of the normal random variables. 1. f (x) > 0 and X(S) = (−∞, ∞). 2. f is symmetric about µ, i.e., f (µ + x) = f (µ − x). 3. f (x) decreases as |x − µ| increases. In fact, the decrease is very rapid. We express this by saying that f has very light tails. 4. P (µ − σ < X < µ + σ) = 0.683. 5. P (µ − 3σ < X < µ + 3σ) = 0.997. Definition 4. The standard normal distribution is N (0, 1). The pdf is often written as Φ(x); and the values of Φ(x) for every x ∈ R are given in formal statistical tables. If X ∼ N (µ, σ 2 ), then Z = X−µ σ ∼ N (0, 1). Example 3. If X ∼ N (2, 16), then the probability that X assumes a value between 0 and 10 is < P (0 < X < 10) = P ( 0−µ σ X−µ σ < 10−µ ) σ = P (Z < 2) − P (Z < −0.5). Theorem 1. If Z1 , . . . , Zn ∼ N (0, 1) and Y = Z12 + · · · + Zn2 , then Y is Chi-squared distributed with n degrees of freedom, denote χ2 (n). Let T = √Z , then the distribution of T is called a t distribution with n degrees of freedom. Y /n We will denote this distribution by t(n). Finally, let Y1 ∼ χ2 (n1 ) and Y2 ∼ χ2 (n2 ) be independent random variables and consider the random variable F = Y1 /n1 . Y2 /n2 The distribution of F is called the F distribution with n1 and n2 degrees of freedom, respectively. We will denote this distribution by F (n1 , n2 ). 5 Theorem 2. Let Fn denote the cdf of t(n). Then lim Fn (t) = Φ(t) n−→∞ for every t ∈ (−∞, +∞). 5 Moment Generating Functions (MGF) and Their Properties The mgf m(t) for a random variable Y is defined to be E[etY ], where t is an arbitrarily small constant. In view of the Taylor expansion of etY : etY = 1 + tY + m(t) = 1 + tE(Y ) + t2 E(Y 2 ) 2! + t3 E(Y 3 ) 3! (tY )2 2! + (tY )3 3! + . . . , we have + ..., 0 m (t) = E(Y ) + 3t2 2t E(Y 2 ) + E(Y 3 ) + . . . , 2! 3! and 00 m (t) = E(Y 2 ) + 0 3.2t E(Y 3 ) + . . . . 3! 00 It is immediate to see that m (0) = E(Y ) and m (0) = E(Y 2 ). Hence, if you have the mgf for a random variable, you can find the moments by taking the derivatives and evaluating them at zero. Note that the variance of Y can be derived using the formula V ar(Y ) = E(Y 2 ) − (E(Y ))2 . If Y is a discrete random variable, then m(t) = E(etY ) = ∞ X ety pdf (y) = 1 + t y=0 ∞ X 0 ∞ ypdf (y) + t2 X 2 y pdf (y) + . . . . 2! y=0 Example 4. Suppose Y is a Poisson random variable with parameter λ. Then pdf (y) = 6 e−λ λy y! for y = 0, 1, 2, . . . . The mgf of Y is given by m(t) = ∞ X = e−λ ety y=0 ∞ X 0 e−λ λy y! (et λ)y y! t = e−λ ee λ = eP λ(et − 1). The moments of Y can be immediately found by taking the derivatives and evaluating them at zero. Example 5. Suppose that Y has a geometric distribution with probability p. Given that P (Y = y) = pq y−1 , the mgf of Y is m(t) = pet . 1−qet 2 /2 Example 6. Suppose that Z ∼ N (0, 1). Then, m(t) = E[etZ ] = et . Example 7. Suppose that Y has a binomial distribution with parameters (n, p). Then, the mgf for Y is given by m(t) = (pet + q)n . The Uniqueness Theorem that I am going to state is very important in using mgfs to find the probability distribution of a function of random variables. Theorem 3. Suppose that the random variables X and Y have mgfs given by mX (t) and mY (t) respectively. If mX (t) = mY (t) for all values of t, then X and Y have the same pdf. Example 8. If Z ∼ N (0, 1) and Y = σZ + µ, show that Y ∼ N (µ, σ 2 ). 2 /2 Answer: We know that mZ (t) = et . Hence, mY (t) = etµ mZ (σt) = etµ+σ 2 t2 /2 which is the mgf of N (µ, σ 2 ). This implies that Y ∼ N (µ, σ 2 ) by the uniqueness theorem. Example 9. Suppose that Y1 ∼ Poisson(λ1 ) and Y2 ∼ Poisson(λ2 ) and that they are independent. Show that W = Y1 + Y2 is Poisson(λ1 + λ2 ). t Answer: mW (t) = mY1 (t).mY2 (t) = e(λ1 +λ2 )(e −1) . This implies W is Poisson(λ1 + λ2 ) by the uniqueness theorem. 7 Example 10. Let Y1 , Y2 , . . . , Yn denote independent identically distributed (IID) normal random P variables with mean µ and variance σ 2 . Then, show that Y = n1 n1 Yi is normally distributed with mean µ and variance σ 2 /n. Answer: It is immediate to verify that muY (t) = eµt+ σ 2 t2 2n . Hence, Y ∼ N (µ, σ 2 /n) by the uniqueness theorem. 6 Exercises 1. Suppose that Y is a normal random variable with mean µ and variance σ 2 . (a) Show that the mgf of Y is eµt+ t2 σ 2 2 . (b) Show that E(Y ) = µ and V ar(Y ) = σ 2 . (c) Derive the third and fourth moments of Y , i.e., E(Y 3 ) and E(Y 4 ). 2. Find the first, second, and third moments of the random variable Y for each of the following mgfs: (a) m(t) = 8 1 . 1−2t t (b) m(t) = e2(e −1) . 3. Consider the function f : R −→ R defined by 0, cx, f (x) = c(3 − x), 0, x < 0, 0 < x < 1.5, 1.5 < x < 3, x > 3, where c is an undetermined constant. (a) For what value of c is f a probability density function? (b) Suppose that a continuous random variable X has probability density function f . Compute E(X). 8 (c) Compute P (X > 2). (d) Suppose that Y ∼ Uniform(0, 3). Which random variable has the larger variance, X or Y? (e) Determine and graph the cdf of X. 4. Imagine that you are throwing at a circular dart board, B. Let us measure the dart board in units for which the radius of B is 1, so that the area of B is π. Suppose that the darts are thrown in such a way that they are certain to hit a point in B, and that each point in B is equally likely to be hit. Thus, if A ⊂ B, then the probability of hitting a point in A is P (A) = area(A) area(B) = area(A) . π Define the random variable X to be the distance from the center of B to the point that is hit. (a) What are the possible values of X? (b) Compute P (X ≤ 0.5). (c) Compute P (0.5 < X ≤ 0.7). (d) Determine and graph the cdf of X. 9