Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou Chapter 6: Random Variables and Distributions We will now study the main tools used to characterize experiments with uncertainty: random variables and their distributions. 1 Single Random Variables and Distributions 1.1 Basic definitions • Random variables: a random variable is a function that maps the sample space Ω of an experiment into R. In other words, a random variable X is a function that assigns a real number X(ω) to each possible experimental outcome ω ∈ Ω. — for example, for an experiment in which a coin is tossed 10 times, the sample space consists of 210 sequences of 10 heads and tails. The number of heads obtained on the 10 tosses can be regarded as a random variable, and let us denote it by X. Clearly, X maps each possible sequences into the set {0, 1, · · · , 10}. • Distributions: suppose A is a subset of R and we wish to measure the probability that X ∈ A. This is given by: Pr(X ∈ A) = Pr(ω ∈ Ω : X(ω) ∈ A). Note that {ω ∈ Ω : X(s) ∈ A} is an event and so the right-hand side is well defined. The distribution of a random variable X is the collection of all probabilities Pr(X ∈ A) for all subsets A of the real numbers. — consider the above example again. If A = {1, · · · , 10}, then Pr(X ∈ A) is just the probability that the experiment outcome is a sequence with at least one head, and so 1 Pr(X ∈ A) = 1 − ( )10 . 2 • Distribution functions: the distribution function (or df ) F of a random variable X is a function defined for each real number x as follows: F (x) = Pr(X ≤ x) = Pr(ω ∈ Ω : X(ω) ≤ x). It just measures the probability of the event consisting of those outcomes satisfying X(ω) ≤ x. Sometimes we also call it the cumulative distribution function (or cdf ). — it is ready to show that F must satisfy the following properties: 1 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou ∗ if x1 < x2 , then F (x1 ) ≤ F (x2 ); ∗ limx→−∞ F (x) = 0 and limx→∞ F (x) = 1; ∗ F is continuous from the right, i.e., F (x) = F (x+ ). — if the df of a random variable X is known, then we can derive the probability of X belonging to any interval: ∗ Pr(X > x) = 1 − F (x); ∗ Pr(x1 < X ≤ x2 ) = F (x2 ) − F (x1 ); ∗ Pr(X < x) = F (x− ); ∗ Pr(X = x) = F (x) − F (x−1 ). Now we discuss two classes of random variables: 1.2 Discrete random variables A random variable X is discrete if there are at most countably many possible values for X. For example, the above random variable counting the number of heads is a discrete one. • For a discrete random variable having values among {xi }i∈N , its distribution function can be calculated as X Pr(X = xi ). F (x) = {i:xi ≤x} Clearly, this df must be a step function and discontinuous. • A discrete random variable can also be characterized by its probability function (or pf ) defined as f (x) = Pr(X = x) for x ∈ R. If x is not one of the possible values of X, clearly f (x) = 0. This pf f (x) just measures the likelihood of each particular outcome x. — the relationship between the df and pf for a discrete random variable is f (x) = F (x) − F (x− ) or F (x) = X f (xi ). {i:xi ≤x} • An important discrete random variable: the binomial distribution with parameters n and p is represented by the pf ( Cnx px (1 − p)n−x if x = 0, 1, · · · , n . f (x) = 0 otherwise 2 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou For example, consider n products produced by a firm. Suppose the probability of each product being defective is p and these n products are independently produced. Then f (x) just measures the probability that x of them are defective. From this pf, it is easy to construct the df. 1.3 Continuous random variables A random variable X is continuous if it can take any value on some (bounded or unbounded) interval. For example, the weight of a person in next year, the temperature in tomorrow, and the house price in next month can be regarded as continuous random variables. • Probability density functions: given the distribution function F of a continuous random variable, we define its probability density function (or pdf ) as a nonnegative function f satisfying Z x f (t)dt F (x) = −∞ for all x ∈ R. — if F is differentiable, then f (x) = F 0 (x). — given the probability density function f , we can calculate Z b f (x)dx. Pr(a < X ≤ b) = a — for a continuous random variable with continuous distribution functions, Pr(X = x) = 0 for any x ∈ R. • An examples: the uniform distribution on [a, b]: ( 1 b−a if x ∈ [a, b] f (x) = 0 otherwise and F (x) = ⎧ ⎪ ⎨ 1 ⎪ ⎩ x−a b−a 0 3 if x > b if x ∈ [a, b] . if x < a MSc Maths and Statistics 2008 Department of Economics UCL 1.4 Chapter 6: Random Variables and Distributions Jidong Zhou Functions of a random variable Given the distribution of X, we want to know the distribution of Y = h(X) where h(·) is a function. • X is a discrete random variable: if g(y) is the probability function of Y , then g(y) = Pr(Y = y) = Pr [h(X) = y] X = f (x). x:h(x)=y • X is a continuous random variable: if G(y) is the distribution function of Y , then G(y) = Pr(Y ≤ y) = Pr(h(X) ≤ y) Z f (x)dx. = {x:h(x)≤y} If G(y) is a differentiable function, then the pdf of Y is g(y) = G0 (y). — example: X is the uniform distribution on [−1, 1] and Y = X 2 . Then for 0 ≤ y ≤ 1, G(y) = Pr(X 2 ≤ y) Z √y f (x)dx = √ − y √ y. = For y > 1, G(y) = 1; and for y < 0, G(y) = 0. The pdf of Y on (0, 1] is 1 g(y) = √ . 2 y — if h is strictly monotonic function, then the pdf of Y can be directly calculated as ¯ ¯ £ −1 ¤ ¯ dh−1 (y) ¯ ¯ g(y) = f h (y) ¯¯ dy ¯ £ ¤ f h−1 (y) = |h0 [h−1 (y)]| where h−1 is the inverse function of h. The second equality is because of the derivative rule for the inverse function. We prove this result when h is strictly increasing (the proof is similar if h is strictly decreasing): ¤ £ G(y) = Pr X ≤ h−1 (y) Z h−1 (y) f (x)dx. = −∞ 4 MSc Maths and Statistics 2008 Department of Economics UCL So Chapter 6: Random Variables and Distributions Jidong Zhou £ ¤ dh−1 (y) . g(y) = G0 (y) = f h−1 (y) dy The second equality is using the Lebnitz’s rule we have introduced in Chapter 2. Exercise 1 (i) Suppose that the pdf of a random variable X is ( x 2 if 0 < x < 2 . f (x) = 0 otherwise Determine the df and pdf of the new random variable Y = X(2 − X). (ii) Suppose the pdf of a random variable X is ( e−x if x > 0 . f (x) = 0 otherwise √ Determine the pdf of Y = X. (iii) Suppose X has a continuous distribution function F , and let Y = F (X). Show that Y has a uniform distribution on [0, 1]. (This transformation from X to Y is called the probability integral transformation.) 1.5 Moments The distribution of a random variable contains all of the probabilistic information about it. However, it is usually cumbersome to present the entire distribution. Instead, some summaries of the distribution can be useful for giving people a rough idea of how the distribution looks like. The most commonly used summaries are the moments of the random variable. • Expectation — for a discrete random variable X with pf f having positive values on {xi }, its expectation is X xi f (xi ). E(X) = i When X has infinitely many values, this series may not converge.1 We say E(X) exists if and only if X |xi | f (xi ) < ∞. i This condition guarantees that 1 P i xi f (xi ) For example, if f (n) = kn1 2 for n = 1, 2, · · · , where k = S in Chapter 1), then ∞ n=1 nf(n) does not converge. 5 converges. S∞ 1 n=1 n2 (which converges as we have confirmed MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou — for a continuous random variable X with pdf f , its expectation is Z ∞ E(X) = xf (x)dx. −∞ Similarly, this integration may not be well defined.2 We say E(X) exists if and only if Z ∞ |x| f (x)dx < ∞. −∞ — the expectation of X is also called the expected value of X or the mean of X. It can be regarded as the center of gravity of the distribution of X, but not necessarily the central position of the distribution. — examples: (i) the expectation of the uniform distribution on [a, b] is Z b a a+b x dx = . b−a 2 (ii) the expectation of the binomial distribution is n X x=0 xCnx px (1 − p)n−x = np. — some properties of the expectation (we assume all expectations exist): ∗ for scalars a and b, E(a + bX) = a + bE(X). ∗ for two random variables X1 and X2 , we have E(X1 + X2 ) = E(X1 ) + E(X2 ). ∗ if h(X) = h1 (X) + h2 (X) is a function of X, then E(h(X)) = E(h1 (X)) + E(h2 (X)). ∗ E[h(X)] = Z ∞ h(x)f (x)dx, −∞ but E[h(X)] is in general not equal to h [E(X)] except when h is a linear function. 2 For example, for the Cauchy distribution which has pdf f (x) = one can verify that U∞ −∞ 1 , π(1 + x2 ) xf (x)dx does not exists. 6 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou • Variance — the variance of a distribution is given by: ¤ £ V ar(X) = E (X − E(X))2 = E(X 2 ) − E(X)2 . It is also often denoted by σ 2 . (The variance of a distribution may also not exist.) The square root of the variance is called the standard deviation and often denoted p by σ = V ar(x). — it measures the spread or dispersion of the distribution around its mean. — examples: (i) the variance of the uniform distribution on [a, b] is µ ¶ Z b 2 a+b 2 x dx − 2 a b−a µ ¶ 2 2 a+b 2 a + ab + b − = 3 2 2 (b − a) . = 12 (ii) the variance of the binomial distribution is np(1 − p). — some properties of the variance: ∗ V ar(c) = 0 where c is a constant; ∗ V ar(aX + b) = a2 V ar(X) where a and b are scalars. • Two useful inequalities: — Markov Inequality: suppose X is a random variable with Pr(X ≥ 0) = 1. Then for any real number t > 0, E(X) . Pr(X ≥ t) ≤ t ∗ this result can help approximate probability distribution of a random variable when only its mean is known. — Chebyshev Inequality: suppose X is a random variable and V ar(X) exists. Then for any real number t > 0, Pr(|X − E(X)| ≥ t) ≤ V ar(X) . t2 ∗ this is just from the Markov inequality by realizeing |X − E(X)| ≥ t ⇐⇒ [X − E(X)]2 ≥ t2 . 7 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou ∗ this result just says that it becomes less likely for the realization of a random variable when it is farther away from the mean. ∗ see more applications of these two results in Section 4.8 in DeGroot and Schervish (2002). • Higher order moments — kth moment of X is E(X k ) ∗ the mean of X is just the first order moment ∗ again, the kth moment may fail to exist for some distributions. We say the ¯ ¯ kth moment exists if and only if E(¯X k ¯) < ∞ ¯ ¯ ¯ ¯ ∗ if E(¯X k ¯) < ∞ for some positive integer k, then E(¯X j ¯) < ∞ for any positive integer j < k. £ ¤ — kth central moment of X is E (X − E(X))k ∗ the variance of X is just the second order central moment • Moment generating functions The moment generating function (or mgf ) of a random variable X is ψ(t) = E(etX ). If ψ(t) exists for all values of t in an open interval around t = 0, then we have ψ (n) (0) = E(X n ). That is, the nth order derivative of the mgf of X evaluated at t = 0 is just the nth £ ¤2 moment of X. Thus, the mean is ψ 0 (0) and the variance is ψ 00 (0) − ψ 0 (0) . In many cases, using mgf to compute moments is more convenient than using the definition directly. — example: The pdf of X is f (x) = ( e−x if x > 0 . 0 otherwise Compute the mean and variance of X. Z ∞ Z tx −x ψ(t) = e e dx = 0 = 0 1 1−t 8 ∞ e(t−1)x dx MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou for t < 1. So ψ(t) exists for t in an open interval around t = 0. Since ψ 0 (t) = 1 2 and ψ 00 (t) = , (1 − t)2 (1 − t)3 it is ready to show that E(X) = 1 and V ar(X) = 1. — an important result: if the mgf of two random variables X and Y are identical for all values of t in an open interval around t = 0, then the probability distributions of X and Y must be identical. • Quantile and median The p-quantile of a distribution is the value x that divides the distribution in two parts, one with probability p and the other with probability 1 − p. More precisely, if a random variable’s distribution function is F , then its p-quantile is the smallest x such that F (x) ≥ p. In particular, the 0.5-quantile is called the median. That is, the median of a distribution divides it in two parts, each with the equal probability. — examples: (a) if Pr(X = 1) = 0.1, Pr(X = 2) = 0.4, Pr(X = 3) = 0.3, and Pr(X = 4) = 0.2, then the median is 2, and the 0.8-quantile is 3. (b) if a continuous random variable has ⎧ ⎪ ⎨ 1/2 f (x) = 1 ⎪ ⎩ 0 the pdf for 0 ≤ x ≤ 1 for 2.5 ≤ x ≤ 3 , otherwise then the median is 1, and the 0.4-quantile is 0.8. — in some cases, the median can reflect the “average” value of a random variable X better than the mean. For example, if Pr(X = 10) = 0.99 and Pr(X = 10000) = 0.01, then the mean of X is 109.9 which is much higher than 10, but its median is 10 which is closer to the value of X in most of the time. — the median minimizes the mean absolue error E(|X − d|), while the mean minimizes the mean square error E[(X − d)2 ]. — given the pdf f (x), then the value of x for which f (x) is maximum is called the mode of the distribution Exercise 2 (i) Let X be a random variable that can take only the values 0, 1, 2, · · · . Show E(X) = ∞ X n Pr(X = n) = n=0 ∞ X n=1 9 Pr(X ≥ n). MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou (ii) Prove that the variance of the binomial distribution is np(1 − p). (iii) Let X has discrete uniform distribution on the integers 1, · · · , n. Compute the variP ance of X. (You may wish to use the formula nk=1 k2 = 16 n(n + 1)(2n + 1).) 2 Bivariate Distributions In many cases, we need more than one random variable to describe an experiment. This part consider the bivariate case. Let (X, Y ) be a pair of random variables. We first study their joint distribution. 2.1 Joint distributions • The discrete case: if both X and Y are discrete random variables, the joint probability functions is f (x, y) = Pr(X = x, Y = y). If (x, y) is not the value of X and Y , f (x, y) = 0. The pf is always non-negative and satisfies X f (x, y) = 1. The joint distribution function is now X F (x, y) = f (xi , yj ). xi ≤x,yj ≤y • The continuous case: if X and Y are continuous random variables, the joint distribution function is F (x, y) = Pr(X ≤ x, Y ≤ y) for any (x, y) ∈ R2 . It is nondecreasing in each argument and satisfies lim F (x, y) = 0 x→−∞ y→−∞ lim F (x, y) = 1. x→∞ y→∞ The joint probability density function is a nonnegative function f defined on R2 such that Z b Z a f (x, y)dxdy F (a, b) = for any (a, b) ∈ R2 . −∞ −∞ If F (x, y) is twice differentiable, then the pdf is f (x, y) = 10 ∂ 2 F (x, y) . ∂x∂y MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou And we can calculate Z Pr(a < X ≤ b, c < Y ≤ d) = c dZ b f (x, y)dxdy. a • Example: Suppose the joint pdf of X and Y is ( cx2 y for x2 ≤ y ≤ 1 . f (x, y) = 0 otherwise Determine the value of c and then calculate Pr(X ≥ Y ). First of all, f must satisfy Z ∞ −∞ which implies c It is ready to solve Z 1 −1 Z 1 Z ∞ f (x, y)dxdy = 1, −∞ x2 ydydx = x2 c 2 Z 1 −1 x2 (1 − x4 )dx = 1. c= 21 . 4 Z 1Z x The probability 21 Pr(X ≥ Y ) = 4 0 x2 ydydx = x2 3 . 20 • The expectation of a function of two random variables: Z ∞Z ∞ E [h(X, Y )] = h(x, y)f (x, y)dxdy. −∞ −∞ Exercise 3 Suppose that the joint pdf of X and Y is ( c(x2 + y) for 0 ≤ y ≤ 1 − x2 f (x, y) = . 0 otherwise Determine the value of c and then calculate Pr(Y ≤ X + 1). 2.2 Marginal distributions Given the joint distribution function of X and Y , we want to know the distribution function of each random variable. This is called the marginal distribution. In general, given F (x, y), the marginal distribution function of X is F1 (x) = Pr(X ≤ x, Y ≤ ∞), and that of Y is F2 (y) = Pr(X ≤ ∞, Y ≤ y). 11 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou • The discrete case: the marginal probability function of X is f1 (x) = X f (x, y), X f (x, y). y and that of Y is f2 (y) = x Then, the marginal distribution function of X is F1 (x) = X f1 (xi ) X f2 (yj ). xi ≤x and that of Y is F2 (y) = yj ≤y • The continuous case: the marginal distribution function of X is Z ∞Z x F1 (x) = Pr(X ≤ x, Y ≤ ∞) = f (x̃, y)dx̃dy, −∞ −∞ and the marginal probability density function of X is Z ∞ f1 (x) = f (x, y)dy. −∞ Similarly, F2 (y) = Pr(X ≤ ∞, Y ≤ y) = and f2 (y) = Z ∞ Z ∞ −∞ Z y f (x, ỹ)dỹdx −∞ f (x, y)dx. −∞ • Example: Suppose the joint pdf of X and Y is ( 1 for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 f (x, y) = . 0 otherwise Then the marginal pdf of X is f1 (x) = Z 1 f (x, y)dy = 1 0 and the marginal df of X is F1 (x) = x. 12 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou • Although the marginal distribution of X and Y can be derived from their joint distribution, it is usually impossible to reconstruct their joint distribution from their marginal distributions without additional information. (See the exceptional case below where two random variables are independent.) • The moments of each random variable can be calculated by using its marginal distribution. Since it is very straightforward, we will not present the details. 2.3 Conditional distributions We have encountered the concept of conditional probability before. We now apply it to distribution functions. Suppose we know the joint distribution of a pair of random variables (X, Y ). In general, we can derive the revised probability of Y ∈ B conditional on that we have learned X ∈ A as follows: Pr(Y ∈ B, X ∈ A) Pr(Y ∈ B|X ∈ A) = Pr(X ∈ A) if Pr(X ∈ A) > 0. Both the numerator and the denominator can be computed from the joint distribution of X and Y . From now on, we focus on conditional distribution functions (i.e., B has the form of Y ≤ y and A is a singleton set). • The discrete case: given the joint probability function f (x, y), the probability function of Y conditional on X = x is f2 (y|x) ≡ Pr(Y = y|X = x) Pr(Y = y, X = x) = Pr(X = x) f (x, y) . = f1 (x) It measures the revised probability of Y = y conditional on X = x. Then the distribution function of Y conditional on X = x is P yj ≤y f (x, yj ) F2 (y|x) = . f1 (x) The conditional distribution function of X can be similarly derived. • The continuous case: since Pr(X = x) = 0 for a continuous random variable, we can derive the conditional distribution of Y in the following way: Pr(Y ≤ y|x < X ≤ x + ∆) = 13 F (x + ∆, y) − F (x, y) . F1 (x + ∆) − F1 (x) MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou Then we divide both numerator and denominator by ∆ and then let ∆ tend to zero. This limit operation yields the conditional distribution function of Y : F2 (y|x) ≡ Pr(Y ≤ y|X = x) = = ∂F (x, y)/∂x dF1 (x)/dx ∂F (x, y)/∂x . f1 (x) Then the conditional probability density function of Y is f2 (y|x) = f (x, y) ∂F2 (y|x) = ∂y f1 (x) whenever F2 (y|x) is differentiable with respect to y. • In either case, we have f (x, y) = f2 (y|x)f1 (x) = f1 (x|y)f2 (y). That is, if we know the marginal pdf and the conditional pdf, then we can reconstruct the joint pdf. Furthermore, we also have f1 (x|y) = = f2 (y|x)f1 (x) f2 (y) f (y|x)f1 (x) R 2 . x f2 (y|x)f1 (x)dx (In the discrete case, the integration in the denominator should be replaced by the sum.) This is the Bayes’ Theorem for random variables. Exercise 4 Suppose the joint pdf of X and Y is ( 3 16 (4 − 2x − y) for x > 0, y > 0 and 2x + y < 4 . f (x, y) = 0 otherwise Determine the conditional pdf of Y for every given value of X, and compute Pr(Y ≥ 2|X = 0.5). 2.4 Conditional moments Our exposition is for continuous random variables, but all results also hold for discrete ones. Consider X and Y with the joint pdf f (x, y). • Conditional expectation: 14 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou — the conditional expectation of Y given X = x is Z ∞ E(Y |x) = yf2 (y|x)dy −∞ where f2 (y|x) is the conditional pdf of y. When x changes, this conditional expectation will also change. — the conditional expectation of Y given X, denoted by E(Y |X), is a function of X and so a random variable. If h(x) ≡ E(Y |x), then E(Y |X) = h(X) and its distribution can be derived from X’s marginal distribution according to this function relationship. — then if all related expectations exist, we have Z ∞ E(Y |x)f1 (x)dx E [E(Y |X)] = −∞ Z ∞Z ∞ = yf2 (y|x)f1 (x)dxdy −∞ −∞ Z ∞Z ∞ = yf (x, y)dxdy −∞ = E(Y ), −∞ where the second step uses the definition of E(Y |x). This result is called the law of iterated expectations. — similarly, E [E(r(X, Y )|X)] = E(r(X, Y )) for any function r. • Condition variance: — the conditional variance of Y given X = x is V ar(Y |x) = E{[Y − E(Y |x)]2 |x} = E(Y 2 |x) − [E(Y |x)]2 . Exercise 5 (i) Prove (a) if E(Y |X) = 0 then E(Y ) = 0; and (b) if E(Y |X) = 0 then E(XY ) = 0. (ii) Suppose the distribution of X is symmetric with respect to the point x = 0 and all moments of X exist. Suppose E(Y |X) = aX + b for constants a and b. Show that X 2m and Y are uncorrelated for m = 1, 2, · · · . (iii) Show that V ar(Y ) = E [V ar(Y |X)] + V ar[E(Y |X)]. 15 MSc Maths and Statistics 2008 Department of Economics UCL 2.5 Chapter 6: Random Variables and Distributions Jidong Zhou Independent random variables • Two random variables X and Y are independent iff, for any two subsets A and B of R, we have Pr(X ∈ A and Y ∈ B) = Pr(X ∈ A) Pr(X ∈ B). • This statement is equivalent to that two random variables X and Y are independent iff F (x, y) = F1 (x)F2 (y) or f (x, y) = f1 (x)f2 (y) or f1 (x|y) = f1 (x) if f2 (y) > 0. — the last statement just says that knowing the realized value of Y does not change our probability judgment of X (vice versa). — the last statement also indicates that if X and Y are independent random variables, then the set of all (x, y) pairs where f (x, y) > 0 should be rectangular. • Example: suppose the joint pdf of X and Y is ( 2e−(x+2y) for x ≥ 0 and y ≥ 0 f (x, y) = . 0 otherwise Are X and Y independent to each other? It is ready to calculate that f1 (x) = e−x for x ≥ 0 and f1 (x) = 0 for x < 0; and f2 (y) = 2e−2y for y ≥ 0 and f2 (y) = 0 for y < 0. Thus, f (x, y) = f1 (x)f2 (y) and so X and Y are indeed independent.3 • Properties: if X and Y are two independent random variables, then — E(XY ) = E(X)E(Y ) — V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) — E(X|Y ) = E(X) and E(Y |X) = E(Y ) where Z +∞ E(X|Y = y) = xf1 (x|y)dx, −∞ Z +∞ E(Y |X = x) = yf2 (y|x)dy. −∞ 3 In effect, for two continuous random variables, they are independent iff f (x, y) = g1 (x)g2 (y) for all x and y, where gi are nonnegative functions. That is, the joint pdf can be factorized into the product of a nonnegative function of x and a nonnegative function of y. 16 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou — h(X) and g(Y ) are also independent for any two functions h and g, and so E [h(X)g(Y )] = E [h(X)] E [g(Y )] — if ψ x and ψ y are the mgf of X and Y , respectively, then the mgf of Z = X + Y is ψz = ψx ψy Exercise 6 (i) suppose the joint pdf of X and Y is ( kx2 y 2 for x2 + y 2 ≤ 1 f (x, y) = . 0 otherwise Show that X and Y are not independent. (ii) Suppose X1 and X2 are two independent variables and their mgf are ψ 1 (t) and ψ 2 (t), respectively. Let Y = X1 + X2 and let its mgf be ψ(t). Show that, if all mgf exist, then ψ(t) = ψ 1 (t)ψ 2 (t). (iii) Let (X, Y, Z) be independent random variables such that: E(X) = −1 and V ar(X) = 2 E(Y ) = 0 and V ar(Y ) = 3 E(Z) = 1 and V ar(Z) = 4 Let T = 2X + Y − 3Z + 4 U = (X + Z)(Y + Z) Find E(T ), V ar(T ), E(T 2 ) and E(U ). 2.6 Covariance and correlations These two concepts are used to measure how much two random variables X and Y depend on each other. Let E(X) and E(Y ) be the expectations of X and Y, respectively. (Notice that they are calculated by using X and Y ’s marginal distributions.) • The covariance of X and Y : Cov(X, Y ) = E [(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y ) — if V ar(X) < ∞ and V ar(Y ) < ∞, then Cov(X, Y ) will exist and be finite. 17 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou — the sign of the covariance indicates the direction of covariation of X and Y . But its magnitude is also influenced by the overall magnitudes of X and Y . • The correlation of X and Y : Cov(X, Y ) ρ(X, Y ) = p V ar(X)V ar(Y ) whenever both variances are nonzero. — ρ is between −1 and 1.4 — X and Y are said to be positively correlated if ρ(X, Y ) > 0; they are negatively correlated if ρ < 0; and they are uncorrelated if ρ = 0. • Properties: — if X and Y are independent, then Cov(X, Y ) = 0 and ρ(X, Y ) = 0. But the converse of this statement is not true.5 — if Y = aX + b for some constants a and b, then ρ(X, Y ) = 1 if a > 0 and ρ(X, Y ) = −1 if a < 0. The converse is also true. — the correlation only measures the linear relationship between X and Y . A large |ρ| means that X and Y are close to being linearly related and hence are closely related. But when |ρ| is small, X and Y could still be closely related according to some nonlinear relationship. (See the example in footnote 5.) — if both V ar(X) and V ar(Y ) are finite, then V ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X, Y ). (1) Furthermore, V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y ), and V ar à n X i=1 Xi ! = n X i=1 V ar(Xi ) + X Cov(Xi , Xj ). i6=j This result is based on the Schwarz inequality [E(XY )]2 ≤ E(X 2 )E(Y 2 ) for any two random variables. if the right-hand side is finite, then the equality will hold iff there are constants a and b such that aX + bY = 0 with probability 1. 5 That is, even if two random variables are uncorrelated, they can be dependent. For example, X is a discrete random variable with Pr(X = 1) = Pr(X = 0) = Pr(X = −1) = 1/3, and Y = X 2 . They are clearly dependent but one can check that Cov(X, Y ) = 0. 4 18 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou Exercise 7 (i) Suppose that the pair (X, Y ) are uniformly distributed on the interior of the circle of radius 1. Compute Cov(X, Y ). (ii) Suppose X has uniform distribution on the interval [−2, 2] and Y = X 6 . Show X and Y are uncorrelated. (iii) Prove the result (1), and Cov(aX + bY + c, Z) = aCov(X, Z) + bCov(Y, Z) where a, b, c are constants and all covariances exist. (iv) Suppose that X and Y have the same variance, and the variances of X +Y and X −Y also exist. Show X + Y and X − Y are uncorrelated. 2.7 Multivariate distributions All of the above concepts and results can be readily extended to the case with more than two random variables. • Let X = [X1 X2 · · · Xn ]T be an n × 1 column vector of random variables. • The joint distribution function is: F (x) = Pr(X ≤ x) = Pr(X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn ) • The joint pdf in the continuous case is: f (x) = • The marginal pdf is: f1,··· ,k (x1 , · · · , xk ) = Z ∂ n F (x) . ∂x1 ∂x2 · · · ∂xn ∞ ··· −∞ | {z Z ∞ f (x1 , · · · , xn )dxk+1 · · · dxn . } −∞ n−k • Without loss of generality, the joint pdf of the last n − k random variables conditional on the first k < n random variables’ realized values is f (x1 , · · · , xn ) . f1,··· ,k (x1 , · · · , xk ) • The n random variables are independent iff F (x1 , · · · , xn ) = F (x1 ) · · · F (xn ) or f (x1 , · · · , xn ) = f (x1 ) · · · f (xn ). 19 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou • Some of the most important moments are the following: — expectation: E(X) = ³ E(X1 ) · · · E(X2 ) ´T where each of the expectations inside the vector are performed using the marginal distributions. For example: Z +∞ x1 f1 (x1 )dx1 . E(X1 ) = −∞ — covariance matrix: i h Σ ≡ V ar(X) = E (X − E(X)) (X − E(X))T ⎞ ⎛ σ 11 · · · σ 1n ⎜ . .. ⎟ .. . = ⎜ . . ⎟ ⎠ ⎝ . σ n1 · · · σ nn = E(XX T ) − E(X)E(X)T . — for a constant column vector a, £ ¤ V ar(aT X) = E (aT X − aT E(X))2 h© ª2 i = E aT (X − E(X)) h i = E aT (X − E(X)) (X − E(X))T a = aT Σa This is a quadratic form. Since the variance is always non-negative according to its definition, it yields aT Σa ≥ 0 for any nonzero a. That is, the covariance matrix Σ is positive semidefinite. — for a constant matrix A, we have V ar(AX) = AΣA0 . 2.8 Functions of multiple random variables We focus on the case with continuous random variables. • Suppose the joint pdf of n random variables X1 , · · · , Xn is f (x1 , · · · , xn ), and a new random variable is constructed as Y = h(X1 , · · · , Xn ). Then what is the pdf of Y . We can compute the df of Y first: Z Z G(y) = Pr(Y ≤ y) = · · · f (x1 , · · · , xn )dx1 · · · dxn A(y) 20 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou where A(y) = {(x1 , · · · , xn ) ∈ Rn : h(x1 , · · · , xn ) ≤ y}. If G(y) is differentiable, we can derive g(y) = G0 (y). • Examples: suppose n independent random variables X1 , · · · , Xn share the same distribution F which is differentiable and has the density function f . Let Ymax = max{X1 , · · · , Xn } and Ymin = min{X1 , · · · , Xn }. Determine the pdf of Ymax and Ymin . Gmax (y) = Pr(Ymax ≤ y) = Pr(X1 ≤ y, · · · , Xn ≤ y) = F (y)n , and so gmax (y) = nF (y)n−1 f (y). Gmin (y) = Pr(Ymin ≤ y) = 1 − Pr(Ymin > y) = 1 − [1 − F (y)]n , and so gmin (y) = n [1 − F (y)]n−1 f (y). Exercise 8 (i) Revisit the above example on Ymax and Ymin . Derive the joint pdf of (Ymax , Ymin ). (ii) Suppose X1 and X2 are two independent random variables and each distributes over [0, 1] uniformly. Find the pdf of Y = X1 + X2 . • Now consider the case with n new random variables: Y1 = h1 (X1 , · · · , Xn ) .. . (2) Yn = hn (X1 , · · · , Xn ). We want to derive the joint pdf of Y1 , · · · , Yn . To do that, we need assumptions about the functions hi . If S is the subset of Rn such that Pr((X1 , · · · , Xn ) ∈ S) = 1 and T is the subset of Rn such that Pr((Y1 , · · · , Yn ) ∈ T ) = 1, we assume that the transformation from S to T by all hi is a one-to-one correspondence. That is, given a point (y1 , · · · , yn ) in T , we have a unique preimage (x1 , · · · , xn ) in S. With this assumption, we can solve (2) in terms of X1 = s1 (Y1 , · · · , Yn ) .. . Xn = sn (Y1 , · · · , Yn ). 21 (3) MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou Construct the determinant ¯ ¯ ¯ ¯ J = ¯¯ ¯ ¯ ∂s1 ∂y1 ··· .. . ∂s1 ∂yn ∂sn ∂y1 ··· ∂sn ∂yn .. . .. . ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ for every point (y1 , · · · , yn ) ∈ T . We call it the Jacobian of the transformation in (3). Then the joint pdf of the n new random variables is ( f (s1 , · · · , sn ) |J| for (y1 , · · · , yn ) ∈ T , g(y1 , · · · , yn ) = 0 otherwise where |J| is the absolute value of the Jacobian.6 • Examples: suppose the joint pdf of X1 and X2 is ( 4x1 x2 for 0 < x1 , x2 < 1 f (x1 , x2 ) = . 0 otherwise Let Y1 = X1 /X2 and Y2 = X1 X2 . Find the joint pdf of Y1 and Y2 . It is easy to see that y1 > 0 and y2 ∈ (0, 1). For each pair of such y1 and y2 , we can derive √ y1 y2 , r y2 = . y1 x1 = x2 Then the Jocabian is Therefore, ¯ q ¯ 1 y2 ¯ 2 y1 ¯ q ¯ y ¯ − 2y11 y21 g(y1 , y2 ) = ( 1 2 q y1 y2 √1 2 y1 y2 ¯ ¯ ¯ ¯= 1 . ¯ 2y1 ¯ 2 yy21 for y1 > 0 and 0 < y2 < 1 0 otherwise . • A technique about the sum (or the difference) of two random variables: Suppose we want to know the pdf of Y = X1 + X2 or X1 − X2 . Sometimes it is quite hard to calculate G(y) = Pr(X1 + X2 ≤ y) or Pr(X1 − X2 ≤ y). In that case, we can introduce another new random variable Z = X2 . Then we first derive the joint pdf of (Y, Z) and then find the marginal distribution of Y . 6 In particular, if Y = AX, where X and Y are the vectors of random variables and A is a n × n nonsigular matrix, then 1 f (A−1 y). g(y) = |det A| 22 MSc Maths and Statistics 2008 Department of Economics UCL Chapter 6: Random Variables and Distributions Jidong Zhou Exercise 9 Suppose that X1 and X2 are independent and share the same distribution ( e−x for x > 0 . f (x) = 0 otherwise Find the pdf of Y = X1 − X2 . 23