Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.2 Joint probability mass functions 4.2 49 Joint probability mass functions When the random experiment we are interested in involves more than one random variable, it is usually better to analyse all variables together instead of separately, because they may be interconnected to each other. In order to do this, we have to deal with joint distributions of two or more random variables, as well as conditional distributions and the relationships between them. When we analyse a single random variable we talk about the “univariate case”, while when simultaneously analysing two random variables we talk about the “bivariate case”, and in general, when the variables in play are two or more we talk about the “multivariate case”. Bivariate case Definition 4.19. Let X, Y be discrete random variables defined on the same sample space, the joint probability mass function (joint PMF) of X and Y is the map pX,Y : R2 ! R defined by pX,Y (x, y) = P(X = x, Y = y), 8x, y 2 R. (4.6) The right member in (4.6) employs the notation P(X = x, Y = y) ⌘ P({X = x} \ {Y = y}), that will henceforth be used for the probability of an intersection of two or more events. Joint probability mass functions satisfy the same three properties that hold in the univariate case. Proposition 4.20. Let X, Y be discrete random variables defined on the same sample space, then their joint probability mass function satisfies the following properties: (i) pX,Y 0; 50 4. Discrete random variables (ii) {(x, y) 2 R2 : pX,Y (x, y) 6= 0} is countable; (iii) P (x,y)2R2 pX,Y (x, y) = 1. Proof.(i)-(ii) The first two properties are trivially satisfied, by the definition in (4.6) and noticing that the cartesian product of two countable sets is countable. (iii) By definition we have X pX,Y (x, y) = (x,y)2R2 X P(X = x, Y = y), (x,y)2R2 pX,Y (x,y)>0 but the events {X = x} \ {Y = y} for all di↵erent points (x, y) of R2 such that pX,Y (x, y) > 0 form a partition of the sample space, hence the respective probabilities sum up to one. In the multivariate case, the probability mass functions of the single random variables are generally referred to as marginal probability mass functions. The appellation comes from the fact that, when displaying in a table the joint probabilities of two random variables, the joint probabilities are in the central rows and columns, while the marginal probabilities are usually arranged in one additional row and one additional column appended to the bottom and righ-hand side of the table respectively, whose elements are onbtained by summing up the figures in the above subtended column and the left subtended row respectively. Table 4.2 shows the disposition for two random variables. Given the joint probability mass function, one can easily obtain the marginals by summing up the joint probability while fixing one argument. Proposition 4.21. Let X, Y be discrete random variables defined on the same sample space, then pX (x) = X y2R pX,Y (x, y), 8x 2 R, (4.7) 4.2 Joint probability mass functions x1 y1 .. . x2 pX,Y (x1 , y1 ) 51 ··· xn pX,Y (x2 , y1 ) · · · pX,Y (xn , y1 ) ··· ··· ··· ··· pX (x1 ) ··· ··· pX (xn ) pY pY (y1 ) .. . ym pX,Y (x1 , ym ) pX,Y (x2 , ym ) · · · pX,Y (xn , ym ) pY (ym ) pX 1 Table 4.1: Table of joint and marginal PMFs for two discrete r.v.s X, Y taking values in {x1 , . . . , xn } and {y1 , . . . , ym } respectively. and analogously pY (y) = X pX,Y (x, y), x2R 8y 2 R. Unfortunately, the converse is not true, that is: the joint PMF determines the marginal PMFs, but the marginal PMFs are not enough to determine the joint PMF. The reason is that the marginal PMFs provide no information about the relationships between the random variables. Analogously to the uinivariate case, we can define the joint cumulative distribution function of two or more random variables, and compute it in terms of the joint PMF. Definition 4.22. Let X, Y be discrete random variables defined on the same sample space, the joint cumulative distribution function of X and Y is the map FX,Y : R2 ! [0, 1] defined by, for all a, b 2 R, FX,Y (a, b) = P(X a, Y b) X = pX,Y (x, y). (4.8) xa,yb Given the joint PMF of two random variables, it is possible to compute the probability of any event that depends on the two variables. 52 4. Discrete random variables Proposition 4.23. Let X, Y be discrete random variables defined on the same sample space. Then, for any A ✓ R2 , we have P((X, Y ) 2 A) = X pX,Y (x, y). (4.9) (x,y)2A Note that any event determined by X and Y can be written in the form {(X, Y ) 2 A} for some A ✓ R2 . For instance: {X = Y } = {(X, Y ) 2 A}, {X > Y } = {(X, Y ) 2 A}, where A = {(x, x) : x 2 R}, where A = {(x, y) 2 R2 : x > y}. Multivariate case The same definitions and properties stated for the bivariate case are extended to the multivariate case. Definition 4.24. Let X1 , X2 , . . . , Xn be m discrete random variables defined on the same sample space, the joint probability mass function of X1 , . . . Xn is the map pX1 ,...Xn : Rn ! R defined by pX1 ,...Xn (x1 , . . . , xn ) = P(X1 = x1 , . . . , Xn = xn ), 8x1 , . . . , xn 2 R. Then, Proposition 4.20 and Proposition 4.23 are straightforwardly extended to the multivariate case. Regarding the marginals, we not only have the univariate ones for the single variables, but also the bi- and multivariate ones for any sub-collection of variables. In general we have variables, for 1 m n n m m-variate marginal PMFs for n random 1. Proposition 4.25. Let X1 , X2 , . . . , Xn be m discrete random variables defined on the same sample space, 1 m n, and 1 k1 < k2 < . . . < km n. The joint probability mass function pXk1 ,...,Xkm of Xk1 , . . . , Xkm is given by pXk1 ,...,Xkm (xk1 , . . . , xkm ) = X (y1 ,...,yn )2Rn : yki =xki , 1im pX1 ,...Xn (y1 , . . . , yn ). 4.3 Conditional proability mass functions 4.3 53 Conditional proability mass functions It remains to discuss how the information on the value taken by one random variable influences the probability of the possible values for the other random variables, that is the analogous of conditional probabilities. Definition 4.26. Let X, Y be discrete random variables defined on the same sample space, and x 2 R such that pX (x) > 0. The conditional probability mass function of Y given X = x is the the map pY |X (·|x) : R ! R defined by pY |X (y|x) = P(Y = y|X = x), 8y 2 R. As we know from Definition 3.2, for any y 2 R, the conditional probability of {Y = y} given {X = x} is computed as P(Y = y|X = x) = P(Y = y, X = x) , P(X = x) thus we get pY |X (y|x) = pX,Y (x, y) . pX (x) (4.10) The analogous of Proposition 3.12 for PMFs holds as follows. Proposition 4.27. Let X, Y be discrete random variables defined on the same sample space, and x 2 R such that pX (x) > 0, then the conditional probability mass function of Y given X = x is a prabability mass function, that is it satisfies the properties (i)-(iii) in Proposition 4.4. Proof. The properties (i)-(ii) are trivially satisfied by definition and Proposition 4.20. To verify (iii), it is enough to observe that X y2R pY |X (y|x) = X P(Y = y|X = x), y2R where P(·|X = x) is a probability measure by Proposition 3.12 and the events {Y = y} for all y 2 R such that P(Y = y|X = x) > 0 form a partition of the sample space. 54 4. Discrete random variables Remark 4.28. Proposition 4.27 implies that all properties of PMFs also hold for conditional PMFs. For instance: P(Y 2 A|X = x) = 4.4 X y2A pY |X (y|x) 8A ✓ R, 8x 2 R, pX (x) > 0. Independence of random variables As we discussed independence of events, one can ask whether the value taken by one random variable a↵ects the probability distribution of the other random variables. Independence of random variables is defined by means of the concept of independence of events. Definition 4.29. Two random variables X, Y defined on the same sample space are said to be independent if, for all A, B ✓ R the events {X 2 A} and {Y 2 B} are independent, that is if P(X 2 A, Y 2 B) = P(X 2 A) P(Y 2 B), 8A, B ✓ R. Definition 4.29 holds in general for any two random variables on the same sample space. In the particular case of discrete random variables, we can equivalently express independence in terms of the PMFs. Proposition 4.30. Let X, Y be discrete random variables defined on the same sample space, then: X, Y are independent if and only if pX,Y (x, y) = pX (x)pY (y), 8x, y 2 R (4.11) Proof. ()). If X, Y are independent, then (4.11) holds by taking A = {x} and B = {y} in Definition 4.29. 4.4 Independence of random variables 55 ((). If (4.11) holds, then taken any subsets A, B ✓ R we have P(X 2 A, Y 2 B) = P((X, Y ) 2 A ⇥ B) X = pX,Y (x, y) (x,y)2A⇥B = X pX (x)pY (y) (x,y)2A⇥B = X pX (x) x2A ! X pY (y) y2B ! = P(X 2 A)P(Y 2 B). An equivalent condition for independence of random variables is that the conditional PMFs are in fact just the marginal PMFs. This is accordance with Definition 3.6 of independent events. Proposition 4.31. Let X, Y be discrete random variables defined on the same sample space, then X, Y are independent if and only if either of the following holds: (a) for all x 2 R, the conditional PMF of Y given {X = x} coincides with the marginal PMF of Y , that is pY |X (y|x) = pY (y), 8y 2 R; (b) for all y 2 R, the conditional PMF of X given {Y = y} coincides with the marginal PMF of X, that is pX|Y (x|y) = pX (x), 8x 2 R. Proof. We prove both implications only for (a), since (b) is exactly analogous. ()). If X, Y are independent, then by the definition of conditional PMF in (4.10) and Proposition 4.30, for any x 2 R such that pX (x) > 0 we get pY |X (y|x) = pX,Y (x, y) pX (x)pY (y) = = pY (y). pX (x) pX (x) 56 4. Discrete random variables ((). If the property (a) holds, and taking x 2 R such that pX (x) > 0, we obtain pX,Y (x, y) = pY |X (y|x)pX (x) = pY (y)pX (x). For x 2 R such that pX (x) = 0, then both the right-hand and the left-hand side members are equal to 0, so (4.11) holds and X, Y are independent. Sometimes one of the random variables of interest is in fact a function of another random variable. Knowing the PMF of the latter, we obtain the PMF of the former, as shown in the following proposition. Proposition 4.32. Let X be a discrete random variable and f : R ! R be a real-valued function1 . Then Y = f (X) is a discrete random variable on the same sample space, with PMF given by pY (y) = X x2f pX (x), 1 ({y}) 8y 2 R. (4.12) Proof. The fact that Y = f (X) is a random variable is trivial, since it is the composition of two functions of which the first is a random variable: X f Y : ⌦ ! R ! R. Then, for any y 2 Y , the PMF of Y in y is pY (y) = P(Y = y) = P(f (X) = y) = P(X 2 f 1 ({y})) = where f 1 X x2f pX (x), 1 ({y}) ({y}) is the inverse image of {y} through f , that is f 1 ({y}) = {x 2 R : f (x) = y}. We then see that independence is preserved through fuction composition. 1 It is enough that f is defined on the range of X, that is f : X(⌦) ! R, where ⌦ is the sample space on which X is defined. 4.4 Independence of random variables 57 Proposition 4.33. Let X, Y be discrete random variables defined on the same sample space and f, g : R ! R be two functions, then: if X, Y are independent, then also f (X), g(Y ) are independent. Proof. For any A, B ✓ R, by independence of X, Y (ref. Definition 4.29), P(f (X) 2 A, g(Y ) 2 B) = P(X 2 f = P(X 2 f 1 1 (A), Y 2 g 1 (B)) (A))P(Y 2 g 1 (B)) = P(f (X) 2 A)P(g(Y ) 2 B). As done with events, also the concept of independence of random variables can be extended to the multivariate case. The following definition holds for all random variables (not just discrete). Definition 4.34. Let X1 , . . . , Xn be random variables defined on the same sample space, they are said to be independent if P(X1 2 A1 , . . . , Xn 2 An ) = P(X1 2 A1 ) · · · P(Xn 2 An ), 8A1 , . . . , An ✓ R. (4.13) Let X1 , X2 . . . be an infinite collection of random variables defined on the same sample space, they are said to be independent if the random variables of any finite sub-collection are independent. Unlike with events, for a finite collection of random variables it is not necessary that all sub-collections satisfy the condition of multiplication of probabilities. Remark 4.35. Let X1 , . . . , Xn be indipendent random variable, then the variables of any sub-collection of them are also independent, that is: for all m < n, 1 k1 , . . . , km n, Xk1 , . . . , Xkm are independent. One can prove it by verifying Definition 4.29 for Xk1 , . . . , Xkm , adding the remaining random variables in the probability to be computed as taking 58 4. Discrete random variables values in R, which is their whole co-domain. For instance, if X1 , . . . , X5 are independent, then P(X1 2 A1 , X2 2 A2 , X4 2 A4 ) = P(X1 2 A1 , X2 2 A2 , X3 2 R, X4 2 A4 , X5 2 R) = P(X1 2 A1 )P(X2 2 A2 )P(X3 2 R)P(X4 2 A4 )P(X5 2 R) = P(X1 2 A1 )P(X2 2 A2 )P(X4 2 A4 ), since {Xi 2 R} = ⌦ for all i. This leads to the following equivalent characterisation of independent random variables. Proposition 4.36. Let X1 , . . . , Xn be random variables defined on the same sample space, then: they are independent if and only if the events {X1 2 A1 ), . . . , {Xn 2 An } are independent for any collection of A1 , . . . , An ✓ R of subsets of the real line. 4.5 Expectation of discrete random variables We now introduce the concept of expected value, which is fundamental in probability theory. The intuitive interpretation of the expected value sees it as the long-term average in repeated experiments. Namely, repeating the random experiment a large number of times, and noting the value of the random variable for each repetition of the experiment, the arithmetic average of all these values approximates the expected value of the random variable: x1 + . . . + xn ⇡ E[X], for large n, n where xi is the value taken by the random variable X at the i-th repetition of the random experiment, and E[X] denotes the expectation of X. We see now its formal definition in the discrete framework. Definition 4.37. Let X be a discrete random variable, the expected value, or expectation or mean, of X is defined by X X E[X] = x pX (x) = x pX (x). x2R x2R, pX (x)>0 (4.14) 4.5 Expectation of discrete random variables 59 In other words, E[X] is a weighted average of the possible values taken by X, where the weights are the respective values of the PMF of X. Note that if the range of X is finite, i.e. X can take only a finite number of possible values, then the sum in (4.14) is a finite sum, which gives a real number. If the range of X is instead countably infinite, then the sum in (4.14) may or may not converge. We say that X has finite expectation if X x2R since (4.15) implies E[X] < 1. |x| pX (x) < 1, (4.15)