Download 4.2 Joint probability mass functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
4.2 Joint probability mass functions
4.2
49
Joint probability mass functions
When the random experiment we are interested in involves more than one
random variable, it is usually better to analyse all variables together instead
of separately, because they may be interconnected to each other. In order
to do this, we have to deal with joint distributions of two or more random
variables, as well as conditional distributions and the relationships between
them.
When we analyse a single random variable we talk about the “univariate
case”, while when simultaneously analysing two random variables we talk
about the “bivariate case”, and in general, when the variables in play are
two or more we talk about the “multivariate case”.
Bivariate case
Definition 4.19. Let X, Y be discrete random variables defined on the same
sample space, the joint probability mass function (joint PMF) of X and Y is
the map pX,Y : R2 ! R defined by
pX,Y (x, y) = P(X = x, Y = y),
8x, y 2 R.
(4.6)
The right member in (4.6) employs the notation
P(X = x, Y = y) ⌘ P({X = x} \ {Y = y}),
that will henceforth be used for the probability of an intersection of two or
more events.
Joint probability mass functions satisfy the same three properties that
hold in the univariate case.
Proposition 4.20. Let X, Y be discrete random variables defined on the
same sample space, then their joint probability mass function satisfies the
following properties:
(i) pX,Y
0;
50
4. Discrete random variables
(ii) {(x, y) 2 R2 : pX,Y (x, y) 6= 0} is countable;
(iii)
P
(x,y)2R2
pX,Y (x, y) = 1.
Proof.(i)-(ii) The first two properties are trivially satisfied, by the definition
in (4.6) and noticing that the cartesian product of two countable sets
is countable.
(iii) By definition we have
X
pX,Y (x, y) =
(x,y)2R2
X
P(X = x, Y = y),
(x,y)2R2
pX,Y (x,y)>0
but the events {X = x} \ {Y = y} for all di↵erent points (x, y) of R2
such that pX,Y (x, y) > 0 form a partition of the sample space, hence
the respective probabilities sum up to one.
In the multivariate case, the probability mass functions of the single random variables are generally referred to as marginal probability mass functions. The appellation comes from the fact that, when displaying in a table
the joint probabilities of two random variables, the joint probabilities are
in the central rows and columns, while the marginal probabilities are usually arranged in one additional row and one additional column appended to
the bottom and righ-hand side of the table respectively, whose elements are
onbtained by summing up the figures in the above subtended column and
the left subtended row respectively. Table 4.2 shows the disposition for two
random variables.
Given the joint probability mass function, one can easily obtain the
marginals by summing up the joint probability while fixing one argument.
Proposition 4.21. Let X, Y be discrete random variables defined on the
same sample space, then
pX (x) =
X
y2R
pX,Y (x, y),
8x 2 R,
(4.7)
4.2 Joint probability mass functions
x1
y1
..
.
x2
pX,Y (x1 , y1 )
51
···
xn
pX,Y (x2 , y1 ) · · · pX,Y (xn , y1 )
···
···
···
···
pX (x1 )
···
···
pX (xn )
pY
pY (y1 )
..
.
ym pX,Y (x1 , ym ) pX,Y (x2 , ym ) · · · pX,Y (xn , ym ) pY (ym )
pX
1
Table 4.1: Table of joint and marginal PMFs for two discrete r.v.s X, Y
taking values in {x1 , . . . , xn } and {y1 , . . . , ym } respectively.
and analogously
pY (y) =
X
pX,Y (x, y),
x2R
8y 2 R.
Unfortunately, the converse is not true, that is: the joint PMF determines
the marginal PMFs, but the marginal PMFs are not enough to determine the
joint PMF. The reason is that the marginal PMFs provide no information
about the relationships between the random variables.
Analogously to the uinivariate case, we can define the joint cumulative
distribution function of two or more random variables, and compute it in
terms of the joint PMF.
Definition 4.22. Let X, Y be discrete random variables defined on the same
sample space, the joint cumulative distribution function of X and Y is the
map FX,Y : R2 ! [0, 1] defined by, for all a, b 2 R,
FX,Y (a, b) = P(X  a, Y  b)
X
=
pX,Y (x, y).
(4.8)
xa,yb
Given the joint PMF of two random variables, it is possible to compute
the probability of any event that depends on the two variables.
52
4. Discrete random variables
Proposition 4.23. Let X, Y be discrete random variables defined on the
same sample space. Then, for any A ✓ R2 , we have
P((X, Y ) 2 A) =
X
pX,Y (x, y).
(4.9)
(x,y)2A
Note that any event determined by X and Y can be written in the form
{(X, Y ) 2 A} for some A ✓ R2 . For instance:
{X = Y } = {(X, Y ) 2 A},
{X > Y } = {(X, Y ) 2 A},
where A = {(x, x) : x 2 R},
where A = {(x, y) 2 R2 : x > y}.
Multivariate case
The same definitions and properties stated for the bivariate case are extended to the multivariate case.
Definition 4.24. Let X1 , X2 , . . . , Xn be m discrete random variables defined
on the same sample space, the joint probability mass function of X1 , . . . Xn
is the map pX1 ,...Xn : Rn ! R defined by
pX1 ,...Xn (x1 , . . . , xn ) = P(X1 = x1 , . . . , Xn = xn ),
8x1 , . . . , xn 2 R.
Then, Proposition 4.20 and Proposition 4.23 are straightforwardly extended to the multivariate case.
Regarding the marginals, we not only have the univariate ones for the
single variables, but also the bi- and multivariate ones for any sub-collection
of variables. In general we have
variables, for 1  m  n
n
m
m-variate marginal PMFs for n random
1.
Proposition 4.25. Let X1 , X2 , . . . , Xn be m discrete random variables defined on the same sample space, 1  m  n, and 1  k1 < k2 < . . . < km  n.
The joint probability mass function pXk1 ,...,Xkm of Xk1 , . . . , Xkm is given by
pXk1 ,...,Xkm (xk1 , . . . , xkm ) =
X
(y1 ,...,yn )2Rn :
yki =xki , 1im
pX1 ,...Xn (y1 , . . . , yn ).
4.3 Conditional proability mass functions
4.3
53
Conditional proability mass functions
It remains to discuss how the information on the value taken by one
random variable influences the probability of the possible values for the other
random variables, that is the analogous of conditional probabilities.
Definition 4.26. Let X, Y be discrete random variables defined on the same
sample space, and x 2 R such that pX (x) > 0. The conditional probability
mass function of Y given X = x is the the map pY |X (·|x) : R ! R defined
by
pY |X (y|x) = P(Y = y|X = x),
8y 2 R.
As we know from Definition 3.2, for any y 2 R, the conditional probability
of {Y = y} given {X = x} is computed as
P(Y = y|X = x) =
P(Y = y, X = x)
,
P(X = x)
thus we get
pY |X (y|x) =
pX,Y (x, y)
.
pX (x)
(4.10)
The analogous of Proposition 3.12 for PMFs holds as follows.
Proposition 4.27. Let X, Y be discrete random variables defined on the
same sample space, and x 2 R such that pX (x) > 0, then the conditional
probability mass function of Y given X = x is a prabability mass function,
that is it satisfies the properties (i)-(iii) in Proposition 4.4.
Proof. The properties (i)-(ii) are trivially satisfied by definition and Proposition 4.20. To verify (iii), it is enough to observe that
X
y2R
pY |X (y|x) =
X
P(Y = y|X = x),
y2R
where P(·|X = x) is a probability measure by Proposition 3.12 and the events
{Y = y} for all y 2 R such that P(Y = y|X = x) > 0 form a partition of the
sample space.
54
4. Discrete random variables
Remark 4.28. Proposition 4.27 implies that all properties of PMFs also hold
for conditional PMFs. For instance:
P(Y 2 A|X = x) =
4.4
X
y2A
pY |X (y|x)
8A ✓ R, 8x 2 R, pX (x) > 0.
Independence of random variables
As we discussed independence of events, one can ask whether the value
taken by one random variable a↵ects the probability distribution of the other
random variables. Independence of random variables is defined by means of
the concept of independence of events.
Definition 4.29. Two random variables X, Y defined on the same sample
space are said to be independent if, for all A, B ✓ R the events {X 2 A} and
{Y 2 B} are independent, that is if
P(X 2 A, Y 2 B) = P(X 2 A) P(Y 2 B),
8A, B ✓ R.
Definition 4.29 holds in general for any two random variables on the
same sample space. In the particular case of discrete random variables, we
can equivalently express independence in terms of the PMFs.
Proposition 4.30. Let X, Y be discrete random variables defined on the
same sample space, then: X, Y are independent if and only if
pX,Y (x, y) = pX (x)pY (y),
8x, y 2 R
(4.11)
Proof. ()). If X, Y are independent, then (4.11) holds by taking A = {x}
and B = {y} in Definition 4.29.
4.4 Independence of random variables
55
((). If (4.11) holds, then taken any subsets A, B ✓ R we have
P(X 2 A, Y 2 B) = P((X, Y ) 2 A ⇥ B)
X
=
pX,Y (x, y)
(x,y)2A⇥B
=
X
pX (x)pY (y)
(x,y)2A⇥B
=
X
pX (x)
x2A
!
X
pY (y)
y2B
!
= P(X 2 A)P(Y 2 B).
An equivalent condition for independence of random variables is that the
conditional PMFs are in fact just the marginal PMFs. This is accordance
with Definition 3.6 of independent events.
Proposition 4.31. Let X, Y be discrete random variables defined on the
same sample space, then X, Y are independent if and only if either of the
following holds:
(a) for all x 2 R, the conditional PMF of Y given {X = x} coincides with
the marginal PMF of Y , that is
pY |X (y|x) = pY (y),
8y 2 R;
(b) for all y 2 R, the conditional PMF of X given {Y = y} coincides with
the marginal PMF of X, that is
pX|Y (x|y) = pX (x),
8x 2 R.
Proof. We prove both implications only for (a), since (b) is exactly analogous.
()). If X, Y are independent, then by the definition of conditional PMF in
(4.10) and Proposition 4.30, for any x 2 R such that pX (x) > 0 we get
pY |X (y|x) =
pX,Y (x, y)
pX (x)pY (y)
=
= pY (y).
pX (x)
pX (x)
56
4. Discrete random variables
((). If the property (a) holds, and taking x 2 R such that pX (x) > 0, we
obtain
pX,Y (x, y) = pY |X (y|x)pX (x) = pY (y)pX (x).
For x 2 R such that pX (x) = 0, then both the right-hand and the left-hand
side members are equal to 0, so (4.11) holds and X, Y are independent.
Sometimes one of the random variables of interest is in fact a function
of another random variable. Knowing the PMF of the latter, we obtain the
PMF of the former, as shown in the following proposition.
Proposition 4.32. Let X be a discrete random variable and f : R ! R be a
real-valued function1 . Then Y = f (X) is a discrete random variable on the
same sample space, with PMF given by
pY (y) =
X
x2f
pX (x),
1 ({y})
8y 2 R.
(4.12)
Proof. The fact that Y = f (X) is a random variable is trivial, since it is the
composition of two functions of which the first is a random variable:
X
f
Y : ⌦ ! R ! R.
Then, for any y 2 Y , the PMF of Y in y is
pY (y) = P(Y = y) = P(f (X) = y) = P(X 2 f 1 ({y})) =
where f
1
X
x2f
pX (x),
1 ({y})
({y}) is the inverse image of {y} through f , that is
f
1
({y}) = {x 2 R : f (x) = y}.
We then see that independence is preserved through fuction composition.
1
It is enough that f is defined on the range of X, that is f : X(⌦) ! R, where ⌦ is
the sample space on which X is defined.
4.4 Independence of random variables
57
Proposition 4.33. Let X, Y be discrete random variables defined on the
same sample space and f, g : R ! R be two functions, then: if X, Y are
independent, then also f (X), g(Y ) are independent.
Proof. For any A, B ✓ R, by independence of X, Y (ref. Definition 4.29),
P(f (X) 2 A, g(Y ) 2 B) = P(X 2 f
= P(X 2 f
1
1
(A), Y 2 g 1 (B))
(A))P(Y 2 g 1 (B))
= P(f (X) 2 A)P(g(Y ) 2 B).
As done with events, also the concept of independence of random variables
can be extended to the multivariate case. The following definition holds for
all random variables (not just discrete).
Definition 4.34. Let X1 , . . . , Xn be random variables defined on the same
sample space, they are said to be independent if
P(X1 2 A1 , . . . , Xn 2 An ) = P(X1 2 A1 ) · · · P(Xn 2 An ),
8A1 , . . . , An ✓ R.
(4.13)
Let X1 , X2 . . . be an infinite collection of random variables defined on the
same sample space, they are said to be independent if the random variables
of any finite sub-collection are independent.
Unlike with events, for a finite collection of random variables it is not
necessary that all sub-collections satisfy the condition of multiplication of
probabilities.
Remark 4.35. Let X1 , . . . , Xn be indipendent random variable, then the variables of any sub-collection of them are also independent, that is: for all
m < n, 1  k1 , . . . , km  n, Xk1 , . . . , Xkm are independent.
One can prove it by verifying Definition 4.29 for Xk1 , . . . , Xkm , adding
the remaining random variables in the probability to be computed as taking
58
4. Discrete random variables
values in R, which is their whole co-domain. For instance, if X1 , . . . , X5 are
independent, then
P(X1 2 A1 , X2 2 A2 , X4 2 A4 ) = P(X1 2 A1 , X2 2 A2 , X3 2 R, X4 2 A4 , X5 2 R)
= P(X1 2 A1 )P(X2 2 A2 )P(X3 2 R)P(X4 2 A4 )P(X5 2 R)
= P(X1 2 A1 )P(X2 2 A2 )P(X4 2 A4 ),
since {Xi 2 R} = ⌦ for all i.
This leads to the following equivalent characterisation of independent
random variables.
Proposition 4.36. Let X1 , . . . , Xn be random variables defined on the same
sample space, then: they are independent if and only if the events {X1 2
A1 ), . . . , {Xn 2 An } are independent for any collection of A1 , . . . , An ✓ R of
subsets of the real line.
4.5
Expectation of discrete random variables
We now introduce the concept of expected value, which is fundamental
in probability theory. The intuitive interpretation of the expected value
sees it as the long-term average in repeated experiments. Namely, repeating
the random experiment a large number of times, and noting the value of the
random variable for each repetition of the experiment, the arithmetic average
of all these values approximates the expected value of the random variable:
x1 + . . . + xn
⇡ E[X],
for large n,
n
where xi is the value taken by the random variable X at the i-th repetition
of the random experiment, and E[X] denotes the expectation of X.
We see now its formal definition in the discrete framework.
Definition 4.37. Let X be a discrete random variable, the expected value,
or expectation or mean, of X is defined by
X
X
E[X] =
x pX (x) =
x pX (x).
x2R
x2R,
pX (x)>0
(4.14)
4.5 Expectation of discrete random variables
59
In other words, E[X] is a weighted average of the possible values taken
by X, where the weights are the respective values of the PMF of X.
Note that if the range of X is finite, i.e. X can take only a finite number
of possible values, then the sum in (4.14) is a finite sum, which gives a real
number. If the range of X is instead countably infinite, then the sum in
(4.14) may or may not converge. We say that X has finite expectation if
X
x2R
since (4.15) implies E[X] < 1.
|x| pX (x) < 1,
(4.15)