Probability Theory
Conditional distributions
The discrete case. Let the random variables X and Y have joint probability
function pX,Y(x,y) and marginal probability functions pX(x) and pY(y).
The conditional probabillity function of Y given that X
X=xx is
Chapter 2
The continuous case. Let the random variables X and Y have joint density
function fX,Y(x,y) and marginal density functions fX(x) and fY(y).
The conditional density function of Y given that X=x is
Problem 2.6.1
Problem 2.6.1
Let X and Y be independent Exp(1)-distributed random
variables. Find the conditional distribution of X given that X+Y=c.
Due to the independence, the joint density function of X and Y is
Let U=X+Y and V=X. The problem can now be expressed as
Determine the joint density function of U=(U,V)´ where U=X+Y
and V=X.
It is obvious that this is a bijection and therefore Theorem 1.2.1
is applicable. Inversion yields
The Transformation Theorem (Theorem 1.2.1) then gives us the
joint distribution of U and V.
Problem 2.6.1
Conditional expectations
By Theorem 1.2.1 we now obtain
A conditional probability distribution is in itself a proper
probability distribution which means that we can define
and now it follows that
When facing a random variable Y we are often interested in
determining E(Y) and Var(Y). A relevant question is therefore if
these expectations can be computed solely using fY│X=x(y) and
fX(x), which, in that case, means that we do not first have to find
the marginal density fY(y).
We thus have that V ‫ פ‬U=c ∈ U(0,c).
Conditional expectations
Conditional expectations
Let us look closer on the conditional expectation E(Y│X=x).
So what about E[h(X)]? In the discrete case we get
can be regarded
as a function of x,
This conditional expectation
If we let x vary over all its values we thus can se the conditional
expectation as a function of the random variable X, that is
A function of X is itself a random variable and interesting things
happens when we determine the expectation and the variance
of h(X).
Conditional variance
Conditional expectations
Theorem 2.1. Let X and Y be two random variables where
E‫פ‬Y‫∞ < פ‬. It then holds that
Definition 2.2. Let X and Y be two random variables. The
conditional variance of Y given that X=x is
On the right-hand side the inner expectation concerns the
conditional distribution fY│X=x(y), whereas the outer expectation
regards fX(x).
The conditional variance is (also) a function of x, v(x), and the
corresponding random variable is v(X)=Var(Y│X).
Example (cont.)
Let X and Y be random variables such that Y‫פ‬X=x ‫ א‬U(0,x) and
X ‫ א‬Exp(λ).
We want to determine E(Y) and Var(Y). The marginal density of
Y is given by
which is a tough nut to crack. It is therefore difficult to determine
E(Y) and Var(Y) via the marginal density function of Y.
We can, however, use conditional expectations to solve the
it follows that E(Y‫פ‬X)=X/2. Theorem 2.1 and properties of Exp(λ)
thus yields
and the sought expectation was easily found. The variance is a
little bit tougher, but not by much.
Hierarchic models
Distributions with random parameters
Example (cont.)
Because Var(Y‫פ‬X=x)=x2/12 it follows that Var(Y‫פ‬X)=X2/12, and so
Let X be a random variable whose probability distribution depends on a
parameter M. We further assume that M is itself a random variable.
The probability distribution of X that is known is therefore in fact a
conditional distribution, fX│M=m(x), where M has probability distribution fM(m).
and due to the fact that E(Y‫פ‬X)=X/2 it further follows that
In these situations we speak of a hierarchic model.
It is of natural interest to find the marginal (or ”unconditional”) distribution
of X. This can be done in two steps.
1. We first find the joint probability distribution via fX,M(x,m)= fX│M=m(x) fM(m).
and, finally, it follows from Corollary 2.3.1 that
2. We then find the marginal distribution of X, fX(x), by integrating
(or summing) over m.
Exercise 3.3
Exercise 3.3
Suppose X has a normal distribution such that the mean is zero and the
inverse of the variance is gamma distributed, viz.,
Show that X∈ t(n).
The integrand show similarities with the density function of
which means that we try to manipulate it to be a full density function.
Exercise 3.3
Exercise 3.3 (extended)
As an extra exercise we use Theorem 2.1 and Corollary 2.3.1 to find the
mean and variance of the t(n)-distribution.
The variance is a little trickier.
and it is clear that X∈ t(n).
This expectation/integral is solved using our ”usual” technique.
Exercise 3.3 (extended)
The Bayesian approach
The probability distribution of a random variable is often parametrized, that
is, it depends on the value(s) of one (or more) parameter(s).
In the Bayesian approach the parameters are not completely unknown. The
parameters are considered to be random variables and their probability
distributions represents our prior knowledge about them. Such a distribution
is called the prior (or a priori) distribution of the parameter.
The integrand show similarities with the density function of Γ((n-2)/2, 2/n) so
The Bayesian way to perform statistical inference is to use the result from a
random sample to get an update of the prior distribution of the parameter.
We find the posterior distribution of the parameter by taking our findings
about conditional distributions one step further.
The Bayesian approach
Exercise 3.3 (extended, cont.)
Let X be a random variable whose probability distribution
depends on a parameter M.
We continue (and further extend) Exercise 3.3 and determine the posterior
distribution of Σ2 given that X=x. Using the results so far we get that
The probability distribution of X is therefore a conditional
distribution, fX│M=m(x). Our prior knowledge about M is
represented by the prior fM(m).
Given that X=x we are interested in finding the posterior density
(m) which we do by using the fact that
M│X (m),
That is, Σ2 │ X=x ∈ Γ((n+1)/2, 2/(n+x2)).
The Regression function
Definition 5.1. Let X1, X2,…,Xn and Y be jointly distributed
random variables, and set
The function h is called the regression function Y on X.
Definition 5.2. A predictor for Y based on X is a function d(X).
where a0, a1,…,an are constants.
