Download Chapter 2 Conditioning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Scalar field theory wikipedia , lookup

Hardware random number generator wikipedia , lookup

Renormalization group wikipedia , lookup

Information theory wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Generalized linear model wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Probability Theory
Conditional distributions
The discrete case. Let the random variables X and Y have joint probability
function pX,Y(x,y) and marginal probability functions pX(x) and pY(y).
The conditional probabillity function of Y given that X
X=xx is
Chapter 2
Conditioning
Thommy Perlinger, Probability Theory
The continuous case. Let the random variables X and Y have joint density
function fX,Y(x,y) and marginal density functions fX(x) and fY(y).
The conditional density function of Y given that X=x is
1
Problem 2.6.1
Thommy Perlinger, Probability Theory
2
Problem 2.6.1
Let X and Y be independent Exp(1)-distributed random
variables. Find the conditional distribution of X given that X+Y=c.
Due to the independence, the joint density function of X and Y is
Let U=X+Y and V=X. The problem can now be expressed as
Determine the joint density function of U=(U,V)´ where U=X+Y
and V=X.
It is obvious that this is a bijection and therefore Theorem 1.2.1
is applicable. Inversion yields
Firstt it can be
Fi
b proven th
thatt th
the sum off two
t
i d
independent
d t Exp(1)
E (1) is
i
Γ(2,1), that is U∈Γ(2,1). See Problem 1.3.39.
that is
The Transformation Theorem (Theorem 1.2.1) then gives us the
joint distribution of U and V.
Thommy Perlinger, Probability Theory
3
Thommy Perlinger, Probability Theory
4
1
Problem 2.6.1
Conditional expectations
By Theorem 1.2.1 we now obtain
A conditional probability distribution is in itself a proper
probability distribution which means that we can define
expectations
t ti
i th
in
the usuall manner.
and now it follows that
When facing a random variable Y we are often interested in
determining E(Y) and Var(Y). A relevant question is therefore if
these expectations can be computed solely using fY│X=x(y) and
fX(x), which, in that case, means that we do not first have to find
the marginal density fY(y).
We thus have that V ‫ פ‬U=c ∈ U(0,c).
Thommy Perlinger, Probability Theory
5
Conditional expectations
Thommy Perlinger, Probability Theory
6
Conditional expectations
Let us look closer on the conditional expectation E(Y│X=x).
So what about E[h(X)]? In the discrete case we get
p
can be regarded
g
as a function of x,
This conditional expectation
If we let x vary over all its values we thus can se the conditional
expectation as a function of the random variable X, that is
A function of X is itself a random variable and interesting things
happens when we determine the expectation and the variance
of h(X).
Thommy Perlinger, Probability Theory
7
Thommy Perlinger, Probability Theory
8
2
Conditional variance
Conditional expectations
Theorem 2.1. Let X and Y be two random variables where
E‫פ‬Y‫∞ < פ‬. It then holds that
Definition 2.2. Let X and Y be two random variables. The
conditional variance of Y given that X=x is
On the right-hand side the inner expectation concerns the
conditional distribution fY│X=x(y), whereas the outer expectation
regards fX(x).
The conditional variance is (also) a function of x, v(x), and the
corresponding random variable is v(X)=Var(Y│X).
Thommy Perlinger, Probability Theory
C
Corollary
ll
2 3 1 Let
2.3.1.
L t X and
d Y be
b two
t
random
d
variables
i bl where
h
E(Y2) < ∞. It then holds that
9
Thommy Perlinger, Probability Theory
Example (cont.)
Example
Let X and Y be random variables such that Y‫פ‬X=x ‫ א‬U(0,x) and
X ‫ א‬Exp(λ).
We want to determine E(Y) and Var(Y). The marginal density of
Y is given by
which is a tough nut to crack. It is therefore difficult to determine
E(Y) and Var(Y) via the marginal density function of Y.
We can, however, use conditional expectations to solve the
problem.
Thommy Perlinger, Probability Theory
10
11
Because
it follows that E(Y‫פ‬X)=X/2. Theorem 2.1 and properties of Exp(λ)
thus yields
and the sought expectation was easily found. The variance is a
little bit tougher, but not by much.
Thommy Perlinger, Probability Theory
12
3
Hierarchic models
Distributions with random parameters
Example (cont.)
Because Var(Y‫פ‬X=x)=x2/12 it follows that Var(Y‫פ‬X)=X2/12, and so
Let X be a random variable whose probability distribution depends on a
parameter M. We further assume that M is itself a random variable.
The probability distribution of X that is known is therefore in fact a
conditional distribution, fX│M=m(x), where M has probability distribution fM(m).
and due to the fact that E(Y‫פ‬X)=X/2 it further follows that
In these situations we speak of a hierarchic model.
It is of natural interest to find the marginal (or ”unconditional”) distribution
of X. This can be done in two steps.
1. We first find the joint probability distribution via fX,M(x,m)= fX│M=m(x) fM(m).
and, finally, it follows from Corollary 2.3.1 that
2. We then find the marginal distribution of X, fX(x), by integrating
(or summing) over m.
Thommy Perlinger, Probability Theory
13
Exercise 3.3
Thommy Perlinger, Probability Theory
14
Exercise 3.3
Suppose X has a normal distribution such that the mean is zero and the
inverse of the variance is gamma distributed, viz.,
Show that X∈ t(n).
The integrand show similarities with the density function of
Thommy Perlinger, Probability Theory
15
which means that we try to manipulate it to be a full density function.
16
4
Exercise 3.3
Exercise 3.3 (extended)
As an extra exercise we use Theorem 2.1 and Corollary 2.3.1 to find the
mean and variance of the t(n)-distribution.
The variance is a little trickier.
and it is clear that X∈ t(n).
This expectation/integral is solved using our ”usual” technique.
17
Exercise 3.3 (extended)
Thommy Perlinger, Probability Theory
18
The Bayesian approach
The probability distribution of a random variable is often parametrized, that
is, it depends on the value(s) of one (or more) parameter(s).
In the Bayesian approach the parameters are not completely unknown. The
parameters are considered to be random variables and their probability
distributions represents our prior knowledge about them. Such a distribution
is called the prior (or a priori) distribution of the parameter.
The integrand show similarities with the density function of Γ((n-2)/2, 2/n) so
The Bayesian way to perform statistical inference is to use the result from a
random sample to get an update of the prior distribution of the parameter.
Thi updated
This
d t d version
i iis called
ll d the
th posterior
t i (or
( a posteriori)
t i i) distribution
di t ib ti off
the parameter.
We find the posterior distribution of the parameter by taking our findings
about conditional distributions one step further.
Thommy Perlinger, Probability Theory
19
Thommy Perlinger, Probability Theory
20
5
The Bayesian approach
Exercise 3.3 (extended, cont.)
Let X be a random variable whose probability distribution
depends on a parameter M.
We continue (and further extend) Exercise 3.3 and determine the posterior
distribution of Σ2 given that X=x. Using the results so far we get that
The probability distribution of X is therefore a conditional
distribution, fX│M=m(x). Our prior knowledge about M is
represented by the prior fM(m).
Given that X=x we are interested in finding the posterior density
fM│X=x
(m) which we do by using the fact that
M│X (m),
Thommy Perlinger, Probability Theory
21
That is, Σ2 │ X=x ∈ Γ((n+1)/2, 2/(n+x2)).
22
The Regression function
Definition 5.1. Let X1, X2,…,Xn and Y be jointly distributed
random variables, and set
The function h is called the regression function Y on X.
Definition 5.2. A predictor for Y based on X is a function d(X).
Th predictor
The
di
i said
is
id to be
b linear
li
if d is
i linear,
li
that
h iis, if
where a0, a1,…,an are constants.
Thommy Perlinger, Probability Theory
23
6