Download Now we wish to discuss the case where we have multiple random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Now we wish to discuss the case where we have multiple random variables.
Two random variables are jointly continuous if we can calculate probabilities by integrating a
function called the joint density function over the set of interest.
P((X,Y) Ο΅ S) = βˆ¬π‘† 𝑓 X,Y(x,y)dxdy
Recall the following reasoning
Let βˆ†x be small and positive and say we are interested in P(x ≀ X ≀ x + βˆ†x)
We note that
π‘₯+βˆ†x
P(x ≀ X ≀ x + βˆ†x) = ∫π‘₯
𝑓X(x)dx β‰ˆ fX(x) βˆ™ βˆ†x
This means that for sufficiently small βˆ†x, we can say
P(x ≀ X ≀ x + βˆ†x) = fX(x) βˆ™ βˆ†x
Or for a small Ξ΄,
P(x ≀ X ≀ x + Ξ΄) = fX(x) βˆ™ Ξ΄
We will use similar reasoning to get and intuition for joint density functions.
Suppose we are given an event S and we wish to calculate the probability. We are given a
function fX,Y (i.e. the joint density function); this is some surface that sits on top of the 2-
dimensional plane. To calculate the probability we look at the volume under the surface (i.e. that
surface that sits on top of S.)
What should the total volume under the surface be? We think of them as probabilities so the
total volume should be 1.
∞
∞
βˆ«βˆ’βˆž βˆ«βˆ’βˆž 𝑓 X,Y(x,y)dxdy = 1
Also, since we are talking about probabilities, the joint density must be a nonnegative function.
fX,Y(x,y) β‰₯ 0
Let Ξ΄, be small and positive. Suppose we are in the intervals x ≀ X ≀ x + Ξ΄ and y ≀ Y ≀ y + Ξ΄.
What is the probability of this small square?
It is the volume under the surface that sits on top of that square. Note that since Ξ΄ is very small,
the joint density function, fX,Y(x,y), is not going to change very much, so we can treat it as a
constant. Therefore the volume is the height times the base. The base is Ξ΄2 and the height is
whatever the function happens to be.
Therefore,
P(x ≀ X ≀ x + Ξ΄, y ≀ Y ≀ y + Ξ΄) β‰ˆ fX,Y(x,y) βˆ™ Ξ΄2
Thus, the joint density function gives us probabilities per unit area.
With the joint density function in hand, we can calculate things like expectations using
∞
∞
E[g(X,Y)] = βˆ«βˆ’βˆž βˆ«βˆ’βˆž 𝑔(π‘₯, 𝑦)𝑓X,Y(x,y)dxdy
Note that this is completely analogous to the formula for the discrete case.
Recall the following formula
fX(x) βˆ™ Ξ΄ = P(x ≀ X ≀ x + Ξ΄)
We wish to know a formula for this marginal density, fX(x), in terms of the joint density,
fX,Y(x,y).
To do this, we will consider the fact that we have a joint density function, and we wish to
compute the probability of falling in the interval x ≀ X ≀ x + Ξ΄.
This is the probability that (x,y) falls in the following strip.
To find that probability we take the following integral
∞
π‘₯+𝛿
P(x ≀ X ≀ x + Ξ΄) = βˆ«βˆ’βˆž ∫π‘₯
𝑓X,Y(x,y)dxdy
Note that since x varies only a little bit, the inner integral is approximately constant.
So,
∞
π‘₯+𝛿
P(x ≀ X ≀ x + Ξ΄) = βˆ«βˆ’βˆž ∫π‘₯
𝑓 X,Y(x,y)dxdy
∞
= βˆ«βˆ’βˆž 𝛿 βˆ™ 𝑓 X,Y(x,y)dy
But recall that
P(x ≀ X ≀ x + Ξ΄) = fX(x) βˆ™ Ξ΄
Equating these two we have
∞
fX(x) βˆ™ Ξ΄ = βˆ«βˆ’βˆž 𝛿 βˆ™ 𝑓 X,Y(x,y)dy
Cancelling out terms we have the formula
∞
fX(x) = βˆ«βˆ’βˆž 𝑓X,Y(x,y)dy
Now we can describe independence,
X and Y are called independent if and only if
fX,Y(x,y) = fX(x)fY(y)
Example
Consider a 2-dimensional plane which contains a set of parallel lines which are a distance d
apart. Suppose we are throwing a needle of length L < d completely at random onto the plane.
Note that there are two possibilities of interest: either the needle intersects one of the lines or it
doesn’t.
We wish to know the probability that the needle will intersect a line
(i.e. P(needle will intersect a line)).
To approach this type of problem, there is a standard four step procedure.
Set up the sample space
Describe a probability law
Identify the set of interest.
Calculate
The needle must land somewhere and we can specify where it lands by using the center of the
needle. We will let the random variable X denote the distance from the center of the needle to
the nearest line. To specify the orientation of the needle, we will let the random variable ΞΈ
denote the acute angle that forms with the needle and a line parallel to the lines in the plane.
We have the following picture
What are the possible values of X and Y?
Well, 0 ≀ X ≀ d/2 and 0 ≀ ΞΈ ≀ Ο€/2
What is the probability law?
We say that X and Y have a uniform distribution. This is because since they are completely at
random then any value should be equally likely to any other value. Also, X and Y are
independent.
What is the joint density function?
fX,ΞΈ(x,ΞΈ) = fX(x) βˆ™ fΞΈ(ΞΈ) (because we assumed independence).
fX(x) = 2/d (since it is uniform and needs to integrate to 1)
fΞΈ(ΞΈ) = 2/Ο€
So, fX,ΞΈ(x,ΞΈ) = 4/(dΟ€)
To identify the set of interest, we return to the picture. Recall that the needle can either intersect
or not.
Note that when the needle intersects, then X < (L/2)sin(ΞΈ).
Therefore,
P(needle will intersect a line) = P(X < (L/2)sin(ΞΈ))
= ∬x < (L/2)sin(ΞΈ)) 4/(dΟ€)dxdΞΈ
πœ‹/2
= 4/(dΟ€)∫0
𝐿
( )sin(πœƒ)
∫0 2
𝑑π‘₯π‘‘πœƒ
πœ‹/2 𝐿
= 4/(dΟ€)∫0
(2) sin(πœƒ)
= (2L)/(dΟ€) βˆ™ -cos(ΞΈ) |πœ‹/2
0
= (2L)/(dΟ€)
To recapitulate,
P(needle will intersect a line) = (2L)/(dΟ€)
We now turn to conditioning.
Recall that
P(x ≀ X ≀ x + Ξ΄) β‰ˆ fX(x) βˆ™ Ξ΄
By analogy, we would like
P(x ≀ X ≀ x + Ξ΄ | Y β‰ˆ y) β‰ˆ fX|Y(x|y) βˆ™ Ξ΄
We define the conditional density by the formula
fX|Y(x|y) = fX,Y(x,y)/fY(y)
if fY(y) > 0
The best way to think about conditional probability is to think that y has been fixed and we look
at the joint density as a function of x. In other words, if we are told what Y is, then what do we
know about X? The values of X will have a certain distribution. What will the distribution look
like? It has the same shape as the joint density (it is a slice of the joint density).
To get a picture for the concept fY|X(y|x), we think what if we fix x and integrate over all y, then
what do we get? Well, integrating over all y when an X value is given calculates the marginal
density of x. It is the calculation that corresponded to the following picture.
By looking at the different slices, we can see how likely the different values of x are going to be.
If we are interested in fY|X(y|x), then how would we think about it? Well, this refers to a universe
where X takes on a certain value. There are possible values of Y that can happen. It gives us a
certain shape which tells us the likelihood of the various Y values. This gives us the shape of the
conditional distribution of Y given that a particular X value has occurred; in order to give us the
conditional, it needs to be renormalized so that the area under the slice will add up to 1. If we
divide by the total area of the slice (i.e. the marginal density of x), then renormalization is
accomplished and we have the conditional.
To recapitulate, we start with the joint density and take slices. Next, we adjust the slices so that
each slice has a total area that adds up to 1. This gives us the conditional.
Hence the definition,
fX|Y(x|y) = fX,Y(x,y)/fY(y)
if fY(y) > 0
We recall the definition of independence.
X and Y are called independent if and only if
fX,Y(x,y) = fX(x)fY(y)
Combining this with the definition of the conditional density we have that
If X and Y are called independent, then
fX|Y(x|y) = fX(x)
This solidifies our intuition of conditioning.
Example:
Suppose we have a stick of length L. We break it at a completely random location X. We then
break it again at a completely random location Y.
X has a uniform distribution in [0,L] and Y has a uniform distribution in [0,X]
We have the following picture:
What is the joint density function of these two random variables?
Well, by definition of conditional density function we have
fY|X(y|x) = fX,Y(x,y)/fX(x)
meaning that
fX,Y(x,y) = fX(x) fY|X(y|x)
1
=𝐿 βˆ™
1
π‘₯
Note that this is only true on the set 0 ≀ Y ≀ 𝑋 ≀ L
As a picture, the set is
With the joint density we can calculate the conditional expectation of Y given an X.
π‘₯
E[Y | X = x] = ∫0 𝑓 Y|X(y | X = x)dy
π‘₯
1
= ∫0 𝑦 βˆ™ π‘₯ dy
1
= 2π‘₯
Since we have the joint distribution, we can compute the marginal.
Recall the following:
1
fX,Y(x,y) = = 𝐿 βˆ™
1
π‘₯
on 0 ≀ Y ≀ 𝑋 ≀ L
To find the density of y, for some y (shown below), we are going to integrate over x going on the
interval shown below.
So we integrate over x from x = y to L.
We therefore have
∞
fY(y) = βˆ«βˆ’βˆž 𝑓X,Y(x,y)dx
𝐿1
= βˆ«π‘¦
βˆ™
𝐿
1
π‘₯
dx
1
= 𝐿ln(x)|𝑦𝐿
1
= 𝐿(ln(L) – ln(y))|𝑦𝐿
1
𝐿
0≀𝑦 ≀𝐿
= 𝐿ln(𝑦)
Given this we can compute the expected value of Y.
∞
E[Y] = βˆ«βˆ’βˆž 𝑦𝑓Y(y)dy
𝐿
= ∫0 𝑦𝑓Y(y)dy
𝐿
1
𝐿
= ∫0 𝑦 𝐿 ln(𝑦)dy
1
After we integrate by parts, we get E[Y] = 4L