Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Now we wish to discuss the case where we have multiple random variables. Two random variables are jointly continuous if we can calculate probabilities by integrating a function called the joint density function over the set of interest. P((X,Y) Ο΅ S) = β¬π π X,Y(x,y)dxdy Recall the following reasoning Let βx be small and positive and say we are interested in P(x β€ X β€ x + βx) We note that π₯+βx P(x β€ X β€ x + βx) = β«π₯ πX(x)dx β fX(x) β βx This means that for sufficiently small βx, we can say P(x β€ X β€ x + βx) = fX(x) β βx Or for a small Ξ΄, P(x β€ X β€ x + Ξ΄) = fX(x) β Ξ΄ We will use similar reasoning to get and intuition for joint density functions. Suppose we are given an event S and we wish to calculate the probability. We are given a function fX,Y (i.e. the joint density function); this is some surface that sits on top of the 2- dimensional plane. To calculate the probability we look at the volume under the surface (i.e. that surface that sits on top of S.) What should the total volume under the surface be? We think of them as probabilities so the total volume should be 1. β β β«ββ β«ββ π X,Y(x,y)dxdy = 1 Also, since we are talking about probabilities, the joint density must be a nonnegative function. fX,Y(x,y) β₯ 0 Let Ξ΄, be small and positive. Suppose we are in the intervals x β€ X β€ x + Ξ΄ and y β€ Y β€ y + Ξ΄. What is the probability of this small square? It is the volume under the surface that sits on top of that square. Note that since Ξ΄ is very small, the joint density function, fX,Y(x,y), is not going to change very much, so we can treat it as a constant. Therefore the volume is the height times the base. The base is Ξ΄2 and the height is whatever the function happens to be. Therefore, P(x β€ X β€ x + Ξ΄, y β€ Y β€ y + Ξ΄) β fX,Y(x,y) β Ξ΄2 Thus, the joint density function gives us probabilities per unit area. With the joint density function in hand, we can calculate things like expectations using β β E[g(X,Y)] = β«ββ β«ββ π(π₯, π¦)πX,Y(x,y)dxdy Note that this is completely analogous to the formula for the discrete case. Recall the following formula fX(x) β Ξ΄ = P(x β€ X β€ x + Ξ΄) We wish to know a formula for this marginal density, fX(x), in terms of the joint density, fX,Y(x,y). To do this, we will consider the fact that we have a joint density function, and we wish to compute the probability of falling in the interval x β€ X β€ x + Ξ΄. This is the probability that (x,y) falls in the following strip. To find that probability we take the following integral β π₯+πΏ P(x β€ X β€ x + Ξ΄) = β«ββ β«π₯ πX,Y(x,y)dxdy Note that since x varies only a little bit, the inner integral is approximately constant. So, β π₯+πΏ P(x β€ X β€ x + Ξ΄) = β«ββ β«π₯ π X,Y(x,y)dxdy β = β«ββ πΏ β π X,Y(x,y)dy But recall that P(x β€ X β€ x + Ξ΄) = fX(x) β Ξ΄ Equating these two we have β fX(x) β Ξ΄ = β«ββ πΏ β π X,Y(x,y)dy Cancelling out terms we have the formula β fX(x) = β«ββ πX,Y(x,y)dy Now we can describe independence, X and Y are called independent if and only if fX,Y(x,y) = fX(x)fY(y) Example Consider a 2-dimensional plane which contains a set of parallel lines which are a distance d apart. Suppose we are throwing a needle of length L < d completely at random onto the plane. Note that there are two possibilities of interest: either the needle intersects one of the lines or it doesnβt. We wish to know the probability that the needle will intersect a line (i.e. P(needle will intersect a line)). To approach this type of problem, there is a standard four step procedure. Set up the sample space Describe a probability law Identify the set of interest. Calculate The needle must land somewhere and we can specify where it lands by using the center of the needle. We will let the random variable X denote the distance from the center of the needle to the nearest line. To specify the orientation of the needle, we will let the random variable ΞΈ denote the acute angle that forms with the needle and a line parallel to the lines in the plane. We have the following picture What are the possible values of X and Y? Well, 0 β€ X β€ d/2 and 0 β€ ΞΈ β€ Ο/2 What is the probability law? We say that X and Y have a uniform distribution. This is because since they are completely at random then any value should be equally likely to any other value. Also, X and Y are independent. What is the joint density function? fX,ΞΈ(x,ΞΈ) = fX(x) β fΞΈ(ΞΈ) (because we assumed independence). fX(x) = 2/d (since it is uniform and needs to integrate to 1) fΞΈ(ΞΈ) = 2/Ο So, fX,ΞΈ(x,ΞΈ) = 4/(dΟ) To identify the set of interest, we return to the picture. Recall that the needle can either intersect or not. Note that when the needle intersects, then X < (L/2)sin(ΞΈ). Therefore, P(needle will intersect a line) = P(X < (L/2)sin(ΞΈ)) = β¬x < (L/2)sin(ΞΈ)) 4/(dΟ)dxdΞΈ π/2 = 4/(dΟ)β«0 πΏ ( )sin(π) β«0 2 ππ₯ππ π/2 πΏ = 4/(dΟ)β«0 (2) sin(π) = (2L)/(dΟ) β -cos(ΞΈ) |π/2 0 = (2L)/(dΟ) To recapitulate, P(needle will intersect a line) = (2L)/(dΟ) We now turn to conditioning. Recall that P(x β€ X β€ x + Ξ΄) β fX(x) β Ξ΄ By analogy, we would like P(x β€ X β€ x + Ξ΄ | Y β y) β fX|Y(x|y) β Ξ΄ We define the conditional density by the formula fX|Y(x|y) = fX,Y(x,y)/fY(y) if fY(y) > 0 The best way to think about conditional probability is to think that y has been fixed and we look at the joint density as a function of x. In other words, if we are told what Y is, then what do we know about X? The values of X will have a certain distribution. What will the distribution look like? It has the same shape as the joint density (it is a slice of the joint density). To get a picture for the concept fY|X(y|x), we think what if we fix x and integrate over all y, then what do we get? Well, integrating over all y when an X value is given calculates the marginal density of x. It is the calculation that corresponded to the following picture. By looking at the different slices, we can see how likely the different values of x are going to be. If we are interested in fY|X(y|x), then how would we think about it? Well, this refers to a universe where X takes on a certain value. There are possible values of Y that can happen. It gives us a certain shape which tells us the likelihood of the various Y values. This gives us the shape of the conditional distribution of Y given that a particular X value has occurred; in order to give us the conditional, it needs to be renormalized so that the area under the slice will add up to 1. If we divide by the total area of the slice (i.e. the marginal density of x), then renormalization is accomplished and we have the conditional. To recapitulate, we start with the joint density and take slices. Next, we adjust the slices so that each slice has a total area that adds up to 1. This gives us the conditional. Hence the definition, fX|Y(x|y) = fX,Y(x,y)/fY(y) if fY(y) > 0 We recall the definition of independence. X and Y are called independent if and only if fX,Y(x,y) = fX(x)fY(y) Combining this with the definition of the conditional density we have that If X and Y are called independent, then fX|Y(x|y) = fX(x) This solidifies our intuition of conditioning. Example: Suppose we have a stick of length L. We break it at a completely random location X. We then break it again at a completely random location Y. X has a uniform distribution in [0,L] and Y has a uniform distribution in [0,X] We have the following picture: What is the joint density function of these two random variables? Well, by definition of conditional density function we have fY|X(y|x) = fX,Y(x,y)/fX(x) meaning that fX,Y(x,y) = fX(x) fY|X(y|x) 1 =πΏ β 1 π₯ Note that this is only true on the set 0 β€ Y β€ π β€ L As a picture, the set is With the joint density we can calculate the conditional expectation of Y given an X. π₯ E[Y | X = x] = β«0 π Y|X(y | X = x)dy π₯ 1 = β«0 π¦ β π₯ dy 1 = 2π₯ Since we have the joint distribution, we can compute the marginal. Recall the following: 1 fX,Y(x,y) = = πΏ β 1 π₯ on 0 β€ Y β€ π β€ L To find the density of y, for some y (shown below), we are going to integrate over x going on the interval shown below. So we integrate over x from x = y to L. We therefore have β fY(y) = β«ββ πX,Y(x,y)dx πΏ1 = β«π¦ β πΏ 1 π₯ dx 1 = πΏln(x)|π¦πΏ 1 = πΏ(ln(L) β ln(y))|π¦πΏ 1 πΏ 0β€π¦ β€πΏ = πΏln(π¦) Given this we can compute the expected value of Y. β E[Y] = β«ββ π¦πY(y)dy πΏ = β«0 π¦πY(y)dy πΏ 1 πΏ = β«0 π¦ πΏ ln(π¦)dy 1 After we integrate by parts, we get E[Y] = 4L