Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Theory Conditional distributions The discrete case. Let the random variables X and Y have joint probability function pX,Y(x,y) and marginal probability functions pX(x) and pY(y). The conditional probabillity function of Y given that X X=xx is Chapter 2 Conditioning Thommy Perlinger, Probability Theory The continuous case. Let the random variables X and Y have joint density function fX,Y(x,y) and marginal density functions fX(x) and fY(y). The conditional density function of Y given that X=x is 1 Problem 2.6.1 Thommy Perlinger, Probability Theory 2 Problem 2.6.1 Let X and Y be independent Exp(1)-distributed random variables. Find the conditional distribution of X given that X+Y=c. Due to the independence, the joint density function of X and Y is Let U=X+Y and V=X. The problem can now be expressed as Determine the joint density function of U=(U,V)´ where U=X+Y and V=X. It is obvious that this is a bijection and therefore Theorem 1.2.1 is applicable. Inversion yields Firstt it can be Fi b proven th thatt th the sum off two t i d independent d t Exp(1) E (1) is i Γ(2,1), that is U∈Γ(2,1). See Problem 1.3.39. that is The Transformation Theorem (Theorem 1.2.1) then gives us the joint distribution of U and V. Thommy Perlinger, Probability Theory 3 Thommy Perlinger, Probability Theory 4 1 Problem 2.6.1 Conditional expectations By Theorem 1.2.1 we now obtain A conditional probability distribution is in itself a proper probability distribution which means that we can define expectations t ti i th in the usuall manner. and now it follows that When facing a random variable Y we are often interested in determining E(Y) and Var(Y). A relevant question is therefore if these expectations can be computed solely using fY│X=x(y) and fX(x), which, in that case, means that we do not first have to find the marginal density fY(y). We thus have that V פU=c ∈ U(0,c). Thommy Perlinger, Probability Theory 5 Conditional expectations Thommy Perlinger, Probability Theory 6 Conditional expectations Let us look closer on the conditional expectation E(Y│X=x). So what about E[h(X)]? In the discrete case we get p can be regarded g as a function of x, This conditional expectation If we let x vary over all its values we thus can se the conditional expectation as a function of the random variable X, that is A function of X is itself a random variable and interesting things happens when we determine the expectation and the variance of h(X). Thommy Perlinger, Probability Theory 7 Thommy Perlinger, Probability Theory 8 2 Conditional variance Conditional expectations Theorem 2.1. Let X and Y be two random variables where EפY∞ < פ. It then holds that Definition 2.2. Let X and Y be two random variables. The conditional variance of Y given that X=x is On the right-hand side the inner expectation concerns the conditional distribution fY│X=x(y), whereas the outer expectation regards fX(x). The conditional variance is (also) a function of x, v(x), and the corresponding random variable is v(X)=Var(Y│X). Thommy Perlinger, Probability Theory C Corollary ll 2 3 1 Let 2.3.1. L t X and d Y be b two t random d variables i bl where h E(Y2) < ∞. It then holds that 9 Thommy Perlinger, Probability Theory Example (cont.) Example Let X and Y be random variables such that YפX=x אU(0,x) and X אExp(λ). We want to determine E(Y) and Var(Y). The marginal density of Y is given by which is a tough nut to crack. It is therefore difficult to determine E(Y) and Var(Y) via the marginal density function of Y. We can, however, use conditional expectations to solve the problem. Thommy Perlinger, Probability Theory 10 11 Because it follows that E(YפX)=X/2. Theorem 2.1 and properties of Exp(λ) thus yields and the sought expectation was easily found. The variance is a little bit tougher, but not by much. Thommy Perlinger, Probability Theory 12 3 Hierarchic models Distributions with random parameters Example (cont.) Because Var(YפX=x)=x2/12 it follows that Var(YפX)=X2/12, and so Let X be a random variable whose probability distribution depends on a parameter M. We further assume that M is itself a random variable. The probability distribution of X that is known is therefore in fact a conditional distribution, fX│M=m(x), where M has probability distribution fM(m). and due to the fact that E(YפX)=X/2 it further follows that In these situations we speak of a hierarchic model. It is of natural interest to find the marginal (or ”unconditional”) distribution of X. This can be done in two steps. 1. We first find the joint probability distribution via fX,M(x,m)= fX│M=m(x) fM(m). and, finally, it follows from Corollary 2.3.1 that 2. We then find the marginal distribution of X, fX(x), by integrating (or summing) over m. Thommy Perlinger, Probability Theory 13 Exercise 3.3 Thommy Perlinger, Probability Theory 14 Exercise 3.3 Suppose X has a normal distribution such that the mean is zero and the inverse of the variance is gamma distributed, viz., Show that X∈ t(n). The integrand show similarities with the density function of Thommy Perlinger, Probability Theory 15 which means that we try to manipulate it to be a full density function. 16 4 Exercise 3.3 Exercise 3.3 (extended) As an extra exercise we use Theorem 2.1 and Corollary 2.3.1 to find the mean and variance of the t(n)-distribution. The variance is a little trickier. and it is clear that X∈ t(n). This expectation/integral is solved using our ”usual” technique. 17 Exercise 3.3 (extended) Thommy Perlinger, Probability Theory 18 The Bayesian approach The probability distribution of a random variable is often parametrized, that is, it depends on the value(s) of one (or more) parameter(s). In the Bayesian approach the parameters are not completely unknown. The parameters are considered to be random variables and their probability distributions represents our prior knowledge about them. Such a distribution is called the prior (or a priori) distribution of the parameter. The integrand show similarities with the density function of Γ((n-2)/2, 2/n) so The Bayesian way to perform statistical inference is to use the result from a random sample to get an update of the prior distribution of the parameter. Thi updated This d t d version i iis called ll d the th posterior t i (or ( a posteriori) t i i) distribution di t ib ti off the parameter. We find the posterior distribution of the parameter by taking our findings about conditional distributions one step further. Thommy Perlinger, Probability Theory 19 Thommy Perlinger, Probability Theory 20 5 The Bayesian approach Exercise 3.3 (extended, cont.) Let X be a random variable whose probability distribution depends on a parameter M. We continue (and further extend) Exercise 3.3 and determine the posterior distribution of Σ2 given that X=x. Using the results so far we get that The probability distribution of X is therefore a conditional distribution, fX│M=m(x). Our prior knowledge about M is represented by the prior fM(m). Given that X=x we are interested in finding the posterior density fM│X=x (m) which we do by using the fact that M│X (m), Thommy Perlinger, Probability Theory 21 That is, Σ2 │ X=x ∈ Γ((n+1)/2, 2/(n+x2)). 22 The Regression function Definition 5.1. Let X1, X2,…,Xn and Y be jointly distributed random variables, and set The function h is called the regression function Y on X. Definition 5.2. A predictor for Y based on X is a function d(X). Th predictor The di i said is id to be b linear li if d is i linear, li that h iis, if where a0, a1,…,an are constants. Thommy Perlinger, Probability Theory 23 6