Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables … Functions of Random Variables Towards Least Squares … Distributions - Binomial Distribution Poisson Distribution Practical Scenario The adjoining signals are from independent channels and are giving information. Can we conclude anything at all from the values ? Are they Biased??? Law of Large Numbers • {Xn} be the sequence of measurements we take every time. • Does the measurement average, X, converge to a fixed value ? • How do we define convergence in probability spaces? Law of Large Numbers gives an answer !!! Let E ( X i ) i , V ( X i ) i 2 , Cov( X i X j ) 0, i j 1 Lt 2 n n 2 i 0 X i i 0 i 1:n This is the hieroglyphic way of saying The mean value being measured tends to the sample mean. And it does converge!!! Central Limit Theorem • In applications, limiting distributions are required for further analysis of experimental data. • The Random Variable Xn stands for a statistics computed from a sample of size n. • Actual distribution of such an RV is often difficult to find. • In such a case, we often use in practice approximate the distribution of such RVs with limiting distributions. • It is justified for large values of n, by the Central Limit Theorem. As was shown practically in the last class, for large number of samples, the distribution is approximated to be gaussian (under mild assumptions). This fact is called “Central Limit Theorem” Expectation Revisited Two Random Variables – Important Definitions f X ( x) E ( g ( X )) f X ,Y ( x, y )dy E (h(Y )) E (Y ) g ( x) f X ,Y ( x, y )dydx xf ( x)dx E ( g ( X , Y )) xf yf Y ( y )dy ( x, y )dxdy yf g ( x, y ) f X ,Y ( x, y )dxdy X ,Y X ,Y ( x, y )dxdy ( x, y )dydx What abt. g(X,Y)=aX+bY ? Where, f X ( orY ) ( x) is the Marginal Probability Density Function of the R.V. X(or Y). X ,Y X h( y ) f f X ,Y ( x, y )dx E( X ) ( x)dx X fY ( x ) g ( x) f Multidimensional Random Variables Joint Probability Functions: Joint Probability Distribution Function: F ( X ) P[{X1 x1} {X 2 x2 } ......... {X n xn }] Joint Probability Density Function: n F ( X ) f ( x) X 1X 2 ...X n Marginal Probability Functions: A marginal probability functions are obtained by integrating out the variables that are of no interest. Multivariate Expectations Mean Vector: E[x] [ E[ x1 ] E[ x2 ] ...... E[ xn ]] Expected value of g(x1,x2,…….,xn) is given by E[ g (x)] ..... g (x) f (x) or xn xn1 x1 ..... g (x) f (x)dx xn xn-1 x1 Covariance Matrix: cov[x] P E[(x )(x )T ] E[xxT ] T where, S E[xxT ] is known as autocorrelation matrix. 1 0 0 1 0 0 2 21 NOTE: P R 0 0 n n1 12 1 n 2 1n 1 0 2 n 0 2 1 0 0 0 0 n Gaussian or Normal Distribution The normal distribution is the most widely known and used distribution in the field of statistics. Many natural phenomena can be approximated by Normal distribution. (Owing to the Central Limit Theorem). Normal Density Function: 0.399 1 f ( x) e 2 ( x ) 2 2 2 x An Important Trick: I e t 2 2 dt 2 ( Normalization) -2 - + +2 Two Dimensional Gaussian R.V. Two Random Variables, X and Y are said to be jointly Gaussian if their density is of the form f ( x, y ) Ae ( ax 2 bxy cy 2 dx ey ) where ax 2 bxy cy 2 dx ey is a quadratic form which is Positive Semi Definite in general. This density can be written in the form f ( x, y ) 1 2 1 2 1 r 2 e x 2 2 r ( x )( y ) y 2 1 1 1 2 2 2 1 2 2(1 r ) 1 2 with 1 , 2 , 1 , 2 , r as the parameters of the distribution with i > 0 and r |1| Positive semi definite Matrices A Symmetric Matrix A, is said to be positive semi definite if one of the equivalent conditions below are satisfied. 1. All Eigenvalues of A are positive or zero. 2. There exists a nonsingular matrix A1 such that A = A1*A1T. (Cholesky Decomposition) 3. Every principal minor of A is positive 4. x’Ax >= a|x|. For all x and some a>0. The additional useful property is that A = UDU’, U is an orthogonal matrix. We can use the above and show that the covariance matrix is atleast positive semidefinite … a little proof! Geometrical Interpretation Consider the positive definite quadratic form again. f xT Ax; 2 2 2 x y z when n=3: 1 a b c The special choice of principal coordinates The Gradient, f 2 Ax. Geometry of the Ellipsoids tell us that the principal axes are normal to the surface Proportionality to the Gradient. Let unknown Proportionality constants be 2. Then for the principal directions, x, the 2 x f 2 Ax Eigenvalue Problem !!! Now we can locate the size of the ellipsoid. f x Ax Multivariate Normal Distribution Multivariate Gaussian Density Function: 1 f ( X) n How to find equal probability surface? 1 Xμ 2 2 P e T 1 1 2 X μ P X μ T R 1 Xμ constant More over one is interested to find the probability of x lies inside the quadratic hyper surface For example what is the probability of lying inside 1-σ 1 0 0 ellipsoid. Y C( X μ) 2 P zi2 c 2 f ( z )dV V R 1 CΣCT zi Yi i z12 z22 zn2 c 2 1 0 0 1 22 0 0 Σ 1 n2 Illustration of Expectation A Lottery has two schemes, the First scheme has two outcomes (denoted by 1 and 2)and the second has three (denoted by 1,2 and 3). It is agreed that the participant in the First scheme gets $1, if outcome is 1, $2, if the outcome is 2. The participant in the second scheme gets $3 if the outcome is 1, -$2 if the outcome is 2 and $3 if the outcome is 3. The probabilities of each outcome are listed as follows. p(1, 1) = 0.1; p(1, 2) = 0.2; p(1, 3) = 0.3 p(2, 1) = 0.2; p(2, 2) = 0.1; p(2; 3) = 0.1 Help the investor to decide on which scheme to prefer.[Bryson] Constant Probability Surfaces: 2D OBSERVATIONS Example Problem. • What does it tell us? • How do we draw these? • Importance of the Shape of the Ellipses. • In n dimensions they are hyper ellipsoids P matrix was found to be 2.2 2.2 2.2 7.5 Surfaces Plotted for the Problem Considered In The Previous Lecture Clearly, the dime gain has a lot of uncertainty. The investor better not risk !!! Multivariate Normal Distribution Yi represents coordinates based on Cartesian principal axis system and σ2i is the variance along the principal axes. Probability of lying inside 1σ,2σ or 3σ ellipsoid decreases with increase in dimensionality. n\c 1 Curse of Dimensionality 2 3 1 0.683 0.955 0.997 2 0.394 0.865 0.989 3 0.200 0.739 0.971 Functions of Random Variables. Let Y be a function of the random variable X. Y g( X ) •We are interested in deriving the pdf and cdf of Y in terms of x. fY ( y ) i f X ( xi ) J ( xi ) where xi are the solution vectors of the algebraic mapping y g ( x). J ( x) is the Jacobian defined as g x • This property can be used to derive the important result that, “A linear mapping of jointly gaussian random variables is still jointly gaussian” (2D demonstration follows …) Example: Let y ax 2 and p( x) 1 x 2 exp( x 2 / 2 x2 ) NOTE: for each value of y there are two values of x. p( y ) and 1 exp( y / 2a x2 ), y 0 2 x 2 ay p(y) = 0, otherwise We can also show that E ( y ) a x2 and V ( y ) 2a 4 x4 A linear mapping of jointly gaussian random variables is still jointly gaussian simple demonstration. Z aX bY , and W cX dY . f ZW ( z, w) f ZW (a1 z b1w, c1 z d1w) , ad bc where x a1 z b1w and y c1 z d1w Covariance Matrix Covariance matrix indicates the tendency of each pair of dimensions in random vector to vary together i.e. “covary”. Properties of covariance matrix: Covariance matrix is square. T Covariance matrix is always +ve definite i.e. x Px > 0. Covariance matrix is symmetric i.e. P = PT. If xi and xj tends to increase together then Pij > 0. If xi and xj are uncorrelated then Pij = 0. Independent Variables Recall, two random variables are said to be independent if knowing values of one tells you nothing about the other variable. Joint probability density function is product of the marginal probability density functions. Cov(X,Y)=0 if X and Y are independent. E(XY)=E(X)E(Y). Two variables are said to be uncorrelated if cov(X,Y)=0. Independent variables are uncorrelated but vice versa is not true. Converse only true only when jointly Gaussian x 2 rxy y 1 2(1 r 2 ) 1 1 2 2 2 f ( x, y ) 1 2 1 2 1 r 2 e r 0 f ( x, y ) 1 2 1 2 e 2 2 1 x y 2 1 2 f X ( x ) fY ( y ) 2 Conditioning in Random Vectors Conditional Marginal density and distribution functions are given as : Useful Tricks f ( x1 | x3 ) fY ( y | X x ) f XY ( x, y ) f XY ( x, y )dy FY ( y | X x) f ( x1 , x2 | x3 )dx2 f ( x, y ) XY f X ( x) Left Removal f ( x1 | x4 ) y f ( x1 | x2 , x3 , x4 ) f ( x2 , x3 | x4 )dx2 dx3 Similarly f XY ( x, y ) for X Right Removal f XY ( x, y )dy g ( y) f E ( g (Y ) | X x) g ( y) f Y ( y | X x)dy Conditional Expectation : f XY ( x, y )dy Special Case: yf E (Y | X x) Note that this is a Number yf Y XY ( x, y )dy ( y | X x )dy ( x, y )dy XY f XY ( x, y )dy E Y | X x R E{Y | X } is a Random Variable. This is the locus of the Centers of Masses of Marginal Density of Y along X. X Y Useful Result: E E Y | X E Y Theorem: The function g ( X ) that minimizes E Y - g ( X ) 2 is the conditional expected value of Y assuming X x , E Y | X x More Generally E E g ( X , Y ) | X E g ( X , Y ) E E g ( X ) g (Y ) | X E g ( X ) E{g (Y ) | X } (1) Orthogonality and Least Squares Theorem: The constant a that minimizes the m.s. error, E Y - aX , Y - aX is such that Y - aX is orthogonal to X . E Y aX X 0 Y aX How would we measure angles in the probability space then? This is precisely what correlation does. sin r E{ XY } E{ X 2 }E{Y 2 } Correlation Coefficient For a 2D space Consider the example of 2D jointly Gaussian problem … 2