Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Contents • Random variables, distributions, and probability density functions • Discrete Random Variables • Continuous Random Variables • Expected Values and Moments • Joint and Marginal Probability • Means and variances • Covariance matrices • Univariate normal density • Multivariate Normal densities 236607 Visual Recognition Tutorial 1 Random variables, distributions, and probability density functions Random variable X is a variable which value is set as a consequence of random events, that is the events, which results is impossible to know in advance. A set of all possible results is called a sampling domain and is denoted by . Such random variable can be treated as a “indeterministic” function X which relates every possible random event with some value X ( ) . We will be dealing with real random variables X : R Probability distribution function is a function F : R [0,1] for which for every x F ( x) Pr( X x) 236607 Visual Recognition Tutorial 2 Discrete Random Variable Let X be a random variable (d.r.v.) that can assume m different values in the countable set v1 v2 vm Let pi be the probability that X assumes the value vi: pi Pr X vi , Mass function satisfy m pi 0, and P( x) 0, and pi must satisfy: i 1, p i 1 i m. 1. P( x) 1. x A connection between distribution and the mass function is given by F ( x) P( y ),P( x) F ( x) lim F ( y ) y x yx, y x 236607 Visual Recognition Tutorial 3 Continuous Random Variable The domain of continuous random variable (c.r.v.) is uncountable. The distribution function of c.r.v can be defined as x F ( x) p( y)dy where the function p(x) is called a probability density function . It is important to mention, that a numerical value of p(x) is not a “probability of x”. In the continuous case p(x)dx is a value which approximately equals to probability Pr[x<X<x+dx] Pr[ x X x dx] F ( x dx) F ( x) p( x)dx 236607 Visual Recognition Tutorial 4 Continuous Random Variable Important features of the probability density function : p( x)dx 1 x R : Pr( X x) 0 b Pr(a X b) p( x)dx a 236607 Visual Recognition Tutorial 5 Expected Values and Moments The mean or expected value or average of x is defined by m E[ x] xP( x) vi pi for d.r.v. E[ x] x i 1 xf ( x)dxfor c.r.v. If Y=g(X) we have: E[Y ] E[ g ( X )] E[ g ( X )] g ( x) P( x)for d.r.v X x:P ( x ) 0 g ( x) P( x)dxfor c.r.v X The variance is defined as: var ( X ) s 2 E[( X ) 2 ] ( x ) 2 P( x) E[( x 2 )] ( E[ x]) 2 , x where s is the standard deviation of x. 236607 Visual Recognition Tutorial 6 Expected Values and Moments Intuitively variance of x indicates distribution of its samples around its expected value (mean). Important property of the mean is its linearity: E[aX bY ] aE[ X ] bE[Y ] At the same time variance is not linear: var (aX ) a 2 var( X ) • The k-th moment of r.v. X is E[Xk] (the expected value is a first moment). The k -th central moment is s k E[( X ) k ] E[( X E[ X ]) k ] 236607 Visual Recognition Tutorial 7 Joint and Marginal Probability Let X and Y be 2 random variables with domains v1 v2 vm and w1 wn For each pair of values (vi , w j ) we have a joint probability pij Pr{ X vi , Y w j }. P(x,y) 1 joint mass function P( x, y ) 0, and x y The marginal distributions for x and y are defined as Px ( x) P( x, y ), and y Py ( y ) P( x, y )for d.r.v. x For c.r.v. marginal distributions can be calculated as PX ( x) P( x, y)dy 236607 Visual Recognition Tutorial 8 Means and variances The variables x and y are said to be statistically independent if and only if P( x, y ) Px ( x) Py ( y ) The expected value of a function f(x,y) of two random variables x and y is defined as E[ f ( x, y )] f ( x, y) P( x, y);or x y The means and variances are: f ( x, y)P( x, y)dxdy x E[ x] xP( x, y ) x y y E[ y ] yP( x, y ) x y s x2 V [ x] E[( x x )2 ] ( x x ) 2 P( x, y ) x y s y2 V [ y ] E[( y y ) 2 ] ( y y ) 2 P( x, y ). y 236607 Visual RecognitionxTutorial 9 Covariance matrices E[ x1 ] 1 E[ x ] E[ x] 2 2 xP(x). x E[ xd ] d The covariance matrix S is defined as the square matrix E[(x μ)(x μ)t ], whose ijth element sij is the covariance of xi and xj: cov( xi , x j ) s ij E[( xi i )( x j j )], 236607 Visual Recognition Tutorial i, j 1, , d. 10 Cauchy-Schwartz inequality var( X Y ) E[( X Y ( x y )) 2 ] E[( ( X x ) (Y y )) 2 ] 2 E[( X x )2 ] 2 E[( X x )(Y y )] E[(Y y ) 2 ] 2s x2 2s xy s y2 0 From this we have the Cauchy-Schwartz inequality s xy2 s x2s y2 The correlation coefficient is normalized covariance ( x, y) s xy /(s xs y ) It always 1 ( x, y ) 1 . If ( x, y ) 0 the variables x and y are uncorrelated. If y=ax+b and a>0, then ( x, y ) 1 If a<0, then ( x, y ) 1. Question.Prove that if X and Y are independent r.v. then ( x, y ) 0 236607 Visual Recognition Tutorial 11 Covariance matrices s 11 s 12 s s 22 21 s d 1 s d 2 s 1d s 12 s 12 s 2 d s 21 s 22 s dd s d 1 s d 2 s 1d s 2d 2 s d If the variables are statistically independent, the covariances are zero, and the covariance matrix is diagonal. The covariance matrix is positive semi-definite: if w is any ddimensional vector, then wt w 0 . This is equivalent to the requirement that none of the eigenvalues of S can ever be negative. 236607 Visual Recognition Tutorial 12 Univariate normal density The normal or Gaussian probability function is very important. In 1-dimension case, it is defined by probability 2 1 x density function 1 p( x) 2s e 2 s The normal density is described as a "bell-shaped curve", and it is completely determined by , s . The probabilities obey Pr x s 0.68 Pr x 2s 0.95 Pr x 3s 0.997 236607 Visual Recognition Tutorial 13 Multivariate Normal densities Suppose that each of the d random variables xi is normally 2 distributed, each with its own mean and variance:p( xi ) ~ N ( i , s i ) If these variables are independent, their joint density has the form 1 x d d 2 p(x) pxi ( xi ) 1 e 2s i i 1 i 1 1 d 1 xi i 2 i1 s i d (2 ) d / 2 s i e i 2 si i 2 i 1 This can be written in a compact matrix form if we observe that for this case the covariance matrix is diagonal, i.e., 236607 Visual Recognition Tutorial 14 Covariance matrices s 12 0 2 0 s 2 0 0 0 0 2 s d • and hence the inverse of the covariance matrix is easily written as 1/ s 12 0 0 2 0 1/ s 0 2 1 2 0 1/ s d 0 236607 Visual Recognition Tutorial 15 Covariance matrices and 2 xi i t 1 ( x μ ) (x μ) s i • Finally, by noting that the determinant of S is just the product of the variances, we can write the joint density in the form p( x) 1 (2 ) d /2 1/ 2 e 1 ( x μ )t 1 ( x μ ) 2 • This is the general form of a multivariate normal density function, where the covariance matrix is no longer required to be diagonal. 236607 Visual Recognition Tutorial 16 Covariance matrices The natural measure of the distance from x to the mean is t provided by the quantity r 2 x μ 1 x μ which is the square of the Mahalanobis distance from x to . 236607 Visual Recognition Tutorial 17 Example:Bivariate Normal Density where thus 2 s 11s 12 s 1 s 1s 2 Σ , 2 s 21s 22 s 1s 2 s 2 s 12 is a correlation coefficient;| | 1 s 1s 2 1 2 s 1s 2 s1 2 2 2 1 | Σ | s 1 s 2 (1 ),Σ 1 2 s s s 2 1 2 and after doing dot products in (x - μ)T Σ1 (x - μ) we get the expression for bivariate normal density: p( x1 , x2 ) N[ 1, 2][s 1,s 2] 2 2 x x x x 1 1 1 1 1 1 2 2 2 2 exp 2 2(1 2 ) s 1 s 1 s 2 s 2 2s 1s 2 1 2 236607 Visual Recognition Tutorial 18 Some Geometric Features The level curves of the 2D Gaussian are ellipses; the principal axes are in the direction of the eigenvectors of S, and the different width correspond to the corresponding eigenvalues. For uncorrelated r.v. ( =0 ) the axes are parallel to the coordinate axes. For the extreme case of 1 the ellipses collapse into straight lines (in fact there is only one independent r.v.). Marginal and conditional densities are unidimensional normal. 236607 Visual Recognition Tutorial 19 Some Geometric Features 236607 Visual Recognition Tutorial 20 Law of Large Numbers and Central Limit Theorem Law of large numbers Let X1, X2,…,be a series of i.i.d. (independent and identically distributed) random variables with E[Xi]= . Then for Sn= X1+…+ Xn 1 lim S n n n Central Limit Theorem Let X1, X2,…,be a series of i.i.d. r.v. with E[Xi]= and variance var(Xi)=s2 . Then for Sn= X1+…+ Xn S n n D N (0,1) s n 236607 Visual Recognition Tutorial 21