Download Tutoria1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Matrix calculus wikipedia , lookup

Transcript
Contents
• Random variables, distributions, and probability density
functions
• Discrete Random Variables
• Continuous Random Variables
• Expected Values and Moments
• Joint and Marginal Probability
• Means and variances
• Covariance matrices
• Univariate normal density
• Multivariate Normal densities
236607 Visual Recognition Tutorial
1
Random variables, distributions, and
probability density functions
Random variable X is a variable which value is set as a consequence of
random events, that is the events, which results is impossible to know in
advance. A set of all possible results is called a sampling domain and is
denoted by  . Such random variable can be treated as a
“indeterministic” function X which relates every possible random event
  with some value X ( ) . We will be dealing with real random
variables X :   R
Probability distribution function is a function F : R  [0,1] for which
for every x
F ( x)  Pr( X  x)
236607 Visual Recognition Tutorial
2
Discrete Random Variable
Let X be a random variable (d.r.v.) that can assume m different values in
the countable set
  v1 v2
vm 
Let pi be the probability that X assumes the value vi:
pi  Pr  X  vi  ,
Mass function satisfy
m
pi  0,
and
P( x)  0,
and
pi must satisfy:
i  1,
p
i 1
i
m.
 1.
 P( x)  1.
x
A connection between distribution and the mass function is given by
F ( x)   P( y ),P( x)  F ( x)  lim F ( y )
y x
yx, y x
236607 Visual Recognition Tutorial
3
Continuous Random Variable
The domain of continuous random variable (c.r.v.) is uncountable.
The distribution function of c.r.v can be defined as
x
F ( x) 
 p( y)dy

where the function p(x) is called a probability density function . It is
important to mention, that a numerical value of p(x) is not a “probability
of x”. In the continuous case p(x)dx is a value which approximately
equals to probability Pr[x<X<x+dx]
Pr[ x  X  x  dx]  F ( x  dx)  F ( x)  p( x)dx
236607 Visual Recognition Tutorial
4
Continuous Random Variable
Important features of the probability density function :

 p( x)dx  1

x  R : Pr( X  x)  0
b
Pr(a  X  b)   p( x)dx
a
236607 Visual Recognition Tutorial
5
Expected Values and Moments
The mean or expected value or average
of x is defined by
m
E[ x]     xP( x)   vi pi for d.r.v.

E[ x] 
x
i 1
 xf ( x)dxfor c.r.v.

If Y=g(X) we have:
E[Y ]  E[ g ( X )] 

E[ g ( X )] 

g ( x) P( x)for d.r.v X
x:P ( x )  0
 g ( x) P( x)dxfor c.r.v
X

The variance is defined as:
var ( X )  s 2  E[( X   ) 2 ]   ( x   ) 2 P( x)  E[( x 2 )]  ( E[ x]) 2 ,
x
where s is the standard deviation of x.
236607 Visual Recognition Tutorial
6
Expected Values and Moments
Intuitively variance of x indicates distribution of its
samples around its expected value (mean).
Important property of the mean is its linearity:
E[aX  bY ]  aE[ X ]  bE[Y ]
At the same time variance is not linear:
var (aX )  a 2 var( X )
• The k-th moment of r.v. X is E[Xk] (the expected
value is a first moment). The k -th central moment is
s k  E[( X   ) k ]  E[( X  E[ X ]) k ]
236607 Visual Recognition Tutorial
7
Joint and Marginal Probability
Let X and Y be 2 random variables with domains
  v1 v2
vm  and
  w1
wn 
For each pair of values (vi , w j ) we have a joint
probability pij  Pr{ X  vi , Y  w j }.
P(x,y)  1
joint mass function P( x, y )  0, and 
x y
The marginal distributions for x and y are defined as
Px ( x)   P( x, y ), and
y
Py ( y )   P( x, y )for d.r.v.
x
For c.r.v. marginal distributions can be calculated as

PX ( x) 
 P( x, y)dy

236607 Visual Recognition Tutorial
8
Means and variances
The variables x and y are said to be statistically independent
if and only if
P( x, y )  Px ( x) Py ( y )
The expected value of a function f(x,y) of two random
variables x and y is defined as
 
E[ f ( x, y )]   f ( x, y) P( x, y);or 
x y
The means and variances are:

f ( x, y)P( x, y)dxdy
 
 x  E[ x]   xP( x, y )
x y
 y  E[ y ]   yP( x, y )
x y
s x2  V [ x]  E[( x   x )2 ]   ( x   x ) 2 P( x, y )
x y
s y2  V [ y ]  E[( y   y ) 2 ]   ( y   y ) 2 P( x, y ).
 y
236607 Visual RecognitionxTutorial
9
Covariance matrices
 E[ x1 ]   1 
 E[ x ]   
  E[ x]   2    2    xP(x).

   x

  
 E[ xd ]   d 
The covariance matrix S is defined as the square matrix
  E[(x  μ)(x  μ)t ],
whose ijth element sij is the covariance of xi and xj:
cov( xi , x j )  s ij  E[( xi  i )( x j   j )],
236607 Visual Recognition Tutorial
i, j  1,
, d.
10
Cauchy-Schwartz inequality
var( X  Y )  E[( X  Y  ( x   y )) 2 ]  E[( ( X   x )  (Y   y )) 2 ]
  2 E[( X   x )2 ]  2 E[( X   x )(Y   y )]  E[(Y   y ) 2 ]
  2s x2  2s xy  s y2  0
From this we have the Cauchy-Schwartz inequality
s xy2  s x2s y2
The correlation coefficient is normalized covariance
 ( x, y)  s xy /(s xs y )
It always 1   ( x, y )  1 . If  ( x, y )  0 the variables x
and y are uncorrelated. If y=ax+b and a>0, then  ( x, y )  1
If a<0, then  ( x, y )  1.
Question.Prove that if X and Y are independent r.v. then  ( x, y )  0
236607 Visual Recognition Tutorial
11
Covariance matrices
s 11 s 12
s
s 22
21




s d 1 s d 2
s 1d   s 12 s 12
s 2 d  s 21 s 22


s dd 



s d 1 s d 2
s 1d 

s 2d 

2 
s d 
If the variables are statistically independent, the covariances
are zero, and the covariance matrix is diagonal.
The covariance matrix is positive semi-definite: if w is any ddimensional vector, then wt  w  0 . This is equivalent to
the requirement that none of the eigenvalues of S can ever be
negative.
236607 Visual Recognition Tutorial
12
Univariate normal density
The normal or Gaussian probability function is very
important. In 1-dimension case, it is defined by probability
2
1  x 
density function
 

1
p( x) 
2s
e
2 s 
The normal density is described as a "bell-shaped curve", and
it is completely determined by   , s  .
The probabilities obey
Pr  x    s   0.68
Pr  x    2s   0.95
Pr  x    3s   0.997
236607 Visual Recognition Tutorial
13
Multivariate Normal densities
Suppose that each of the d random variables xi is normally
2
distributed, each with its own mean and variance:p( xi ) ~ N ( i , s i )
If these variables are independent, their joint density has the
form
1  x  
d
d

2
p(x)   pxi ( xi )  

1
e
2s i
i 1
i 1
1
d
1  xi  i 



2 i1  s i 
d
(2 ) d / 2  s i
e

i

2 si
i


2
i 1
This can be written in a compact matrix form if we observe
that for this case the covariance matrix is diagonal, i.e.,
236607 Visual Recognition Tutorial
14
Covariance matrices
s 12 0

2
0
s
2



0
 0
0

0

2
s d 
• and hence the inverse of the covariance matrix is easily
written as
1/ s 12
0
0 


2
0
1/
s
0
2

 1  



2
0
1/ s d 
 0
236607 Visual Recognition Tutorial
15
Covariance matrices
and
2
 xi  i 
t
1

(
x

μ
)

(x  μ)
 s 
i


• Finally, by noting that the determinant of S is just the product
of the variances, we can write the joint density in the form
p( x) 
1
(2 )
d /2

1/ 2
e
1
 ( x μ )t 1 ( x μ )
2
• This is the general form of a multivariate normal density
function, where the covariance matrix is no longer required
to be diagonal.
236607 Visual Recognition Tutorial
16
Covariance matrices
The natural measure of the distance from x to the mean  is
t
provided by the quantity r 2   x  μ   1  x  μ 
which is the square of the Mahalanobis distance from x to .
236607 Visual Recognition Tutorial
17
Example:Bivariate Normal Density
where
thus

2
s 11s 12  s 1 s 1s 2 
Σ

,

2
s 21s 22   s 1s 2 s 2 
s 12
is a correlation coefficient;|  | 1
s 1s 2
 1 

2
s 1s 2 
  s1
2 2
2
1
| Σ | s 1 s 2 (1   ),Σ 

 
    
1 2
s
s
s 2 

1 2
and after doing dot products in (x - μ)T Σ1 (x - μ) we get the
expression for bivariate normal density:
p( x1 , x2 )  N[ 1,  2][s 1,s 2]
2
2











x


x


x


x


1
1
1
1
1
1
2
2
2
2




exp  

2









 2(1   2 )  s 1 
s 1  s 2   s 2   
2s 1s 2 1   2


 

236607 Visual Recognition Tutorial
18
Some Geometric Features
The level curves of the 2D Gaussian are ellipses; the
principal axes are in the direction of the eigenvectors of S,
and the different width correspond to the corresponding
eigenvalues.
For uncorrelated r.v. ( =0 ) the axes are parallel to the
coordinate axes.
For the extreme case of   1 the ellipses collapse into
straight lines (in fact there is only one independent r.v.).
Marginal and conditional densities are unidimensional
normal.
236607 Visual Recognition Tutorial
19
Some Geometric Features
236607 Visual Recognition Tutorial
20
Law of Large Numbers and Central
Limit
Theorem
Law of large numbers Let X1, X2,…,be a series of i.i.d.
(independent and identically distributed) random variables
with E[Xi]= .
Then for Sn= X1+…+ Xn
1
lim S n  
n  n
Central Limit Theorem Let X1, X2,…,be a series of i.i.d.
r.v. with E[Xi]= and variance var(Xi)=s2 . Then for Sn=
X1+…+ Xn
S n  n D

 N (0,1)
s n
236607 Visual Recognition Tutorial
21