Download Random Variables … Functions of Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Four-vector wikipedia , lookup

Matrix multiplication wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Gaussian elimination wikipedia , lookup

Matrix calculus wikipedia , lookup

Principal component analysis wikipedia , lookup

Transcript
Random Variables …
Functions of Random Variables
Towards Least Squares …
Distributions - Binomial Distribution
Poisson Distribution
Practical Scenario



The adjoining signals
are from independent
channels and are giving
information.
Can we conclude
anything at all from the
values ?
Are they Biased???
Law of Large Numbers
• {Xn} be the sequence of measurements we take every time.
• Does the measurement average, X, converge to a fixed value ?
• How do we define convergence in probability spaces?
Law of Large Numbers gives an answer !!!
Let E ( X i )  i , V ( X i )   i 2 , Cov( X i X j )  0, i  j
1
Lt 2
n  n
2

 i  0  X i  i  0
i 1:n
This is the hieroglyphic way of saying
The mean value being measured tends to the sample mean.
And it does converge!!!
Central Limit Theorem
• In applications, limiting distributions are required for further analysis
of experimental data.
• The Random Variable Xn stands for a statistics computed from a
sample of size n.
• Actual distribution of such an RV is often difficult to find.
• In such a case, we often use in practice approximate the distribution
of such RVs with limiting distributions.
• It is justified for large values of n, by the Central Limit Theorem.
As was shown practically in the last class, for
large number of samples, the distribution is
approximated to be gaussian (under mild
assumptions).
This fact is called “Central Limit Theorem”
Expectation Revisited
Two Random Variables – Important Definitions


f X ( x) 

E ( g ( X )) 
f X ,Y ( x, y )dy
E (h(Y )) 

E (Y ) 
  g ( x) f
X ,Y
( x, y )dydx
 
 xf
( x)dx 
E ( g ( X , Y )) 
  xf

 

 
 yf
Y
( y )dy 

( x, y )dxdy
  yf
 
  g ( x, y ) f
X ,Y
( x, y )dxdy
 
X ,Y
X ,Y
( x, y )dxdy
( x, y )dydx
What abt.
g(X,Y)=aX+bY ?
Where,
f X ( orY ) ( x) is the Marginal Probability Density Function
of the R.V. X(or Y).
X ,Y
 
 
X
  h( y ) f
 
f X ,Y ( x, y )dx

E( X ) 
( x)dx 



X
 

fY ( x ) 
 g ( x) f
 
Multidimensional Random Variables

Joint Probability Functions:
 Joint Probability Distribution Function:
F ( X )  P[{X1  x1} {X 2  x2 }  ......... {X n  xn }]

Joint Probability Density Function:
n F ( X )
f ( x) 
X 1X 2 ...X n

Marginal Probability Functions: A marginal
probability functions are obtained by integrating out
the variables that are of no interest.
Multivariate Expectations

Mean Vector:
E[x]  [ E[ x1 ] E[ x2 ] ...... E[ xn ]]

Expected value of g(x1,x2,…….,xn) is given by
E[ g (x)]    ..... g (x) f (x) or
xn xn1

x1
  ..... g (x) f (x)dx
xn xn-1
x1
Covariance Matrix:
cov[x]  P  E[(x   )(x   )T ]  E[xxT ]   T
where, S  E[xxT ] is known as autocorrelation matrix.
 1 0  0   1
 0   0  
2
  21
NOTE: P  R  




 0 0   n    n1
12
1
n 2
1n   1 0 
 2 n   0  2 


1   0
0
0
0 


  n 
Gaussian or Normal Distribution


The normal distribution is the most widely known and used
distribution in the field of statistics.
 Many natural phenomena can be approximated by
Normal distribution. (Owing to the Central Limit
Theorem).
Normal Density Function:
0.399
1
f ( x) 
e
 2
( x )
2 2

2
x 
An Important Trick:

I
e
t 2
2
dt  2 ( Normalization)

-2 -
 + +2
Two Dimensional Gaussian R.V.
Two Random Variables, X and Y are said to be jointly Gaussian
if their density is of the form
f ( x, y )  Ae
 ( ax 2  bxy  cy 2  dx  ey )
where
ax 2  bxy  cy 2  dx  ey is a quadratic form
which is Positive Semi Definite in general.
This density can be written in the form
f ( x, y ) 
1
2 1 2 1  r
2
e
  x   2  2 r ( x   )( y   )   y   2 
1
1
1
2
2


 

 
2  
 1 2

2(1 r )  1  
  2  

with  1 ,  2 , 1 , 2 , r as the parameters of the distribution
with  i > 0 and r |1|
Positive semi definite Matrices

A Symmetric Matrix A, is said to be positive semi definite if one of
the equivalent conditions below are satisfied.
1. All Eigenvalues of A are positive or zero.
2. There exists a nonsingular matrix A1 such that A = A1*A1T.
(Cholesky Decomposition)
3. Every principal minor of A is positive
4. x’Ax >= a|x|. For all x and some a>0.

The additional useful property is that
A = UDU’, U is an orthogonal matrix.
We can use the above and show that the covariance matrix is atleast
positive semidefinite … a little proof!
Geometrical Interpretation
Consider the positive definite quadratic form again.
f  xT Ax;
2
2
2
x  y z
when n=3:          1
a b c
The special choice of principal coordinates
The Gradient, f  2 Ax.
Geometry of the Ellipsoids tell us that the principal
axes are normal to the surface  Proportionality
to the Gradient.
Let unknown Proportionality constants be 2.
Then for the principal directions, x, the
2 x  f  2 Ax
Eigenvalue Problem !!! Now we can locate the size
of the ellipsoid.
f
 x  Ax
Multivariate Normal Distribution

Multivariate Gaussian Density Function:
1
f ( X) 
n

How to find equal probability surface?

1
Xμ
2

2 P
e
T 1
 1

  2  X μ  P  X μ  



T
R
1
 Xμ  constant
More over one is interested to find the probability of x lies
inside the quadratic hyper surface
 For example what is the probability of lying inside 1-σ
 1

0
0 
ellipsoid.

Y  C( X  μ)

2
P   zi2  c 2    f ( z )dV
V
R 1  CΣCT
zi 
Yi
i
z12  z22 
 zn2  c 2
 1

 0



 0

1
 22
0


0 
Σ


1 
 n2 
Illustration of Expectation
A Lottery has two schemes, the First scheme has two outcomes
(denoted by 1 and 2)and the second has three (denoted by 1,2
and 3). It is agreed that the participant in the First scheme gets
$1, if outcome is 1, $2, if the outcome is 2. The participant in
the second scheme gets $3 if the outcome is 1, -$2 if the
outcome is 2 and $3 if the outcome is 3. The probabilities of
each outcome are listed as follows.
p(1, 1) = 0.1; p(1, 2) = 0.2; p(1, 3) = 0.3
p(2, 1) = 0.2; p(2, 2) = 0.1; p(2; 3) = 0.1
Help the investor to decide on which scheme to prefer.[Bryson]
Constant Probability Surfaces: 2D
OBSERVATIONS
Example Problem.
• What does it tell us?
• How do we draw
these?
• Importance of the
Shape of the Ellipses.
• In n dimensions they
are hyper ellipsoids
P matrix was found to be
2.2
2.2
2.2
7.5
Surfaces Plotted for the Problem Considered In The Previous Lecture
Clearly, the dime gain has a lot of uncertainty.
The investor better not risk !!!
Multivariate Normal Distribution


Yi represents coordinates based on Cartesian
principal axis system and σ2i is the variance along
the principal axes.
Probability of lying inside 1σ,2σ or 3σ ellipsoid
decreases with increase in dimensionality.
n\c
1
Curse of Dimensionality
2
3
1 0.683 0.955 0.997
2 0.394 0.865 0.989
3 0.200 0.739 0.971
Functions of Random Variables.
Let Y be a function of the random variable X.
Y  g( X )
•We are interested in deriving the pdf and cdf of Y in
terms of x.
fY ( y )  
i
f X ( xi )
J ( xi )
where
xi are the solution vectors of the algebraic mapping
y  g ( x).
J ( x) is the Jacobian defined as
g
x
• This property can be used to derive the important result that,
“A linear mapping of jointly gaussian random variables is still
jointly gaussian” (2D demonstration follows …)

Example:
Let y  ax 2 and p( x) 
1
 x 2
exp( x 2 / 2 x2 )
NOTE: for each value of y there are two values of x.
p( y ) 
and
1
exp( y / 2a x2 ),  y  0
2 x 2 ay
p(y) = 0, otherwise
We can also show that
E ( y )  a x2 and V ( y )  2a 4 x4
A linear mapping of jointly gaussian random variables is still
jointly gaussian simple demonstration.
Z  aX  bY , and W  cX  dY .
f ZW ( z, w) 
f ZW (a1 z  b1w, c1 z  d1w)
,
ad  bc
where x  a1 z  b1w and y  c1 z  d1w
Covariance Matrix


Covariance matrix indicates the tendency of each pair of
dimensions in random vector to vary together i.e. “covary”.
Properties of covariance matrix:
 Covariance matrix is square.
T
 Covariance matrix is always +ve definite i.e. x Px > 0.
 Covariance matrix is symmetric i.e. P = PT.
 If xi and xj tends to increase together then Pij > 0.
 If xi and xj are uncorrelated then Pij = 0.
Independent Variables


Recall, two random variables are said to be independent if knowing
values of one tells you nothing about the other variable.
 Joint probability density function is product of the marginal
probability density functions.
 Cov(X,Y)=0 if X and Y are independent.
 E(XY)=E(X)E(Y).
Two variables are said to be uncorrelated if cov(X,Y)=0.
 Independent variables are uncorrelated but vice versa is not
true.
 Converse only true only when jointly Gaussian
  x   2 rxy   y 
1
   

 
2(1 r 2 )   1   1 2    2 

2
f ( x, y ) 
1
2 1 2 1  r
2
e
r 0
f ( x, y ) 
1
2 1 2
e
2
2
1   x   y  
    
2   1    2  


 f X ( x ) fY ( y )
2




Conditioning in Random Vectors
Conditional Marginal density and distribution
functions are given as :
Useful Tricks

f ( x1 | x3 ) 
fY ( y | X  x ) 
f XY ( x, y )


f XY ( x, y )dy
FY ( y | X  x) 




f ( x1 , x2 | x3 )dx2

f ( x, y )
 XY
f X ( x)
Left Removal
 
f ( x1 | x4 ) 

y


f ( x1 | x2 , x3 , x4 ) f ( x2 , x3 | x4 )dx2 dx3
 
Similarly
f XY ( x, y )
for X
Right Removal
f XY ( x, y )dy


 g ( y) f

E ( g (Y ) | X  x) 
 g ( y) f
Y
( y | X  x)dy  
Conditional Expectation :
f XY ( x, y )dy

Special Case:

 yf

E (Y | X  x) 
Note that this is a
Number
 yf

Y
XY
( x, y )dy
( y | X  x )dy  


( x, y )dy



XY
f XY ( x, y )dy
E Y | X  x  R
E{Y | X } is a Random Variable.
This is the locus of the Centers of Masses of
Marginal Density of Y along X.
X
Y
Useful Result:
E E Y | X   E Y 
Theorem:

The function g ( X ) that minimizes E Y - g ( X )
2

is the conditional expected value of Y assuming X  x ,
E Y | X  x
More Generally
E E  g ( X , Y ) | X   E  g ( X , Y ) 
E E  g ( X ) g (Y ) | X   E g ( X ) E{g (Y ) | X }
(1)
Orthogonality and Least Squares
Theorem:


The constant a that minimizes the m.s. error, E Y - aX  ,
Y - aX
is such that Y - aX is orthogonal to X .
E Y  aX  X   0
Y
aX
How would we measure angles in the
probability space then?
This is precisely what correlation does.
sin   r 
E{ XY }
E{ X 2 }E{Y 2 }
Correlation Coefficient
For a 2D space
Consider the example of 2D jointly Gaussian problem …
2