Download Review of probability and random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Review of Probability,
Random Process, Random
Field
for Image Processing
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
1
Probability Models

Experiment:
» Throw a dice, toss a coin, …
 Example : Card draw
» Each experiment has an
outcome
» A single card is drawn from a
» Experiments can be repeated.
well shuffled deck of playing

Sample space ()
» The set of all possible outcomes
of an experiments

Event
» A subset of outcomes in  that
has a particular meanings.

Probability of an event A
» P(A) = |A|/||
» |A|: cardinality of A, # of
elements in set A.
© 2002-2003 by Yu Hen Hu
cards. Find P(drawing an
Ace)
»  = {1, 2, …, 52}, || = 52.
» Event A = drawing an Ace.
Assume the four Aces cards
are labeled as 1, 2, 3, 4, then
event A = {1, 2, 3, 4}, |A| = 4
» Thus, P(A) = 4/52 = 1/13
ECE533 Digital Image Processing
2
Axioms of a Probability Model



Each outcome i of an
experiment can be assigned
to a probability measure P(i)
such that 0  P(i)  1.
For fair experiments where
each outcome is equally likely
to occur, P(i) = 1/||
In general, the probability of
an event, which is a set of
outcomes is evaluated as:
P( A) 
P( )


i A
© 2002-2003 by Yu Hen Hu
i
Given a set A, its corresponding
probability measure P(A) has
the following properties:
1. P() = 0.
•
2.
3.
The empty set is an
impossible event.
P(A)  0 for every event A.
If Am  An =  for m  n, then
  
P  An    P( An )
 n 1  n 1
4.
P() = 1.
•
The probability of entire
sample space = unity.
ECE533 Digital Image Processing
3
Independence
Question:
Given a fair coin, and a wellshuffled deck of cards. What is
the probability of toss the coin
and observe a Head AND
drawing a Jack of hearts?
Answer:
P(Head) = ½
P(Jack of hearts) = 1/52.
But the events of tossing a coin
and drawing a card are
independent!. Hence
P(Head AND Jack of hearts) =
P(Head)P(Jack of hearts) =
1/104.
© 2002-2003 by Yu Hen Hu
Independence
Two events A and B are statistically
independent if
P(AB) = P(A)P(B)
Independence of N events
Given N events {An; 1nN}. We
say these N events are mutually
independent iff
P(  A j )   P( A j )
jJ
jJ
where J  {1, 2, …, N} is any
subset of the indices
ECE533 Digital Image Processing
4
Conditional Probability
Let A and B be two events in the
same sample space . Given
that B has occurred, the
conditional probability that A will
also occur is defined as:
P( A | B) 
P( A  B)
P( B)
A perfect dice is thrown twice.
Given that the sum of the two
outcomes is 9. What is the
probability that the outcome of
the first throw is 4?
Answer
Assuming P(B)0.
Theorem.
If A and B are independent
events, then
P(A|B) = P(A)P(B)/P(B) = P(A)
© 2002-2003 by Yu Hen Hu
Example
Let the outcome of the first throw
is m, the second throw is n. Then
B={(m,n); m+n=9, 1m,n6}
={(3,6),(4,5),(5,4),(6,3)}
AB={(m,n); m=4,n=5, 1m,n6}
={(4,5)}
P(A|B) = P(AB)/P(B)
= (1/36)/(4/36) = ¼
Note that P(A) = 1/6.
ECE533 Digital Image Processing
5
Law of Total Probability & Bayes’ Rule
Bayes’ Rule
Law of total probability
Let {Bn} be a set of events that partitions
the sample space :
B
n
P( Bn | A) 
  and
Thus,
n
Bm  Bn   for m  n
P( Bn | A) 
Then for any event A  ,
A  A   Bn   ( A Bn )
n
P( Bn  A) P( A  Bn )

P( A)
P( A)
P( A | Bn ) P( Bn )
 P( A | Bn ) P( Bn )
n
n
Thus,
P( A)  P( ( A  Bn ))
n
  P( A  Bn )   P( A | Bn ) P( Bn )
n
© 2002-2003 by Yu Hen Hu
n
ECE533 Digital Image Processing
6
Random Variable



A random variable X() is a
real-valued function defined
for points  in a sample
space 
Example: If  is the whole
class of students, and is an
individual student, we may
define X() as the height of
an individual student (in feet)
Question: What is the
probability of a student’s
height between 5 feet and 6
feet?
© 2002-2003 by Yu Hen Hu


Define B=[5, 6]. Our goal is to find
the probability
P({ : 5  X()  6}) = P({ :
X() B}) = P({XB})
In general, we are interested in the
probability P({XB}) or for
convenience, P(XB).
» If B = {xo} is a singleton set, we may
simply write P(X=xo}.


Example 2.1
P(a  X b) = P(X b) P(X a)
Example 2.2
P(X=0 or X = 1) = P(X=0) + P(X=1)
ECE533 Digital Image Processing
7
Probability Mass Functions (PMF) and
Expectations

Probability mass function (PMF)
is defined on a discrete random
variable X by pX(xi) = P(X = xi)
Hence,
P ( X  B) 
 I B ( x i )P ( X  x i )
i


 IB (xi )p X (xi )
Joint PMF of X and
Y:
i
p XY ( x i , y j )  P ( X  x i ,Y  y j )
 P ({X  x i }  {Y  y j })
© 2002-2003 by Yu Hen Hu

Marginal PMF:
p X (xi ) 
p XY ( x i , y j )

j
pY ( y j ) 
 p XY ( x i , y j )
i

Expectations (mean, average):
E[ X ] 
 xi P( X  xi )
i

 xi p X (xi )
i
ECE533 Digital Image Processing
8
Moments and Standard Deviation
n-th moment: E[Xn]
Defined over a real-valued
random variable X.
Standard Deviation: var(X)
Let m = E[X], then
Var[X] = E[(X-m)2]
= E[X2 – 2Xm + m2]
= E[X2] – 2mE[X] + m2
= E[X2] – m2
= E[X2] – (E[X])2
© 2002-2003 by Yu Hen Hu
Example Find the E[X2] and
var(X) of a Bernoulli r.v. X:
E[X2] = 02 (1 – p) + 12 p = p
Since E[X] = p, thus,
Var(X) = E[X2] – (E[X])2
= p – (p)2 = p(1 – p)
Example Let X ~ poisson().
Since E[X(X – 1)] = 2, we
have E[X2] = 2 + . Thus,
var(X) = (2 + ) – 2 = .
ECE533 Digital Image Processing
9
Conditional Probability
The conditional probability is
defined as follows:
P ( X  B | Y C )
Example Let X = message to
be sent (an integer). For X =
i, light intensity i is directed
at a photo detector. Y ~
Poisson(i) = # of photoelectrons generated at the
detector.
Solution: for n = 0, 1, 2, …
 P ({ X  B} | {Y  C})
P ({ X  B}  {Y  C})

P ({Y  C})
P ( X  B, Y  C )

P (Y  C )
In terms of pmf, we have
p X |Y ( x i | y j ) 
© 2002-2003 by Yu Hen Hu
P (Y  n | X  i ) 
p XY ( x i , y j )
pY (y j )
ni e i
n!
Thus, P(Y<2|X=i) =
P(Y=0|X=i) + P(Y=1|X=i)
= (1   i ) exp(  i )
ECE533 Digital Image Processing
10
Definitions of Continuous R.V.s
Definition: Continuous R.V.
Let X() be a random variable
defined on a sample space . X is
a continuous random variable if
Definition: probability density
function (pdf)
f(x) is a probability density function
if
1. f (x )  0
P ( X  B)


  f ( x )dx   IB ( x )f ( x )dx
B
© 2002-2003 by Yu Hen Hu
2.  f(x)dx  1
-

ECE533 Digital Image Processing
11
Cumulative Distribution Function
Definition: The cumulative
distribution function (cdf) of a
random variable X is defined by
FX(x) = P(Xx)
lim F ( x )  1
x 
(d)
lim F ( x )  0
x
(e) 
(f) F(x) is right continuous. I.e.
F ( x0 )  lim F ( x )  P ( X  x0 )  F ( x0 )
x  x0
If X is a continuous
random var.
x
FX ( x ) 
 f (v )dv
f ( x )  dFX ( x ) / dx
Properties of CDFs
(a) 0  F(x)  1.
(b) F(b)F(a) = P(a  X  b)
(c) a < b implies F(a) < F(b)
© 2002-2003 by Yu Hen Hu
(g)
(h)
F ( x0 )  lim F ( x )  P ( X  x0 )
x  x0
P(X=x0) = F(x0) – F(x0 )
Note that if F(x) is continuous at
x0, F(x0+) = F(x0 ) = F(x0).
From (h), P(X=x0) = 0!
ECE533 Digital Image Processing
12
Functions of Random Variables
Let X be a random variable, and
g(X) a real-valued function.
Y=g(X) is a new random
variable. We want to find
P(YC) in terms of FX(x). For
this, we must find the set B
B  {x  R; g ( x )  C}
such that
P (Y  C )  P (g ( X )  C )  P ( X  B )
To find FY(y), C = (-, y], or
equivalently,
B  {x  R; g ( x )  y}
© 2002-2003 by Yu Hen Hu
Example X: input voltage, a
random variable. Y = g(X) = aX +
b where a 0 is the gain, and b is
offset voltage.
Solution: g(x)  y iff x  (y-b)/a for
a > 0, and x  (y-b)/a for a < 0.
a>0: FY(y) = FX((y-b)/a),
fY(y) = dF/dy = (1/a)fX((y-b)/a)
a<0: FY(y) = 1-FX((y-b)/a),
fY(y) = dF/dy = (-1/a)fX((y-b)/a)
In summary,
fY(y) = (1/|a|)fX((y-b)/a)
ECE533 Digital Image Processing
13
Random Processes and Random Fields
Random Process : A family of
random variables Xt()
For each fixed outcome  ,
Xt() is a function of t (time).
For fixed index t, Xt() is a
random variable defined on .
Example a. A jukebox has 6
songs. You roll a dice and
based on its outcome to pick a
song.
Example b. Let t {0, 1, 2, …}.
At each t, toss a coin. Xt() = 0
if outcome is tail, = 1 if outcome
is head.
© 2002-2003 by Yu Hen Hu
Random Field: A random field is
a random process that is
defined on 2D space rather than
on 1D (time).
For a monochrome image, the
intensity of each pixel f(x,y) =
Xx,y() is modeled as a random
variable. For a particular
outcome i, f(x, y) is a
deterministic function of x, and
y. All results applicable to
random processes can be
applied to random field.
ECE533 Digital Image Processing
14
Mean, Correlation and Covariance
Mean
If Xt is a random process, it mean
function is
mX (t )  E[ X t ]
where the expectation is taken w.r.t.
pmf or pdf of Xt at time t.
Correlation
RX (t1 , t 2 )  E[ X t1 X t2 ]
Covariance
C X (t1 , t2 )
 E[( X t1  mX (t1 ))( X t2  mX (t2 ))]
© 2002-2003 by Yu Hen Hu
Example a. Denote si(t) to be the
time function of ith song. Then
t
m X (t )   (1 / 6) si (t )
i 1
1 6 6
RX (u, v)   si (u ) s j (v)
36 i 1 j 1
Example d. Given that X0 = 5.
Hence P(X1 = 4) = 1, mX(1) = 4.
P(X2 = 3) = (4/5)(4/5) = 16/25,
P(X2 = 4) = (4/5)(1/5) + (1/5)(4/5)
= 8/25, P(X2 = 5) = 1/25. Thus,
mX(2) = 3(16/25)+4(8/25)+5(1/25)
= 17/5
ECE533 Digital Image Processing
15
Stationary Process and WSS
Any property that depends on
the value of {Xt} at k index
points t1, t2, …, tk is completely
characterized by the joint pdf (or
pmf) of Xt1, Xt2, …, Xtk denoted
by (pdf case) f(X(t1), …, X(tk))
Definition Stationary Process
{Xt} is (strictly) stationary if for
any finite set of time points {t1,
t2, …, tk}, their joint pdf is time
invariant.
f(X(t1), …, X(tk))
= f(X(t1+), …, X(tk +))
© 2002-2003 by Yu Hen Hu
Definition Wide-sense stationary
{Xt} is wide-sense stationary if its
first two moments are
independent of time. That is,
mX(t) = E[Xt] = mX
RX(u,v) = RX(uv)
Let u = t + , v = t, we may write
RX(u,v) = RX((t + )-t) = RX()
ECE533 Digital Image Processing
16
Power Spectral Density and Power
Definition Power Spectral Density

SX ( f ) 
 j 2f
R
(

)
e
d  0
X


RX ( ) 

j 2f
S
(
f
)
e
df
X


SX(f) gives the power density of
the process Xt distributed over
each frequency f. Hence it must
be non-negative.
Definition Power PX
Properties of PSF and Correlation
function:
a) R() = R(). Hence SX(f) is a
real valued, even function.
b) R()  R(0).
To prove, use Cauchy-Schwarz
inequality:
E[UV]2  E[U2]E[V2]
c) SX(f) is real, even and nonnegative.

PX  E[| X t | ]  RX (0) 
2
S
X
( f )df
f  
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
17
LTI System: A brief review




A system y(t) = L[x(t)] is a
mapping of a function x(t) to
a function y(t).
L[] is a Linear System iff
L[ax1+bx2] = a L[x1]+b L[x2]
L[] is time invariant iff
L[x(t+u)] = y(t+u)
A LTI (linear, time invariant)
system can be uniquely
characterized by its impulse
response h(t) = L[(t)]
© 2002-2003 by Yu Hen Hu
Given a LTI system y(t) = L[x(t)], y(t)
can be obtained via the convolution
between x(t) and the impulse
response h(t):

y(t ) 
 h(u) x(t  u)du  h(t ) * x(t )
u  
The Fourier transform of h(t) is
called the transfer function

H( f ) 
 j 2ft
h
(
t
)
e
dt

t  
Y( f )  H( f )X ( f )
ECE533 Digital Image Processing
18
LTI System with WSS input

Let a WSS process Xt be the
input to a LTI system with
impulse response h(t). The
output is denoted by Yt.

 h(u) X t u du
Yt 
Cross correlation between Xt, Yt



E[ X tYs ]  E  X t  h(u ) X s u du 
 u  



If E[Xt] = m, then


E[Yt ]  E   h(u ) X t u du 



 m  h(u )du  m  H (0)

t
s u
]du
u  
u  

 h(u) E[ X X


 h(u) R
u  
Define
X
(t  s  u )du  RXY (t  s)

RXY ( )   h(u ) RX (  u )du


  h(v) RX (  v)dv

 h( ) * RX ( )
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
19
LSS Input to a LTI System (Cont’d)
Define cross PDF as the Fourier
transform of RXY(),

S XY ( f ) 
 j 2f
R
(

)
e
d
 XY
Substitute t-s with , we have
RY ( )   h(u ) RXY (  u )du

Taking Fourier transform,
SY ( f )  H ( f ) S XY ( f )

then
SXY(f) = H*(f)SX(f)
Therefore,
 H ( f ){H * ( f ) S X ( f )}
 
 

E[YtYs ]  E   h(u ) X t u du Ys 
 u  
 
| H ( f ) |2 S X ( f )


 h(u ) E[ X
Y ]du
t u s
u  


 h(u ) R
X
(t  s  u )du  RY (t  s )
u  
© 2002-2003 by Yu Hen Hu
ECE533 Digital Image Processing
20