Download Discrete Random Variables: Joint PMFs, Conditioning and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Discrete Random Variables: Joint PMFs,
Conditioning and Independence
Berlin Chen
Department of Computer Science & Information Engineering
National Taiwan Normal University
Reference:
- D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability , Sections 2.5-2.7
Motivation
• Given an experiment, e.g., a medical diagnosis
– The results of blood test is modeled as numerical values of a
random variable X
– The results of magnetic resonance imaging (MRI,核磁共振攝影)
is also modeled as numerical values of a random variable Y
We would like to consider probabilities of events involving
simultaneously the numerical values of these two variables and
to investigate their mutual couplings
P X  x  Y  y  ?
Probability-Berlin Chen 2
Joint PMF of Random Variables
• Let X and Y be random variables associated with
the same experiment (also the same sample space and
probability laws), the joint PMF of X and Y is defined
by
p X ,Y  x , y   P X  x  Y  y   P  X  x ,Y  y 
• if event A is the set of all pairs  x, y  that have a
certain property, then the probability of A can be
calculated by
P  X ,Y  A  
– Namely,

 x , y A
p X ,Y  x , y 
A can be specified in terms of X and Y
Probability-Berlin Chen 3
Marginal PMFs of Random Variables (1/2)
• The PMFs of random variables X and Y can be
calculated from their joint PMF
p X  x    p X ,Y  x , y ,
y
pY  y    p X ,Y  x , y 
x
 
– p X  x  and pY y are often referred to as the marginal PMFs
– The above two equations can be verified by
p X x   P  X  x 
  P  X  x ,Y  y 
y
  p X ,Y  x , y 
y
Probability-Berlin Chen 4
Marginal PMFs of Random Variables (2/2)
• Tabular Method: Given the joint PMF of random
variables X and Y is specified in a two-dimensional
table, the marginal PMF of X or Y at a given value
is obtained by adding the table entries along a
corresponding column or row, respectively
Probability-Berlin Chen 5
Functions of Multiple Random Variables (1/2)
• A function Z  g  X ,Y  of the random variables X and Y
defines another random variable. Its PMF can be
calculated from the joint PMF p X , y
pZ z  

 x , y  g  x , y z 
p X ,Y  x , y 
• The expectation for a function of several random
variables
EZ   Eg  X ,Y     g  x, y  p X ,Y  x, y 
x
y
Probability-Berlin Chen 6
Functions of Multiple Random Variables (2/2)
• If the function of several random variables is linear and
of the form Z  g  X ,Y   aX  bY  c
E Z   aE  X   bE Y   c
– How can we verify the above equation ?
Probability-Berlin Chen 7
An Illustrative Example
• Given the random variables X and Y whose joint is
given in the following figure, and a new random
variable Z is defined by Z  X  2Y , calculate EZ 
– Method 1:
E X   1 
3
6
8
3
51
 2
 3
 4

20
20
20
20 20
3
7
7
3 50
 2
 3
 4

EY   1 
20
20 20
20
20
51
50 151
 2

 7.55
E[ Z ]  E X   2EY  
20
20 20
– Method 2:
p Z z  
p X ,Y  x, y 

 x , y  x  2 y  z 
1
1
2
2

E
Z
[
]

3


4


5


6

1
1
2
2
20
20
20
20
p Z 3 
, p Z 4  
, p Z 5  
, p Z 6  
20
20
20
20
2
4
3
3
7
8
9
10








4
3
3
2
20
20
20
20
p Z 7  
, p Z 8  
, p Z 9  
, p Z 10  
20
20
20
20
1
1
12

11



 7.55
1
1
20
20
p Z 11 
, p Z 12  
20
20
Probability-Berlin Chen 8
More than Two Random Variables (1/2)
• The joint PMF of three random variables X ,
is defined in analogy with the above as
Y and Z
p X ,Y , Z  x, y , z   P  X  x, Y  y , Z  z 
– The corresponding marginal PMFs
p X ,Y  x, y    p X ,Y , Z  x, y , z 
z
and
p X  x     p X ,Y , Z  x, y , z 
y z
Probability-Berlin Chen 9
More than Two Random Variables (2/2)
• The expectation for the function of random variables X ,
Y and Z
Eg  X , Y , Z      g  x, y , z  p X ,Y , Z  x, y , z 
x y z
– If the function is linear and has the form
aX  bY  cZ  d
EaX  bY  cZ  d   aE  X   bE Y   cE Z   d
• A generalization to more than three random variables
Ea1 X 1  a 2 X 2    a n X n  
a1 E X 1   a 2 E  X 2     a n E  X n 
Probability-Berlin Chen 10
An Illustrative Example
• Example 2.10. Mean of the Binomial. Your probability
class has 300 students and each student has probability
1/3 of getting an A, independently of any other student.
– What is the mean of
Let
X , the number of students that get an A?
1, if the ith student gets an A
Xi  
0, otherwise
 X 1 , X 2 ,  , X 300 are bernoulli random variables with common mean p  1/3
Their sum X  X 1  X 2    X 300 can be interprete d as a binomial random
variable with parameters n ( n  300) and p ( p  1/3). That is, X is the number
of success in n ( n  300) independen t trials
 EX   E X 1  X 2    X 300    E X i   300 1/3  100
300
i 1
Probability-Berlin Chen 11
Conditioning
• Recall that conditional probability provides us with a way
to reason about the outcome of an experiment, based on
partial information
• In the same spirit, we can define conditional PMFs,
given the occurrence of a certain event or given the
value of another random variable
Probability-Berlin Chen 12
Conditioning a Random Variable on an Event (1/2)
• The conditional PMF of a random variable X ,
conditioned on a particular event A with P  A   0 , is
defined by (where X and A are associated with the same experiment)
P X  x A 
PX A  x   P  X  x A  
P A
• Normalization Property
– Note that the events P X  x A  are disjoint for different
values of X , their union is A
P  A    P X  x A 
Total probability theorem
x
P X  x A
  PX A  x   

P  A
x
x
 P X  x A
x
P  A
P  A

1
P  A
Probability-Berlin Chen 13
Conditioning a Random Variable on an Event (2/2)
• A graphical illustration
PX
A
x  Is obtained by adding the probabilities of the outcomes
that give rise to X  x and be long to the conditioning event A
Probability-Berlin Chen 14
Illustrative Examples (1/2)
• Example 2.12. Let X be the roll of a fair six-sided die
and A be the event that the roll is an even number
PX
A
x   P X  x roll is even 
P  X  x and X is even 

P  X is even 
1 / 3, if x  2,4,6

otherwise
 0,
Probability-Berlin Chen 15
Illustrative Examples (2/2)
• Example 2.14. A student will take a certain test
repeatedly, up to a maximum of n times, each time with
a probability p of passing, independently of the number
of previous attempts.
– What is the PMF of the number of attempts given that the
student passes the test ?
p x
X
Let X be a geometric random variable with parameter p ,
representi ng the number of attempts until the
p
fist success comes up
p X  x   (1  p ) x 1 p
1 2
Let A be the event that the student pass the test
w ithin n attempts ( A  X  n)
 (1  p ) x 1 p
, if x  1,2 , ,n
 n

 p X A  x     (1  p ) m 1 p
 m 1
0,
otherwise
p
P A
n-1 n
pX A x
1 2
x
n-1 n
Probability-Berlin Chen 16
Conditioning a Random Variable on Another (1/2)
• Let X and Y be two random variables associated with
the same experiment. The conditional PMF p X Y of X
given Y is defined as
P  X  x, Y  y 
p X Y x y   P  X  x Y  y  
P Y  y 

p X ,Y  x, y 
pY  y 
• Normalization Property
Y is fixed on some value y
 p X Y x y   1
x
• The conditional PMF is often convenient for the
calculation of the joint PMF
multiplication (chain) rule
p X ,Y  x, y   pY  y  p X Y x y  (  p X  x  pY
X
y x  )
Probability-Berlin Chen 17
Conditioning a Random Variable on Another (2/2)
• The conditional PMF can also be used to calculate the
marginal PMFs
p X  x    p X ,Y  x, y    pY  y  p X Y x y 
y
y
• Visualization of the conditional PMF p X Y
p X Y x y  

p X ,Y  x, y 
pY  y 
p X ,Y  x, y 
 p X ,Y  x, y 
x
Probability-Berlin Chen 18
An Illustrative Example (1/2)
• Example 2.14. Professor May B. Right often has her
facts wrong, and answers each of her students’
questions incorrectly with probability 1/4, independently
of other questions. In each lecture May is asked 0, 1, or
2 questions with equal probability 1/3.
– What is the probability that she gives at least one wrong answer ?
Let X be the number of questions asked,
Y be the number of questions answered wrong
n k
  p  1-p n  k
k 
P (Y  1)  P (Y  1)  P (Y  2)
 P ( X  1, Y  1)  P ( X  2, Y  1) modeled as binomial distributions
 P ( X  2, Y  2)
 P (Y  1)  P ( X  1) P (Y  1 X  1)  P ( X  2) P (Y  1 X  2)
 P ( X  2) P (Y  2 X  2)

1 1 1  2  1 3  1  2  1 1  11
             
3 4 3 1  4 4  3  2  4 4  48
Probability-Berlin Chen 19
An Illustrative Example (2/2)
• Calculation of the joint PMF p X ,Y  x, y  in Example 2.14.
Probability-Berlin Chen 20
Conditional Expectation
• Recall that a conditional PMF can be thought of as an
ordinary PMF over a new universe determined by the
conditioning event
• In the same spirit, a conditional expectation is the same
as an ordinary expectation, except that it refers to the
new universe, and all probabilities and PMFs are
replaced by their conditional counterparts
Probability-Berlin Chen 21
Summary of Facts About Conditional Expectations
• Let X and Y be two random variables associated with
the same experiment
– The conditional expectation of X given an event A
with P  A   0 , is defined by
E X A    xp
x
X A
x 
• For a function g  X  , it is given by
E g  X
 A 
 g x  p X
x
A
x 
Probability-Berlin Chen 22
Total Expectation Theorem (1/2)
• The conditional expectation of X given a value
is defined by
E X Y  y    xp
x
X Y
x y 
– We have
E  X    p Y  y  xp
y
x
y of Y
X Y
x y 
• Let A1 ,, An be disjoint events that form a partition of the
sample space, and assume that P Ai   0 , for all i .
Then,
E  X    P  A i  E X A i 
n
i 1
Probability-Berlin Chen 23
Total Expectation Theorem (2/2)
• Let A1 ,, An be disjoint events that form a partition of an
event B , and assume that P Ai  B   0 , for all i . Then,
E  X B    P  A i B  E X A i  B 
n
i 1
• Verification of total expectation theorem
E X

 xp
X
x
x  
 x pX
x
y
,Y
  x  pY y p X
Y
x y 
  p Y  y  x p X
Y
x y 
x
y
y
x
x , y 
  p Y  y  E X Y  y 
y
Probability-Berlin Chen 24
An Illustrative Example (1/2)
• Example 2.17. Mean and Variance of the Geometric
Random Variable
x1
– A geometric random variable X has PMF pX x  1 p p, x  1,2,
Let A1 be the event that
A2 be the event that
X
X
 1
 1
E  X   P  A1 E X A1   P  A2 E X A2 
where
P  A1   p , P  A2   1  p (??)
pX
pX
p
  1,



x
p
A1
0,

x 1
otherwise
 1  p x  2 p (??) , x  1
A2  x   
 0 ,
otherwise
Note that (See Example 2.13) :
 (1  p ) x 1 p
, if x  1,2 , ,n
 n
p X A  x     (1  p ) m 1 p
 m 1
0,
otherwise

E X A1   1  1   x  0  1
x2


E X A2   1  0   x  1  p x  2 p
x2


  x  1  p x  2 p
x2



   x   11  p x 1 p

x  1

 
  


   x 1  p x 1 p     1  p x 1 p 

 x 1
  x 1
 E X   1
 E  X   P  A1 E X A1   P  A2 E X A2 
1
 P  A1   1  1  p E  X   1
 E X  
1
p
Probability-Berlin Chen 25
An Illustrative Example (2/2)
   P  A E X A  P  A E X A 
E X A   1  1   x  0  1
p
E X A   1  0   x  1  p 
 x
E X
2
2
1
2

2
1
2
2
1
x2

2
2
2
2
2
x2
2
x2
2
  x  12  2 x  1

  

 

 
    x  12  1  p x  2 p   2   x  1  p x  2 p     1  p x  2 p 
  x2

 x2

 x2
  2
 
 
 
x  1 
x2 
x2 

p   2    x  1  1  p 
p   2   1  p 
p     1  p x  2
   x  1  p 
 x  1

 x2

 x2
  x2
  2   x   1  p  p     1  p  p 
 E X  2 E  X   1
 E X   p  1  1  p E X  2 E  X   1
1  2 1  p E  X  
1
 we have shown that E  X   
E X  
p
p
E X
2

x  1
x  1

x  1
x  1
set

p

x   x  1
2
2
2
2
 
E X
2

2
p2

1
p
  E X 
 var  X   E X
2

2

1
p2

1 1 p

p
p2
Probability-Berlin Chen 26
Independence of a Random Variable from an Event
• A random variable X is independent of an event
A if
P  X  x and A   P  X  x P  A , for all x


– Require two events X  x and
A be independent for all
• If a random variable X is independent of an event
and P  A   0
pX
x
A
P  X  x and A 
A x  
P A 
P  X  x P  A 

P A 
 P X  x 
 p X  x , for all x
Probability-Berlin Chen 27
An Illustrative Example
• Example 2.19. Consider two independent tosses of a fair coin.
– Let random variable X be the number of heads
– Let random variable Y be 0 if the first toss is head, and 1 if the first
toss is tail
– Let A be the event that the number of head is even
•
Possible outcomes (T,T), (T,H), (H,T), (H,H)
1 / 2 , if x  0
1 / 4 , if x  0




p
x

if x  1
0,
X A
p X  x   1 / 2 , if x  1
1 / 2 , if x  2
1 / 4 , if x  2


p X A  x   p X  x   X and A are not independent!
1 / 2 ,
pY  y   
1 / 2 ,
P A   1 / 2
if y  0
if y  1
PY  y and A  1 / 2, if y  0

pY A  y  
if y  1
P  A
1 / 2,
pY
A
y 
pY  y   Y and A are independen t!
Probability-Berlin Chen 28
Independence of a Random Variables (1/2)
• Two random variables X and Y are independent if
p X ,Y  x , y   p X  x  p Y  y , for all x , y
or P  X  x , Y  y   P  X  x P Y  y , for all x , y
• If a random variable X is independent of an random
variable Y
pX
pX
Y
Y
x y  
x y  
p X  x , for all y with pY  y   0 all x
p X ,Y  x , y 
pY  y 
p X  x  pY  y 

pY  y 
 p X  x , for all y with p  y   0 and all x
Probability-Berlin Chen 29
Independence of a Random Variables (2/2)
• Random variables X and Y are said to be conditionally
independent, given a positive probability event A, if
p X ,Y
A
x , y  
pX
A
x  p Y A  y ,
for all x , y
– Or equivalently,
p X Y , A x y   p X
A
x ,
for all y with pY
A
 y   0 and all x
• Note here that, as in the case of events, conditional
independence may not imply unconditional
independence and vice versa
Probability-Berlin Chen 30
An Illustrative Example (1/2)
• Figure 2.15: Example illustrating that conditional
independence may not imply unconditional independence
– For the PMF shown, the random variables X and Y are not
independent
• To show X and
a pair of values
Y are not independent, we only have to find
x, y  of X and Y that
p X Y x y   p X  x 
– For example,
independent
X and Y are not
p X Y 11  0  p X 1 
3
20
Probability-Berlin Chen 31
An Illustrative Example (2/2)
• To show X and
all pair of values
Y are not dependent, we only have to find
x, y  of X and Y that
p X Y x y   p X  x 
– For example, X and Y are independent, conditioned
on the event A  X  2, Y  3
P A 
9
,
20
pX
P X



x
y
Y ,A
2 / 20


1
3

Y ,A
1
, pX
6 / 20 3
1 / 20 1
p X Y , A 1 4  

3 / 20 3
4 / 20 2
p X Y , A 2 3  
 , pX
6 / 20 3
2 / 20 2
p X Y , A 2 4  

3 / 20 3
pX

 x  Y  y  A
P Y  y  A 
3 / 20


1

 1/ 3
A
9 / 20
A 2  
6 / 20
 2/3
9 / 20
Probability-Berlin Chen 32
Functions of Two Independent Random Variables
• Given X and Y be two independent random variables,
let g  X  and h Y  be two functions of X and Y ,
respectively. Show that g  X  and h Y  are independent.
Let U  g  X  and V  h Y , then
pU ,V u , v  


p X ,Y  x, y 

 x , y  g  x u , h  y v 
 p X  x  pY  y 
 x , y  g  x u , h  y v 
 p X  x   pY  y 
x g  x u 
y h  y  v
 pU u  pV v 
Probability-Berlin Chen 33
More Factors about Independent Random Variables (1/2)
• If X and Y are independent random variables, then
EXY   EX EY 
– As shown by the following calculation
E XY     xyp X ,Y  x, y 
x y
   xyp X  x  pY  y 
by independence
x y


  xp X  x   ypY  y 
x

y
 E X EY 
• Similarly, if X and Y are independent random variables,
then
Eg  X h Y   Eg  X Eh Y 
Probability-Berlin Chen 34
More Factors about Independent Random Variables (2/2)
• If X and Y are independent random variables, then
var  X  Y   var  X   var Y 
– As shown by the following calculation

var X  Y   E  X  Y   EX  Y 2


 E  X  Y 2  2 X  Y EX   EY   EX   EY 2



2
    x  y  p X ,Y x, y   2EX   EY EX   2EX   EY EY  
 x, y

 EX 2  2  EX EY   EY 2

 
 

   x 2 p X ,Y  x, y     y 2 p X ,Y  x, y   2  xypX ,Y  x, y 
 x, y
  x, y
  x, y

 EX 2  EY 2  2EX EY 
 
  

 E X 2  EX 2  E Y 2  EY 2  var X   varY 
Probability-Berlin Chen 35
More than Two Random Variables
• Independence of several random variables
– Three random variable X , Y and Z are independent if
p X ,Y , Z x, y , z   p X x  pY  y  p Z z  for all x, y , x
? Compared to the conditions to be satisfied for three independent events A1, A2 and A3 (in P.39 of the textbook)
• Any three random variables of the form f  X  , g  X  and h  X 
are also independent
• Variance of the sum of independent random variables
– If X 1 , X 2 ,  , X n are independent random variables, then
var  X 1  X 2    X n   var  X 1   var  X 2     var  X n 
Probability-Berlin Chen 36
Illustrative Examples (1/3)
• Example 2.20. Variance of the Binomial. We consider
n independent coin tosses, with each toss having
probability p of coming up a head. For each i , we let X
be the Bernoulli random variable which is equal to 1 if
the i-th toss comes up a head, and is 0 otherwise.
i
– Then, X  X 1  X 2    X n is a binomial random variable.
 var  X i   p 1 p , for all i
 var  X    var  X i  np 1 p 
n
i 1
(Note that X i ' s are independent! )
Probability-Berlin Chen 37
Illustrative Examples (2/3)
•
Example 2.21. Mean and Variance of the Sample Mean. We wish
to estimate the approval rating of a president, to be called B. To this
end, we ask n persons drawn at random from the voter population,
and we let X i be a random variable that encodes the response of
the i-th person:
1, if the i - th person approves B' s performanc e
Xi  
0, if the i - th person disapprove s B' s performanc e
– Assume that X i independent, and are the same random variable
(Bernoulli) with the common parameter ( p for Bernoulli), which is
unknown to us
• X i are independent, and identically distributed (i.i.d.)
– If the sample mean S n (is a random variable) is defined as
X  X2  Xn
Sn  1
n
X1 X 2
X with parameter p
X n 1 X n
Probability-Berlin Chen 38
Illustrative Examples (3/3)
– The expectation of S n will be the true mean of X i
 X  X2  Xn 
ES n   E  1

n


1 n
  E X i 
n i 1
 EX i  (  p for the Bernoulli we assumed here)
– The variance of S n will approximate 0 if n is large enough
 X  X2   Xn 
lim var S n   var  1

n 
n


n
 lim
n 
 var  X i 
i 1
n2
 lim
n 
np 1  p 
n2
 lim
n 
p 1  p 
0
n
• Which means that S n will be a good estimate of EX i  if n
is large enough
Probability-Berlin Chen 39
Recitation
• SECTION 2.5 Joint PMFs of Multiple Random Variables
– Problems 27, 28, 30
• SECTION 2.6 Conditioning
– Problems 33, 34, 35, 37
• SECTION 2.6 Independence
– Problems 42, 43, 45, 46
Probability-Berlin Chen 40
Related documents