Download x, y

Document related concepts

History of network traffic models wikipedia , lookup

Functional decomposition wikipedia , lookup

History of the function concept wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Expected value wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Discrete Distributions
Random Variable



A random variable X is a function
that maps the possible outcomes
of an experiment to real numbers.
That is X: C --> R, where C is
the set of all outcomes of an
experiment and R is the set of
real numbers.
The space of X is the set of real
numbers S = {x: X(c)= x, c  C }
An Example of Random
Variable



If we toss a coin one time, then
there are two possible outcomes,
namely “head up” and “tail up”.
We can define a random variable X
that maps “head up” to 1 and
“tail up” to 0.
We also can define a random
variable Y that maps “head up” to
0 and “tail up” to 1.

The spaces of both random
variables X and Y are {0,1}.
Further Illustration of
Random Variables


A random variable corresponds to
a quantitative interpretation of
the outcomes of an experiment.
For example, a company offers its
employees a drawing in its
yearend party. A computer will
randomly select an employee for
the first prize of $100,000 based
on the employees’ ID number,
which ranges from 1 to 100.



In addition, the computer will
randomly select two more
employees for the second and
third prizes of $50,000 and
$10,000, respectively.
Assume that each employee can
receive only one award and the
drawing starts with the third
prize and ends with the first
prize.
Then, there are totally 100 × 99
× 98 = 970200 possible outcomes.


To Edward, whose employee ID number is
10, the random variable of his interest
is as follows:
X(<10, *, *>) = 10,000
X(<*, 10, *>) = 50,000
X(<*, *, 10>) = 100,000
X(all other outcomes) = 0
To Grace, whose employee ID number is
30, the random variable of her interest
is as follows:
Y(<30, *, *>) = 10,000
Y(<*, 30, *>) = 50,000
Y(<*, *, 30>) = 100,000
Y(all other outcomes) = 0



The outcome spaces of random variables
X and Y are identical. However, X and
Y map some outcomes to different real
numbers.
The spaces of X and Y are also
identical and both are {0, 10000, 50000,
100000}.
The probability functions of X and Y
are also equal.
Prob(X=10,000) = Prob(Y=10,000) = 0.01
Prob(X=50,000) = Prob(Y=50,000)
= 0.01
Prob(X=100,000) = Prob(Y=100,000)
= 0.01
Prob(X=0) = Prob(Y=0) = 0.97

The expected values of X and Y
are equal to
E[X] = E[Y]
= 10,000 * 0.01 + 50,000 * 0.01
+ 100,000 * 0.01
= 1600.
Discrete Random Variables


Given a random variable X,
let S denote the space of X.
If S is a finite or
countable infinite set, then
X is said to be a discrete
random variable.
Countable Infinite

A set is said to be
countable infinite, if it
contains infinite number of
elements and there exists a
one-to-one mapping between
each element of the set and
the positive integers.
Examples of Countable /
Uncountable Infinite



The set of integer numbers is
countable.
The set of fractional numbers
is countable.
The set of real numbers is
uncountable.
Probability Mass Function

The probability mass
function (p.m.f.) of a
discrete random variable X
is defined to be
PX k   Prob X  k  
 Prob(q),
qQk
where Qk contains all outcomes that
are mapped to k by random variable X .

In the previous example of
drawing,
PX 10,000  Prob X  10,000

 Prob(  10, i, j )
10,i , j 
i 10, j 10
i j


10,i , j 
i 10, j 10
i j
1
 0.01.
100  99  98


In fact,the p.m.f. of a random
variable is defined on a set of
events of the experiment
conducted.
In the previous drawing example,
the set of outcomes that are
mapped to 10,000 by X is an
event.

Furthermore, in the previous
drawing example, random variables
X and Y map some outcomes to
different real numbers. However,
X and Y have the same
distribution, i.e. the p.m.f. of
X and the p.m.f. of Y are equal.
More precisely,
PX (k )  PY (k )
for every k  {0,10000, 50000, 100000}.
Properties of the Probability
Mass Function

The p.m.f. of a random
variable X satisfies the
following three properties:
1
PX x   0 , x  S : the space of X.
If S is finite, then PX x   0.
2  PX xi 
1 .
3
 P x  ,
xi S
Prob A 
x j A
X
j
where A  S .
Probability Distribution
Function

For a random variable X, we
define its probability
distribution function F as
FX t   Prob X  t 
Properties of a Probability
Distribution Function

1. lim FX t   1 .
t 
2 . lim FX t   0 .
t 
3 . FX w  FX t  , if w  t .

Any function that satisfies
these conditions above can
be a distribution function.
An Example of the
Probability Distribution
Function of a Discrete
Random Variable

Assume that we toss a 4sided die twice. Then, we
have 16 possible outcomes:
, 1,2
, 1,3
, 1,4
, 2,1
, 2,2
, 2,3
, 2,4, 
1,1


, 3,2
, 3,3
, 3,4
, 4,1
, 4,2
, 4,3
, 4,4
3,1

Let random variable X be the
sum of the outcome.
Then,
Prob X  2  
1
2
, Prob X  3 
16
16
3
4
Prob X  4  
, Prob X  5 
16
16
3
2
Prob X  6  
, Prob X  7  
16
16
1
Prob X  8  .
16
1
2
3
4 5
FX 5  Prob X  5 
  
 .
16 16 16 16 8
Operations of Random
Variables


Let X and Y be two random
variables defined on the same
outcome space of an experiment.
Then, we can define a new random
variable Z=f(X,Y).


For example, in the example of
drawing, if Edward and Grace are
husband and wife, then we can
define a new random variable
Z=X+Y.
We have
X(<30, 10, *>) = 50,000
Y(<30, 10, *>) = 10,000
Z(<30, 10, *>) = 60,000
Function of Random
Variables


Let X be a random variable and G
be a function. Then, random
variable Y=G(X)maps an outcome
ν in the outcome space of X to
value G(X(ν)).
With respect to the probability
distribution functions, if G(X)
is monotonically increasing, oneto-one mapping, then
FY t   ProbY  t   ProbG  X   t 
 ProbX  G 1 t   FX G 1 t 
An Example of Functions of
Random Variables


Let random variable X be the
sum of two tosses of a 4sided die and Y=X2.
Then,

FY 16  ProbY  16  Prob X  16
 Prob X  4  FX 4.
2

6 3
 PX 4  PX 3  PX 2   .
16 8
Expected Value of a Discrete
Random Variable

Let X be a discrete random
variable and S be its space.
Then, the expected value of
X is
EX    Pr ob( z ) X ( z )   PX xi xi
zC

xi S
μ is a widely used symbol
for expected value.
Expected Value of a Function of
a Random Variable

Let X be a random variable
and G be a function. Then,
the expected value of random
variable Y  G X  is equal to
 Gx P x 
xi S
i
X
i
Expected Value of a Function of
a Random Variable
 Proof :
'
E Y    PY  yi yi , where S is the space of Y .
yi S '

 ProbY  y y
i
i
yi S '

  ProbX  x Gx 
j
yi S ' all x j
such that
G x j  yi
 

 P x Gx .
x j S
X
j
j
j

For example, let X correspond to
the outcome of tossing a die once.
Then,
Px(1)=Px(2)=Px(3)=Px(4)=Px(5)=Px(6)=1/6.

and E[X]=3.5
If we are concerned about the
difference between the observed
outcome and the mean. And define
Y=|X-E[X]|, then PY(1/2)=1/3,
PY(3/2)=1/3, PY(5/2)=1/3.
Therefore,
1 1 3 1 5 1 9 1 3
E[Y ]          .
2 3 2 3 2 3 2 3 2
On the other hand,
1
| xi  E[ X ] |P x ( xi )   | xi  3.5 | 

6
xi
xi
1 9 3
 (2.5  1.5  0.5  0.5  1.5  2.5)    .
6 6 2
Theorems about the
Expected Value
(a)If c is a constant, Ec  c.
(b)If c is a constant and g
is a function,
Ecg  X   cEg  X 
(c)If c1 and c2 are constants
and g1 and g2 are functions,
then
Ec1g1 X   c2 g2 X   c1Eg1 X  c2 Eg2 X .
Theorems about the
Expected Value


Proof of (a):
Trivial.
Proof of (b):
E cg  X  
 cg x P x ,where S is the space
xi S
i
X
i
of X and PX x  is
the p.m.f of X.
 c  g xi P X  xi 
xi S
 cE g  X 
Theorems about the
Expected Value

Proof of (c):
Ec1 g1  X   c2 g 2  X    c1 g1 xi   c2 g 2 xi PX xi 
xi S
  c1 g1 xi PX xi    c2 g 2 xi PX xi 
xi S
xi S
 c1Eg1  X   c2 Eg 2  X .

An extension of (c)
k
 k

E  ci g i  X    ci Eg i  X .
 i 1
 i 1
Variance of a Discrete
Random Variable


The variance of a random variable
is defined to be E  X   2 
and is typically denoted by σ2.
For a discrete random variable X,
2
Var X   E  X     E X 2  2 X   2 

 
 E X  

 E X 2  2 E X    2
2

2
.
σ is normally called the standard
deviation.
Variance of a Discrete
Random Variable

Let X be a random variable with
mean μX and variance σX2. Let Y=
aX+b, where a and b are constants.
Then,
EY   EaX  b  aEX   b  a X  b


 E aX  b  a  b  
 E a  X      a E  X      a 
VarY   E Y   y 
2
2
X
2
2
X
2
2
X
2
2
X
.
Variance of a Random
Variable


The variance of a random variable
measures the deviation of its
distribution from the mean.
For example, in one drawing,
Robert has 0.1% of chance to win
$100,000, while in another
drawing, he has 0.01% of chance
to win $1,000,000.


The expected amounts of award in
these two drawings are equal.
0.001 * 100000 = 100
0.0001 * 1000000 = 100
However, their variances are
different.
0.001 * (100000 – 100)2
+ 0.999 * (0 – 100)2 = 9,990,000
0.0001 * (1000000 – 100)2
+ 0.999 * (0 – 100)2 = 99,990,000

In many distributions, the mean
and variance together uniquely
determine the parameters of the
random variables.
The Bernoulli Experiment
and Distribution


A Bernoulli experiment is a random
experiment, the outcome of which
can be classified in one of two
mutually exclusive and exhaustive
ways, say, success and failure.
A sequence of Bernoulli trials
occurs when a Bernoulli experiment
is performed several independent
times, so that the probability of
success, say p, remains the same
from trial to trial.
The Bernoulli Distribution

Let X be a Bernoulli random
variable. The p.m.f of X can be
written as
PX k   p 1  p 
1 k
k

,
where k= 0 or 1 and p is the
probability of success.
The expected value of X is
1
k
kp
 1  p 
1 k
 p.
k 0

The variance of X is
1
2 k
1 k




k

p
p
1

p
 p1  p .

k 0
The Binomial Distribution



Let X be the random variable
corresponding to the number of
successes in a sequence of Bernoulli
trials.
n k
Then,
PX k   Prob X  k   Ck p 1  p 
nk
where n is the number of Bernoulli
trials and p is the probability of
success in one trial.
X is said to have a binomial
distribution and is normally denoted by
b(n , p).
,
Example of the Binomial
Distribution

Assume that Tiger and Whale are the two
teams that enter the Championship
series of the professional basket ball
league. Based on prior records, Tiger
has a 60% chance of beating Whale in a
single game. Larry, who is a fan of
Tiger, makes a bet with Peter, who is a
fan of Whale.
According to their
agreement, Larry will pay Peter $1000,
should Whale win the 5-game series. In
order to make a fair bet, how much
should Peter pay Larry, if Tiger wins
the series?

The probability
that Tiger wins
the series is
C35 (0.6)3 (0.4) 2  C45 (0.6) 4 (0.4)  C55 (0.6)5  0.6826
Z * 0.6826  1000 * (1  0.6826)
Z  465.

If the championship series
consists of 3 games, then what is
the probability that Tiger win
the series?
C (0.6) (0.4)  C (0.6)
3
2
2
 0.648  0.6826
3
3
3
The Moment-Generating
Function

Let X be a discrete random
variable with p.m.f PX x  and space
S. If there is a positive number
h such that
  e
E etX 
txi
xi S
PX xi 
exists and is finite for -h<t<h,
then the function of t defined by
 
M t   E e
tX
is called the moment-generating
function of X. and often
abbreviated as m.g.f.
The Moment-Generating
Function

Let X and Y be two discrete random
variables with the same space S.
If E etX  E etY ,
then the probability mass functions of
X and Y are equal.
Insight of the argument above:
Assume that S={s1,s2, …, sk}contains
only positive integers.
Then, we have





PX s1 ets1  PX s2 ets2  ...  PX sk etsk
 PY s1 ets1  PY s2 ets2  ...  PY sk etsk .
Therefore, PX si   PY si  , i.e. X and Y
have the same p.m.f.
The Moment-Generating
Function

Let M X t  be the m.g.f of a
discrete random variable X.
d K M X t 
k txi
  xi e PX  xi .
K
dt
xi S
Furthermore,
d K M X 0
k
k



x
P
x
.

E
X
.

i
X
i
K
dt
xi S
 

In particular,

 X  M X 0 and   M X
2
X

 

0  M X 0 .


2
The Moment-Generating Function
of the Binomial Distribution

Let X be b(n , p).
E X  
n
n
k
kC
p
 k 1  p 
n k
k 0
n
n!
n k

p k 1  p 
k  0 k  1! n  k !
  k
n
E X
2
2
Ckn p k 1  p 
n k
k 0
n
n! k
n k

p k 1  p 
k  0 k  1! n  k !
are both difficult to compute.
On the other hand, we can easily
derive the m.g.f. of a binomial
distribution.
   e
M X t   E e
n
tX
k 0
  Ckn  pe t  1  p 
n
k 0
tk
k
nk
Ckn p k 1  p 
nk


n
 pe t  1  p  .
The Moment-Generating Function
of the Binomial Distribution



M X t   n pe  1  p  pet
n2
n 1

t
t 2
t
t
M X t   nn  1 pe  1  p 
pe  npe pe  1  p 

M X 0  np

M X 0  nn  1 p 2  np.
n 1
t

Therefore,
  



 X  M X 0  np
2


σ X  M X 0   M X 0


 n 2 p 2  np 2  np  n 2 p 2
2
 np1-p .
The Poisson Process
A Poisson process models the
number of times that a particular
type of events occur during a time
interval.
 The Poisson process is based on
the following 3 assumptions:
(1)The numbers of event
occurrences in non-overlapping
intervals are independent.
lim Prob(one occurrence
(2) t
0
between times t and t  t )= t.

The Poisson Process
lim
(3) t
 0 Prob(two occurrences
between times t and t  t )= 0.
 λ is the only parameter of the
Poisson process.
 One example of the Poisson
process is to model the number of
Web accesses that a Web server
receives between 8 AM and 9 AM.
The Basis of the Assumptions
of the Poisson Process


Assume that an ideal random
number generator generates λ
numbers in [0, 1].
If we divide [0, 1] evenly into n
subintervals,then the probability
that there is exactly one of the
number generated in [0, 1/n] is
  1 
C1  1 
 n 
1

n
 1
 1
  1
 1   .
n  n
The Basis of the Assumptions
of the Poisson Process

The probability that there are
exactly two of the numbers
generated in [0, 1/n] is
2
1  1
C2   1  
n  n
 2
   1  1 

1  
2
2n  n 
 2
.
Let t  1 / n .Then,
 1
lim Prob one occurrence in 0, t   lim t 1  
t 0
n 
 n
lim Prob two occurrences in 0, t   lim
t 0
n 
   1
2
 1
 t.
t  1  1 
 n
2
 2

   1
2
t 2 .
The Poisson Distribution


Assume that we are concerned
about a Poisson process with
parameter λ and want to count the
number of event occurrences
during one time interval.
We can divide the time interval
evenly into n subintervals as the
following figure shows.
1/n
Time=0
Time=1
The Poisson Distribution

The probability that the event occurs k
times during the time interval is

  
lim Ckn 
1



n 
n
 n 
k
nk   
 lim


n   k!
n



Since
k

1 


1 

nk

n

n
k


n


lim 1  
n 
n

n!
 
 lim


n   k!n  k !
 n

1 
k 
 lim
n   k!

1 

k
1
and
n

n
k


n


lim 1    e  ,
n 
n

k  
the final result is e
.
k!

k

1 


1 

n
,

n

n
k


n
The Poisson Distribution

We say that a random variable X
has a Poisson distribution, if
PX k  

k
k!
e

.
By the Maclaurin’s series,
we have

1 k

e 
 .
k  0 k!
Therefore,



k
 P k   e  k!  e
k 0

X
k 0


 e  1.
The Poisson Distribution

The moment-generating function of
a random variable with the
Poisson distribution is


M X t   E e Xt
e



k 0
MX
MX

e 
t
e kt k  

e
k!
k 0

k
e
k!

t   e e

 e t 1
t
t   e
t

2

e
e t

e
 e t 1


e 
 e t 1
  e t e  e
t
1

The Poisson Distribution

Therefore,
 X  M X  0   and  X 2  M X  0   M X  0




 2    2  .


Therefore, λ is the average rate of
event occurrence per unit of time.
Let Y be the random variable
corresponding to the number of event
occurrences during a time interval of
length t. Then,
k

t 
PY k  
k!
e
 t
.
2
The Poisson Distribution

The probability that the event
occurs k times during a time
interval of length t is
t
n
lim Ck    
n 
 n
 lim
n 
t

1    
n

n k
t k 1  t n 1  t k

k! 

t k t

e
k!
k
 
n 

n
Joint Distributions
Joint Probability Mass
Function

Let X and Y be two discrete random variables
defined on the same outcome set. The
probability that X=x and Y=y is denoted by
PX,Y(x, y)= Prob(X=x,Y=y) and is called the
joint probability mass function(joint p.m.f)
of X and Y. PX,Y(x, y) satisfies the the
following 3 properties:
(1) 0  PX ,Y x, y   1
(2)
P  x, y   1

 
x , y S
X ,Y
(3) Pr ob X , Y   A 
P x, y ,

 
x , y A
where A is a subset of S  S.
X ,Y
Example of Joint
Distributions

Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
Wine
Not Purchasing
Wine
Male
45
255
Female
70
630
Purchasing
Juice
Not Purchasing
Juice
Male
60
240
Female
210
490
Example of Joint
Distributions

Let random variable M correspond
to whether a customer is male,
random variable W correspond to
whether a customer purchases wine,
random variable J correspond to
whether a customer purchases
juice.

The joint p.m.f of M and W is
W
PMW (0,1) = 0.07
PMW (1,1) = 0.045
M
PMW (0,0) = 0.63
PMW (1,0) = 0.255

The joint p.m.f of M and J is
W
PMW (0,1) = 0.21
PMW (1,1) = 0.06
M
PMW (0,0) = 0.49
PMW (1,0) = 0.24
Marginal Probability Mass
Function

Let PXY(x,y) be the joint p.m.f. of
discrete random variables X and Y.
PX x   Pr ob X  x    Pr obX  x, Y  yi 
yj
  PXY ( x, yi ) 
yj

is called the marginal p.m.f of X.
Similarly,
PY  y    PX ,Y xi , y 
xi
is called the marginal p.m.f. of Y.
More on Joint Probability
Mass Function

Note that we can always create a common
outcome set for any two or more random
variables. For example, let X and Y
correspond to the outcomes of the first
and second tosses of a coin,
respectively. Then, the outcome set of
X is {head up, tail up} and the outcome
set of Y is also {head up, tail up}.
The common outcome set of X and Y is
{(head up,head up),(head up,tail
up),(tail up,head up),(tail up,tail
up)}.
Independent Random
Variables

Two discrete random variables X and Y
are said to be independent if and only
if for all possible combination of x
and y
PX ,Y x, y   PX x PY  y .

Otherwise, X and Y are said to be
dependent.
Example of Independent
Random Variables

Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
soft drinks
Not purchasing
soft drinks
Male
90
210
Female
210
490
Example of Independent
Random Variables


Let random variable M correspond to
whether a customer is male or not and
random variable S correspond to whether
a customer purchases soft drinks or not.
Then, M and S are independent, since
for all possible combinations of the
values of M and S, we have
Prob(M=i,S=j)=Prob(M=i)Prob(S=j).
Another Example of Joint
Distribution
Object
X
Y
Class
Object
X
Y
Class
1
7.1
9.1
1
11
10.9
8.8
2
2
6.7
10.2
1
12
10.8
10.3
2
3
7.5
10.6
1
13
11.1
11
2
4
7.6
8.8
1
14
12.3
9.1
2
5
8.1
10.3
1
15
12.1
9.7
2
6
8.0
11.0
1
16
12
10.9
2
7
8.6
8.9
1
17
13.1
8.9
2
8
8.7
9.8
1
18
12.8
10.1
2
9
9.2
11.2
1
19
13.2
11.3
2
10
6.5
10.1
1
20
13.7
9.9
2
Average
7.8
10.0
-
Average
12.2
10.0
-
Joint p.m.f. of X, Y, and C
12
11
2
1
1
2
2
1
1
11
10
2
2
1
1
9
2
1
1
2
2
2
2
8
6
8
10
12
14
Joint p.m.f. of X and C
2
22 2
1
1 1 1 11 11
11
22 2
2 22
2
1
0
6
8
10
12
14
Joint p.m.f. of Y and C
2
22 2
1
11 1
2222
22 2
1 111 1 1 1
0
6
8
10
12
14
Joint Distribution Function

Let X and Y be two random variables. The joint
distribution function is defined as follows:


FXY(x,y)=Prob( X≤x, Y≤y).
Note that this definition applies to both
discrete and continuous random variables.
Joint Probability Density
Function


Assume that X and Y be two continuous
random variables defined on the same
space S. The joint probability density
function of X and Y is defined as follows:
x, y  to be independent if and
Fxysaid
Xf and
Y
are
xy  x, y  
only if
xy
f XY x, y   f X x fY  y .

In some text books, it is defined
that two random variables are
independent, if and only if

We have
FXY x, y   FX x FY  y .
FXY  x, y  FX  x FY  y 
f XY  x, y  

xy
xy
 f X  x  fY  y .

The marginal p.d.f of X is

f X x    f XY x, y dy

and the marginal p.d.f of Y is
fY  y  

 f x, y dx
XY

Jointly Independent and
Pairwise Independent


Note that, even we have
 PX,Y (x,y) = PX (x)PY (y)
 PY,Z (y,z) = PY (y)PZ(z)
 PX,Z (x,z) = PX (x)PZ (z)
Then, it is not necessary true
that
 PX,Y,Z (x,y,z) = PX (x)PY (y) PZ (z)
An Example of Pairwise
Independence

Let X and Y are two random variables
that correspond to tossing a unbiased
coin two times. Let Z = X  Y.
Then
 Prob(Z=0) = Prob(X=0,Y=0) +
Prob(X=1,Y=1) = ½
 Prob(X=0,Z=0) = Prob(X=0,Y=0) = ¼ =
Prob(X=0)Prob(Z=0).


Therefore, X, Y and Z are pairwise
independent.
However, Prob(X=0,Y=0,Z=1) = 0 and
Prob(X=0)Prob(Y=0)Prob(Z=1)= 1/8
Hence, X, Y and Z are not jointly
independent.

On the other hand, jointly
independent implies pairwise
independent. For example,
PX ,Y  x, y    PX ,Y ,Z  x, y , z 
z
  PX  x PY  y PZ z 
z
 PX  x PY  y  PZ z 
 PX  x PY  y .
z
Addition of Two Random
Variables



Let X and Y be two random variables. Then,
E[X+Y]=E[X]+E[Y].
Note that the above equation holds even if X
and Y are dependent.
Proof of the discrete case :
E[ X  Y ]   PXY ( x, y )( x  y )
x
y
  x  PXY ( x, y )   y  PXY ( x, y )
x
y
x
y
  x  PXY ( x, y )   y  PXY ( x, y )
x
y
y
x
  xPX ( x)   yPY ( y )  E[ X ]  E[Y ]
x
y

On the other hand,
Var[ X  Y ]
 E[(( X  Y )  (  x   y )) 2 ]
 E[( X  Y ) 2  (  x   y ) 2  2( X  Y )(  x   y )]
 E[( X  Y ) 2 ]  (  x   y ) 2  2(  x   y ) 2
 E[ X ]  E[Y ]  2 E[ XY ]   x   y  2  x  y
2
2
2
2
 ( E[ X ]   x )  ( E [Y ]   y )  2( E[ XY ]   x  y )
2
2
2
2
 Var[ X ]  Var[Y ]  2( E [ XY ]  E[ X ]E[Y ]) #

Note that if X and Y are independent, then
E[ XY ]   xyPXY ( x, y )
x
y
  xyPX ( x) PY ( y )
x
y
  xPX ( x) yPY ( y )
x
y
 E[ X ]E[Y ]

Therefore, if X and Y are independent,
then Var[X+Y]=Var[X]+Var[Y].
Covariance

Let X and Y be two random variables.
Then, E[(X-µX)(Y- µY)] is called the
covariance of X and Y, and is
denoted by σXY, where µX and µY are
the means of X and Y, respectively.
Covariance

E[(X-µX)(Y- µY)]
= E[XY- µYX- µXY+ µXµY]
= E[XY]- µYE[X]- µXE[Y]+E[µXµY]
= E[XY]- µXµY

Therefore, if X and Y are
independent, then Cov[X,Y]=0.
Examples of Correlated
Random Variables

Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
Wine
Not Purchasing
Wine
Male
45
255
Female
70
630
Purchasing
Juice
Not Purchasing
Juice
Male
60
240
Female
210
490
Examples of Correlated
Random Variables

Let random variable M correspond
to whether a customer is male,
random variable W correspond to
whether a customer purchases wine,
random variable J correspond to
whether a customer purchases
juice.

The joint p.m.f of M and W is
W
PMW (0,1) = 0.07
PMW (1,1) = 0.045
M
PMW (0,0) = 0.63
PMW (1,0) = 0.255
Cov(M,W)= E[MW]-E[M]E[W]
= 0.045 – 0.3*0.115
= 0.0105 >0
M and W are positively correlated.

The joint p.m.f of M and J is
W
PMW (0,1) = 0.21
PMW (1,1) = 0.06
M
PMW (0,0) = 0.49
PMW (1,0) = 0.24
Cov(M,J)= E[MJ]-E[M]E[J]
= 0.06 – 0.3*0.27
= -0.021 < 0
M and J are negatively correlated.
Covariance of
Independent Random
Variables

Assume that the supermarket also
collected the following statistics of
customers’ purchasing behavior:
Purchasing
soft drinks
Not purchasing
soft drinks
Male
90
210
Female
210
490

The joint p.m.f of M and S is
S
PMS (0,1) = 0.21
PMWS(1,1) = 0.09
M
PMS (0,0) = 0.49

PMWS(1,0) = 0.21
Cov(M,S)= E[MS]-E[M]E[S]
= 0.09 – 0.3*0.3
= 0,
due to the fact that M and S are
independent.
Correlation Coefficient

The correlation coefficient of two
random variables X and Y is defined
as follows:

cov( X , Y )
 XY
Bounds of a Correlation
Coefficient

Let K (b)  E ((Y  uY )  b( X  u X )) 2
  Y2  2b X  Y  b2 X2 .
We have
Y
2
2
K (
)   Y (1   ).
X
Since K(b) is the expected value of
a square, K(b)  0 for all b  R.
Therefore,  1    1.

Implication of the Value
of the Correlation
Coefficient

Assume that the supermarket collected
the following statistics of customers’
purchasing behavior:
Purchasing
cosmetics
Not purchasing
cosmetics
Male
10
290
Female
260
440
Implication of the Value
of the Correlation
Coefficient


Let random variable M correspond to
whether a customer is male, random
variable C correspond to whether a
customer purchases cosmetics.
Then, the correlation coefficient of
M and C is
-0.349.
Implication of the Value
of the Correlation
Coefficient


On the other hand, we also have the
following dataset:
Purchasing
juice
Not purchasing
juice
Male
60
240
Female
210
490
The correlation coefficient of M and J
is -0.103.
Another Example of
Correlation Coefficient
Object
X
Y
Class
Object
X
Y
Class
1
7.1
9.1
1
11
10.9
8.8
2
2
6.7
10.2
1
12
10.8
10.3
2
3
7.5
10.6
1
13
11.1
11
2
4
7.6
8.8
1
14
12.3
9.1
2
5
8.1
10.3
1
15
12.1
9.7
2
6
8.0
11.0
1
16
12
10.9
2
7
8.6
8.9
1
17
13.1
8.9
2
8
8.7
9.8
1
18
12.8
10.1
2
9
9.2
11.2
1
19
13.2
11.3
2
10
6.5
10.1
1
20
13.7
9.9
2
Average
7.8
10.0
-
Average
12.2
10.0
-
Joint p.m.f. of X, Y, and C
12
11
2
1
1
2
2
1
1
11
10
2
2
1
1
9
2
1
1
2
2
2
2
8
6
8
10
12
14
Joint p.m.f. of X and C
2
22 2
1
1 1 1 11 11
11
22 2
2 22
2
1
0
6
8
10
12
14
Joint p.m.f. of Y and C
2
22 2
1
11 1
2222
22 2
1 111 1 1 1
0
6
8
10
12
14
Another Example of
Correlation Coefficients

The correlation coefficient of X and C
is
E[ XC ]  E[ X ]E[C ]

 X C
16.1  10 1.5

 0.925.
2.379  0.5
On the other hand, the covariance of Y
and C is
E[YC]-E[Y}E[C] = 15-10×1.5 =0
and therefore the correlation
coefficient of Y and C is 0.


With respect to data analysis,
random variable X provides
valuable information about the
class of an object.
On the other hand, random
variable Y essentially provides
no information about the class of
an object.
Example of Uncorrelated
Random Variables


Assume X and Y have the following
joint p.m.f
PXY(0,1)= PXY(1,0)= PXY(2,1)= 1/3
We have the following marginal
p.m.f.s
PX 0   PXY (0, y )  1 / 3 ; PX 1   PXY (1, y )  1 / 3
y
y
PX 2   PXY (2, y )  1 / 3 ; PY 0   PXY ( x,0)  1 / 3
x
PY 1   PXY ( x,1)  2 / 3
x
x
Example of Uncorrelated
Random Variables



Since PXY(0,1) = 1/3
PX(0) x PY(1) = 1/3 x 2/3 = 2/9,
X and Y are not independent.
However,
Cov (X, Y) = E[XY] – E[X]E[Y] =
[2/9 x 1 + 2/9 x 2] – [1 x 2/3] = 0.
Therefore, independence implies
uncorrelated, but the inverse is not
true.