Download Random Variables & Expectation

Document related concepts
no text concepts found
Transcript
Random Variables & Expectation
Random Variable
A random variable (r.v.) is a well defined rule for
assigning a numerical value to all
possible outcomes of an experiment.
example:
experiment:
outcomes:
sample space S:
random variable:
taking a course
grades A, B, C, D, F
discrete & finite
Y = 4 if grade is A
Y = 3 if grade is B
Y = 2 if grade is C
Y = 1 if grade is D
Y = 0 if grade is F
Experiment: throw 2 dice
What are the possible outcomes?
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
Define the random variable X to be
the sum of the dots on the 2 dice.
For which outcomes does X = 9
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
For which outcomes does X = 9
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
What is Pr(X=9)?
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
Since there are 36 equally likely outcomes, each has a probability of 1/36.
So since there are 4 outcomes that yield X=9,
Pr(X=9) = 4/36 =1/9
Let’s calculate the probabilities of all the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
Pr(X=x)
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
Pr(X=x)
1/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
Pr(X=x)
1/36
2/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
Pr(X=x)
1/36
2/36
3/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
Pr(X=x)
1/36
2/36
3/36
4/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
8
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
8
9
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
8
9
10
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
8
9
10
11
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
Let’s calculate the probabilities of the
possible values x of the random variable X
x
1,1
2,1
3,1
4,1
5,1
6,1
1,2
2,2
3,2
4,2
5,2
6,2
1,3
2,3
3,3
4,3
5,3
6,3
1,4
2,4
3,4
4,4
5,4
6,4
1,5
2,5
3,5
4,5
5,5
6,5
1,6
2,6
3,6
4,6
5,6
6,6
2
3
4
5
6
7
8
9
10
11
12
Pr(X=x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
Let’s graph the probability distribution of X.
Pr(X=x)
x Pr(X=x)
2
3
4
5
6
7
8
9
10
11
12
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
8/36
6/36
4/36
2/36
0
2 3 4 5 6 7 8 9 10 11 12
x
Pr(X=x) = f(x) = p(x)
as described in this table or graph is called the
probability distribution or probability mass function (p.m.f.)
Pr(X=x)
x Pr(X=x)
2
3
4
5
6
7
8
9
10
11
12
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
8/36
6/36
4/36
2/36
0
2 3 4 5 6 7 8 9 10 11 12
x
Properties of Probability Distributions
1. 0 ≤ Pr(X=x) ≤ 1 for all x
2.
 p ( x)  1
x
Cumulative Mass Function
F ( x0 )  Pr( X  x0 )   p( x)
x  x0
Cumulative Mass Function (2 dice problem)
x Pr(X=x)
2
1/36
3
2/36
4
3/36
5
4/36
6
5/36
7
6/36
8
5/36
9
4/36
10
3/36
11
2/36
12
1/36
Pr(X≤x)
1/36
3/36
6/36
10/36
15/36
21/36
26/36
30/36
33/36
35/36
1
1
F(x)
30/36
24/36
18/36
12/36
6/36
0 1 2 3 4 5 6 7 8 9 10 11 12 13
x
Expectation, Expected Value, or Mean
of a Random Variable
  E ( X )   xp( x)
x
Notice the similarity of the definitions of
the mean of a random variable & the mean of
a frequency distribution for a population
  E ( X )   xp( x)
x
 fi 
pop.freq. distrib.:   (1/ N ) xi f i   xi  
i 1
i 1
N
c
c
Recall that probability [p(x)] is the relative frequency
[f/N] with which something occurs over the long run.
So these definitions are saying the same thing.
Example: Suppose that a stock broker wants to
estimate the price of a certain stock one year from
now. If the probability mass function of the price in
a year is as given, determine the expected price.
x = price in one year
94
98
102
106
p(x)
0.25
0.25
0.25
0.25
Example: Suppose that a stock broker wants to
estimate the price of a certain stock one year from
now. If the probability mass function of the price in
a year is as given, determine the expected price.
x = price in one year
94
98
102
106
p(x)
0.25
0.25
0.25
0.25
1.00
Example: Suppose that a stock broker wants to
estimate the price of a certain stock one year from
now. If the probability mass function of the price in
a year is as given, determine the expected price.
x = price in one year
94
98
102
106
p(x)
0.25
0.25
0.25
0.25
1.00
xp(x)
23.5
24.5
25.5
26.5
Example: Suppose that a stock broker wants to
estimate the price of a certain stock one year from
now. If the probability mass function of the price in
a year is as given, determine the expected price.
x = price in one year
94
98
102
106
p(x)
0.25
0.25
0.25
0.25
1.00
xp(x)
23.5
24.5
25.5
26.5
100.0
Notice that you do NOT divide by the number of observations
when you’re done adding.
Also, the probabilities do not have to be equal; they just have
to add up to one.
Theorem: Suppose that g(X) is a function of a
random variable X, & the probability mass function of
X is px(x). Then the expected value of g(X) is
E[ g ( X )]   g ( x) px ( x)
x
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
p(y)
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
1
p(y)
0.5
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
1
4
p(y)
0.5
0.5
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
1
4
p(y)
0.5
0.5
yp(y)
0.5
2.0
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
1
4
p(y)
yp(y)
0.5
0.5
0.5
2.0
E(Y) = 2.5
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
4
1
1
4
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
4
1
1
4
ypx(x)
0.4
0.2
0.3
1.6
Example: Suppose Y = X2 & the distribution of X is as
given below. Determine the mean of g(X) by using
1. the definition of expected value, &
2. the previous theorem.
x
-2
-1
1
2
p(x)
0.1
0.2
0.3
0.4
y
ypx(x)
4
0.4
1
0.2
1
0.3
4
1.6
E(Y) = 2.5
Definition:
Variance of a random variable X
  V ( X )  E[( X   ) ]
2
2
  ( X   ) p ( x)
2
x
Theorem:
The variance of X can also be
calculated as follows:
  V(X)  E(X )  [E(X)]
2
2
2
Standard Deviation of a random variable X
    V (X )
2
Example: Suppose sales at a donut shop are distributed as below.
Calculate (a) the mean number of donuts sold, (b) the variance
(using both the definition of the variance & the theorem), & (c) the
standard deviation.
x
p(x)
1
0.08
2
0.27
4
0.10
6
0.33
12 0.22
First, the mean….
x
p(x)
xp(x)
1
0.08
0.08
2
0.27
0.54
4
0.10
0.40
6
0.33
1.98
12 0.22
2.64
First, the mean….
x
p(x)
xp(x)
1
0.08
0.08
2
0.27
0.54
4
0.10
0.40
6
0.33
1.98
12 0.22
2.64
=5.64
Next, the variance using the definition:
 2  V ( X )  E[( X   ) 2 ]   ( X   ) 2 p( x)
x
x
p(x)
xp(x)
x-
1
0.08
0.08
-4.64
2
0.27
0.54
-3.64
4
0.10
0.40
-1.64
6
0.33
1.98
0.36
12 0.22
2.64
6.36
=5.64
Next, the variance using the definition:
2
2
2
  V ( X )  E[( X   ) ]   ( X   ) p( x)
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
2
0.27
0.54
-3.64 13.25
4
0.10
0.40
-1.64
2.69
6
0.33
1.98
0.36
0.13
12 0.22
2.64
6.36
40.45
=5.64
x-
(x-)2
x
Next, the variance using the definition:
  V ( X )  E[( X   ) ]   ( X   ) p( x)
2
2
2
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
1.72
2
0.27
0.54
-3.64 13.25
3.58
4
0.10
0.40
-1.64
2.69
0.27
6
0.33
1.98
0.36
0.13
0.04
12 0.22
2.64
6.36
40.45
8.90
=5.64
x-
(x-)2 (x-)2p(x)
x
Next, the variance using the definition:
 2  V ( X )  E[( X   ) 2 ]   ( X   ) 2 p( x)
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
1.72
2
0.27
0.54
-3.64 13.25
3.58
4
0.10
0.40
-1.64
2.69
0.27
6
0.33
1.98
0.36
0.13
0.04
12 0.22
2.64
6.36
40.45
8.90
=5.64
x-
(x-)2 (x-)2p(x)
x
2 =14.51
Now, the variance using the theorem:
V(X) = E(X2)-[E(X)]2.
x2
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
1.72
1
2
0.27
0.54
-3.64 13.25
3.58
4
4
0.10
0.40
-1.64
2.69
0.27
16
6
0.33
1.98
0.36
0.13
0.04
36
12 0.22
2.64
6.36
40.45
8.90
144
=5.64
x-
(x-)2 (x-)2p(x)
x
2 =14.51
Now, the variance using the theorem:
V(X) = E(X2)-[E(X)]2.
(x-)2 (x-)2p(x)
x2
x2p(x)
1.72
1
0.08
-3.64 13.25
3.58
4
1.08
0.40
-1.64
2.69
0.27
16
1.60
0.33
1.98
0.36
0.13
0.04
36
11.88
12 0.22
2.64
6.36
40.45
8.90
144
31.68
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
2
0.27
0.54
4
0.10
6
=5.64
x-
2 =14.51
Now, the variance using the theorem:
V(X) = E(X2)-[E(X)]2.
(x-)2 (x-)2p(x)
x2
x2p(x)
1.72
1
0.08
-3.64 13.25
3.58
4
1.08
0.40
-1.64
2.69
0.27
16
1.60
0.33
1.98
0.36
0.13
0.04
36
11.88
12 0.22
2.64
6.36
40.45
8.90
144
31.68
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
2
0.27
0.54
4
0.10
6
=5.64
x-
2 =14.51
E(X2)=46.32
Now, the variance using the theorem:
V(X) = E(X2)-[E(X)]2.
(x-)2 (x-)2p(x)
x2
x2p(x)
1.72
1
0.08
-3.64 13.25
3.58
4
1.08
0.40
-1.64
2.69
0.27
16
1.60
0.33
1.98
0.36
0.13
0.04
36
11.88
12 0.22
2.64
6.36
40.45
8.90
144
31.68
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
2
0.27
0.54
4
0.10
6
=5.64
x-
2 =14.51
2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51
E(X2)=46.32
And lastly, the standard deviation,
by taking the square root of the variance.
(x-)2 (x-)2p(x)
x2
x2p(x)
1.72
1
0.08
-3.64 13.25
3.58
4
1.08
0.40
-1.64
2.69
0.27
16
1.60
0.33
1.98
0.36
0.13
0.04
36
11.88
12 0.22
2.64
6.36
40.45
8.90
144
31.68
x
p(x)
xp(x)
1
0.08
0.08
-4.64 21.53
2
0.27
0.54
4
0.10
6
=5.64
x-
2 =14.51
E(X2)=46.32
2 = V(X) = E(X2) – [E(X)]2 = 46.32 – (5.64)2 = 14.51
 = 3.81
Important Theorem
If X has mean  and variance 2,
then (X-)/ has mean 0 and variance 1.
Example: (G-)/
Suppose your course grades have a mean of
2.7 and a standard deviation of 1.2.
Suppose you took your grades, subtracted
2.7 from each one, then divided those
results by 1.2.
The new set of numbers would have a mean
of 0 and a standard deviation of 1.
Expectation Rules
Let k, a, & b be constants.
1. E(k) = k
The mean of a constant is the constant.
2. V(k) = 0
The variance of a constant is zero.
3. E(a + bX) = a + b E(X)
4. V(a + bX) = b2 V(X)
Example: If X has a mean of 3
and a variance of 2/3, what are the
mean and variance of Y=5+2X ?
First find the mean E(Y) = E(5+2X).
E(a + bX) = a + b E(X).
Let a=5 & b=2. Then just plug into the formula. So,
E(Y) = E(5+2X) = 5 + 2 E(X) = 5 + 2(3) = 11.
Next find the variance V(Y) = V(5+2X).
V(a + bX) = b2 V(X).
Again let a=5 and b=2 and just plug into the formula.
V(Y) = V(5+2X) = 22 V(X) = 4 V(X) = 4(2/3) = 8/3.
Notice that the constant term shifts the mean but has no
effect on the spread of the distribution.
Joint Probability Distribution for 2
Discrete Random Variables X & Y
p(x,y) = Pr(X=x and Y=y)
Properties
of Joint Probability Distributions
1.
0  p(x, y)  1 f or all x and y
2.
  p(x, y )  1
x
y
Example: Consider the following joint distribution of
the number of jobs & the number of promotions of
college graduates in their 1st 5 years out of college.
Number of
jobs (x)
Number of Promotions (y)
1
2
3
4
1
0.10
0.15
0.12
0.06
2
0.05
0.07
0.10
0.05
3
0.04
0.02
0.14
0.10
For example,
the probability of 3 jobs & 2 promotions is 0.02.
Number of
jobs (x)
Number of Promotions (y)
1
2
3
4
1
0.10
0.15
0.12
0.06
2
0.05
0.07
0.10
0.05
3
0.04
0.02
0.14
0.10
We can determine the marginal distribution
of the 2 random variables X & Y
just as we did before for 2 events.
Just add across the row or down the column.
Number of
jobs (x)
Number of Promotions (y)
1
2
3
4
1
0.10
0.15
0.12
0.06
2
0.05
0.07
0.10
0.05
3
0.04
0.02
0.14
0.10
For the probability of 1 job…
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
3
0.04
0.02
0.14
0.10
Number of
jobs (x)
Number of Promotions (y)
Similarly for the probabilities of 2 or 3 jobs …
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
Number of
jobs (x)
Number of Promotions (y)
For the probability of 1 promotion …
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
Number of
jobs (x)
Number of Promotions (y)
pY(y):
marginal 0.19
prob. of y
and for the probabilities of 2, 3, or 4 promotions …
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal 0.19
prob. of y
0.24
0.36
0.21
Number of
jobs (x)
Number of Promotions (y)
Notice again, that you must get at total one when
you total the marginal probabilities for x and for y.
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal 0.19
prob. of y
0.24
0.36
0.21
1.00
Number of
jobs (x)
Number of Promotions (y)
Conditional Probabilities for Random Variables
Example
The probability that X is 2 given that Y is 3:
pX|Y(2|3) = Pr(X=2|Y=3)
= Pr(X=2 & Y=3)/Pr(Y=3).
The probability that Y is 2 given that X is 3:
pY|X(2|3) = Pr(Y=2|X=3)
= Pr(Y=2 & X=3)/Pr(X=3).
Let’s do the calculations using our previous example.
1
2
3
4
pX(x):
marginal
prob. of x
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
Number
of jobs (x)
Number of Promotions (y)
pY(y):
marginal 0.19
prob. of y
pX|Y(2|3) = Pr(X=2|Y=3)
= Pr(X=2 & Y=3)/Pr(Y=3)
0.10/0.36 = 0.278.
pY|X(2|3) = Pr(Y=2|X=3)
= Pr(Y=2 & X=3)/Pr(X=3)
= 0.02/0.30 = 0.067.
0.24
0.36
0.21
1.00
Cumulative Joint Mass Function for
2 Discrete Random Variables X & Y
F(X,Y) = Pr(X ≤ x and Y ≤ y)
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3)
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1) …
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) …
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
…
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
…
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) …
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) + f(2,2)
…
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) + f(2,2)
+ f(2,3) …
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) + f(2,2)
+ f(2,3)
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) + f(2,2)
+ f(2,3)
= 0.10 + 0.15
+ 0.12 + 0.05
+ 0.07 + 0.10
Job/Promotion Example: Find probability that a
person had 2 or fewer jobs & 3 or fewer promotions
Number of Promotions
(y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal
0.19
prob. of
y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of
x
F(2,3) = f(1,1)
+ f(1,2) + f(1,3)
+ f(2,1) + f(2,2)
+ f(2,3)
= 0.10 + 0.15
+ 0.12 + 0.05
+ 0.07 + 0.10
= 0.59
Independence
Recall that 2 events A & B were independent
if Pr(A∩B)=Pr(A) Pr(B)
Similarly 2 random variables are independent
if p(x,y) = pX(x) pY(y) for all values of x & y
In our previous example, are the number of
jobs & number of promotions independent?
Number of Promotions (y)
2
3
4
1
0.10
0.15
0.12
0.06
0.43
2
0.05
0.07
0.10
0.05
0.27
3
0.04
0.02
0.14
0.10
0.30
pY(y):
marginal 0.19
prob. of y
0.24
0.36
0.21
1.00
Number
of jobs (x)
1
pX(x):
marginal
prob. of x
We must have
p(x,y) = pX(x) pY(y)
for all values of x & y.
To start, does
p(1,1) equal pX(1) pY(1) ?
p(1,1) = 0.10
pX(1) pY(1) = 0.43 • 0.19
= 0.0817
≠ 0.10
So X & Y are not independent.
If that case had been equal,
we wouldn’t be done yet.
We’d have to verify that
equality held for all the cells.
Theorem: mean of a function of
2 random variables X & Y
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Suppose that based on the joint distribution of the length X & width
Y of lumber sold by a lumberyard, we would like to determine the
mean length, mean width, & mean area of the lumber.
So we want to calculate
E(X),
E(Y), and
E(XY).
Given the joint distribution below,
calculate E(X), E(Y), & E(XY).
Y
2
4
6
4
0.05
0.05
0.10
8
0.10
0.50
0.20
X
First, determine the marginal distributions.
Y
2
4
6
4
0.05
0.05
0.10
8
0.10
0.50
0.20
X
The marginal distribution of X ...
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
X
The marginal distribution of Y ...
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
0.15
0.55
0.30
X
pY(y)
Check that the marginal distribution
probabilities sum to 1.
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
0.15
0.55
0.30
1.00
X
pY(y)
Next we calculate the mean length & mean width.
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
For E(X),
remember we need to multiply the values by their probabilities
and add up.
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
x
p(x)
xp(x)
We get the values of X and their probabilities …
Y
pX(x)
x
p(x)
2
4
6
4
0.05
0.05
0.10
0.20
4
0.20
8
0.10
0.50
0.20
0.80
8
0.80
pY(y)
0.15
0.55
0.30
1.00
X
xp(x)
multiply …
Y
pX(x)
x
p(x)
xp(x)
2
4
6
4
0.05
0.05
0.10
0.20
4
0.20
0.80
8
0.10
0.50
0.20
0.80
8
0.80
6.40
pY(y)
0.15
0.55
0.30
1.00
X
and add up.
Y
pX(x)
x
p(x)
xp(x)
2
4
6
4
0.05
0.05
0.10
0.20
4
0.20
0.80
8
0.10
0.50
0.20
0.80
8
0.80
6.40
pY(y)
0.15
0.55
0.30
1.00
X
7.20
We now have our E(X).
Y
pX(x)
x
p(x)
xp(x)
2
4
6
4
0.05
0.05
0.10
0.20
4
0.20
0.80
8
0.10
0.50
0.20
0.80
8
0.80
6.40
pY(y)
0.15
0.55
0.30
1.00
X
E(X) = 7.20
For E(Y), we do the same thing.
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
y
p(y)
yp(y)
Get the values of Y and their probabilities …
Y
pX(x)
2
4
0.05
4
0.05
y
p(y)
2
0.15
4
0.55
6
0.30
6
0.10
0.20
X
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
yp(y)
multiply …
Y
pX(x)
2
4
0.05
4
0.05
y
p(y)
yp(y)
2
0.15
0.30
4
0.55
2.20
6
0.30
1.80
6
0.10
0.20
X
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
and add up.
Y
pX(x)
2
4
0.05
4
0.05
y
p(y)
yp(y)
2
0.15
0.30
4
0.55
2.20
6
0.30
1.80
6
0.10
0.20
X
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
4.30
There’s our E(Y).
Y
pX(x)
2
4
0.05
4
0.05
y
p(y)
yp(y)
2
0.15
0.30
4
0.55
2.20
6
0.30
1.80
6
0.10
0.20
X
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
E(Y) = 4.30
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
For the mean area, E(XY), the theorem translates to
E[ XY ]   xy p( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
E[XY] 
 xy p(x, y)
x
y
To keep track of the xy terms, we are going to put them in our table.
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10
0.50
0.20
0.80
pY(y)
0.15
0.55
0.30
1.00
X
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50
0.20
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
Next, we need to multiple the xy terms by the corresponding probabilities, …
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
… and then add it all up.
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
So we have 0.05 (8) ...
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
So we have 0.05 (8) + 0.05 (16) ...
y
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
So we have 0.05 (8) + 0.05 (16) + 0.10 (24) ...
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) ...
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) ...
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48) ...
To calculate the mean area E(XY), we use the theorem
E[ g ( X , Y )]   g ( x, y ) p( x, y )
x
y
Y
pX(x)
2
4
6
4
0.05 (8)
0.05 (16)
0.10 (24)
0.20
8
0.10 (16)
0.50 (32)
0.20 (48)
0.80
0.15
0.55
0.30
1.00
X
pY(y)
E[ XY ]    xy p ( x, y )
x
y
So we have 0.05 (8) + 0.05 (16) + 0.10 (24) + 0.10 (16) + 0.50 (32) + 0.20 (48)
=
30.8 for the mean area.
You might wonder if we could get E(XY)
by just multiplying E(X) by E(Y).
The answer is generally not.
In our example, we had
E(X) = 7.2, E(Y) =4.3, & E(XY) = 30.8
E(X) E(Y) = 30.96, not 30.80.
Close in this case, but not the same.
If X and Y are independent, then it is true
that E(XY) = E(X) E(Y).
It may also hold occasionally in other cases.
But generally, it doesn’t work.
Definition: Covariance of X & Y
C ( X ,Y )  E[( X   X )(Y  Y )]
  ( x   X )( y  Y ) p( x, y)
x
y
What does this mean?
Suppose that two variables tend to move in the same direction,
like study time and grades.
Next, when x is large, so that it is larger than its mean, then x-X > 0.
When x is large, y tends to be large as well, so that y-Y > 0 also.
Remember, that the p(x,y) values are probabilities and therefore
must be positive.
So those terms in the formula would look like
C ( X , Y )   ( x   X )( y  Y ) p( x, y )
x
y
+
These products are positive.
+
+
Similarly, since x and y tend to be small together,
we have x-X < 0 with y-Y<0 too.
Those terms would look like
C ( X , Y )   ( x   X )( y  Y ) p( x, y )
x
y
-
-
+
These products are positive too.
So we’re adding up a lot of positive numbers.
What all that means is that when 2 variables tend to move in the
same direction, the covariance will positive.
When 2 variables tend to move in opposite directions,
their covariance C(X,Y) < 0,
perhaps like party time and grades.
If variables don’t tend to move either
in the same or opposite directions,
their covariance C(X,Y) = 0.
This case includes independent variables.
It is usually easier to calculate covariances
using this theorem.
Theorem: C(X,Y) = E(XY) – E(X) E(Y)
Returning to the lumber example
Remember we had
E(X) = 7.2, E(Y) = 4.3, & E(XY) = 30.8
Then the covariance would be
C(X,Y) = E(XY) – E(X) E(Y)
= (30.8) – (7.2)(4.3)
= - 0.16
Difficulty
The value of the covariance changes when
you change units.
That is, you get different answers if you use
feet, inches, or meters.
So it’s difficult to tell if a particular answer
means a strong relationship or not.
Fortunately, we have a solution to this
problem …
Correlation Coefficient
The correlation coefficient is similar to the
covariance, but it doesn’t vary with the
units used.
Correlation Coefficient
 ( X ,Y ) 
C ( X ,Y )
 X Y
The correlation coefficient is denoted by the
Greek letter rho, .
It’s computed by dividing the covariance of X
& Y by the standard deviations of X & of Y.
The correlation coefficient is
always between -1 and 1.
-1 ≤  ≤ 1.
Correlation Coefficient
-1 ≤  ≤ 1
So, if your correlation coefficient  is close to
1, you have a strong positive relationship.
If it is close to -1, you have a strong negative
relationship.
If it is close to zero, there is no strong linear
relationship at all.
Back to the lumber example again
 ( X ,Y ) 
C ( X ,Y )
 X Y
We had C(X,Y) = -0.16.
We need the standard deviations of X and Y,
which we have not calculated yet.
This is what we had for X so far.
x
p(x)
xp(x)
4
0.20
0.80
8
0.80
6.40
E(X) = 7.20
Recall we said previously that we can calculate V(X)
as V(X) = E(X2) – [E(X)]2.
x
p(x)
xp(x)
We have E(X) but
we need E(X2).
4
0.20
0.80
8
0.80
6.40
The theorem
E[g(X)] = Sg(x)p(x)
gives us
E(X) = 7.20
E(X2) = Sx2p(x)
E(X2) = Sx2p(x)
x
p(x)
xp(x)
x2
4
0.20
0.80
16
8
0.80
6.40
64
E(X) = 7.20
x2p(x)
E(X2) = Sx2p(x)
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
E(X2) = Sx2p(x)
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
E(X2) = 54.4
Now we need to subtract to get V(X).
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
V(X) = E(X2) – [E(X)]2
E(X2) = 54.4
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
E(X2) = 54.4
V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
E(X2) = 54.4
V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56
Take the square root
to get the standard deviation X
x
p(x)
xp(x)
x2
x2p(x)
4
0.20
0.80
16
3.2
8
0.80
6.40
64
51.2
E(X) = 7.20
E(X2) = 54.4
V(X) = E(X2) – [E(X)]2 = 54.4 – (7.2)2 =2.56
X = 1.60
We do the same thing with Y.
y
p(y)
yp(y)
2
0.15
0.30
4
0.55
2.20
6
0.30
1.80
E(Y) = 4.30
Get y2
y
p(y)
yp(y)
y2
2
0.15
0.30
4
4
0.55
2.20
16
6
0.30
1.80
36
E(Y) = 4.30
y2p(y)
Multiply by p(y).
y
p(y)
yp(y)
y2
y2p(y)
2
0.15
0.30
4
0.60
4
0.55
2.20
16
8.80
6
0.30
1.80
36
10.80
E(Y) = 4.30
Add to get E(Y2).
y
p(y)
yp(y)
y2
y2p(y)
2
0.15
0.30
4
0.60
4
0.55
2.20
16
8.80
6
0.30
1.80
36
10.80
E(Y) = 4.30
E(Y2) = 20.20
Subtract to get V(Y).
y
p(y)
yp(y)
y2
y2p(y)
2
0.15
0.30
4
0.60
4
0.55
2.20
16
8.80
6
0.30
1.80
36
10.80
E(Y) = 4.30
E(Y2) = 20.20
V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71
Take the square root to get
the standard deviation Y
y
p(y)
yp(y)
y2
y2p(y)
2
0.15
0.30
4
0.60
4
0.55
2.20
16
8.80
6
0.30
1.80
36
10.80
E(Y) = 4.30
E(Y2) = 20.20
V(Y) = E(Y2) – [E(Y)]2 = 20.20 – (4.3)2 =1.71
Y = 1.31
Now we have everything we need to compute the
correlation coefficient for the lumber problem.
 ( X ,Y ) 
C ( X ,Y )
 X Y
0.16

 0.076
(1.60)(1.31)
This number is much closer to 0 than it is to -1.
So the negative relation between the length &
width of the lumber is very weak.
Theorem
1. E(aX + bY) = aE(X) + bE(Y)
2. V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]
Example:
The mean & variance of X are 1 & 5 respectively.
The mean & variance of Y are 2 & 6 respectively.
The covariance of X & Y is 7.
Determine the mean & variance of 4X + 3Y.
Recall:
E(aX + bY) = aE(X) + bE(Y)
V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]
To solve this problem what should “a” & “b” be?
a is 4 & b is 3.
E(aX + bY) = aE(X) + bE(Y) = 4 (1) + 3(2)
= 4 + 6 =10
V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]
= 42V(X) + 32V(Y) + 2(4)(3)C(X,Y)
= 16(5) + 9(6) +24(7)
= 80 + 54 + 168
=302
Consider the following joint distribution of X & Y.
y
2
1
x
Determine the following:
4
0.20 0.25
3
0.15 0.20
5
0.15 0.05
a. The mean & variance of X
b. The mean & variance of Y
c. The covariance & correlation
coefficient of X & Y
d. The mean & variance of X+Y
First, determine the marginal distribution of X
y
2
x
4
pX(x)
1
0.20 0.25 0.45
3
0.15 0.20 0.35
5
0.15 0.05 0.20
and the marginal distribution of Y.
y
2
x
4
pX(x)
1
0.20 0.25 0.45
3
0.15 0.20 0.35
5
0.15 0.05 0.20
pY(y)
0.50 0.50
Verify that they sum to 1.
y
2
x
4
pX(x)
1
0.20 0.25 0.45
3
0.15 0.20 0.35
5
0.15 0.05 0.20
pY(y)
0.50 0.50
1
Set up table to compute the mean & variance of X.
x
y
pX(x)
2
4
1
0.20
0.25 0.45
3
0.15
0.20 0.35
5
0.15
0.05 0.20
pY(y)
0.50
0.50
x
1
p(x) xp(x) x2p(x)
Fill in the values of X and their probabilities.
y
x
p(x) xp(x) x2p(x)
1
0.45
pX(x)
2
4
1
0.20
0.25 0.45
3
0.35
3
0.15
0.20 0.35
5
0.20
5
0.15
0.05 0.20
pY(y)
0.50
0.50
x
1
Multiply x by p(x).
x
p(x) xp(x) x2p(x)
1
0.45 0.45
3
0.35 1.05
5
0.20 1.00
Add to get the mean of X.
x
p(x)
xp(x)
1
0.45
0.45
3
0.35
1.05
5
0.20
1.00
E(X) =2.50
x2p(x)
To calculate the variance,
first compute E(X2) = S x2p(x).
x
p(x)
xp(x)
x2p(x)
1
0.45
0.45
0.45
3
0.35
1.05
3.15
5
0.20
1.00
5.00
E(X) =2.50
To calculate the variance,
first compute E(X2) = S x2p(x).
x
p(x)
xp(x)
x2p(x)
1
0.45
0.45
0.45
3
0.35
1.05
3.15
5
0.20
1.00
5.00
E(X) =2.50
E(X2)=8.60
Calculate the variance as
V(X) = E(X2) – [E(X)]2.
x
p(x)
xp(x)
x2p(x)
1
0.45
0.45
0.45
3
0.35
1.05
3.15
5
0.20
1.00
5.00
E(X) =2.50
E(X2)=8.60
V(X) = E(X2) – [E(X)]2 = 8.6 – (2.5)2 = 2.35
Set up table to compute the mean & variance of Y.
y
2
pX(x)
4
1
0.20
0.25 0.45
3
0.15
0.20 0.35
5
0.15
0.05 0.20
pY(y)
0.50
0.50
x
1
y
p(y) yp(y) y2p(y)
Fill in the values of Y and their probabilities.
y
2
1
x
0.20
pX(x)
4
y
p(y) yp(y) y2p(y)
2
0.5
4
0.5
0.25 0.45
3
0.15
0.20 0.35
5
0.15
0.05 0.20
pY(y)
0.50
0.50
1
Multiply y by p(y)
y
p(y) yp(y) y2p(y)
2
0.5
1
4
0.5
2
and add to get E(Y).
y
p(y)
yp(y)
2
0.5
1
4
0.5
2
E(Y)= 3
y2p(y)
To calculate the variance,
first compute E(Y2) = S y2p(y).
y
p(y)
yp(y)
y2p(y)
2
0.5
1
2
4
0.5
2
8
E(Y)= 3
To calculate the variance,
first compute E(Y2) = S y2p(y).
y
p(y)
yp(y)
y2p(y)
2
0.5
1
2
4
0.5
2
8
E(Y)= 3
E(Y2) = 10
Calculate the variance as
V(Y) = E(Y2) – [E(Y)]2.
y
p(y)
yp(y)
y2p(y)
2
0.5
1
2
4
0.5
2
8
E(Y)= 3
E(Y2) = 10
V(Y) = E(Y2) – [E(Y)]2 = 10 – (3)2 = 1
To determine the C(X,Y) = E(XY) - E(X) E(Y), we need
E ( XY )   xy p( x, y)
x
y
As before, we’ll put the xy values in the table
next to the probability values
y
x
pX(x)
2
4
1
0.20 (2)
0.25 (4)
0.45
3
0.15 (6)
0.20 (12)
0.35
5
0.15 (10)
0.05 (20)
0.20
0.50
0.50
1.00
pY(y)
Then we multiply and add.
y
x
pX(x)
2
4
1
0.20 (2)
0.25 (4)
0.45
3
0.15 (6)
0.20 (12)
0.35
5
0.15 (10)
0.05 (20)
0.20
0.50
0.50
1.00
pY(y)
E(XY) = (0.20)(2) + (0.25)(4) + (0.15)(6) + (0.20)(12) + (0.15)(10) + (0.05)(20)
= 0.40
= 7.20
+
1.00
+
0.90 +
2.40
+
1.50
+
1.00
C(X,Y) = E(XY) – E(X) E(Y)
Since E(XY) = 7.2, E(X) = 2.5, & E(Y) = 3.0,
C(X,Y) = 7.2 – (2.5)(3)
= 7.2 – 7.5
= -0.3
Next, the correlation coefficient.
Since C(X,Y) = -0.3, V(X)=2.35, & V(Y) =1,
 ( X ,Y ) 
C( X ,Y )
 X Y

0.3
2.35 1
 0.196
The next part of the problem asked
for E(X+Y)
We know that E(X) = 2.5 and E(Y) = 3.0.
E(aX+bY) = a E(X) + b E(Y)
What should “a” & “b” be?
1&1
So E(X+Y) = 1 E(X) + 1E(Y)
= E(X) + E(Y)
= 2.5 + 3.0
=
5.5
Lastly: V(X+Y)
We know V(X) = 2.35, V(Y) = 1, & C(X,Y) = -0.3.
V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)]
What are “a” & “b” ?
1&1
V(aX+bY) = a2 V(X) + b2 V(Y) + 2ab [C(X,Y)]
= 12 V(X) + 12 V(Y) + 2(1)(1)[C(X,Y)]
= V(X) + V(Y) + 2[C(X,Y)]
= 2.35 + 1 + 2 (-0.3)
= 2.75
Specific Discrete Distributions
1.
2.
3.
4.
5.
Uniform
Binomial
Hypergeometric
Multinomial
Poisson
Uniform Distribution
The uniform distribution assigns all the
possible values equal probabilities.
example: a fair die has possible values
1, 2, 3, 4, 5, and 6
each with probability 1/6.
Graph of Uniform Distribution
Example: Fair Die
Probability
1/6
0
1 2 3 4 5
6
value on die
Binomial Distribution
Example: What is the probability of getting
3 heads on 5 tosses of an unfair (lopsided)
coin whose probability on any toss of getting
a head is 1/3.
What is the probability of getting specifically
HTHHT ?
(1/3) (2/3) (1/3) (1/3) (2/3)
= (1/3)3 (2/3)2
What is the probability of any other specific outcome
with 3 heads on 5 tosses?
The same.
So we just have to figure out how many different ways
you can get 3 heads on 5 tosses, and multiply that by
the probability of each individual outcome.
That will give us the probability of getting 3 heads on 5
tosses.
How many ways can you get 3 heads on 5 tosses?
It’s the number of combinations of 5 objects taken 3 at a time.
5!
5!
120
C3 


 10
5
3!(5  3)! 3! 2! (6)(2)
So the probability of getting 3 heads on 5 tosses is
1
C3  
5
3
3
2
2
 1  4  40
   (10)   
 0.1646
3
 27  9  243
In general, the probability of getting
x successes on n trials in which the probability of
success on any given trial is p is
( n C x )p 1  p )
x
n x
This is the binomial distribution.
Notes
1. 0! = 1
2. Each trial that can result in either success
or failure is called a Bernoulli trial.
Example: If the probability that any person passes
this course is 0.95, what is the probability that in a
a class of 30 people, exactly 28 people pass?
( n C x )p 1  p )
x
n x
( 30 C 28 )(0.95) 0.05)  0.259
28
2
30!
30  29  28! 30  29
where n Cx 


 15  29  435
28! 2!
28! 2!
2
Let’s go back to the example in which we flipped a coin
5 times & the probability of heads on each toss was 1/3.
For 3 heads, the probability was 0.1646.
Using the binomial formula, we can determine the
probabilities of the other possibilities.
x
0
1
2
3
4
5
p(x)
0.1317
0.3292
0.3292
0.1646
0.0412
0.0041
1
If we graph this distribution, it looks like:
probability
x
0
1
2
3
4
5
p(x)
0.1317
0.3292
0.3292
0.1646
0.0412
0.0041
1
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0
1
2
3
4
5
number of heads
Notice that there is a bump on the left and a tail on the right.
Such a distribution is said to be skewed to the right.
The skew is where the tail is.
Binomial Distribution
The binomial distribution graph we just did was for
p = 1/3 and the skew was to the right.
A binomial distribution with p < ½ will always have
a skew to the right.
What do you think the distribution will look like if
p>½?
It will be skewed to the left. (The tail will be on the
left & the bump will be on the right.)
Binomial Distribution
What do you think the distribution will look like if p  ½ ?
It will be symmetric. The left and right sides will be mirror
images of each other.
If the number of trials n (tosses in our example) is large, the
graph will be roughly symmetric even if p ≠ ½ .
How large does n have to be for the graph to be roughly
symmetric? That depends on how far p is from ½.
There are two sets of rules that are sometimes used to
determine if the graph is roughly symmetric.
One rule requires that np ≥ 5 and n(1p) ≥ 5.
The other rule requires that np1p) ≥ 3.
These rules are not exactly equivalent, but they both work
reasonably well.
Mean & Variance
of the Binomial Distribution
Mean:
Variance:
 = np
2 = np(1p)
Example: What are the mean, variance, &
standard deviation for our binomial
distribution example in which n=5 & p=1/3?
Mean:
 = np = (5)(1/3) = 5/3
Variance: 2 = np1p) = (5)(1/3)(2/3)= 10/9
Standard Deviation:   10
9
 1.054
Using Excel to calculate
Binomial Probabilities
On an Excel spreadsheet, you can get the
binomial distribution as follows:
click insert, and then click function
select statistical as the category of function,
scroll down to the binomdist function, and
click on it
fill in the information in the dialog box .
Suppose that you wanted to calculate a messy
binomial, such as the probability of between 60
and 70 successes inclusive, on 100 trials with
success probability on each trial of 0.64.
This would be a lot of work with just a calculator.
You would have to calculate 11 separate
binomial probabilities (the probabilities for 60,
61, 62, … 70) and then add them up.
It’s much easier with Excel.
Remember: you want the probability of between
60 and 70 successes inclusive, on 100 trials with
success probability on each trial of 0.64.
You can calculate the (cumulative) probability of
70 or fewer successes.
Then calculate the cumulative probability of 59 or
fewer successes.
Then take the difference.
To get the probability of 70 or fewer successes,
specify the following:
# of successes: 70
# of trials: 100
prob.of success on any trial: 0.64
cumulative: True (because you want 70
or fewer, not just 70)
To get the probability of 59 or fewer successes,
specify the following:
# of successes: 59
# of trials: 100
prob.of success on any trial: 0.64
cumulative: True
Then just subtract the two cumulative
function values you calculated.
If you do this, you get
0.91368 – 0.17394 = 0.7397
We can also study binomial problems
using proportions.
For example, we might want to know the probability of
getting 60% heads on 5 tosses of a coin with probability
of heads on each toss of 1/3. (This is the same as
getting 3 heads.)
In general, if X is the number of successes on n trials,
the proportion of successes is X/n.
We can easily determine the mean & variance of this
binomial proportion variable X/n.
If p again is the probability of success on any given trial,
E(X/n) = p
V(X/n) = p1p)/n
When can we use the binomial distribution?
1. We have exactly two possibilities on
each trial (success or failure, heads or
tails, male or female, yes or no, etc.)
2. The probability of success is the same
on each trial.
3. The trials are independent. (What
happens on one trial has no effect on
what happens on the next trial.)
Sampling with & without Replacement
Suppose we have a bowl with 6 red and 4 green marbles. We
select 3 marbles at random without replacement. We want
to know the probability of selecting exactly 2 red marbles.
What’s the probability of getting a red marble on the 1st draw?
6/10
What’s the probability of getting a red marble on the 2nd draw?
It depends on what we got on the first draw.
If we got a red one, then the probability is 5/9.
If we got a green one, then the probability is 6/9.
Since the probability varies from trial to trial, we can not use
the binomial distribution.
We will discuss very shortly what we use instead.
What if we selected the marbles
with replacement?
Then the probability of a red marble would
be the same on each draw, regardless of
what you pulled out previously.
Then we could use the binomial distribution.
Suppose we instead of having 6 red marbles
and 4 green marbles, we had 6000 red ones
and 4000 green ones.
The probability of red on the 1st draw would be
6,000/10,000 = 0.6 .
If we got red on the 1st draw, the probability of red
on the 2nd draw would be 5999/9999 = 0.59996
If we got green on the 1st draw, the probability of
red on the 2nd would be 6000/9999 = 0.60006
These three numbers are very close.
So you could use the binomial distribution to get a
very good approximation of the probability.
So if we have two options on each trial,
when we can use the binomial distribution?
1. If we sample with replacement, or
2. We sample without replacement, but the
sample is small relative to the population.
A rule that is often used is that the
sample is less than 5% of the population
(n < 0.05 N).
If our sample is more than 5% of
our population, then we will use the
hypergeometric distribution.
Let’s return to our marble problem.
Suppose we have a bowl with 6 red and 4 green marbles.
We select 3 marbles at random without replacement. We
want to know the probability of selecting exactly 2 red
marbles.
Remember that the number of ways of selecting x objects
from n is n C x .
So there are
6
C2
ways of selecting 2 red marbles from 6.
C1 ways of selecting 1 green marble from 4.
There are 10 C 3 ways of selecting 3 marbles from 10.
There are
4
So the probability of getting exactly
2 red marbles on 3 draws will be
# of ways of getting the
2 red marbles out of 6
# of ways of getting the
1 green marble out of 4
( 6 C 2 ) ( 4 C1 )
(10 C 3 )
# of ways of getting 3
marbles out of 10.
and our probability is
 6!   4! 

 

( 6 C 2 ) ( 4 C1 )  2! 4!   1!3! 

(10 C3 )
 10! 


 3! 7! 
(6  5  4  3  2  1) (4  3  2  1)
(2  1)(4  3  2  1) (1)(3  2  1)

(10  9  8  7  6  5  4  3  2  1)
(3  2  1)7  6  5  4  3  2  1)
(15)(4)

120
60

120
 0 .5
The hypergeometric distribution
can also be used if you have more
than 2 categories.
If you had 3 categories, for example, you
would have 3 combinations in the numerator
instead of two.
What do you do if the probabilities are constant from
trial to trial but you have more than 2 categories?
You use the multinomial distribution,
which is a generalization of the binomial.
Recall that the formula for the binomial is
( n C x )p 1  p )
x
n x
where p is the probability of success and 1p is the
probability of failure.
Remember that this is equal to
n!
n x
x
p 1  p )
x! (n - x)!
Suppose we have k outcomes for each trial instead
of 2, and their probabilities are p1, p2, p3, … pk.
Then on n trials, the probability of x1 outcomes of
type 1, x2 outcomes of type 2, x3 outcomes of type 3,
and … xk outcomes of type k would be
n!
prob. 
p 1x1 p 2x2 p 3x3 ...p kxk
x1! x 2!x 3!...x k!
where x1 + x1 + x1 + …+ x1 = n
and
p1 + p2 + p3 + …+ pk = 1
Example: Suppose that at a fair, children pay money to
reach into a container, which holds a large number of toys.
50% are of type 1, 30% are of type 2, & 20% are of type 3.
Sally pays for 3 toys, and reaches into the box and grabs 3
at random. What is the probability that she gets one of
each type?
prob. 
n!
p 1x1 p 2x2 p 3x3 ...p kxk
x1! x 2!x 3!...x k!
3!

(0.50)1 (0.30)1 (0.20)1
1!1!1!

6
(0.50)(0.30)(0.20)
(1)(1)(1)
 6(0.03)  0.18
Our fifth discrete probability distribution
is the Poisson distribution.
The Poisson distribution has outcome
possibilities 0,1, 2, 3, …. that describe the
number of occurrences per unit of time or
per unit of space.
It applies in problems involving requests for
service such as at expressway tollbooths,
supermarket checkout counters, bank teller
windows, airport runways, and repair shops.
Poisson Distribution Formula
e 
p( x) 
x!

x
where x is the number of occurrences and  is the
mean rate of occurrence.
Remember that e is a constant that is approximately
equal to 2.71828.
Example: If a bank serves on average 1 customer per
minute,
(a) what is the probability that exactly 2 customers will
enter the bank in the same particular minute?
The mean rate of occurrence  = 1.
-1
2
e
(
1
)
e 

Pr(X  2) 
2!
x!
-
x
e -1
0.368


 0.184
2
2
(b) What is the probability that 2 or more
customers will enter in the same minute?
We want Pr(X ≥ 2)
= Pr(X=2) + Pr(X=3) + Pr(X=4) + ….
Even though these calculations are going to diminish in
size, you’re going to have to do a lot of calculations to
get a good approximation.
There’s a much easier way to do this problem.
Use the complement.
The complement (or opposite) of “2 or more customers”
is “1 or fewer customers.”
So Pr(X ≥ 2) = 1 - Pr(X ≤ 1) .
Let’s do the problem that way.
(b) What is the probability that 2 or more
customers will enter in the same minute?
The mean rate of occurrence  is still 1.
Pr(X 2)
 1  Pr(X  1)
 1  [Pr(X  0)  Pr(X  1)]
 e-110 e -1 (1)1 
 1 


0
!
1
!


 1  [e 1  e 1 ]
 1  [0.368  0.368]  0.264
e   x
p( x) 
x!
Mean & Variance of a
Poisson Distributed Random Variable
Not surprisingly, the mean is  since we’ve
been referring to that Poisson parameter
as the mean rate of occurrence.
It turns out that the variance is also .
Related documents