Download Introduction to Probability Theory

Document related concepts
no text concepts found
Transcript
Stats 241.3
Probability Theory
Summary
The sample Space, S
The sample space, S, for a random phenomena
is the set of all possible outcomes.
An Event , E
The event, E, is any subset of the sample space,
S. i.e. any set of outcomes (not necessarily all
outcomes) of the random phenomena
S
E
Probability
Suppose we are observing a random phenomena
Let S denote the sample space for the phenomena, the
set of all possible outcomes.
An event E is a subset of S.
A probability measure P is defined on S by defining
for each event E, P[E] with the following properties
1.
2.
P[E] ≥ 0, for each E.
P[S] = 1.

3. P 

i

Ei    P  Ei  if Ei  Ei   for all i, j
 i
P  E1  E2 
  P  E1   P  E2  
Finite uniform probability space
Many examples fall into this category
1. Finite number of outcomes
2. All outcomes are equally likely
3.
PE=
nE
nS 

nE
N
no. of outcomes in E

total no. of outcomes
Note : n  A = no. of elements of A
To handle problems in case we have to be able to
count. Count n(E) and n(S).
Techniques for counting
Basic Rule of counting
Suppose we carry out k operations in sequence
Let
n1 = the number of ways the first operation
can be performed
ni = the number of ways the ith operation can be
performed once the first (i - 1) operations
have been completed. i = 2, 3, … , k
Then N = n1n2 … nk = the number of ways the
k operations can be performed in sequence.
Diagram:
n1






n2
n2
n2



 n3
Basic Counting Formulae
1. Permutations: How many ways can you order n
objects
n!
2. Permutations of size k (< n): How many ways can
you choose k objects from n objects in a specific
order
n
Pk =n  n  1
n!
 n  k  1 
 n  k !
3. Combinations of size k ( ≤ n): A combination of
size k chosen from n objects is a subset of size k
where the order of selection is irrelevant. How many
ways can you choose a combination of size k objects
from n objects (order of selection is irrelevant)
n
n Ck    
k 
n  n  1 n  2   n  k  1
n!

k  k  1 k  2  1
 n  k  !k !
Important Notes
1. In combinations ordering is irrelevant.
Different orderings result in the same
combination.
2. In permutations order is relevant. Different
orderings result in the different permutations.
Rules of Probability
The additive rule
P[A  B] = P[A] + P[B] – P[A  B]
and
P[A  B] = P[A] + P[B] if P[A  B] = 
The additive rule for more than two events
n  n
P  Ai    P  Ai    P  Ai  Aj 
i
j
 i 1  i 1
 P  Ai  Aj  Ak  
i
j
  1
k
n 1
P  A1  A2 
and if Ai  Aj =  for all i ≠ j.
then
n  n
P  Ai    P  Ai 
 i 1  i 1
 An 
The Rule for complements
for any event E
P  E   1  P  E 
Conditional Probability,
Independence
and
The Multiplicative Rue
Then the conditional probability of A given B is
defined to be:
P  A B  
if P  B  0
P  A  B
P  B
The multiplicative rule of probability

 P  A P  B A if P  A  0
P  A  B  


P
B
P
A
B
if
P
B

0








and
P  A  B  P  A P  B
if A and B are independent.
This is the definition of independent
The multiplicative rule for more than
two events
P  A1  A2 
 An  
P  A1  P  A2 A1  P  A3 A2  A1 
P  An An 1  An 2
 A1 
Independence
for more than 2 events
Definition:
The set of k events A1, A2, … , Ak are called
mutually independent if:
P[Ai1 ∩ Ai2 ∩… ∩ Aim] = P[Ai1] P[Ai2] …P[Aim]
For every subset {i1, i2, … , im } of {1, 2, …, k }
i.e. for k = 3 A1, A2, … , Ak are mutually independent if:
P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],
P[A2 ∩ A3] = P[A2] P[A3],
P[A1 ∩ A2 ∩ A3] = P[A1] P[A2] P[A3]
Definition:
The set of k events A1, A2, … , Ak are called
pairwise independent if:
P[Ai ∩ Aj] = P[Ai] P[Aj] for all i and j.
i.e. for k = 3 A1, A2, … , Ak are pairwise independent if:
P[A1 ∩ A2] = P[A1] P[A2], P[A1 ∩ A3] = P[A1] P[A3],
P[A2 ∩ A3] = P[A2] P[A3],
It is not necessarily true that P[A1 ∩ A2 ∩ A3] = P[A1]
P[A2] P[A3]
Bayes Rule for probability
P  A P  B A
P  A B  
P  A P  B A  P  A  P  B A 
An generalization of Bayes Rule
Let A1, A2 , … , Ak denote a set of events such that
S  A1  A2 
 Ak and Ai  Aj  
for all i and j. Then
P  Ai  P  B Ai 
P  Ai B  
P  A1  P  B A1    P  Ak  P  B Ak 
Random Variables
an important concept in probability
A random variable , X, is a numerical quantity
whose value is determined be a random
experiment
Definition – The probability function, p(x), of
a random variable, X.
For any random variable, X, and any real
number, x, we define
p  x   P  X  x   P  X  x
where {X = x} = the set of all outcomes (event)
with X = x.
For continuous random variables p(x) = 0 for all
values of x.
Definition – The cumulative distribution
function, F(x), of a random variable, X.
For any random variable, X, and any real
number, x, we define
F  x   P  X  x   P  X  x
where {X ≤ x} = the set of all outcomes (event)
with X ≤ x.
Discrete Random Variables
For a discrete random variable X the probability
distribution is described by the probability
function p(x), which has the following properties
1.
0  p  x  1

2.
 p  x   p  x   1
x
3.
i 1
P  a  x  b 
i
 p  x
a  x b
Graph: Discrete Random Variable
P  a  x  b 
p(x)
a
 p  x
a  x b
b
Continuous random variables
For a continuous random variable X the probability
distribution is described by the probability density
function f(x), which has the following properties :
1.
f(x) ≥ 0

2.
 f  x  dx  1.

3.
b
P  a  X  b   f  x  dx.
a
Graph: Continuous Random Variable
probability density function, f(x)

 f  x  dx  1.

b
P  a  X  b   f  x  dx.
a
The distribution function F(x)
This is defined for any random variable, X.
F(x) = P[X ≤ x]
Properties
1.
2.
3.
F(-∞) = 0 and F(∞) = 1.
F(x) is non-decreasing (i. e. if x1 < x2 then
F(x1) ≤ F(x2) )
F(b) – F(a) = P[a < X ≤ b].
4.
p(x) = P[X = x] =F(x) – F(x-)
Here
F  x    lim F  u 
ux
5. If p(x) = 0 for all x (i.e. X is continuous)
then F(x) is continuous.
6.
For Discrete Random Variables
F  x   P  X  x   p u 
u x
F(x) is a non-decreasing step function with
F    0 and F     1
p  x   F  x   F  x    jump in F  x  at x.
1.2
F(x)
1
0.8
0.6
0.4
p(x)
0.2
0
-1
0
1
2
3
4
7.
For Continuous Random Variables
Variables
x
F  x   P  X  x    f  u  du

F(x) is a non-decreasing continuous function with
F    0 and F     1
f  x  F   x.
f(x) slope
F(x)
1
0
-1
0
1
x
2
To find the probability density function, f(x), one first
finds F(x) then
f  x  F   x.
Some Important Discrete
distributions
The Bernoulli distribution
Suppose that we have a experiment that has two
outcomes
1. Success (S)
2. Failure (F)
These terms are used in reliability testing.
Suppose that p is the probability of success (S) and
q = 1 – p is the probability of failure (F)
This experiment is sometimes called a Bernoulli Trial
Let
0 if the outcome is F
X 
1 if the outcome is S
q
Then p  x   P  X  x   
p
x0
x 1
The probability distribution with probability function
q x  0
p  x   P  X  x  
p x 1
is called the Bernoulli distribution
1
0.8
0.6
p
q = 1- p
0.4
0.2
0
0
1
The Binomial distribution
We observe a Bernoulli trial (S,F) n times.
Let X denote the number of successes in the n trials.
Then X has a binomial distribution, i. e.
 n  x n x
p  x   P  X  x    p q
x  0,1, 2,
 x
where
1. p = the probability of success (S), and
2. q = 1 – p = the probability of failure (F)
,n
The Poisson distribution
• Suppose events are occurring randomly and
uniformly in time.
• Let X be the number of events occuring in a
fixed period of time. Then X will have a
Poisson distribution with parameter l.
p  x 
lx
x!
e
l
x  0,1, 2,3, 4,
The Geometric distribution
Suppose a Bernoulli trial (S,F) is repeated until a
success occurs.
X = the trial on which the first success (S)
occurs.
The probability function of X is:
p(x) =P[X = x] = (1 – p)x – 1p = p qx - 1
The Negative Binomial distribution
Suppose a Bernoulli trial (S,F) is repeated until k
successes occur.
Let X = the trial on which the kth success (S)
occurs.
The probability function of X is:
 x  1 k x  k
p  x   P  X  x  
p q
 k  1
x  k , k  1, k  2,
The Hypergeometric distribution
Suppose we have a population containing N objects.
Suppose the elements of the population are partitioned into two
groups. Let a = the number of elements in group A and let b = the
number of elements in the other group (group B). Note N = a + b.
Now suppose that n elements are selected from the population at
random. Let X denote the elements from group A.
The probability distribution of X is
 a  b 
 

x  n  x 

p  x   P  X  x 
N
 
n
Continuous Distributions
Continuous random variables
For a continuous random variable X the probability
distribution is described by the probability density
function f(x), which has the following properties :
1.
f(x) ≥ 0

2.
 f  x  dx  1.

3.
b
P  a  X  b   f  x  dx.
a
Graph: Continuous Random Variable
probability density function, f(x)

 f  x  dx  1.

b
P  a  X  b   f  x  dx.
a
Continuous Distributions
The Uniform distribution from a to b
 1

f  x  b  a
 0
a xb
otherwise
0.4
f  x
0.3
0.2


1 

ba 

0.1
0
0
5
10
a
b
x
15
The Normal distribution
(mean m, standard deviation s)

1
f  x 
e
2s
s
m
 x  m 2
2s 2
The Exponential distribution
lel x
f  x  
 0
x0
x0
0.2
0.1
0
-2
0
2
4
6
8
10
The Weibull distribution
A model for the lifetime of objects
that do age.
The Weibull distribution with parameters a and b.
f  x  a x
b 1
e
a
b
 xb
x0
The Weibull density, f(x)
0.7
(a = 0.9, b = 2)
0.6
(a = 0.7, b = 2)
0.5
0.4
(a = 0.5, b = 2)
0.3
0.2
0.1
0
0
1
2
3
4
5
The Gamma distribution
An important family of distributions
The Gamma distribution
Let the continuous random variable X have
density function:
 l a a 1  l x
x e

f  x     a 

0

x0
x0
Then X is said to have a Gamma distribution
with parameters a and l.
Graph: The gamma distribution
0.4
(a = 2, l = 0.9)
0.3
(a = 2, l = 0.6)
(a = 3, l = 0.6)
0.2
0.1
0
0
2
4
6
8
10
Comments
1. The set of gamma distributions is a family of
distributions (parameterized by a and l).
2. Contained within this family are other distributions
a. The Exponential distribution – in this case a = 1, the
gamma distribution becomes the exponential distribution
with parameter l. The exponential distribution arises if
we are measuring the lifetime, X, of an object that does
not age. It is also used a distribution for waiting times
between events occurring uniformly in time.
b. The Chi-square distribution – in the case a = n/2 and
l = ½, the gamma distribution becomes the chi- square
(c2) distribution with n degrees of freedom. Later we
will see that sum of squares of independent standard
normal variates have a chi-square distribution, degrees
of freedom = the number of independent terms in the
sum of squares.
Expectation
Let X denote a discrete random variable with
probability function p(x) (probability density function
f(x) if X is continuous) then the expected value of X,
E(X) is defined to be:
E  X    xp  x    xi p  xi 
x
i
and if X is continuous with probability density function
f(x)
EX  

 xf  x  dx

Expectation of functions
Let X denote a discrete random variable with
probability function p(x) then the expected value of X,
E[g (X)] is defined to be:
E  g  X     g  x  p  x 
x
and if X is continuous with probability density function
f(x)
EX  

 g  x  f  x  dx

Moments of a Random Variable
the kth moment of X :
mk  E  X k 
  xk p  x 
if X is discrete
 x
 
  x k f  x  dx if X is continuous
-
• The first moment of X , m = m1 = E(X) is the center of gravity
of the distribution of X.
• The higher moments give different information regarding the
distribution of X.
the kth central moment of X
k

m  E  X  m 


0
k
   x  m k p  x 
if X is discrete
 x
 
   x  m k f  x  dx if X is continuous
-
Moment generating functions
Definition
Let X denote a random variable, Then the moment
generating function of X , mX(t) is defined by:
  etx p  x 
if X is discrete

 x
tX
mX  t   E e    
  etx f  x  dx if X is continuous


Properties
1. mX(0) = 1
2. mXk   0   k th derivative of mX  t  at t  0.
 
 mk  E X
mk  E  X
3.
k

k
k

 x f  x  dx

k

  x p  x
mX  t   1  m1t 
m2
2!
t 
2
X continuous
X discrete
m3
3!
t 
3

mk
k!
t 
k
.
4. Let X be a random variable with moment
generating function mX(t). Let Y = bX + a
Then mY(t) = mbX + a(t)
= E(e [bX + a]t) = eatE(e X[ bt ])
= eatmX (bt)
5. Let X and Y be two independent random
variables with moment generating function
mX(t) and mY(t) .
Then mX+Y(t) = E(e [X + Y]t) = E(e Xt e Yt)
= E(e Xt) E(e Yt)
= mX (t) mY (t)
6. Let X and Y be two random variables with
moment generating function mX(t) and mY(t)
and two distribution functions FX(x) and
FY(y) respectively.
Let mX (t) = mY (t) then FX(x) = FY(x).
This ensures that the distribution of a random
variable can be identified by its moment
generating function
M. G. F.’s - Continuous distributions
Name
Continuous
Uniform
Exponential
Gamma
c2
n d.f.
Normal
Moment generating
function MX(t)
ebt-eat
[b-a]t
 l 
 l  t  for t < l
a
 l 
 l  t  for t < l
 1 
1-2t


n/2
for t < 1/2
tm+(1/2)t2s2
e
M. G. F.’s - Discrete distributions
Name
Discrete
Uniform
Bernoulli
Binomial
Geometric
Negative
Binomial
Poisson
Moment
generating
function MX(t)
et etN-1
N et-1
q + pet
(q + pet)N
pet
1-qet
 pet  k


t
1-qe


l(et-1)
e
Note:
The distribution of a random variable X can be described by:
1.
p  x   probability function
if X is discrete
f  x   probability density function if X is continuous
2. Distribution function:
  p u 
if X is discrete
 u  x
F  x   x
  f  u  du if X is continuous
 
3. Moment generating function:
  etx p  x 
if X is discrete
 x
tX
mX  t   E e    
  etx f  x  dx if X is continuous

Summary of Discrete Distributions
Name
Discrete
Uniform
Bernoulli
Geometric
probability function p(x)
1
p(x) = N x=1,2,...,N
p x=1
p(x) = q x=0

N
p(x) =  x  pxqN-x
p(x) =pqx-1 x=1,2,...
Negative
Binomial
 x-1 
p(x) =  k-1  pkqx-k
Binomial
Poisson
Hypergeometric
x=k,k+1,...
lx -l
p(x) = x! e x=1,2,...
 A  N-A 
 

 x  n-x 
p(x) =
N
 
n
Mean
N+1
2
p
Variance
N2-1
12
pq
Moment
generating
function MX(t)
et etN-1
N et-1
q + pet
Np
Npq
(q + pet)N
1
p
k
p
q
p2
kq
p2
pet
1-qet
 pet  k


1-qet
l
l
el(e -1)
A
n N
 
A  AN-n
n N 1-NN-1
 


not useful
t
Summary of Continuous Distributions
Name
Continuous
Uniform
Exponentia
l
Gamma
c2
n d.f.
Normal
Weibull
probability
density function f(x)
 1
a xb
f ( x)   b  a
0
otherwise
le  lx x  0
f ( x)  
x0
0
 la

x a 1e lx x  0
f(x) = f(x)   G ( a )
0
x0
(1/2)n n -(1/2)x
x
e
x? 0
f(x) =  (n/2)
0
x<0
1
2
2
f(x) =
e-(x-m) /2s
2 s
   -x
x e
x? 0
f(x) = 
0
x<0
Mean
a+b
2
Variance
(b-a)2
12
Moment generating
function MX(t)
ebt-eat
[b-a]t
1
l
1
l2
 l 
l  t  for t < l
a
l
a
l2
 l 
l  t  for t < l
n
n
m
s2
 (

)
+1




a
 1 
1-2t


n/2
for t < 1/2
etm+(1/2)t



( )-[( )]

+2

+1

2s2
not
avail.
Jointly distributed Random
variables
Multivariate distributions
Discrete Random Variables
The joint probability function;
p(x,y) = P[X = x, Y = y]
1.
0  p  x, y   1
2.
 p  x, y   1
x
3.
y
P  X , Y   A   p  x, y 
 x, y   A
Continuous Random Variables
Definition: Two random variable are said to have
joint probability density function f(x,y) if
1.
0  f  x, y 
 
2.
  f  x, y  dxdy  1
 
3.
P  X , Y   A    f  x, y  dxdy
A
Marginal and conditional
distributions
Marginal Distributions (Discrete case):
Let X and Y denote two random variables with
joint probability function p(x,y) then
the marginal density of X is
p X  x    p  x, y 
y
the marginal density of Y is
pY  y    p  x, y 
x
Marginal Distributions (Continuous case):
Let X and Y denote two random variables with
joint probability density function f(x,y) then
the marginal density of X is
fX  x 

 f  x, y  dy

the marginal density of Y is
fY  y  

 f  x, y  dx

Conditional Distributions (Discrete Case):
Let X and Y denote two random variables with
joint probability function p(x,y) and marginal
probability functions pX(x), pY(y) then
the conditional density of Y given X = x
pY X  y x  
p  x, y 
pX  x 
conditional density of X given Y = y
pX Y  x y  
p  x, y 
pY  y 
Conditional Distributions (Continuous Case):
Let X and Y denote two random variables with
joint probability density function f(x,y) and
marginal densities fX(x), fY(y) then
the conditional density of Y given X = x
fY X  y x  
f  x, y 
fX  x
conditional density of X given Y = y
fX Y  x y 
f  x, y 
fY  y 
The bivariate Normal distribution
Let
f  x1 , x2  
1
 2  s 1s 2
1 
2
e
1
 Q  x1 , x2 
2
where
2
 x  m 2






x

m
x

m
x

m
1
1
1
1
2
2
2
2

2






 
 
 s 1  s 2   s 2  
 s 1 
Q  x1 , x2  
1  2
This distribution is called the bivariate
Normal distribution.
The parameters are m1, m2 , s1, s2 and .
Surface Plots of the bivariate
Normal distribution
Marginal distributions
1. The marginal distribution of x1 is Normal with
mean m1 and standard deviation s1.
2. The marginal distribution of x2 is Normal with
mean m2 and standard deviation s2.
Conditional distributions
1. The conditional distribution of x1 given x2 is
Normal with:
s1
m1 2  m1    x2  m2  and
mean
s2
standard deviation s 1 2  s 1 1   2
2. The conditional distribution of x2 given x1 is
Normal with:
s2
m21  m2    x1  m1  and
mean
s1
standard deviation s 21  s 2 1   2
Independence
Definition:
Two random variables X and Y are defined to be
independent if
p  x, y   pX  x  pY  y 
if X and Y are discrete
f  x, y   f X  x  fY  y 
if X and Y are continuous
multivariate distributions
k≥2
Definition
Let X1, X2, …, Xn denote n discrete random
variables, then
p(x1, x2, …, xn )
is joint probability function of X1, X2, …, Xn if
1.
0  p  x1 ,
2.
  px ,
1
x1
3.
, xn   1
, xn   1
xn
P  X 1 ,
, X n   A  
 x1,
 px ,
1
, xn   A
, xn 
Definition
Let X1, X2, …, Xk denote k continuous random
variables, then
f(x1, x2, …, xk )
is joint density function of X1, X2, …, Xk if
1.
f  x1 ,

2.

  f x ,
1

3.
, xn   0
, xn  dx1 ,
, dxn  1

P  X 1 ,
, X n   A  
f x ,

A
1
, xn  dx1 ,
, dxn
The Multinomial distribution
Suppose that we observe an experiment that has k
possible outcomes {O1, O2, …, Ok } independently n
times.
Let p1, p2, …, pk denote probabilities of O1, O2, …,
Ok respectively.
Let Xi denote the number of times that outcome Oi
occurs in the n repetitions of the experiment.
The joint probability function of:
p  x1 ,
n!
, xn  
p1x1 p2x2
x1 ! x2 ! xk !


 x1
n
x2
pkxk
 x1 x2
 p1 p2
xk 
xk
k
p
is called the Multinomial distribution
The Multivariate Normal distribution
Recall the univariate normal distribution
1

s 
f  x 
e
2s
 12
xm
2
the bivariate normal distribution
f  x, y  
1
2s xs y 1  
2
e
 12
2 1 





   2   
xm x 2
sx
xm x
sx
xm y
sy
  
xm y 2 
sy
The k-variate Normal distribution
f  x1 ,
, xk   f  x  
1
 2 
k /2

1/ 2
e
 12  x μ   1  x μ 
where
 x1 
x 
2

x
 
 
 xk 
 m1 
m 
μ   2
 
 
 mk 
s 11 s 12
s
s
12
22



s 1k s 2 k
s 1k 
s 2 k 


s kk 
Marginal distributions
Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete
random variables with joint probability function
p(x1, x2, …, xq, xq+1 …, xk )
then the marginal joint probability function
of X1, X2, …, Xq is
p12
q
 x , , x     p  x ,
1
q
1
xq1
xn
, xn 
Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous
random variables with joint probability density
function
f(x1, x2, …, xq, xq+1 …, xk )
then the marginal joint probability function
of X1, X2, …, Xq is

f12
q

 x , , x     f  x ,
1
q
1


, xn  dxq 1
dxn
Conditional distributions
Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k discrete
random variables with joint probability function
p(x1, x2, …, xq, xq+1 …, xk )
then the conditional joint probability function
of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is
p1
q q 1 k
 x ,, x
1
q

xq 1 ,, xk 
p  x1 ,
pq 1
k
x
q 1
, xk 
,, xk 
Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous
random variables with joint probability density
function
f(x1, x2, …, xq, xq+1 …, xk )
then the conditional joint probability function
of X1, X2, …, Xq given Xq+1 = xq+1 , …, Xk = xk is
f1
q q 1 k
 x ,, x
1
q

xq 1 ,, xk 
f  x1 ,
f q 1
k
x
q 1
, xk 
,, xk 
Definition – Independence of sets of vectors
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous
random variables with joint probability density
function
f(x1, x2, …, xq, xq+1 …, xk )
then the variables X1, X2, …, Xq are independent
of Xq+1, …, Xk if
f  x1 ,
, xk   f1
q
 x , , x  f
1
q
q 1 k
x
q 1
, , xk 
A similar definition for discrete random variables.
Definition – Mutual Independence
Let X1, X2, …, Xk denote k continuous random
variables with joint probability density function
f(x1, x2, …, xk )
then the variables X1, X2, …, Xk are called
mutually independent if
f  x1 ,
, xk   f1  x1  f 2  x2  f k  xk 
A similar definition for discrete random variables.
Expectation
for multivariate distributions
Definition
Let X1, X2, …, Xn denote n jointly distributed
random variable with joint density function
f(x1, x2, …, xn )
then
E  g  X 1 ,


, X n  

  g x ,
1


, xn  f  x1 ,
, xn  dx1 ,
, dxn
Some Rules for Expectation
1.
E  Xi  

  x f x ,
i




1
, xn  dx1
dxn

 x f  x  dx
i i
i
i

Thus you can calculate E[Xi] either from the joint distribution of
X1, … , Xn or the marginal distribution of Xi.
2.
E a1 X1 
 an X n  b  a1E  X1  
The Linearity property
 an E  X n   b
3. (The Multiplicative property) Suppose X1, … , Xq
are independent of Xq+1, … , Xk then
E  g  X1 ,
, X q  h  X q 1 ,
 E  g  X1 ,
, X k 
, X q  E h  X q 1 ,
In the simple case when k = 2
E  XY   E  X  E Y 
if X and Y are independent
, X k 
Some Rules for Variance
2

Var  X   E  X  m X    E  X 2   m X2


Tchebychev’s inequality
1
P  X  m  ks   1  2
k
Ex:
3
P  X  m  2s  
4
8
P  X  m  3s  
9
15
P  X  m  4s  
16
1.
Var  X  Y   Var  X   Var Y   2Cov  X , Y 
where
Cov  X , Y  =E  X  m X Y  mY  
Note: If X and Y are independent, then
Cov  X , Y   0
and
Var  X  Y   Var  X   Var Y 
The correlation coefficient XY
 xy =
Cov  X , Y 
Var  X  Var Y 

Cov  X , Y 
s XsY
Properties :
1. If X and Y are independent than  XY  0.
2.
1   XY  1
and  XY  1 if there exists a and b such that
P Y  bX  a  1
where XY = +1 if b > 0 and XY = -1 if b< 0
Some other properties of variance
2.
3.
Var  aX  bY   a2Var  X   b2Var Y   2abCov  X , Y 
Var  a1 X1 
 an X n  
a12 Var  X1  
 an2 Var  X n  
2a1a2Cov  X1 , X 2  
 2a1anCov  X1 , X n 
2a2a3Cov  X 2 , X 3  
 2a2anCov  X 2 , X n 
2an1anCov  X n1 , X n 
  ai2 Var  X i   2 ai a j Cov  X i , X j 
n
i 1
4.
Variance: Multiplicative Rule for independent
random variables
Suppose that X and Y are independent random variables,
then:
Var  XY   Var  X Var Y   m X2 Var Y   mY2Var  X 
Mean and Variance of averages
Let X1, … , Xn be n mutually independent random
variables each having mean m and standard deviation s
(variance s2). n
1
Let X   X i
n i 1
Then
and
m X  E  X   m
s X2  Var  X  
s2
n
The Law of Large Numbers
Let X1, … , Xn be n mutually independent random
variables each having mean m.
1 n
Let X   X i
n i 1
Then for any d > 0 (no matter how small)
P  X  m  d   P  m  d  X  m  d   1 as n  
Conditional Expectation:
Definition
Let X1, X2, …, Xq, Xq+1 …, Xk denote k continuous
random variables with joint probability density function
f(x1, x2, …, xq, xq+1 …, xk )
then the conditional joint probability function
of X1, X2, …, Xq
given Xq+1 = xq+1 , …, Xk = xk is
f1
q q 1 k
 x ,, x
1
q

xq 1 ,, xk 
f  x1 ,
f q 1
k
x
q 1
, xk 
,, xk 
Definition
Let U = h( X1, X2, …, Xq, Xq+1 …, Xk )
then the Conditional Expectation of U
given Xq+1 = xq+1 , …, Xk = xk is
E U xq 1 ,, xk  


  h  x , , x  f
1


k
1 q q 1 k
 x , , x
1
q

xq 1 ,, xk dx1  dxq
Note this will be a function of xq+1 , …, xk.
A very useful rule
Let (x1, x2, … , xq, y1, y2, … , ym) = (x, y) denote q + m
random variables.
Let U  g  x1 , , xq , y1 , , ym   g  x, y 
Then
E U   Ey  E U y  
Var U   Ey Var U y    Vary  E U y  
Functions of Random Variables
Methods for determining the distribution of
functions of Random Variables
1. Distribution function method
2. Moment generating function method
3. Transformation method
Distribution function method
Let X, Y, Z …. have joint density f(x,y,z, …)
Let W = h( X, Y, Z, …)
First step
Find the distribution function of W
G(w) = P[W ≤ w] = P[h( X, Y, Z, …) ≤ w]
Second step
Find the density function of W
g(w) = G'(w).
Use of moment generating
functions
1. Using the moment generating functions of
X, Y, Z, …determine the moment
generating function of W = h(X, Y, Z, …).
2. Identify the distribution of W from its
moment generating function
This procedure works well for sums, linear
combinations, averages etc.
Let x1, x2, … denote a sequence of independent
random variables
Sums
Let S = x1 + x2 + … + xn then
mS t   mx1  x2 
 xn
t  =mx t  mx t 
1
2
mxn t 
Linear Combinations
Let L = a1x1 + a2x2 + … + anxn then
mL  t   ma1x1 a2 x2 
 an xn
t  =mx  a1t  mx  a2t 
1
2
mxn  ant 
Arithmetic Means
Let x1, x2, … denote a sequence of independent
random variables coming from a distribution with
moment generating function m(t)
x1  x2 
Let x 
n
mx  t   m1
1
x1  x2 
n
n
 xn
, then
1  1 
t m t 
1 t   m 
 xn
n  n 
n
  t 
 m  
  n 
n
1 
m t 
n 
The Transformation Method
Theorem
Let X denote a random variable with
probability density function f(x) and U = h(X).
Assume that h(x) is either strictly increasing
(or decreasing) then the probability density of
U is:


1
dh (u )
dx
g  u   f h (u )
 f  x
du
du
1
The Transfomation Method
Theorem
(many variables)
Let x1, x2,…, xn denote random variables
with joint probability density function
f(x1, x2,…, xn )
Let u1 = h1(x1, x2,…, xn).
u2 = h2(x1, x2,…, xn).
un = hn(x1, x2,…, xn).
define an invertible transformation from the x’s to the u’s
Then the joint probability density function of
u1, u2,…, un is given by:
g  u1 ,
, un   f  x1 ,
 f  x1 ,
, xn 
d  x1 ,
d  u1 ,
, xn  J
, xn 
, un 
 dx1
 du
 1
d  x1 , , xn 
where J 
 det 
d  u1 , , un 

 dxn
 du1
Jacobian of the transformation
dx1 
dun 



dxn 
dun 
Some important results
Distribution of functions of random
variables
The method used to derive these results will be
indicated by:
1. DF - Distribution Function Method.
2. MGF - Moment generating function method
3. TF - Transformation method
Student’s t distribution
Let Z and U be two independent random variables with:
1. Z having a Standard Normal distribution
and
2. U having a c2 distribution with n degrees of
freedom
then the distribution of:
t

g (t )  K   1
n

2
n 1

2
Z
t
U
is:
n
n 1 


2 

where K 
n 
n   
2
DF
The Chi-square distribution
Let Z1, Z2, … , Zv be v independent random variables
having a Standard Normal distribution, then
n
U   Z i2
i 1
has a c2 distribution with n degrees of freedom.
DF
MGF
for n = 1
for n > 1
Distribution of the sample mean
Let x1, x2, …, xn denote a sample from the normal
distribution with mean m and variance s2.
n
then
x
x
i 1
i
n
has a Normal distribution with:
m x  m and standard deviation s x 
s
n
MGF
The Central Limit theorem
If x1, x2, …, xn is a sample from a distribution
with mean m, and standard deviations s, then
if n is large x  the sample mean
has a normal distribution with mean
mx  m
and variance
s 
2
x
s 
s 
2
standard deviation s x 


n 
n
MGF
Distribution of the sample variance
Let x1, x2, …, xn denote a sample from the normal
distribution with mean m and variance s2.
n
Let
x
x
i
i 1
n
n
and s 2 
n
then
U
 x  x 
i 1
i
s2
2
 x  x 
2
i
i 1
n 1
n  1 s 2


s2
has a c2 distribution with n = n - 1 degrees of freedom.
MGF
Distribution of sums of Gamma R. V.’s
Let X1, X2, … , Xn denote n independent random variables
each having a gamma distribution with parameters
(l,ai), i = 1, 2, …, n.
Then W = X1 + X2 + … + Xn has a gamma distribution with
parameters (l, a1 + a2 +… + an).
MGF
Distribution of a multiple of a Gamma R. V.
Suppose that X is a random variable having a gamma
distribution with parameters (l,a).
Then W = aX has a gamma distribution with parameters
(l/a, a).
MGF
Distribution of sums of Binomial R. V.’s
Let X1, X2, … , Xk denote k independent random variables each
having a binomial distribution with parameters
(p,ni), i = 1, 2, …, k.
Then W = X1 + X2 + … + Xk has a binomial distribution with
parameters (p, n1 + n2 +… + nk).
MGF
Distribution of sums of Negative Binomial R. V.’s
Let X1, X2, … , Xn denote n independent random variables each
having a negative binomial distribution with parameters
(p,ki), i = 1, 2, …, n.
Then W = X1 + X2 + … + Xn has a negative binomial distribution
with parameters (p, k1 + k2 +… + kn).
MGF
Related documents