Download L3: Lecture notes: Joint disributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sheets L4
Jointly Distributed Random Variables
Discrete joint distributions:
The joint distribution of the discrete variables X
and Y are given by the joint probability mass
function:
P(X = x and Y = y) for all x SX and y SY
(in a two dimensional table or a formula):
Notation: P(X = x and Y = y)= P(X = x, Y = y)
Requirements:
1. P(X = x and Y = y) ≥ 0
2.
=1
The marginal or individual prob. functions:
P(X = x) =
and
1
P(Y = y) =
The conditional probability function of X,
under the condition Y = y, is defined:
P(X = x | Y = y) =
For fixed y,
P(X = x | Y = y) is a probability function
and
the conditional expectation is:
E(X |Y = y) =
Analogously for P(Y = y |X = x) and E(Y|X= x)
Independence of random variables :
Recall: events A and B are
independent  P(A  B) = P(A) P(B)
Two r.v.`s X and Y will be called independent
when every event concerning X and every event
concerning Y are indpendent.
2
General definition :
X and Y are independent 
P(XA and YB) = P(XA) P(YB) ,
for any combination A R and B R
For discrete variables this is equivalent to:
X and Y are independent 
P(X = x and Y = y) = P(X = x) P(Y = y)
for all xR en yR
Consequence: if X and Y are independent than
the joint distribution follows logically on the
marginal distributions.
Functions of 2 discrete r.v.
Theorem:
e.g.
It follows that: E(X+Y) = E(X) + E(Y)
3
The distribution of Z = X+Y can be found if the
joint probability function is known:
If X and Y are independent, we can write for
the sum or convolution Z = X+Y
(for the case X, Y є {0, 1, 2,…}) :
Convolutions of common discrete distributions:
1. If If X ~ Poisson (µ1), Y ~ Poisson(µ2) and
X and Y are independent,
then X+Y ~ Poisson (µ1 + µ2)
2. If X ~ B(m, p), Y ~ B(n, p) and X and Y are
independent => X + Y ~ B(m+n, p)
The properties above can be extended to n
discrete independent r.v. X1, …, Xn
4
X and Y are dependent if they are not
independent
Measure for dependency: covariance
cov(X, Y) = E{(X-EX)(Y-EY)}
For discrete X and Y:
cov(X,Y) =
We say that X and Y are not correlated if
cov(X, Y) = 0
Properties covariance:
1.
2.
3.
4.
cov(X, Y) = E(XY) – EX×EY
cov(X, Y) = cov(Y, X)
cov(X, X) = var(X)
cov(X + a, Y) = cov(X, Y)
5
5. cov(aX, Y) = a cov(X, Y)
6. cov(X, Y + Z) = cov(X, Y) + cov(X, Z)
7. var(X+Y) = var(X)+var(Y)+2cov(X,Y) ,
var(X- Y) = var(X)+var(Y) - 2cov(X,Y)
Property 7 for X1, …, Xn :
Properties for independent X and Y :
1. E(XY) = EX×EY
2. cov(X, Y) = 0
3. var(X±Y) = var(X)+var(Y)
Note 1: In general: E(X+Y) = EX+EY ,
but
E(XY) ≠ EX×EY
Note 2 : independence => no correlation
not vice versa !
6
The value of the covariance :
- is positive if large values of X in general
coïncide with large values of Y and v.v.
- depends on the unit of measurement
(e.g. 1 m = 100 cm): cov(aX, Y) = a cov(X, Y)
The correlation coefficient ρ(X,Y) does not
depend on the unit of measurement:
Properties ρ(X,Y):
1. -1 ≤ ρ(X,Y) ≤ 1
2. ρ(X,Y) = +1 if for a > 0 P(Y= aX+b) = 1
ρ(X,Y) = -1 if for a < 0 P(Y= aX+b) = 1
3. ρ(aX+b,cY+d) = ρ(X,Y) if a×c > 0
ρ(aX+b,cY+d) = - ρ(X,Y) if a×c < 0
So “ρ(X,Y) is a standardized measure for
linear dependence of X and Y”
7
Continuous joint distributions:
If there is a function f(x, y) so that, for every B
R2, P((X,Y) є B) =
, then
(X,Y) are jointly continuous r.v.
Requirements for joint density function f(x,y):
1.
2.
f(x, y) ≥ 0
Notation: f(x, y) = fX,Y(x, y)
The cumulative joint distribution function F
or FX,Y :
8
Properties f and F of jointly continuous r.v. :
1. F(x, y) is a continuous function
2. F(x, y) has similar properties as F(x) : non
decreasing, right continuous and similar
limits in both arguments (x and y)
3.
and
4.
5.
and
6. P(X = x and Y = y) = P(X= x) = P(Y = y)= 0
7. Probability of values of (X, Y) in a rectangle:
9
Independence of continuous r.v.
General definition (see also discrete joint distr):
X and Y are independent 
P(XA and YB) = P(XA) P(YB) ,
for any combination A R and B R
For continuous variables this is equivalent to:
X and Y are independent 
for all xR en yR
or
for all xR en yR
The distribution of a function of two (or
more) continuous r.v.
The density function f Z(z) of Z = g(X, Y) can
usually be determined in 3 steps:
1. Express FZ(z) = P( g(X,Y) ≤ z) in the joint
or individual distribution of X and Y.
10
2. Determine f Z ( z ) 
d
dz
FZ ( z )
3. Use the known distribution.
We will restrict ourselves to simple two
variable functions: X+Y, XY, max(X,Y) and
min(X,Y). And later on we will extend these
properties to n variables X1, …, Xn
Property exponential distribution:
If X ~ E(λ) and Y ~ E(λ) and X and Y are
independent, then min(X,Y) ~ E(2 λ).
The sum or convolution of two independent
continuous r.v. X and Y:
Note: if X and Y are non negative the
boundaries of the integral can be 0 and z.
Consequence for the normal distribution:
11
If X ~ N(µ1, σ12) and Y ~ N(µ2, σ22) and X and
Y are independent, then
X +Y ~ N(µ1+ µ2, σ12 + σ22)
For the expectation we can derive:
In particular:
,which can
be used to compute
Many of the definitions and properties for two
variables can be easily extended to 3 or more
variables. Some examples:
- E(X1 +…+ X n) = E(X1) +…+ E(X n)
12
If X1,…, Xn are independent, then
- var(X1 +…+ X n) = var(X1) +…+ var(X n)
In statistics we use (relatively) small samples to
get an idea about the population properties.
We model the observed values x1,…, xn in the
sample as random variables X1,…, Xn .
We call X1,…, Xn a random sample if
1. X1,…, Xn are independent.
2. X1,…, Xn all have the same distribution
This population is called the population distribution, so if E(Xi)= µ and var( Xi) = σ2),
Then
 The sum
:
= nµ and
= nσ2 .
13
 The sample mean
:
,
 The weak law of large numbers:
for any c > 0
Random samples and the normal distribution
If X1,…, Xn are independent and Xi ~ N(µ, σ2)
(i = 1,…,n) then:
 The sum
 The sample mean
~ N(nµ, nσ2)
~ N(µ,
)
The Central Limit Theorem
14
If X1, X2 ,… are independent and E(Xi )= µ and
var(Xi )= σ2, i = 1, 2,… then:
Ф(z)
So for large n (≥ 25) we have approximately :
~ N(nµ, nσ2) and
~ N(µ,
)
Normal approximation of X ~ B(n,p) for large
n (rule np(1-p) > 5 ) : X ~ N(np, np(1-p))
Use continuity correction: P(X≤ k)=P(X≤ k+½)
15
Related documents