Download Li Jie

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
5 Joint Probability Distributions
and Random Samples
Li Jie
5.1 Jointly Distributed Random Variables
5.2 Expected Values,Covariance,and
Correlation
5.3 Statistics and Their Distributions
5.4 The Distribution of the Sample Mean
5.5 The Distribution of a Linear Combination
Supplementary Exercises Bibliography
Li Jie
Introduction
 In this chapter,we first discuss probability
models for the joint behavior of several random
variables, putting special emphasis on the case in
which the variables are independent of one
another .We then study expected values of
functions of several random variables, including
covariance and correlation as measures of the
degree of association between two variables.
Li Jie
5.1 Jointly Distributed Random
Variables
There are many experimental situations in which
more than one random variable (rv) will be of an
investigator.
We shall first consider joint probability
distributions for two discrete rv’s, then for two
continuous.
Li Jie
The Joint Probability Mass Function for Two
Discrete Random Variables
The probability mass function (pmf) of a single
discrete rv X specifies how much probability mass is
placed on each x value.The joint pmf of two discrete
rv’s X and Y describes how much probability mass
is placed on each possible pair of values (x, y).
Li Jie
Example 1 we want to study the distribution of people’s
height and weight, let X=(X1,X2), X1---the height of
people, X2---the weight of people.
Example 2 When two fair dice are rolled in an honest
manner, let Y=(Y1,Y2), Y1----the number shown on first
dice, Y2---the number shown on second dice.
Example 3 A box has six tickets, labeled from 1 to 6. Two
tickets are selected from the box by sampling without
replacement. Let X=(X1,X2), X1, X2, respectively, denote
the labels of the first and the second ticket so selected.
Li Jie
DEFINITION
Let X and Y be two discrete rv’s defined on the sample
space S of an experiment. The joint probability mass
function p(x,y) is defined for each pair of numbers (x,y)
by p(x,y)=P(X=x and Y=y)
Let A be any set consisting of pairs of (x, y) values. Then
the probability P[( X , Y )  A] is obtained by summing
the joint pmf over pairs in A:
P[( X , Y )  A] 
 p( x, y)
( x , y )A
A function p(x,y) can be used as a joint pmf provided that
p( x, y )  0 for all x and y and
  p ( x, y )  1
x
y
Li Jie
Example When two fair dice are rolled in an honest
manner, let Y=(Y1,Y2), Y1----the number shown on
first dice, Y2---the number shown on second dice.
Y2
1
2
3
4
5
6
1
1/36
1/36
1/36
1/36
1/36
1/36
2
1/36
1/36
1/36
1/36
1/36
1/36
3
1/36
1/36
1/36
1/36
1/36
1/36
4
1/36
1/36
1/36
1/36
1/36
1/36
5
1/36
1/36
1/36
1/36
1/36
1/36
6
1/36
1/36
1/36
1/36
1/36
1/36
Y1
Li Jie
Example 5.1
X=the deductible amount on auto policy and Y=the
deductible amount on homeowner’s policy.
Y
p(x,y)
X
100
250
0
.20
.05
100
.10
.15
200
.20
.30
Li Jie
DEFINITION The marginal probability mass
functions of X and of Y, denoted by pX (x)and pY ( y),
respectively, are given by
p X ( x )   p( x , y )
y
pY ( y )   p( x, y )
x
The pmf of one of the variables alone is obtained by
summing p(x,y) over values of the other variable. The
result is called a marginal pmf because when the p(x,y)
values appear in a rectangular table, the sums are just
marginal (row or column) totals.
Li Jie
Y2
1
2
3
4
5
6
1
1/36
1/36
1/36
1/36
1/36
1/36
2
1/36
1/36
1/36
1/36
1/36
1/36
3
1/36
1/36
1/36
1/36
1/36
1/36
4
1/36
1/36
1/36
1/36
1/36
1/36
5
1/36
1/36
1/36
1/36
1/36
1/36
6
1/36
1/36
1/36
1/36
1/36
1/36
Y1
Pi.
p.j
Li Jie
Example 5.2(Example 5.1 continued)
The possible X values are x=100 and x=250, so
computing row totals in the joint probabilities table
yields
pX (100)  p(100,0)  p(100,100)  p(100,200)  .50
and
pX (250)  p(250,0)  p(250,100)  p(250,200)  .50
The marginal pmf of X is then
.5
pX ( x)  
0
x  100,250
otherwise
Li Jie
Similarly, the marginal pmf of Y is obtained from
column totals as
.25

pY ( y )  .50
 0

y  0,100
y  200
otherwise
so
P (Y  100)  pY (100)  pY (200)  .75
as before.
Li Jie
The Joint Probability Density Function
for Two Continuous Random Variables
DEFINITION Let X and Y be continuous rv’s.
Then f(x,y) is the joint probability density function
for X and Y if for any two-dimensional set A
P[( X , Y )  A]   f ( x, y)dxdy
A
Li Jie
In particular, if A is two-dimensional rectangle
( x, y) : a  x  b, c  y  d
then
P[( X , Y )  A]  P(a  X  b, c  Y  d )

b

d
a c
f ( x, y)dydx
For f(x,y) to be a candidate for a joint pdf, it must satisfy .
(1) f ( x, y)  0
and
(2)
 
 
 
f ( x, y)dxdy 1
Li Jie
We can think of f(x,y) as specifying a surface at
height f(x,y) above the point (x,y) in a threedimensional coordinate system.
Then P[( X , Y )  A] is the volume underneath this surface
and above the region A, analogous to the area under a
curve in the one-dimensional case. This is illustrated in
Figure 5.1.
f(x,y)
y
Surface f(x,y)
A=Shaded rectangle
x
Li Jie
Example 5.3
Suppose the joint pdf of (X,Y) is given by
6
 ( x  y2 )
f ( x, y )   5

0

0  x  1,0  y  1
otherwise
To verify that this is a legitimate pdf, note that f ( x, y )  0
 
and
 
 
  
 
 
f ( x, y)dxdy 1
6
f ( x, y)dxdy    ( x  y 2 )dxdy
0 0 5
1 1
1 16
6
   xdxdy    y 2 dxdy
0 0 5
0 0 5
1 1
16
6
6 6
2
  xdx   y dy    1
0 5
0 5
10 15
1
Li Jie
1 1
1
1
6
4 4
P (0  X  ,0  Y  )    ( x  y 2 )dxdy
0 0 5
4
4
6 14 14
6 14 14 2
   xdxdy    y dxdy
5 0 0
5 0 0
 .109
Li Jie
Example The pdf of vector (X,Y) is given by
ce 3( x y ) , x  0, y  0
f ( x, y)  
otherwise
0,
1
x+y=1
G
Determine : (1) the constant c
1
(2)the probability P{(X,Y)∈G}
Li Jie
1
x+y=1
G
Solution: (1)
1
 
 
  f ( x, y )dxdy  1 
 
0
0
  ce
3 ( x  y )
  3 x
0
dxdy  c  e
  3 y
0
dx  e
1
dy  c  1
9
c9
(2)
1 x
P{( X , Y )  G}  0 dx 0 9e 3( x  y ) dy 
1
3 x
3(1 x )
3
3
e
[
1

e
]
dx

1

4
e

1
0
Li Jie
Example : The pdf for the vector (X,Y) whose
components are positive random variables is given by
x  y
f ( x, y)  ye e , x  0, y  0
Determine the probability that Y>X.
Solution: P (Y  X ) 
P(Y  X )  0 dy 0 ye

y
( x y )
dx  0 ye (1  e )dy  3 / 4

y
y
Li Jie
Definition The marginal probability density functions
of X and Y, denoted by f X (x)and f Y ( y) ,respectively,
are given by

f X ( x)   f ( x, y)dy

for    x  

fY ( y)   f ( x, y)dx for    y  

As with joint pmf’s, from the joint pdf of X and Y, each
of the two Marginal density functions can be computed.
Li Jie
Example : The pdf of X and Y is given by
f ( x, y)  xe x ( y1) ,
x  0, y  0
Determine the marginal density functions.
Solution:

f X ( x)   f ( x, y )dy



0
xe x ( y 1) dy  e  x ( x  0)

fY ( y )   f ( x, y )dx



0
 x ( y 1)
xe
1
dx 
( y  1) 2
y0
Li Jie
Example : Suppose the pdf of random vector (X,Y)
is given by:
 2 1
 x  xy, 0  x  1,0  y  2
f ( x, y )  
3
0,
otherwise
Determine (1)the marginal density functions fX(x), fY(y)
(2) the probability of P{X+Y>1},P{Y>X}
Li Jie
Solution: (1) The marginal density function are:


f X ( x)  
f ( x, y )dy
xy
2
2
  ( x  )dy  2 x  x
3
3
2
0
So
Similarly
2
 2 2
2 x  x, 0  x  1
f X ( x)  
3
0,
otherwise


fY ( y )  
f ( x, y )dx
xy
1 1
  ( x  )dx   y
3
3 6
1
0
2
Li Jie
1 1
  y, 0  y  2
fY ( y )   3 6
0,
otherwise
2
D
1
(2) Computer the probability
x+y=1
0
P{ X  Y  1}   f ( x, y )dxdy
1
D
1 2
0 1x
 
1
65
( x  xy)dxdy 
3
72
2
Li Jie
0
2
G
1
y=x
0
1
P{Y  X }   f ( x, y )dxdy
G
1
17
  ( x  xy)dxdy 
3
24
1 2
0 x
2
Li Jie
Exercise 1: The probability density function for the
continuous random vector (X,Y) is given by
2 x, 0  x  1,0  y  1
f ( x, y )  
0, otherwise
Determine the marginal density functions fX(x), fY(y)
Li Jie
Exercise 2: The probability density function for the
continuous random vector (X,Y) is given by
1 / 2, x  y  2, x  0, y  0
f ( x, y )  
0, otherwise
a. Determine the marginal density functions fX(x), fY(y)
b. Calculate the probability that X>2Y
Li Jie
Independent random variables
In chapter 2 we pointed out that one way of defining
independence of two events is to say that A and B are
independent if P( A  B)  P( A)  P( B)
.We now use an analogous definition for the
independence of two rv’s.
Li Jie
DEFINITION Two random variables X and Y are
said to be independent if for every pair of x and y
values
p( x, y)  pX ( x) pY ( y) when X and Y are discrete
or
f ( x, y)  f X ( x)  fY ( y) when X and Y are continuous
otherwise is not satisfied for all (x, y), then X and Y
are said to be dependent.
Li Jie
Example 5.6 In the insurance situation of Example
5.1 and 5.2,
p(100,100)  .10  (.5)(.25)  pX (100)  pY (100)
so X and Y are not independent.
Li Jie
Example 5.8 Suppose that the lifetimes of two
components are independent of one another and that
the first lifetime, X ,has an exponential distribution
with parameter  whereas the second, X 2 , has an
exponential distribution with parameter  2 .Then
the joint pdf is
1
1
f ( x1 , x2 )  f X1 ( x1 )  f X 2 ( x2 )
1e  1x1  2 e  2 x2

0

x1  0, x2  0
,
otherwise
Li Jie
Let 1=1/1000 and  2 =1/1200 ,so that the expected
lifetimes are 1000 hours and 1200 hours,respectively.
The probability that both component lifetimes are at
least 1500 hours is
P (1500  X 1 , 1500  X 2 )
 P (1500  X 1 )  P (1500  X 2 )
 e 1 (1500)  e  2 (1500)
 (.2231)(.2865)  .0639
Li Jie
Example There are three white balls and three red
balls, put them into three boxes, the boxes are labeled
1,2,3. Let X denote the number of white balls in first
box, Y denote the number of red balls in second box.
Determine joint probability distribution of (X,Y)
pij  P{ X  i, Y  j}  P{ X  i}  P{Y  j}
1 i 2 3 i j 1 j 2 3 j
 C ( ) ( )  C3 ( ) ( )
3 3
3 3
i
3
Li Jie
Example There are three white balls and three red
balls, put them into three boxes, the boxes are labeled
1,2,3. Let X denote the number of white balls in first
box, Y denote the number of red balls in second box.
Determine joint probability distribution of (X,Y)
X
Y
0
1
2
3
0
64/729
96/729
48/729
8/729
1
96/729
16/81
8/81
12/729
2
48/729
8/81
4/81
6/729
3
8/729
12/729
6/729
1/729
Li Jie
Example 5
Put three balls (these balls are
undistinguished) into three boxes, which labeled 1,2,3.
Let X denote the number of balls in the first box, Y
denote the number of balls in the second box. Determine
the joint probability distribution of (X,Y).
pij  P{X  i, Y  j}  P{X  i Y  j} P{Y  j}
0i j 3
1 j 2 3 j
P{Y  j}  C ( ) ( ) , 0  j  3
3 3
j
3
Li Jie
pij  P{X  i, Y  j}  P{X  i Y  j} P{Y  j}
0i j 3
1 j 2 3 j
P{Y  j}  C ( ) ( ) , 0  j  3
3 3
j
3
P{X  i Y  j}  C
i
3 j
1 i 1 3 j  i
( )( )
,0i j 3
2 2
1
3!
pij 
27 i! j!(3  i  j )!
Li Jie
Example 5
Put three balls (these balls are
undistinguished) into three boxes, which labeled 1,2,3.
Let X denote the number of balls in the first box, Y
denote the number of balls in the second box. Determine
the joint probability distribution of (X,Y).
X
Y
0
1
2
3
0
1/27
1/9
1/9
1/27
1
1/9
2/9
1/9
0
2
1/9
1/9
0
0
3
1/27
0
0
0
Li Jie
Exercise1: Two cards are drawn from a special deck
consisting of two hearts and two diamonds, and they
are placed face down in front of us. The two cards
are then turned over and their suits are observed.
Let X1 and X2 be defined as follows:
1 if the card on our left is a heart
X1  
0 if the card on our left is a diamond
1 if the card on our right is a heart
X2  
0 if the card on our right is a diamond
Li Jie
Determine the joint probability distribution of
X=(X1,X2)
X1
X2
0
1
0
1/6
1/3
1
1/3
1/6
Li Jie
5.2 Expected Values,Covariance,and
Correlation
PROPOSITION
Let X and Y be jointly distributed rv’s with pmf p(x,y)
according to whether the variables are discrete or
continuous.Then the expected value of a function
h(X,Y),denoted by E[h(X,Y)] or μh(X,Y) ,is given by E[h(X,Y)]

h( x, y )  p( x, y ) if X and Y are discrete


E (h( X , Y ))    x y
  h( x, y )  f ( x, y )dxdy if X and Y are continuous
  
Li Jie
Example 5.13
Five friends have purchased tickets to a certain concert.If
the tickets are for seats 1-5 in a particular row and the
tickets are randomly distributed among the five,what is the
expected number of seats separating any particular two of
the five? Let X and Y denote the seat number of the first
and second individuals, respectively. Possible (X,Y) pairs
are {(1,2),(1,3),…,(5,4)},and the joint pmf of (X,Y) is
1

p( x, y )   20

0
x  1,...,5; y  1,...,5; x  y
otherwise
The number of seats separating the two individuals is
h(X,Y)=|X-Y|-1.The accompanying table gives h(X,Y) for
each possible (x,y) pair.
Li Jie
X
h(x,y)
1
2
3
4
5
1
2
-0
0
--
1
0
2
1
3
2
3
1
0
--
0
1
4
2
1
0
--
0
5
3
2
1
0
--
Y
Thus,
5
5
1
E[h( X , Y )]    h( x, y)  p( x, y)   (| x  y | 1)   1, x  y
20
( x, y )
x 1 y 1
Li Jie
Covariance
When two random variables X and Y are not
independent,it is frequently of interest to assess how
strongly they are related to one another;
DEFINITON
The covariance between two rv’s X and Y is
Cov ( X , Y )  E[( X   X )(Y  Y )]
  ( x   X )( y  Y ) p( x, y )
 x y
  
  ( x   X )( y  Y ) f ( x, y )dxdy
  
X , Y , discrete
X , Y , continuous
Li Jie
Example 5.15
The joint and marginal pmf’s for X=automobile policy deductible
amount and Y=homeowner policy deductible amount in Example
y
5.1 were
x 100 250
y
p(x,y)
0
100 200
0 100 200
x 100 .20 .10 .20 p (x) .5 .5
pY(y) .25 .25 .5
X
250 .05 .15 .30
From which μX=∑xpX(x)=175 and μY=125.Therefore.
Cov( X , Y )    ( x  175)( y  125) p( x, y )
( x, y )
 (100  175)(0  125)(. 20)  ...  (250  175)( 200  125)(. 30)  1875
Li Jie
The following shortcut formula for Cov(X,Y)
simplifies the computations.
PROPOSITION
Cov( X , Y )  E( XY )   X Y
Cov( X , X )  V ( X )
Cov( X , Y )  Cov(Y , X )
Cov(aX  b, Y )  aCov( X , Y )
Li Jie
Cov( X ,Y + Z ) = Cov( X ,Y ) + Cov( X , Z )
V ( X ± Y ) = Cov( X ± Y , X ± Y )
= V ( X ) + V ( Y ) ± 2Cov( X ,Y )
Li Jie
If the Xi are independent, then
Cov( X i , X j )  0
i j
,and we have another corollary.
If the Xi are independent, then
V( ∑
X
)
=
V
(
X
)
∑
i
i
i =1
i =1
n
n
Li Jie
Example 5.16
The joint and marginal pdf’s X=amount of and Y=
amount of cashews were
24 xy;0  x  1,0  y  1, x  y  1
f ( x, y )  
0; otherwise

12 x(1  x) 2 ;0  x  1
f X ( x)  
0; otherwise

With fY(y) obtained by replacing x by y in fX(x).It is
easily verified that μX= μY=2/5,and
E ( XY )  



 
1

0
1 x

0
xyf ( x, y )dxdy
1
xy  24 xydydx  8 x 2 (1  x)3 dx  2 / 15
0
Li Jie
Thus Cov(X, Y)=2/15-(2/5)2=2/15-4/25=-2/75.
Li Jie
Example Suppose a random variable XBin(12,0.5),
and another random variable YN(0,1), COV(X,Y)=
-1, find variance and covariance of V=4X+3Y+1 and
W=-2X+4Y
Var (V )  Var (4 X  3Y  1)
 16Var ( X )  9Var (Y )  24Cov( X , Y )
Var (W )  Var (2 X  4Y )
 4Var ( X )  16Var (Y )  16Cov( X , Y )
Cov(V ,W )  E (VW )  E (V ) E (W )
 E[( 4 X  3Y  1)( 2 X  4Y )  E (4 X  3Y  1) E (2 X  4Y )
Li Jie
Example : Let X have the binomial distribution with
parameters n, p, determine V (X) .
Solution: because a binomial distribution is n
Bernoulli distribution, and a Bernoulli distribution is
Xi
0
1
P
q
p
EX i = p , EX i 2 = p , V ( X i ) = p-p 2 = p( 1-p ) = pq
X  X1  X 2    X n
Because X1,X2,…,Xn are independent, then
V ( X ) = V ( X1 ) + V ( X 2 ) +  + V ( X n ) = npq
Li Jie
Correlation
DEFINITION
The correlation coefficient of X and Y,denoted by
Corr(X,Y),  XY or just  , is defined by
 X ,Y
Cov( X , Y )

 X  Y
Example 5.17
It is easily verified that in the insurance problem of Example
5.15,E(X2)=36,250,σ2x=36,250-(175)2=5625,σX=75,
E(Y2)=22,500, σ2y=6875,and σY=82.92.This gives
ρ=1875/(75)(82.92)=.301
Li Jie
PROPOSITION
1. If a and c are either both positive or both negative
Corr(aX+b,cY+d)=Corr(X,Y)
2. For any two rv’s X and Y,-1≤Corr(X,Y)≤1.
PROPOSITION
1. If X and Y are independent,then ρ=0,but ρ=0 does
not imply independence.
2. ρ=1 or –1 iff Y=aX+b for some numbers a and b
with a≠0
Li Jie
Example 5.18 Let X and Y be discrete rv’s with
1
joint pmf

( x, y )  (4,1), (4,1), (2,2)( 2,2)
p ( x, y )   4
 0
otherwise
the two variables are completely dependent.
However,   0
XY
Although there is perfect dependence, there is also
complete absence of any linear relationship!
Li Jie
5.3 Statistics and Their Distributions
The observations in a single sample were denoted in
Chapter 1 by x1,x2,…,xn. Consider selecting two
different samples of size n from the same population
distribution. The xi’s in the second sample will virtually
always differ at least a bit from those in the first sample.
Li Jie
For example,a first sample of n=3 cars of a particular
type might result in fuel efficiencies x1=30.7, x2=29.4,
x3=31.1, whereas a second sample may give x1=28.8,
x2=30.0,and x3=31.1. Before we obtain data, there is
uncertainty about the value of each xi. Because of this
uncertainty, before the data becomes available we view
each observation as a random variable and denote the
sample by X1, X2, …, Xn (uppercase letters for random
variables).
Li Jie
This variation in observed values in turn implies that
the value of any function of the sample observations,
such as the sample mean, sample standard deviation, or
sample fourth spread, also varies from sample to sample.
That is, prior to obtaining x1, …, xn , there is uncertainty as
to the value of x, the value of s, and so on.
Li Jie
DEFINITION
A statistic is any quantity whose value can be
calculated from sample data. Prior to obtaining
data,there is uncertainty as to what value of any
particular statistic will result.
a statistic is a random variable and will be
denoted by an uppercase letter;
a lowercase letter is used to represent the
calculated or observed value of the statistic.
Li Jie
Random Samples
The probability distribution of any particular
statistic depends not only on the population
distribution(normal,uniform,etc.)and the sample
size n but also on the method of sampling.
Li Jie
DEFINITION
The rv’s X1, X2, …, Xn are said to form a (simple)
random sample of size n if
1. The Xi’s are independent rv’s.
2.Every Xi has the same probability distribution.
i.i.d
Li Jie
Deriving the Sampling Distribution of a Statistic
Probability rules can be used to obtain the
distribution of a statistic provided that it is a “fairly
simple” function of the Xi’s and either there are
relatively few different X values in the population or
else the population distribution has a “nice” form.
Example 5.20 and 5.21
Li Jie
Simulation Experiments (omit)
The second method of obtaining information about a statistic’s
sampling distribution is to perform a simulation experiment.
This method is usually used when a derivation via probability
rules is too difficult or complicated to be carried out. Such an
experiment is virtually always done with the aid of a
computer. The following characteristics of an experiment must
be specified:
1. The statistic of interest ( X , S,a particular trimmed mean,etc.)
2.
The population distribution (normal with μ=100 and
σ=15,uniform with lower limit A=5 and upper limit B=10,etc)
Li Jie
3. The sample size n (e.g.,n=10 or n=50)
4. The number of replication k (e.g.,k=500)
Then use a computer to obtain k different random samples,each
of size n ,from the designated population distribution.For each
such sample,calculate the value of the statistic and construct a
histogram of the k calculated values.This histogram gives the
approximate sampling distribution of the statistic.The larger the
value of k,the better the approximation will tend to be (the
actual sampling distribution emerges as k→∞).In
practice,k=500 or 1000 is usually enough if the statistic is
“fairly simple”.
Li Jie
Example 5.23
Consider a simulation experiment in which the population
distribution is quite skewed.Figure 5.12 shows the density curve
of a certain type of electronic control (that is actually a
lognormal distribution with E(ln(X))=3 and
V(ln(X)))=.4).Again the statistic of interest is the sample
mean X .The experiment utilized 500 replications and
considered the same four sample sizes as in Example 5.22.The
resulting histograms Along with a normal probability plot from
MINITAB for the 500 x values based on n=30 are shown in
Figure 5.13).
Li Jie
f(x)
.03
.02
.01
0
x
25
50
75
Figure 5.12 Density curve for the simulation experiment of
Example 5.23 [E(X)=μ=21.7584,V(X)=σ2=82.1449]
Li Jie
5.4 The Distribution of the Sample Mean
PROPOSITION
Let X1 , X 2 ,, X n be a random sample from distributi on
with mean value  and standard deviation  .Then
1.E ( X )   X  
2.V ( X )   X   2 / n; and ; V ( X )   /
2
n
In addition, with T0  X1    X n (the sample total),
E(T0 )  nμ, V(T0 )  nσ 2 and σ T0  n σ
Li Jie
Example 5.24
In a notched tensile fatigue test on a titanium specimen, the expected
number of cycles to first acoustic emission (used to indicate crack
initiation )is μ=28,000,and the standard deviation of the number of
cycles is σ=5000. Let X1,X2,…,X25 be a random sample of size 25,
where each Xi is the number of cycles on a different randomly selected
specimen. Then the E(X)= μ=28,000, and the expected total number
of cycles for the 25 specimens is E(T0)=n μ=25(28,000)=700,000. The
standard deviations of X and T0 are
X  / n 
5000
 1000
25
 T  n  25 (5000)  25,000
0
If the sample size increases to n=100, E(X) is unchanged,but σX=500,
half of its previous value (the sample size must be quadrupled to halve
the standard deviation of X).
Li Jie
The Case of a Normal Population Distribution
Looking back to the simulation experiment of Example 5.22, we
see that when the population distribution is normal, each
histogram of x values is well approximated by a normal curve.
The precise result follows.
PROPOSITION
Let X1,X2,…,Xn be a random sample from a normal
distribution with mean μand standard deviation σ. Then for any
n, X is normally distributed (with mean μ and standard
deviation  / n ), as is T0 (with mean n μand standard
deviation n )
Li Jie
Example 5.25
The time that it takes a randomly selected rat of a certain
subspecies to find its way through a maze is a normally
distributed rv with μ=1.5 min and σ=.35 min. Suppose five rats
are selected.Let X1,X2,…,X5 denote their times in the maze.
Assuming the Xi’s to be a random sample from this normal
distribution, what is the probability that the total time T0=
X1,X2,…,X5 for the five is between 6 and 8 min? By the
proposition, T 0 has a normal distribution with μ T0 = n
μ=5(1.5)=7.5 and variance σ 2 T =nσ 2 =5(.1225)=.6125, so
σT0=.783.To standardize T0,subtract μT0 and divide by Σt0;
Li Jie
6  7.5
8  7.5
Z
)
.783
.783
 P (1.92  Z  .64)  Φ(.64)  Φ(1.92)  .7115
P (6  T0  8)  P (
Determination of the probability that the sample average time
X (a normally distributed variable) is at most 2.0 min requires
 =.1565.Then
/ n
μ X= μ =1.5 and σX=
P ( X  2.0)  P ( Z 
2.0  1.5
)  P ( Z  3.19)  Φ(3.19)  .9993
.1565
Li Jie
The Central Limit Theorem
When the Xi’s are normally distributed,so is X for every
sample size n. The simulation experiment of Example 5.23
suggests that even when the population distribution is highly
_
nonnormal, averaging produces a distribution more bellshaped than the one being sampled. A reasonable conjecture
is that if n is large, a suitable normal curve will approximate
the actual distribution of X. The formal statement of this
result is the most important theorem of probability.
Li Jie
THEOREM
The Central Limit Theorem (CLT)
Let X1,X2,…,Xn be a random sample from a distribution with mean
μand variance σ2.Then if n is sufficiently large, X has approximately a
2
2
normal distribution with  X   and   / n ,and T0 also has
approximately a normal distribution with T  n ,  T 2  n 2 . The larger
the value of n,the better the approximation.
X
0
0
Li Jie
Example 5.26
When a batch of a certain chemical product is prepared,the
amount of a particular impurity in the batch is a random variable
with mean value 4.0g and standard deviation 1.5g.If 50 batches
are independently
_ prepared,what is the (approximate) probability
that the sample average amount of impurity X is between 3.5 and
3.8g?According to the rule of thumb to be stated shortly,n=50 is
large enough for the CLT to be applicable. X then has
approximately a normal distribution with mean value  X  4.0 and
 X  1.5 / 50  .2121,
so
3.5  4.0
3.8  4.0
P(3.5  X  3.8)  P(
Z 
)
.2121
.2121
 (.94)  (2.36)  .1645
Li Jie
Other Applications of the Central Limit
Theorem
The CLT can be used to justify the normal approximation to the
binomial distribution discussed in Chapter 4.Recall that a
binomial variable X is the number of successes in a binomial
experiment consisting of n n independent success/failure trials
with p=P(S) for any particular trial.Define new rv’s
X1,X2,…,Xn by
Xi=
1 if the ith trial results in a success
0 if the ith trial results in a failure
(i=1,…,n)
Li Jie
Because the trials are independent and P(S) is constant from trial
to trial to trial,the Xi’s are idd (a random sample from a Bernoulli
distribution).The CLT then implies that if n is sufficiently
large,both the sum and the average of the Xi’s have pproximately
normal distributions.When the Xi’s are summed,a 1 is add for
every S that occurs and a 0 for every F,so Xi+….+Xn=X.The
sample mean of the Xi’s is X/n,the sample proportion of
successes. That is ,both X and X/n are approximately normal
when n is large. The necessary sample size for this
approximately depends on the value of p: When p is close
to .5,the distribution is quit skewed when p is near 0 or 1.Using
the approximation only if both np≥10 and n(1-p)≥10 ensures
that n is large enough to overcome any skewness in the
underlying Benoulli distribution
Li Jie
1
0
(a)
1
0
(b)
Figure 5.17 Two Bernoulli distribution: (a) p=.4 (reasonable
symmetric);(b) p=.1 (very skewed)
Li Jie
PROPOSITION
Let X1,X2,…,Xn be a random sample from a distribution for
which only positive values are possible [P(Xi>0)=1].Then if n is
sufficiently large,the product Y=X1X2…..Xn has approximately a
lognormal distribution.
To verify this,note that
ln(Y)=ln(X1)+ ln(X2)+….+ ln(Xn)
Li Jie
5.5 The Distribution of a Linear
Combination
This sample mean X and sample total T0 are special cases of a
type of random variable that arises very frequently in statistical
applications.
Definition
Given a collection of n random variables X1,…,Xn and n
n
numerical constants a1,…,an,the rv Y  a1 X 1      an X n   ai X i
is called a linear combination of the Xi’s.
i 1
Li Jie
PROPOSITION
Let X1,X2,…,Xn have mean values μ1,…, μn respectively,and
variances of σ12,…., σn2,respectively.
1. Whether or not the Xi’s are independent,
E (a1 X 1  a2 X 2      an X n )  a1 E ( X 1 )  a2 E ( X 2 )    an E ( X n )
 a11      an  n
Li Jie
2. If X1,X2,…,Xn are independent,
V (a1 X 1  a2 X 2      an X n )  a1 V ( X 1 )  a2 V ( X 2 )    an V ( X n )
2
2
2
 a1  21      a 2 n 2 n
2
and
 a X a X  a 21 21    a 2 n 2 n
1 1
n
n
3. For any X1,X2,…,Xn ,
n
n
V (a1 X 1  a2 X 2      an X n )   ai a j Cov( X i , X j )
i 1 j 1
Li Jie
Example 5.28
A gas station sells three grades of gasoline:regular
unleaded,extra unleaded,and super unleaded.These are priced at
$ 1.20,$1.35,and $1.50 per gallon,respectively.Let X1,X2 and X3
denote the amounts of these grades purchased (gallon) on a
particular day.Suppose the Xi’s are independent with μ1=1000,
μ2= 500,μ3= 300,σ 1=100, σ 2= 80,and σ 3=50.The revenue from
sales is Y=1.2X1+1.35X2+1.5X3,and
E (Y )  1.21  1.35 2  1.5 3  $2325
V (Y )  (1.2) 2  21  (1.35) 2  2 2  (1.5) 2  2 3  31,689
 Y  31,689  $178.01
Li Jie
The Difference Between Two Random
Variables
An important special case of a linear combination results from
taking n=2,a1=1,and a2=-1:
COROLLARY
E(X1-X2) =E(X1)-E(X2) and, if X1 and X2 are independent,
V(X1-X2)=V(X1)+V(X2).
Li Jie
Example 5.29
A certain automobile manufacturer equips a particular
model with either a six-cylinder engine or a four-cylinder
engine. Let X 1and X2 be fuel efficiencies for independently
and randomly selected six-cylinder and four-cylinder cars,
respectively. With μ1=22, μ2=26,σ1=1.2,and σ2=1.5,
E ( X 1  X 2 )  1   2  22  26  4
V ( X 1  X 2 )   21   2 2  (1.2) 2  (1.5) 2  3.69
 X  X  3.69  1.92
1
2
If we relabel so that X1 refers to the four-cylinder car, then
E(X1-X2)=4, but the variance of the difference is still 3.69.
Li Jie
The Case of Normal Random Variables
PROPOSITION
If X1, X2,…,Xn are independent, normally distributed rv’s
(with possibly different means and/or variances), then any
linear combination of the Xi’s also has a normal distribution. In
particular, the difference X1-X2 between two independent,
normally distributed variables is itself normally distributed.
Example 5.30
The total revenue from the sale of the three grades of gasoline on
a particular day was Y=1.2X1+1.35X2+1.5X3 , and we calculated
μY=2325 and (assuming independence) σY=178.01.
If the Xi’s are normally distributed, the probability that
revenue exceeds 2500 is
P(Y  2500)  P( Z 
2500  2325
)  P( Z  .98)  1  (.98)  .1635
178.01
Li Jie
Proofs for the case n=2
For the result concerning expected values, suppose that X1, X2
are continuous with joint pdf f(x1,x2). Then
E (a1 X 1  a2 X 2 )  



 
 a1 



 
(a1 x1  a2 x2 ) f ( x1  x2 )dx1dx2
x1 f ( x1  x2 )dx2 dx1 a2 



 




x2 f ( x1  x2 )dx1dx
2
 a1  x1 f X 1 ( x1 )dx1  a2  x2 f X 2 ( x2 )dx2  a1 E ( X 1 )  a2 E ( X 2 )
Summation replaces integration in the discrete case. The argument
for the variance result does not require specifying whether either
variable is discrete or continuous. Recalling that V(Y)=E[Y-μY]2
Li Jie
V (a1 X 1  a2 X 2 )  E{[ a1 X 1  a2 X 2  (a11  a2 2 )]2 }
 E{a 21 ( X 1  1 ) 2  a 2 2 ( X 2  2 ) 2  2a1a2 ( X 1  1 )( X 2  2 )}
The expression inside the braces is a linear combination of the
variables Y1=(X1-μ1)2, Y2=(X2-μ2)2, and Y3=(X1-μ1)(X2-μ2), so
carrying the E operation through to the three terms gives
a12V(X1)+a22V(X2)+2a1a2Cov(X1+X2) as required
Li Jie