Download S3.1. Independent Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Random variable wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Transcript
Supplemental Material for Chapter 3
S3.1. Independent Random Variables
Preliminary Remarks
Readers encounter random variables throughout the textbook. An informal definition of
and notation for random variables is used. A random variable may be thought of
informally as any variable for which the measured or observed value depends on a
random or chance mechanism. That is, the value of a random variable cannot be known
in advance of actual observation of the phenomena. Formally, of course, a random
variable is a function that assigns a real number to each outcome in the sample space of
the observed phenomena. Furthermore, it is customary to distinguish between the random
variable and its observed value or realization by using an upper-case letter to denote the
random variable (say X) and the actual numerical value x that is the result of an
observation or a measured value. This formal notation is not used in the book because
(1) it is not widely employed in the statistical quality control field and (2) it is usually
quite clear from the context whether we are discussing the random variable or its
realization.
Independent Random variables
In the textbook, we make frequent use of the concept of independent random variables.
Most readers have been exposed to this in a basic statistics course, but here a brief review
of the concept is given. For convenience, we consider only the case of continuous
random variables. For the case of discrete random variables, refer to Montgomery and
Runger (2011).
Often there will be two or more random variables that jointly define some physical
phenomena of interest. For example, suppose we consider injection-molded components
used to assemble a connector for an automotive application. To adequately describe the
connector, we might need to study both the hole interior diameter and the wall thickness
of the component. Let x1 represent the hole interior diameter and x2 represent the wall
thickness. The joint probability distribution (or density function) of these two
continuous random variables can be specified by providing a method for calculating the
probability that x1 and x2 assume a value in any region R of two-dimensional space, where
the region R is often called the range space of the random variable. This is analogous to
the probability density function for a single random variable. Let this joint probability
density function be denoted by f ( x1 , x2 ) . Now the double integral of this joint probability
density function over a specified region R provides the probability that x1 and x2 assume
values in the range space R.
A joint probability density function has the following properties:
a. f ( x1 , x2 )  0 for all x1, x2
b.

 

 
f ( x1 , x2 )dx1dx2  1
c. For any region R of two-dimensional space P{( x1 , x2 )  R}   f ( x1 , x2 )dx1dx2
R
1
The two random variables x1 and x2 are independent if f ( x1, x2 )  f1 ( x1 ) f 2 ( x2 ) where
f1 ( x1 ) and f2 ( x2 ) are the marginal probability distributions of x1 and x2, respectively,
defined as
f1 ( x1 )  



f ( x1 , x2 )dx2 and f 2 ( x2 )   f ( x1 , x2 )dx1

In general, if there are p random variables x1 , x2 ,..., x p then the joint probability density
function is f ( x1 , x2 ,..., x p ) , with the properties:
a. f ( x1 , x2 ,..., x p )  0 for all x1 , x2 ,..., x p
b.
 ... f ( x , x ..., x
1
2
p
)dx1dx2 ...dx p  1
R
c. For any region R of p-dimensional space,
P{( x1 , x2 ,..., x p )  R}   ... f ( x1 , x2 ,..., x p )dx1dx2 ...dx p
R
The random variables x1, x2, …, xp are independent if
f ( x1 , x2 ,..., x p )  f1 ( x1 ) f 2 ( x2 )... f p ( x p )
where fi ( xi ) are the marginal probability distributions of x1, x2 , …, xp, respectively,
defined as
fi ( xi )   ... f ( x1 , x2 ,..., x p )dx1dx2 ...dxi 1dxi 1...dx p
Rxi
S3.2. Development of the Poisson Distribution
The Poisson distribution is widely used in statistical quality control and improvement,
frequently as the underlying probability model for count data. As noted in Section 3.2.3
of the text, the Poisson distribution can be derived as a limiting form of the binomial
distribution, and it can also be developed from a probability argument based on the birth
and death process. We now give a summary of both developments.
The Poisson Distribution as a Limiting Form of the Binomial Distribution
Consider the binomial distribution
n
p( x)    p x (1  p) n  x
 x
n!

p x (1  p) n  x , x  0,1, 2,..., n
x !(n  x)!
Let   np so that p   / n . We may now write the binomial distribution as
2
n(n  1)(n  2) (n  x  1)     n   
p( x) 
  

x!
n  n 
x
x 
 1  2 

(1) 1  1  

x !   n  n 
n x
x
 x 1       
1 
 1   1  
n    n   n 

n
Let n   and p  0 so that   np remains constant. The terms
x
 
 1  2
 x 1 
1   , 1   ,..., 1 
 and  1   all approach unity. Furthermore,
n 
 n
 n  n

 

1    e as n  
 n
n
Thus, upon substitution we see that the limiting form of the binomial distribution is
p ( x) 
 x e
x!
which is the Poisson distribution.
Development of the Poisson Distribution from the Poisson Process
Consider a collection of time-oriented events, arbitrarily called “arrivals” or “births”. Let
xt be the number of these “arrivals” or “births” that occur in the interval [0,t). Note that
the range space of xt is R = {0,1,…}. Assume that the number of births during nonoverlapping time intervals are independent random variables, and that there is a positive
constant  such that for any small time interval t , the following statements are true:
1. The probability that exactly one birth will occur in an interval of length t is
  t .
2. The probability that zero births will occur in the interval is 1   t .
3. The probability that more than one birth will occur in the interval is zero.
The parameter  is often called the mean arrival rate or the mean birth rate. This type of
process, in which the probability of observing exactly one event in a small interval of
time is constant (or the probability of occurrence of event is directly proportional to the
length of the time interval), and the occurrence of events in non-overlapping time
intervals is independent is called a Poisson process.
In the following, let
P{xt  x}  p( x)  px (t ), x  0,1, 2,...
Suppose that there have been no births up to time t. The probability that there are no
births at the end of time t + t is
p0 (t  t )  (1    t ) p0 (t )
Note that
3
p0 (t  t )  p0 (t )
  p0 (t )
t
so consequently
 p (t  t )  p0 (t ) 
lim  0
  p0 (t )
t


  p0 (t )
t 0
For x > 0 births at the end of time t + t we have
px (t  t )  px 1 (t )  t  (1    t ) px (t )
and
 p (t  t )  px (t ) 
lim  x
  px (t )
t 0
t


   px 1 (t )    px (t )
Thus we have a system of differential equations that describe the arrivals or births:
p0 (t )   p0 (t ) for x  0
px (t )   px 1 (t )   px (t ) for x  1, 2,...
The solution to this set of equations is
px (t ) 
( t ) x e   t
x  0,1, 2,...
x!
Obviously for a fixed value of t this is the Poisson distribution.
S3.3. The Mean and Variance of the Normal Distribution
In Section 3.3.1 we introduce the normal distribution, with probability density function
1
 2 ( x   )2
1
f ( x) 
e 2
,   x  
 2
and we stated that  and  2 are the mean and variance, respectively, of the distribution.
We now show that this claim is correct.
Note that f ( x)  0 . We first evaluate the integral I  


f ( x)dx , showing that it is
equal to 1. In the integral, change the variable of integration to z  ( x   ) /  . Then
I 


1  z2 / 2
e
dz
2
4
Since I  0, if I 2  1, then I  1 . Now we may write
1    x2 / 2     y 2 / 2 
e
dx  e
dy
  

2  
1    ( x2  y 2 ) / 2

e
dxdy
2  
I2 
If we switch to polar coordinates, then x  r cos( ), y  r sin( ) and
1 2   r 2 / 2
I2 
e
rdrd
2 0 0
1 2
1

d 
2  1

2 0
2
So we have shown that f ( x) has the properties of a probability density function.
The integrand obtained by the substitution z  ( x   ) /  is, of course, the standard
normal distribution, an important special case of the more general normal distribution.
The standard normal probability density function has a special notation, namely
 ( z) 
1  z2 / 2
e
,   z  
2
and the cumulative standard normal distribution is
z
( z )    (t )dt

Several useful properties of the standard normal distribution can be found by basic
calculus:
1.  ( z )   ( z ), for all real z, so  ( z ) is an even function (symmetric about 0) of z
2.  ( z )   z ( z )
3.  ( z )  ( z 2  1) ( z )
Consequently,  ( z ) has a unique maximum at z = 0, inflection points at z  1 , and both
 ( z )  0 and  ( z )  0 as z   .
The mean and variance of the standard normal distribution are found as follows:

E ( z )   z ( z )dz


    ( z )dz

  ( z ) |
0
and
5

E ( z 2 )   z 2 ( z )dz


  [ ( z )   ( z )]dz


  ( z ) |    ( z )dz

 0 1
1
Because the variance of a random variable can be expressed in terms of expectation as
 2  E ( z   )2  E ( z 2 )   2 , we have shown that the mean and variance of the standard
normal distribution are 0 and 1, respectively.
Now consider the case where x follows the more general normal distribution. Based on
the substitution, we have z  ( x   ) / 
1
 2 ( x   )2
1
E ( x)   x
e 2
dx

 2


  (   z ) ( z )dz





    ( z )dz    z ( z )dz
  (1)   (0)

and
1
 2 ( x   )2
1
E(x )   x
e 2
dx

 2
2

2

  (   z ) 2  ( z )dz






  2   ( z )dz  2  z ( z )dz   2   ( z )dz

  
2
2
Therefore, it follows that V ( x)  E ( x 2 )   2  ( 2   2 )   2   2 .
S3.4. More about the Lognormal Distribution
The lognormal distribution is a general distribution of wide applicability. The lognormal
distribution is defined only for positive values of the random variable x and the
probability density function is
f ( x) 
1
x 2

e
1
(ln x   )2
2 2
x0
6
The parameters of the lognormal distribution are      and 0   2   . The
lognormal random variable is related to the normal random variable in that y  ln x is
normally distributed with mean  and variance  2 .
The mean and variance of the lognormal distribution are
E ( x)   x  e
  12  2
V ( x)   x2  e2   (e  1)
2
2
The median and mode of the lognormal distribution are
x  e
mo  e  
2
In general, the kth origin moment of the lognormal random variable is
E( xk )  e
k   12 k 2 2
Like the gamma and Weibull distributions, the lognormal finds application in reliability
engineering, often as a model for survival time of components or systems. Some
important properties of the lognormal distribution are:
1. If x1 and x2 are independent lognormal random variables with parameters
( 1 ,  12 ), ( 2 ,  22 ) , respectively, then y  x1  x2 is a lognormal random variable
with parameters 1  2 and  12   22 .
2. If x1 , x2 ,..., xk are independently and identically distributed lognormal random
variables with parameters  and  2 , then the geometric mean of the xi, or
1/ k
 k 
  xi 
 i 1 
, has a lognormal distribution with parameters  and  2 / k .
3. If x is a lognormal random variable with parameters  and  2 , and if a, b, and c
are constants such that b  ec , then the random variable y  bx a has a lognormal
distribution with parameters c  a and a 2 2 .
S3.5. More about the Gamma Distribution
The gamma distribution is introduced in Section 3.3.4. The gamma probability density
function is
f ( x) 

( r )
( x) r 1 e  x , x  0
where r > 0 is a shape parameter and   0 is a scale parameter. The parameter r is
called a shape parameter because it determines the basic shape of the graph of the density
function. For example, if r = 1, the gamma distribution reduces to an exponential
7
distribution. There are actually three basic shapes; r  1 or hyperexponential, r = 1 or
exponential, and r > 1 or unimodal with right skew.
The cumulative distribution function of the gamma is
x

0
( r )
F ( x; r ,  )  
(t )r 1 e  x dt
The substitution u  t /  in this integral results in F ( x; r ,  )  F ( x / ; r ,1) , which
depends on  only through the variable x /  . We typically call such a parameter a scale
parameter. It can be important to have a scale parameter in a probability distribution so
that the results do not depend on the scale of measurement actually used. For example,
suppose that we are measuring time in months, and   6 . The probability that x is less
than or equal to 12 months is F (12 / 6; r ,1)  F (2; r ,1) . If we wish to consider measuring
time in weeks, then the probability that x is less than or equal to 48 weeks is just
F (48 / 24; r ,1)  F (2; r ,1) . Therefore, different scales of measurement can be
accommodated by changing the scale parameter without having to change to a more
general form of the distribution.
When r is an integer, the gamma distribution is sometimes called the Erlang distribution.
Another special case of the gamma distribution arises when we let r = ½, 1, 3/2, 2, … and
  1/ 2 ; this is the chi-square distribution with degrees of freedom r /   1, 2,... . The
chi-square distribution is very important in statistical inference.
S3.6. The Failure Rate for the Exponential Distribution
The exponential distribution
f ( x)   e   x , x  0
was introduced in Section 3.3.3 of the text. The exponential distribution is frequently
used in reliability engineering as a model for the lifetime or time to failure of a
component or system. Generally, we define the reliability function of the unit as
R (t )  P{x  t}
t
 1   f ( x)dx
0
 1  F (t )
where, of course, F (t ) is the cumulative distribution function. In biomedical
applications, the reliability function is usually called the survival function. For the
exponential distribution, the reliability function is
F (t )  e t
The Hazard Function
The mean and variance of a distribution are quite important in reliability applications, but
an additional property called the hazard function or the instantaneous failure rate is also
useful. The hazard function is the conditional density function of failure at time t, given
8
that the unit has survived until time t. Therefore, letting X denote the random variable
and x denote the realization,
f ( x | X  x )  h( x )
 F ( x | X  x)
F ( x  x | X  x)  F ( x | X  x)
x
F ( x  X  x  x | X  x)
 lim
x 
x
F ( x  X  x  x, X  x )
 lim
x 
xP{ X  x}
F ( x  X  x  x )
 lim
x 
x[1  F ( x )]
f ( x)

1  F ( x)
 lim
x 
It turns out that specifying a hazard function completely determines the cumulative
distribution function (and vive-versa).
The Hazard Function for the Exponential Distribution
For the exponential distribution, the hazard function is
h( x ) 

f ( x)
1  F ( x)
 e x
e x

That is, the hazard function for the exponential distribution is constant, or the failure rate
is just the reciprocal of the mean time to failure.
A constant failure rate implies that the reliability of the unit at time t does not depend on
its age. This may be a reasonable assumption for some types of units, such as electrical
components, but it’s probably unreasonable for mechanical components. It is probably
not a good assumption for many types of system-level products that are made up of many
components (such as an automobile). Generally, an increasing hazard function indicates
that the unit is more likely to fail in the next increment of time than it would have been in
an earlier increment of time of the same length. This is likely due to aging or wear.
Despite the apparent simplicity of its hazard function, the exponential distribution has
been an important distribution in reliability engineering. This is partly because the
constant failure rate assumption is probably not unreasonable over some region of the
unit’s life.
S3.7. The Failure Rate for the Weibull Distribution
9
The instantaneous failure rate or the hazard function was defined in Section S3.6 of the
Supplemental Text Material. For the Weibull distribution, the hazard function is
h( x ) 


f ( x)
1  F ( x)
(  /  )( x /  )  1 e  ( x / )
e  ( x / )
 x
   


 1
Note that if   1 the Weibull hazard function is constant. This should be no surprise,
since for   1 the Weibull distribution reduces to the exponential. When   1 , the
Weibull hazard function increases, approaching  as    . Consequently, the Weibull
is a fairly common choice as a model for components or systems that experience
deterioration due to wear-out or fatigue. For the case where   1 , the Weibull hazard
function decreases, approaching 0 as   0 .
For comparison purposes, note that the hazard function for the gamma distribution with
parameters r and  is also constant for the case r = 1 (the gamma also reduces to the
exponential when r = 1). Also, when r > 1 the hazard function increases, and when r < 1
the hazard function decreases. However, when r > 1 the hazard function approaches 
from below, while if r < 1 the hazard function approaches  from above. Therefore,
even though the graph of the gamma and Weibull distributions look very similar, and
they can both produce reasonable fits to the same sample of data, they clearly have very
different characteristics in terms of describing survival or reliability data.
10