Download Chapter 5 Continuous Random Variables )xX(P

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 5 Continuous Random Variables A continuous random variable can take any numerical value in some interval. Assigning probabilities to individual values is not possible. Probabilities can be measured in a given range. For a continuous random variable X with a numerical value of interest x the cumulative distribution function (CDF) is denoted by: F(x )  P(X  x) The probability density function (PDF) is given by: for all values of x. The properties of a probability density function can be illustrated with a special distribution called the uniform distribution. The uniform distribution over the interval [0, 1] has the PDF:  1 for 0  x  1
f( x )  
 0 otherwise
with P(X  x )  0  P(X  x)
f( x )  0 A graph of the probability density function is below. For two numerical values a and b, with a < b, the probability that the outcome is in a range is: P(a  X  b)  P(a  X  b)
 P(X  b)  P(X  a )
 F(b)  F(a)
1
Econ 325 – Chapter 5
2
Econ 325 – Chapter 5
In general, the uniform distribution over the interval [ x min , x max ] The important properties of the PDF are: has the PDF:  the total area under the PDF is equal to one. 1

for x min  x  x max
x
f(x)   max  x min
 0 otherwise

 the area under the PDF to the left of the value a is F(a ) . The next graph illustrates that the PDF can also be used to find a range probability. For example, the graph below compares the probability density function for the uniform distribution over the interval [0, 1] and the the uniform distribution over the interval [ 1 4 , 3 4 ] . For a continuous random variable, the range probability P(a  X  b)  P(a  X  b) is the area under the PDF between the values a and b. 3
Again, note that the total area under a PDF is equal to one. Econ 325 – Chapter 5
4
Econ 325 – Chapter 5
By comparing the graphs of the PDFs for the uniform distribution over the interval [0, 1] and the uniform distribution over [ 1 4 , 3 4 ] it can be seen that both are centered at 1 2 . However, the two distributions have different dispersion. That is, the PDF for the uniform distribution over [ 1 4 , 3 4 ] has a higher peak Example An emergency rescue team operates on a 4‐mile stretch of river. Let the random variable X be the distance (in miles) of an emergency from the northernmost point of this stretch of river. X follows a uniform distribution over the interval [0, 4] with PDF: to suggest smaller dispersion.  0.25 for 0  x  4
f( x )  
 0 otherwise
Questions and answers:  Find the probability that a given emergency arises within one mile of the northernmost point of this stretch of river. A graph of the PDF illustrates the problem: The area of the shaded box is calculated as: (height)∙(width). The answer is: 5
Econ 325 – Chapter 5
6
P(X  1)  F(1)  (0.25)(1  0)  0.25 Econ 325 – Chapter 5
 The rescue team’s base is at the mid‐point of this stretch of river. Find the probability that a given emergency arises more than 1.5 miles from this base. Another way of getting the answer is to calculate: P(X  0.5)  P(X  3.5) The graph of the PDF shows: First, calculate the probability that an emergency arises within 1.5 miles from the base. A graph of the PDF illustrates the problem: P(X  0.5)  P(X  3.5) Therefore: P(X  0.5)  P(X  3.5)  2 P(X  0.5)  2 (0.25)(0.5)
 0.25
The range probability is: P(0.5  X  3.5)  (0.25)(3.5  0.5)  0.75 Also note: P(0.5  X  3.5)  F(3.5)  F(0.5) Therefore, the probability that an emergency is outside the 1.5 mile limit is: 1  P(0.5  X  3.5)  1  0.75  0.25 7
Econ 325 – Chapter 5
8
Econ 325 – Chapter 5
Chapter 5.2 Recall the rules introduced for discrete random variables. That is, for constant fixed numbers a and b: Expectations Summary information about a probability distribution is provided by the mean and variance. E(a  b X )  a  b E(X )  a  b  X and Var (a  b X )  b2 Var (X ) E(X) is the expected value of a random variable X. The expected value can be viewed as the average of the observed values from a “large” number of trials of a random experiment. As a special case, the standardized random variable is defined as: The mean of a random variable X is denoted by:  X  E( X ) Z
X  X
X
The properties of Z are: A measure of dispersion is the variance:  2X  Var(X )  E[ (X   X )2 ]
 X  X 
1
E(Z)  E 
   E(X   X )  0 

X 
X
 X  X 
1
Var(Z)  Var 
 2 Var( X )  1 
 X  X
 E(X ) 
2
 2X
The standard deviation of a random variable X is defined as: and That is, the standardized random variable Z has mean 0 and variance 1.  X   2X  Var (X )  0 9
Econ 325 – Chapter 5
10
Econ 325 – Chapter 5
To state that a random variable X follows a normal distribution Chapter 5.3 The Normal Distribution summarized by the parameters mean  X and variance  X the 2
The continuous random variable that follows the normal distribution has popularity in applied work. notation is: The probability density function (PDF) for a normally distributed random variable X with mean  X and variance  X is: 2
f(x ) 


1
exp 
(x   X )2  for    x   2
2   2X
 2 X

X ~ N( X ,  2X )  “is distributed as” 1
The cumulative distribution function (CDF) is: F(x)  P(X  x) a
where the function exp(a) is the exponential function e . The graph shows that the shaded area under the PDF to the left of the value a is the cumulative probability F(a ) . The shape of the PDF is a symmetric, bell‐shaped curve centered on the mean. Note: the total area under the PDF curve is equal to one. 11
Econ 325 – Chapter 5
12
Econ 325 – Chapter 5
For two values a and b , with a < b , the range probability is calculated from the CDF as: A practical problem is that, for the normal distribution, there is no mathematical formula for computing cumulative probabilities. P(a  X  b)  F(b)  F(a ) A quick solution is that computer software offers high accuracy methods for calculating probabilities. The next graph shows that the shaded area under the PDF between the values a and b is the range probability. With Microsoft Excel normal distribution probabilities can be obtained by selecting Insert Function NORM.DIST. The general usage is: NORM.DIST(x,  X ,  X , cumulative) where cumulative = 0 for the PDF, cumulative = 1 for the CDF 13
Econ 325 – Chapter 5
14
Econ 325 – Chapter 5
Working with the Normal Distribution Before the days of high speed laptop computers, applied workers used statistical tables (printed in the Appendix to statistical textbooks) to look‐up normal distribution probabilities. How is the table read ? A graph is useful. Working with the statistical tables can be useful as a learning exercise as it gives emphasis to understanding the properties of the normal distribution. Therefore, as a check on the calculations that can be obtained with Microsoft Excel, the use of the normal distribution tables will be described here. It can be noted that probabilities depend on the setting of  X and  X , the mean and standard deviation of the random variable. However, it turns out that probabilities for the standard normal random variable Z with mean 0 and variance 1 can be used to calculate probabilities for any other normal distribution. Textbooks provide an appendix table for the cumulative distribution function (CDF) for the standard normal random variable: Z
X  X
~ N(0 , 1) X
For a value of interest z0 the table gives the cumulative probability: F(z0 )  P(Z  z0 ) The table lists values for z0  0 only. From symmetry of the normal distribution: F( z0 )  P(Z   z0 )  P(Z  z0 )
 1  F(z0 )
15
Econ 325 – Chapter 5
16
Econ 325 – Chapter 5
A result for a range probability with symmetric upper and lower values can be stated. For some positive value z0 : P( z0  Z  z0 )  P(Z  z0 )  P(Z   z0 ) Now suppose the random variable to work with is: X ~ N( ,  2 )  F(z0 )  [1  F(z0 )]
For two numerical values a and b, with a < b, a probability of interest is:  2 F(z0 )  1
This is shown with a graph. P(a  X  b) This probability statement can be transformed to a probability statement about the standard normal random variable Z. This is done as follows: a X b
P(a  X  b)  P



 

 
b
a
 P
Z

 
 
b
a
 F
  F

  
  
The Appendix Table can now be used to look‐up the cumulative probabilities for the standard normal distribution. By symmetry of the normal distribution the area in the “lower tail” is identical to the area in the “upper tail.” 17
Econ 325 – Chapter 5
18
Econ 325 – Chapter 5
Example A graph gives a helpful illustration of the use of the statistical tables for this problem. Let the continuous random variable X be the amount of money spent on clothing in a year by a university student. It is known that: X ~ N( ,  2 ) with   $380 and   $50 Questions and Answers  Find P( X  400) . This gives the probability that a randomly chosen student will spend less than $400 on clothing in a year. First state the probability as a probability about the standard normal variable Z:  X   400   
P(X  400)  P


 
 
400  380 

 P Z 
 50


Now check the answer with Microsoft Excel by selecting Insert Function:  P(Z  0.4)
 F(0.4)
NORM.DIST(x,  X ,  X , cumulative) Enter the values: Now look‐up the answer in the Appendix Table. The table gives: F(0.4)  0.6554
NORM.DIST(400, 380, 50 ,1) This returns the probability: 0.6554 Therefore, P(X  400)  0.6554 19
Econ 325 – Chapter 5
20
Econ 325 – Chapter 5
 Find P(X  360) . The graph below demonstrates that because of symmetry about the mean: This gives the probability that a randomly chosen student will spend more than $360 on clothing in a year. Express the problem in the form of a probability statement about the standard normal variable Z: Also,  X   360   
P(X  360)  P


 
 
360  380 

 P Z 

50


P(X  360)  P(X  400) P(X  360)  P(X  400)  P(Z   0.4)
 P(Z  0.4) by symmetry
 F(0.4)
This is identical to the probability calculated for P( X  400) . That is, P( X  360)  P( X  400)  0.6554 This result holds since the normal distribution is symmetric about the mean   $380 . 21
Econ 325 – Chapter 5
22
Econ 325 – Chapter 5
 Find P( 300  X  400) . From the previous calculations: P( X  400)  0.6554 This gives the probability that a randomly chosen student will spend between $300 and $400 on clothing in a year. Now find:  X   300   
P(X  300)  P


 
 
300  380 

 P Z 

50


The range probability is calculated as: P(300  X  400)  P(X  400)  P(X  300) A graph gives a helpful picture of the calculations.  P(Z   1.6)
 1  P(Z  1.6) by symmetry
 1  F(1.6)
A look‐up in the Appendix Table gives: F(1.6)  0.9452 The answer is: P(300  X  400)  0.6554  (1  0.9452)  0.60
23
Econ 325 – Chapter 5
24
Econ 325 – Chapter 5
A probability result is: P( X  b )  1  P( X  b )  Finding Cutoff Points or Critical Values A problem that has been presented is: What is the probability that values will occur in some range ? Therefore, as shown in the above graph, the problem is to find the value b such that: Another problem is: What numerical value corresponds to a probability of 10% ? That is, find the value b such that: P( X  b )  0.90 A result is: P( X  b )  0.10 b 

P(X  b )  P Z 

 

b 
 F

  
A graph of the problem is below. The Appendix Table gives F(1.28 )  0.90 (some approximation was used). Therefore, Rearranging gives: b
 1.28

b    1.28 
Note: the upper tail probability can be set to any level of interest. The value of 10% is chosen here.
25
Econ 325 – Chapter 5
26
Econ 325 – Chapter 5
The cutoff point (or critical value) b can be computed with Microsoft Excel with the function: Example: student clothing expenditure exercise Continued  Find a range of dollar clothing expenditure that includes 80% of all students. NORM.INV(probability,  X ,  X ) Cutoff points from the standard normal distribution are computed with the function: Any number of ranges can be found. That is, a variety of values x 0 and x 1 with x 0 <380 and x 1 >380 will satisfy: NORM.S.INV(probability) For example, to find the value z0 such that F( z0 )  0.90 P(x 0  X  x 1 )  0.80 with Microsoft Excel select Insert Function: The shortest range is centered at the mean $380. To calculate this range, find a number a such that: NORM.S.INV(0.9) or NORM.INV(0.9, 0, 1) P(380  a  X  380  a )  0.80 This is illustrated with a graph: Both these functions return the answer z0 = 1.2816 27
Econ 325 – Chapter 5
28
Econ 325 – Chapter 5
By inspecting the graph, it can be seen that an equivalent statement of the problem is: find a number a such that: P(X  380  a )  0.90 To work with the standard normal distribution consider:  X   (380  a )  380 

P(X  380  a )  P

50
 

a
 P Z  
50 

a
 F 
 50 
a
 1.28 50
Results stated earlier for jointly distributed discrete random variables can be extended to work with continuous random variables. Let X and Y be two continuous random variables that take numeric values denoted by x and y, respectively. The joint cumulative distribution function (CDF) is: FX , Y (x , y )  P(X  x and Y  y) The marginal distribution functions are: FX (x )  P(X  x ) The Appendix Table gives F(1.28 )  0.90 Therefore, Chapter 5.6 Jointly Distributed Continuous Random Variables X and Y are statistically independent if and only if: and and FY (y )  P(Y  y ) FX , Y (x , y)  FX (x) FY (y) a  (1.28 )(50)  64 The range centered at $380 is: [ $380 – 64, $380 + 64] = [ $316, $444] As a check on the calculations, the upper limit can be calculated with Microsoft Excel by using the function: NORM.INV(0.9, 380, 50) 29
Econ 325 – Chapter 5
30
Econ 325 – Chapter 5
 Linear Combinations of Random Variables A measure of linear association is covariance: Cov(X , Y )  E[(X   X ) (Y   Y )]
For constant fixed numbers a and b, a linear combination of random variables X and Y is:  E(XY )   X  Y
where  X  E(X ) and  Y  E(Y ) W  a X  bY The mean of the random variable W is:  W  E(W )  a E(X )  b E(Y ) If X and Y are independent then Cov(X , Y )  0 . However, zero covariance does not guarantee independence. X and Y may have some complicated non‐linear relationship. The variance of W is:  Special Case: If X and Y are joint normally distributed random variables then zero covariance also gives the result that X and Y are independent. σ 2W  Var (W )  a 2 Var ( X )  b 2 Var (Y )  2 a b Cov( X , Y )  Special Case: If X and Y are joint normally distributed random variables then W  a X  b Y is also normally distributed with mean and variance as given above. That is, W ~ N( W , σ 2W ) 31
Econ 325 – Chapter 5
32
Econ 325 – Chapter 5
Now consider three random variables X 1 , X 2 and X 3 with means  1 ,  2 and  3 and variances  ,  2 and  3 . The sum of these random variables has the properties: Example: Portfolio Analysis The random variables X and Y are the share prices of two companies trading on the stock market such that E(X  X 2  X 3 )     2   3 X ~ N(25, 81 ) and and Var(X  X 2  X 3 )   1   2   3 
Y ~ N(40 , 121 )      X  2X  Y  2Y The correlation between the two stock prices is: 2 Cov(X 1 , X 2 )  2 Cov(X 1 , X 3 )  2 Cov(X 2 , X 3 )
With independence the covariance between every pair of these random variables is zero to give a simpler result for the variance of the sum:  XY   0.4 A portfolio is the random variable: W  20 X  30 Y Find the probability that the portfolio value exceeds 2,000. Var (X  X 2  X 3 )   1   2   3
W is a linear combination of normal random variables and therefore W also follows a normal distribution. The mean of W is found as: E( W )  20  E( X )  30  E( Y )  20  25  30  40
 1700
33
Econ 325 – Chapter 5
34
Econ 325 – Chapter 5
To find P( W  2000 ) with Microsoft Excel select Insert Function: Recall the definition of correlation:  XY 
Cov(X , Y )
X Y
NORM.DIST(2000, 1700, 306.235, 1) By rearranging the covariance can be calculated as: Cov(X , Y )   XY  X  Y
   W  W This returns the probability 0.8364.   0.4 81  121
Therefore, the probability that the portfolio value exceeds 2,000 is:   39.6
The variance of W is found as: Var ( W )  20 2 Var (X )  30 2 Var (Y )  2  20  30  Cov(X , Y )
 20 2  81  30 2  121  2  20  30  39.6
1  0.8364 = 0.16 Now check this result using the table for the standard normal distribution. Write the probability for W as a probability for Z:  W   W 2000   W 

P(W  2000)  P

W

 W
 93780
The standard deviation of W is: 2000  1700 

 P Z 

306.235 

 W  93780  306.235  P( Z  0.98)
 F(0.98) look up in the Appendix Table
 0.8365
The use of the standard normal distribution table may give slight rounding differences in results compared to Microsoft Excel. 35
Econ 325 – Chapter 5
36
Econ 325 – Chapter 5