Download pd - Dartmouth Math Home

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Mathematics 50
Probability and Statistical Inference
Winter 1999
1. Univariate Continuous Random Variables
(continued)
Week #3
January 18-22
1. Delta method
is for approximate calculation of the variance.
Let X be a random variable and g any function. The exact calculation of mean and variance of
g(X) involves integral which may be cumbersome. Can we approximate mean and variance?
Yes, using Taylor series expansion.
Denote E(X) = ¹ and var(X) = ¾ 2 :
Mean: Using Taylor series expansion around x0 = ¹ we obtain
¯
dg ¯¯
g(X) ' g(¹) + (X ¡ ¹)
¯
dx ¯x=¹
= g(¹) + (X ¡ ¹)g 0 (¹):
Take expectation:
E(g(X)) ' E(g(¹)) + E ((X ¡ ¹)g 0 (¹))
= g(¹) + g 0 (¹)E(X ¡ ¹)
= g(¹) + g 0 (¹) ¢ 0
= g(¹)
Thus,
E(g(X)) ' g(¹):
Variance: Again using Taylor series expansion
g(X) ' g(¹) + (X ¡ ¹)g 0 (¹);
and using the above,
var(g(X)) = E (g(X) ¡ E(g(X)))2
' E (g(X) ¡ g(¹))2
(1.1)
2
' E ((X ¡ ¹)g 0 (¹))
³
= E (X ¡ ¹)2 (g 0 (¹))2
= (g 0 (¹))2 E(X ¡ ¹)2
´
= ¾ 2 (g 0 (¹))2
Thus,
var(g(X)) ' ¾ 2 (g 0 (¹))2 :
(1.2)
Prove that (1.1) and (1.2) are exact if g is a linear function.
Example. Let X » N (¹; ¾ 2 ). Approximate mean and variance of eX :
Solution. We have
g(x) = ex ; g 0 (x) = ex
Therefore,
EeX ' e¹ ;
var(eX ) ' ¾ 2 e2¹ :
Exact mean can be calculated (eX has lognormal distribution),
2
EeX = e¹+:5¾ :
4
3
2
1
-1
-0.5
0
0.5
1
Delta-method to approximate the mean of the lognormal distribution.
Solid: exact mean (¾ = 1)
Dashed line: exact mean (¾ = :5)
Dotted line: approximate mean.
For lognormal distribution
var(eX ) = Ee2X ¡ E 2 (eX )
= EeZ ¡ E 2 (eX )
= EeZ ¡ e2(¹+:5¾
2
2)
where Z » N(2¹; 4¾ 2 ): Hence
2
EeZ = e2¹+2¾ :
Continue,
2
var(eX ) = e2¹+2¾ ¡ e2¹+¾
2
2
2
= e2¹+¾ (e¾ ¡ 1)
2.5
2
1.5
1
0.5
-1
00
-0.5
0.5
1
Delta-method to approximate the variance of the lognormal distribution.
Solid line: exact variance (¾ = :5)
Dotted line: approximate variance.
2. Normal distribution
is the most important distribution in statistics!
Sometimes it is called Gaussian by the name of German mathematician Carl Gauss.
Jargon: A normally distributed RV is called normal RV.
This distribution has two parameters, mean (¹) and variance (¾ 2 > 0): If CRV X has normal
distribution we write
X » N(¹; ¾ 2 ):
The density of X is
1
2
1
Á(x; ¹; ¾ 2 ) = p e¡ 2¾2 (x¡¹)
¾ 2¼
and the distribution function is
2
©(x; ¹; ¾ ) =
Z
1
2
1
p e¡ 2¾2 (t¡¹) dt:
¡1 ¾ 2¼
x
The density of normally distributed variable is bell-shaped.
3
0.4
0.3
0.2
0.1
-4
-2
00
2
x
4
Three normal densities with unit variance (¾ 2 = 1) and
means ¹ =-1 (dotted)
¹ =0 (solid), ¹ =1 (dashed).
Parameter ¹ determines the location.
0.4
0.3
0.2
0.1
-4
-2
00
2
x
4
Three normal densities with zero mean and di®erent variance:
¾ 2 = 1 (solid), ¾ 2 = 2 (dotted), ¾ 2 = 4 dotted.
Variance (SD) de¯nes the shape (the width of the bell).
Mean and variance of the normal distribution: if X » N(¹; ¾ 2 ) then
E(X) = ¹ and var(X) = ¾ 2 :
We have for the mean:
Z
1
1
2
t
p e¡ 2¾2 (t¡¹) dt
¡1 ¾ 2¼
1
2
1 Z1
p
=
(u + ¹)e¡ 2¾2 u du
¾ 2¼ ¡1
1 Z 1 ¡ 12 u2
1 Z 1 ¡ 12 u2
p
=
ue 2¾ du + ¹ p
e 2¾ du
¾ 2¼ ¡1
¾ 2¼ ¡1
= 0 + ¹ £ 1 = ¹:
E(X) =
4
Use integration by part to prove var(X) = ¾ 2 :
Linear transformation of a variable with normal distribution leads again to a normal random
variable.
Proof. Let X » N (¹; ¾ 2 ) and Y = aX + b: Then using
1
fY (y) = fX
a
we obtain
Ã
y¡b
a
!
y¡b
1
2
1
1 1
¡ 1 (y¡(a¹+b))2
p e¡ 2¾2 ( a ¡¹) =
p e 2(¾a)2
a ¾ 2¼
(¾a) 2¼
Denote
¾Y2 = (¾a)2 and ¹Y = a¹ + b
we see that Y has a normal distribution with mean a¹ + b and variance a2 ¾ 2 :
Fact to remember:
if X » N (¹; ¾ 2 ) and Y = aX + b then Y » N(a¹ + b; a2 ¾ 2 ):
Standard normal distribution N (0; 1): If X » N(¹; ¾ 2 ) then
Z=
X ¡¹
¾
is called Z-score or standardized normal random variable, and
Z » N(0; 1)
i.e. it has standard normal distribution.
The standard normal density is
1 2
1
Á(x) = p e¡ 2 x
2¼
and the standard normal distribution is
1 Z x ¡ 1 t2
p
©(x) =
e 2 dt:
2¼ ¡1
Tables are provided in the Appendix A (Rice, p.A7).
It implies that if X » N(¹; ¾ 2 ) then
µ
¶
a¡¹
Pr(X · a) = FX (a) = ©
:
¾
Problem 1. Let X » N (0; 1), ¯nd the median, and the lower and upper quartiles.
Solution. Since X is symmetric, median=mode=mean=0. Also since X is symmetric lower
quartile=-upper quartile. From Table 2 we ¯nd that approximately the upper quartile, x:75 = :57;
and x:25 = ¡:57: Therefore, Pr(¡:57 < X < :57) = :5:
Problem 2. Let X » N(1; 4); ¯nd Pr(jXj < 2):
5
0.2
0.15
0.1
0.05
-4
-2
00
2
x
4
6
Density of X » N (1; 4):
Solution. We have
Pr(jXj < 2) = Pr(¡2 < X < 2)
= Pr(X < 2) ¡ Pr(X < ¡2)
¶
µ
¶
µ
¡2 ¡ 1
2¡1
¡©
= ©
2
2
= © (:5) ¡ © (¡1:5)
Now look up Table 2 (Rice, A7).
© (:5) = :6915;
© (¡1:5) = 1 ¡ © (1:5)
= 1 ¡ :9332 = :0668
Finally,
Pr(jXj < 2) = :6915 ¡ :0668 = :6247:
Problem 3. Given X » N(¹; ¾ 2 ); ¯nd the distribution and the density function of Y = X 2 .
Solution. Warning: we cannot use the formula for the distribution and density derived above
because g(x) = x2 is not monotone. In order to ¯nd the distribution of Y we proceed directly from
the de¯nition of the distribution function (sometimes this method is called "method of distribution
function"):
p
p
FY (y) = Pr(Y · y) = Pr(X 2 · y) = Pr(¡ y · X · y)
p
p
= Pr(X · y) ¡ Pr X < ¡ y):
where y > 0: Recall, if X » N(¹; ¾ 2 )
µ
a¡¹
Pr(X < a) = ©
¾
6
¶
where © is the standard normal distribution. Therefore,
Ãp
!
à p
!
y¡¹
¡ y¡¹
p
p
Pr(¡ y · X · y) = ©
¡©
¾
¾
and the distribution function of Y is
Pr(Y · y) = ©
!
à p
!
y¡¹
¡ y¡¹
¡©
:
¾
¾
Ãp
Take derivative with respect to y and obtain the density (use ©0 = Á and the chain rule)
Ãp
!
à p
!
¡ y¡¹
y¡¹
1
1
+ p Á
p Á
2¾ y
¾
2¾ y
¾
" Ãp
!
à p
!#
y¡¹
¡ y¡¹
1
=
+Á
p Á
2¾ y
¾
¾
If ¹ = 0 then the density is simpler
1
p Á
¾ y
or
¡1
1
p
e 2
¾ 2¼y
Ãp !
³ p ´2
y
¾
y
¾
y
1
= p
e¡ 2¾2
¾ 2¼y
2.5
2
1.5
1
0.5
00
1
2
y
3
4
Density function of X 2 where X » N(0; 1):
Problem (continued). You need to measure the area of the square. You measured the side of
the square, 10 feet with the measurement error SD=1. Assuming the measurement follows normal
distribution, what is the probability that the area of the square is more than:
(a) 100 sq. feet,
(b) (10+1)2 = 121 sq. feet?
Solution: Let X be the length of the square side. We know that X » N (¹; ¾ 2 ) where ¹ =
10; ¾ = 1: The ¯rst probability is
Pr(X 2 > 100) = 1 ¡ Pr(X 2 · 100) = 1 ¡ FY (100)
7
where Y = X 2 : But
Ãp
!
à p
!
y¡¹
¡ y¡¹
FY (100) = ©
¡©
¾
¾
!
à p
!
Ãp
100 ¡ 10
¡ 100 ¡ 10
¡©
= ©
1
1
= © (0) ¡ © (¡20)
' © (0)
1
=
:
2
Finally,
1 ¡ FY (100) = 1 ¡
1
1
= :
2
2
The second probability is
Pr(X 2 > 121) = 1 ¡ FY (121):
But
Ãp
!
à p
!
121 ¡ 10
¡ 121 ¡ 10
¡©
FY (121) = ©
1
1
= © (1) ¡ ©(¡21)
' © (1)
= :8413:
Finally,
Pr(X 2 > 121) ' 1 ¡ :8413
= :1587
What assumption is week?
What is unrealistic in this problem?
How would you reformulate this problem to make it more realistic?
Problem 4. X is normally distributed with mean=¹ and variance ¾ 2 : Find the probabilities
Pr(jX ¡ ¹j < 2¾) and Pr(jX ¡ ¹j < 3¾):
Solution.
Pr(jX ¡ ¹j < 2¾)
= 1 ¡ 2 ¢ © (¡2) = 0:9544997
' 0:954
8
The rule of 2 sigma:
if RV is normally distributed then with probability
:95
the range of RV is plus/minus 2 sigma around the mean.
Pr(jX ¡ ¹j < 3¾)
= 1 ¡ 2 ¢ © (¡3) = 0:9973002
' 0:997
The rule of 3 sigma:
if RV is normally distributed then with probability close to 1 the range of RV is
plus/minus 3 sigma around the mean.
3. Homework (due Wednesday, January 27)
Maximum number of points is 29.
1. (5 points). During summer John decided to get some extra cash by painting. John had to
paint a square wall. He roughly measured the width=10 (mean) with the error plus/minus 2 feet
(SD). Assuming that his measurement has normal distribution and he spends 14 gallon/sq. foot of
paint, what is the expected amount of paint he needs to buy? What is the probability to run out of
paint if he buys 25 gallons? How many gallons is needed to buy to run out of paint with probability
0:95?
Solution. Let X denote the size, we know that X » N (10; 4): Then, the expected area is
E(X 2 ) = ¹2 + ¾ 2 = 100 + 4 = 104: Thus, he needs 104£.25=26 gallons of paint. If he buys 25
gallons than the probability to run out of paint is
Pr(X 2 > 25 £ 4) = Pr(X 2 > 100) = Pr(jXj > 10) = Pr(X > 10) + Pr(X < ¡10)
¶
µ
¶
µ
¡10 ¡ 10
10 ¡ 10
= 1 ¡ Pr(X < 10) + Pr(X < ¡10) = 1 ¡ ©
+©
2
2
1
= 1 ¡ © (0) + © (¡10) ' :
2
Let g be the amount of paint. Then John runs out of paint with probability .95 if Pr(X 2 > 4g) = :95:
Since
à p
!
Ã
!
p
2 g ¡ 10
¡2 g ¡ 10
2
Pr(X > 4g) = 1 ¡ ©
+©
:
2
2
9
The quantity ©
³
´
p
¡2 g¡10
2
is negligible, so that the needed probability can be approximated as
à p
2 g ¡ 10
©
2
!
= :05
which gives
p
2 g ¡ 10
= ¡1:65;
2
and the amount of paint is g = (5 ¡ 1:65)2 = 11: 2 gallons.
3. (2 points). Let E(X) = ¹ and var(X) = ¾ 2 : Show that for RV Z = (X ¡ ¹)=¾ we have
E(Z) = 0 and var(Z) = 1:
Solution. We have
µ
¶
X ¡¹
1
1
1
= E(X ¡ ¹) = (E(X) ¡ ¹)) = (¹ ¡ ¹) = 0
¾
¾
¾
¾
µ
¶
X ¡¹
1
1
1
var(Z) = var
= 2 var(X ¡ ¹) = 2 var(X) = 2 ¾ 2 = 1:
¾
¾
¾
¾
E(Z) = E
4. (3 points). Find the density of jXj where X » N(¹; ¾ 2 ): Approximate the probability
Pr(jXj > c) for large ¹ and relatively small ¾:
Solution. First we ¯nd the distribution of jXj :
FjXj (x) = Pr(jXj · x) = Pr(X · x) ¡ Pr(X · ¡x)
1 Z ¡x ¡ 12 (t¡¹)2
1 Z x ¡ 12 (t¡¹)2
p
dt ¡ p
dt:
=
e 2¾
e 2¾
¾ 2¼ ¡
¾ 2¼ ¡
Then, the density is the derivative of FjXj (x);
1
1
2
2
1
1
p e¡ 2¾2 (x¡¹) + p e¡ 2¾2 (x+¹)
¾ 2¼
¾ 2¼
³
´
1
1
2
2
1
p
e¡ 2¾2 (x¡¹) + e¡ 2¾2 (x+¹) :
=
¾ 2¼
fjXj (x) =
The probability Pr(jXj > c) is
µ
¶
µ
¶
c¡¹
¡c ¡ ¹
1 ¡ Pr(X · c) + Pr(X · ¡c) = 1 ¡ ©
+©
:
¾
¾
Since ¹ is large, the amount
¡c¡¹
¾
is negative and very large in absolute value. This implies that
©
µ
¶
¡c ¡ ¹
'0
¾
so that
µ
¶
c¡¹
= Pr(X > c):
¾
5. (6 points). The lognormal distribution is de¯ned as the distribution of eX where X »
N (¹; ¾ 2 ): Derive the density of this distribution. Find the mean of eX :
Pr(jXj > c) ' 1 ¡ ©
10
Solution. We use formula
fY (y) =
where g = exp and g ¡1 = ln : Since
dg ¡1 (y) ³ ¡1 ´
fX g (y)
dy
1
2
1
fX (x) = p e¡ 2¾2 (x¡¹)
¾ 2¼
and
1
dg ¡1 (y)
=
dy
y
we obtain for Y = exp(X)
fY (y) =
1
2
1
p e¡ 2¾2 (ln y¡¹) ;
y¾ 2¼
y > 0:
To ¯nd the mean we calculate the integral
1 Z 1 x¡ 12 (x¡¹)2
1 Z 1 x ¡ 12 (x¡¹)2
dx = p
dx:
E(eX ) = p
e e 2¾
e 2¾
¾ 2¼ ¡1
¾ 2¼ ¡1
But
x¡
1
(x ¡ ¹)2
2¾ 2
´
´
1 ³
1 ³ 2
2
2
2
2
(x
¡
¹)
¡
2¾
x
=
¡
x
¡
2x¹
+
¹
¡
2¾
x
2¾ 2
2¾ 2
³
´
1
¡ 2 x2 ¡ 2x¹ + ¹2 ¡ 2¾ 2 x
2¾
´
1 ³
¡ 2 x2 ¡ 2x(¹ + ¾ 2 ) + (¹ + ¾ 2 )2 + (¹2 ¡ (¹ + ¾ 2 )2 )
2¾
´
1 ³
¡ 2 (x ¡ (¹ + ¾ 2 ))2 + (¹2 ¡ (¹ + ¾ 2 )2 )
2¾
1
1
¡ 2 (x ¡ (¹ + ¾ 2 ))2 ¡ 2 (¹2 ¡ (¹ + ¾ 2 )2 ):
2¾
2¾
= ¡
=
=
=
=
But
1
1
(¹2 ¡ (¹ + ¾ 2 )2 ) = 2 (¡2¾ 2 ¹ + ¾ 4 ) = ¡¹ ¡ :5¾ 2
2
2¾
2¾
Then,
E(eX )
1 Z 1 ¡ 12 (x¡(¹+¾2 ))2
p
= e
e 2¾
dx
¾ 2¼ ¡1
2
= e¹+:5¾ :
¹+:5¾ 2
6. (4 points) Scores on an examination are assumed to be normally distributed with a mean of
78 and a variance 36.
(a) What is the probability that a person taking the examination scores higher than 72?.
11
(b) Suppose that students scoring in the top 10% of this distribution are to receive an A grade.
What is the minimum score a student must achieve to earn an A grade?
Solution. Let X be the score, we know that X » N(78; 36): The ¯rst probability is
Ã
72 ¡ 78
p
Pr(X > 72) = 1 ¡ Pr(X · 72) = 1 ¡ ©
36
= 1 ¡ © (¡1) = 1 ¡ :1587 = :9413:
!
=1¡©
µ
72 ¡ 78
6
¶
The minimum score, x is the solution to the following equation
Pr(X > x) = :1
or
µ
¶
x ¡ 78
Pr(X · x) = ©
= :9
6
If x:9 is the 90% quantile then
x ¡ 78
= x:9
6
But from Rice, Table 2, x:9 = 1:28 we obtain
x ¡ 78
= 1:28
6
which gives x = 78 + 1:28 £ 6 = 85: 68:
7. (4 points). Does the rule of 3 sigma works for uniform distribution? What is the probability
to fall in the range of 3 sigma around the mean or a uniform distribution?
Solution. Let X » U(a; b): Then we know
E(X) =
a+b
;
2
var(X) =
(b ¡ a)2
:
12
Also, we know that for X » U (a; b)
Pr(c < X < d) =
In our case
d=
and
a+b
b¡a
+3 p ;
2
2 3
c=
d¡c
:
b¡a
a+b
b¡a
¡3 p
2
2 3
b¡a p
d ¡ c = 6 p = 3(b ¡ a)
2 3
and
p
d¡c
3(b ¡ a) p
=
= 3 > 1:
b¡a
b¡a
This means that the interval (c; d) cover the interval (a; b): In other words, for uniform RV §3¾
covers the entire range of the variable: the rule of 3 sigma works.
12
8. (5 points). The US standard for fat concentration in whole milk is from 11.5 to 12.5 percent.
A Vermont company produces milk with the mean fat concentration 12.2 percent and SD=.2.
Assuming that the fat concentration has normal distribution, what is the probability that the fat
concentration in a pack of milk taken at random: (a) will be more than 12.5, (b) will not meet the
US standard?
Solution. If X denotes the fat concentration in Vermont milk then we know that it varies
according X » N(12:2; 0:22 ): The ¯rst probability is
µ
12:5 ¡ 12:2
Pr(X > 12:5) = 1 ¡ Pr(X · 12:5) = 1 ¡ ©
:2
= 1 ¡ © (1:5) = 1 ¡ :9332
¶
= 0:0668:
The second probability is
Pr(X > 12:5 OR X < 11:5) = Pr(X > 12:5) + Pr(X < 11:5)
¶
µ
11:5 ¡ 12:2
= 0:0668 + ©
:2
= 0:0668 + © (¡3:5) =
= 0:0668 + (1 ¡ © (3:5))
= 0:0668 + (1 ¡ 0:9999)
= 0:0669
' 0:0668:
13
Related documents