Download Solutions for Homework 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
U.C. Berkeley — Stat 135 : Concepts of Statistics
Professor: Noureddine El Karoui
Homework 5
Due March 15, 2007
Solutions for Homework 5
Problems
Chapter 9: 12, 13, 24, 30, 32, 36, 43, 57
Problem 9.12
We know the MLE θ̂ is given by 1/X̄ in this situation. Since
statistic is now
Λ
=
=
=
P
i
Xi = nX̄, the generalized likelihood ratio
Πni=1 θ0 exp(−θ0 Xi )
n
Πi=1 (1/X̄) exp(−Xi /X̄)
θ0n exp(−θ0 nX̄)
(1/X̄)n exp(−n)
n
eθ0 X̄ exp(−θ0 X̄) .
Since n, e, and θ0 are all positive, Λ is small precisely when X̄ exp(−θ0 X̄) is small. Since the generalized
likelihood ratio test rejects when Λ < c0 , we see that this is equivalent to rejecting when X̄ exp(−θ0 X̄) ≤ c.
Problem 9.13
Recall that in this problem, θ0 = 1, so the results of 9.12 apply with θ0 replaced by 1 everywhere.
a
Define f (x) = xe−x . Based on 9.12, we see that our test will be to reject H0 when f (X̄) is small. From
Figure 1 it is clear that this is precisely when X̄ < x0 (c) or X̄ > x1 (c).
To show this a bit more analytically, observe that f 0 (x) = exp(−x)(1 − x), which is positive for x ∈ (0, 1)
and negative for x > 1. Thus f (x) is strictly increasing on (0, 1) and strictly decreasing on (1, ∞). Further,
f (0) = 0 and f (x) → 0 as x → ∞. This, plus continuity of f , implies that for any c ∈ (0, f (1)), there are
exactly two solutions to f(x) = c, and that {x : f (x) ≤ c} is of the desired form. (See Figure 1: there are
two solutions b/c the line y=c cuts the graph of the function f exactly twice if c < 1/e. Note that 1/e is
the maximum of f .)
b
In the Neyman-Pearson framework, the rejection region should be such that P0 (reject H0 ) = α. From part
a we see that choosing a rejection region for Λ is equivalent to choosing a rejection region of the form [0, c]
1
0.3
0.2
xex
0.0
0.1
c = .1
x0
0
x1
1
2
3
4
5
6
x
Figure 1: Question 13a
for f (X̄). We should now choose this c to give the correct probability, and here α = .05.
c
On page 147 we see that for independent X ∼ Γ(α1 , λ) and Y ∼ Γ(α2 , λ), the sum X + Y is distributed as
Γ(α1 + α2 , λ). It is also true that, for a positive constant κ, κX ∼ Γ(α1 , λ/κ). (Apply Proposition
B on
P
page 60 with g(x) = κx.) Observe also that X ∼ Exp(1) ⇔ X ∼ Γ(1, 1). Thus under H0 , i Xi ∼ Γ(n, 1)
and X̄ ∼ Γ(n, n).
Let F be the CDF for Γ(n, n). Then for any c ∈ (0, e), we can numerically solve f (x) = c to get x0 (c) and
x1 (c), and
α(c)
= P0 (X̄ exp(−X̄) ≤ c)
= P0 (X̄ ∈ [0, x0 (c)] ∪ [x1 (c), ∞))
= F (x0 (c)) + 1 − F (x1 (c)).
To obtain a specific α, we’d now need to have the computer try a range of values for c until α(c) came out
2
close to the desired value.
d
We could repeatedly generate sets of n = 10 independent Exp(1) variables, and for each such set compute
Wi ≡ X̄i exp(−X̄i ). Given B such Wi , let i∗ = bαBc. Then W(i∗ ) (basically our estimate of the α quantile
of our test statistic) provides a good approximation to c, particularly for very large B.
Problem 9.24
a
In this case, we can write out the likelihood under the null hypothesis explicitly:
L(1/2) =
n
n
(1/2)X (1/2)n−X =
(1/2)n .
X
X
Furthermore we know that the maximum likelihood estimator of p for the binomial distribution is X/n. This
means we can write the generalized likelihood ratio as
n
n
X (1/2)
n
X
n−X
X (X/n) (1 − X/n)
Λ=
=
nn (1/2)n
(n/2)n
=
.
nX (X/n)X nn−X (1 − X/n)n−X
X X (n − X)n−X
b
First, let g(x) = 1/(xx (n − x)n−x ). Notice that this is the non-constant part of the generalized likelihood
ratio. Notice also that g(x) = g(n − x) (try plugging in n − x to see why this is true), which means that g(x)
is symmetric about n/2. This means that letting y = x − n/2 we have g(n/2 + y) = g(n/2 − y). Without
loss of generality, we will consider the function g(n/2 + y) for y ≥ 0. For the argument we will make, an
equivalent argument will hold for g(n/2 − y).
Call h(y) = log(g(n/2 + y)). We want to show that h is non-increasing. This will mean that as y gets bigger,
h(y) gets smaller. Note that this is what we want to show since, as y gets bigger, so does x − n/2, and as h
gets smaller, so does our likelihood ratio statistic. Since a symmetric argument holds for g(n/2 − y) (which
will be non-increasing as y gets big, assuming y ≥ 0), we know that, if h is indeed non-increasing for y ≥ 0,
as |y| gets big, our likelihood ratio gets small which corresponds to rejecting the null hypothesis.
To show why h(y) is non-increasing, we see first that
log(g(n/2 + y)) = −(n/2 + y) log(n/2 + y) − (n/2 − y) log(n/2 − y) = h(y)
and therefore that
h0 (y) = log(n/2 − y) − log(n/2 + y) ≤ 0
since y ≥ 0 and log is an increasing function. This means that h(y) is a non-increasing function of y, as we
wanted to show.
3
Note that this test statistic and rejection region make intuitive sense. Under the null hypothesis, E(X) = n/2,
so the farther X deviates from n/2, the more intuitive “evidence” we have against the null, using the weak
law of large numbers.
c
To find the significance level for a test corresponding to a rejection region |X −n/2| > k, we want to compute
α = P0 (|X − n/2| > k) = P0 ((X − n/2) < −k or (X − n/2) > k) = P0 (X < −k + n/2) + P0 (X > k + n/2).
The last equality holds because the events {X < −k + n/2} and {X > k + n/2} are disjoint. Since our null
distribution is Bin(n, 1/2), we can compute these probabilities explicitly for any values of n and k.
d
Given that n = 10 and k = 2, our null distribution is Bin(10,1/2) and we want to calculate
α = P0 (X < 3 or X > 7) = P0 (X ∈ {0, 1, 2, 8, 9, 10}).
We can compute this using a binomial table or R to get α = 0.0654.
e
In this case, our null distribution is Bin(100,1/2) and we want to find
α = P0 (X < 40 or X > 60) = P0 (X < 40) + P0 (X > 60).
In this case, we have E(X) = 50
Pnand Var(X) = np(1 − p) = 25. We can use the normal approximation (recall
that we can view X as X = i=1 Yi , where Yi are i.i.d Bernoulli(p), so the central limit theorem applies
when n is large) , writing
P0 (X < 40)
X − 50
5
≈ Φ(−2)
= 0.0228
= P0
<
40 − 50
5
Due to symmetry, we also have P0 (X > 60) = 0.0228 and our significance level is therefore approximately
0.0455.
Problem 9.30
Part A
We reject the null when T > c where c is determined by the significance level α.
4
Pr0 (Reject Null) = Pr0 (T > c) = 1 − F0 (c) = α
Suppose we observe T = t0 and we want to find the p-value.
As explained in section 9.2.1, the p-value is the probability under the null of getting something as or more
extreme than what was observed. More formally,
1. Assume T 0 is a new independent copy of T .
2. The p-value is then Pr0 (T 0 ≥ t0 )
Now,
Pr0 (T 0 ≥ t0 ) = 1 − F0 (t0 ) + P (T 0 = t0 ) = 1 − F0 (t0 )
The last step follows from the assumption that T is continuous. This implies that the p-value is given by
V = 1 − F (T )
Part B
First note that if X is uniform on (0, 1) then 1 − X is as well.
So, if we can conclude that F (T ) is uniform on (0, 1) then we will be done.
We would like to argue as in Proposition C of 2.3, but F −1 might not be well defined.
In general, F (x) may have jumps and may be flat in some regions, in which case the definition of F −1
is unclear. For this problem T is a continuous random variable, so we don’t have to worry about jumps.
However, if the density of T is zero on some interval and positive on both sides, then the CDF has a flat
region where an inverse is not be well defined.
Define
F −1 (p) = min{x : F (x) ≥ p}
We note the following properties:
1. If F is strictly increasing at p, then this coincides with the regular definition
2. F −1 is non-decreasing
3. F (F −1 (z)) = z∀z ∈ (0, 1)
Let A = {t|fT (t) > 0} and let B = {ω|T (ω) ∈ A} Then P (B) = 1 so
Pr(F (T ) ≤ z) = Pr((F −1 (F (T )) ≤ F −1 (z)) = Pr((F −1 (F (T )) ≤ F −1 (z)) ∩ B)
The first equality follows by property 2 above and the second follows since P (B) = 1. Now, if ω ∈ B then
by property 1 above (F −1 (F (T (ω))) = T (ω). So,
5
Pr((F −1 (F (T )) ≤ F −1 (z)) ∩ B) = Pr(T ≤ F −1 (z)) ∩ B) = Pr(T ≤ F −1 (z))) = F (F −1 (z)) = z
Where the last equality follows by property 3.
Extra Comment: Not part of the problem
For those of you who have taken analysis, the definition of the quantile function is
F −1 (p) = inf {x : F (x) ≥ p}
Part C
Since the p-value is uniform, this is .9
Part D
This follows since Pr0 (V < α) = Pr(U < α) = α
Problem 9.32
Part A
The likelihood ratio is
exp(
−x
1
(x − 100)2 − (x − 125)2 = exp(4.5) ∗ exp(
)
2 ∗ (252 )
25
If we observe x = 120, Then we get exp(4.5 − 4.8) = .74
Part B
If our priors are equal, then the ratio of posteriors is simply the likelihood ratio (see page 330).
So if β is the posterior probability for alternative B, then we know that
1−β
= .74
β
Which implies that
β=
1
= ..57
1 + .74
6
Part C
This is the probability under A that X > 125. But under A, X = 100 + 25Z in distribution where Z is
standard normal. Therefore,
α = PrA (X > 125) = Pr(100 + 25Z > 125) = Pr(Z > 1) = 1 − Φ(1) = .1587
Part D
This is the probability under B that X > 125. But under B, X = 125 + 25Z in distribution where Z is
standard normal. Therefore,
Power = PrB (X > 125) = Pr(125 + 25Z > 125) = Pr(Z > 0) = .5
Part E
This is the probability under A that X > 120. But under A, X = 100 + 25Z in distribution where Z is
standard normal. Therefore,
α = PrA (X > 120) = Pr(100 + 25Z > 120) = Pr(Z > 1) = 1 − Φ(.8) = .2119
Problem 9.36
Let Xi be the number of suicides during month i. The model is that the Xi are cell counts of a multinomial
with n = 23, 480 total suicides and cell probabilities pi . We are testing
H0 : pi = ni /365 for all i versus HA : pi 6= ni /365 for some i,
where ni is the number of days in month i. Under the null distribution, the expected number of suicides in
month i is ni × 23, 480/365. Construct the following table.
Month
Jan.
Feb.
Mar.
Apr.
May.
Jun.
Jul.
Aug.
Sep.
Oct.
Nov.
Dec.
Total
Days/Months
31
28
31
30
31
30
31
31
30
31
30
31
365
Number
of Suicides
1867
1789
1944
2094
2097
1981
1887
2024
1928
2032
1978
1859
23480
Expected Number
of Suicides
1994
1801
1994
1930
1994
1930
1994
1994
1930
1994
1930
1994
23480
7
±(O − E)2 /E
-8.11
-0.0827
-1.26
+14.0
+5.30
+1.36
-5.76
+0.446
-0.00180
+0.717
+1.20
-9.17
47.4
The bottom-right entry of the table is the observed χ2 statistic. Its null distribution is approximately chisquare with 11 degrees of freedom (11 free parameters under the alternative, and 0 estimated parameters
under the null). The p-value
√ is practically 0. A hanging chi-gram is plotted below. This is the plot of
Pearson Residuals (O − E)/ E. There is an unusually high number of suicides in April, and unusually low
numbers in January and December. If there is a seasonal trend, it unusually flips around in July.
−3
−2
−1
0
1
2
3
Hanging chi−gram
J
F
M
A
M
J
J
A
S
O
N
D
Problem 9.43
Part A
From problem 24, we reject the null hypothesis when |X − n/2|islarge
I use the normal approximation to say
X − n/2
p
≈Z
n/4
Where Z is standard normal. Now,
c
c
−c
c
X − n/2
|> p
) ≈ Pr(|Z| > p
) = Φ( p
) + 1 − Φ( p
)
Pr0 (|X − n/2| > c) = Pr0 (| p
n/4
n/4
n/4
n/4
n/4
To get the P value for the observation X = 9207, we plug in c = 9207 − 17950/2 = 232 above.
Then √ c
n/4
= 3.46, so Φ( √ c
n/4
) = .9997 which implies that our p value is .0006
The model is extremely doubtful.
Parts B and C
Here there are 19750/5 = 3890 trials and each trial produces a number in 0 to 5.
The numerator of the generalized likelihood ratio statistic will be the likelihood evaluated at either p = 1/2
or p = p̂M LE = 9207/17950 = .5129. From these probabilities we will get the expected counts for the cells.
8
If we assume that all 5 coins have probability p of heads, Then the cell probabilities are
(1 − p)5 , 5p(1 − p)4 , 10p2 (1 − p)3 , 10p3 (1 − p)2 , 5p4 (1 − p), p5
We find our expected counts as follows
> N = 3590
> p=.5
> EXP1 = N * c( (1-p)^5, 5*p*(1-p)^4, 10*p^2*(1-p)^3, 10*p^3*(1-p)^2, 5*p^4*(1-p), p^5 )
> p=9207/17950
> EXP2 = N * c( (1-p)^5, 5*p*(1-p)^4, 10*p^2*(1-p)^3, 10*p^3*(1-p)^2, 5*p^4*(1-p), p^5 )
> EXP1
[1] 112.1875 560.9375 1121.8750 1121.8750 560.9375 112.1875
> EXP2
[1]
98.4180 518.2058 1091.4150 1149.3375 605.1670 127.4568
And the resulting chi square statistics are
> OBS = c(100, 524, 1080, 1126, 655, 105)
> sum((OBS-EXP1)^2/EXP1)
[1] 21.56813
> sum((OBS-EXP2)^2/EXP2)
[1] 8.743703
In the first case, the dimension of the null is 0, while in the second case it is 1. So under the null, these are
chi square with 5 and 4 degrees of freedom.
Consulting the Chi-Square tables, these results are very unlikely.
# Heads
0
1
2
3
4
5
Total
Frequency
100
524
1080
1126
655
105
365
Expected Number (1/2)
1867
1867
1867
1867
1867
1867
23480
(O − E)2 /E
1994
1994
1994
1994
1994
1994
23480
Expected Number (p=.6)
-8.11
-8.11
-8.11
-8.11
-8.11
-8.11
47.4
(O − E)2 /E
4
4
4
4
4
4
4
Problem 9.57
The tails of the Cauchy distribution decay less rapidly than those of the normal distribution, i.e. it has heavy
tails. We can see this by comparing their density functions. The normal distribution has density function
φ(x) = √
(x−µ)2
1
e− 2σ2
2πσ
2
Its tails decay exponentially–like e−x . However, the tails of the Cauchy distribution decay algebraically–
like (1 + x)−1 . (Exponential decay is much more rapid than algebraic decay). This means that the upper
9
quantiles of the Cauchy are greater than those of the normal distribution, and similarly the lower quantiles
of the Cauchy are lesser than those of the normal distribution. This is manifested in the normal probability
plot by a systematic departure from a straight line at the ends, curving below on the left and above on the
right.
10