Download Question 1 25 Points, 45 minutes Question 2 30 Points, 54 minutes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
16th of December 1999
GSBA 603 Fall 99
Question 1 (25 Points, 45 minutes)
We are interested in the average sales (y ) of rms in a certain industry. (All numbers are in
millions). We take a random sample of 100 rms and ask them for their last years sales. This gives
Y = 27. However we also ask them for last years prot (X ). This gives X = 15. It is known that
the average prot in the industry last year was x = 10. Use the following information to answer
the questions.
x = 10 x = 50
y =? y = 100
r = 2 = :95
Part a
Give the simple random sample (YSRS ) and ratio (YR ) estimates for y .
27
YSRS = Y = 27. YR = XY x = 15
10 = 18.
Part b
Give the exact bias of YSRS and the approximate bias of YR (ignore the nite population correction).
E (YSRS ) = y so bias(YSRS ) = 0
1 1 (2 502 , :95 50 100) = 0:25
bias(YR) n1 1 (rx2 , xy ) = 100
10
x
Part c
Give the exact variance of YSRS and the approximate variance of YR (ignore the nite population
correction).
V ar(YSRS ) = y2 =n = 1002=100 = 100
1 (22 502 + 1002 , 2 :95 50 100) = 10
V ar(YR ) n1 (r2x2 + y2 , 2rxy ) = 100
Part d
Give the mean squared errors of YSRS and YR (ignore the nite population correction).
MSE (YSRS ) = bias(YSRS )2 + V ar(YSRS ) = 100
MSE (YR) = bias(YR)2 + V ar(YR) = 0:252 + 10 = 10:0625
Question 2 (30 Points, 54 minutes)
Suppose that X1; X2; : : : ; Xn are independent random variables and that
Xi Poisson(ai )
where a1 ; a2; : : : ; an are known constants.
1
16th of December 1999
GSBA 603 Fall 99
Part a
Show that the MLE for is
L =
^ =
P
X
P i
ai
n ,ai Y
e (ai )xi
xi !
X
X
X
) l = , ai + xi log(ai) + log() xi , log(xi!)
P
X
0
) l = , ai + ^xi = 0 for min
P
x
i
) ^ = P a
i
i=1 X
Part b
Calculate E ^ and V ar(^).
P
X
i
P
P
EXi
P
P
=
= P ai = ai
ai
ai
P
P
V ar(Xi ) = P ai = P
V ar(^) = (P
ai)2
( ai )2
ai
E ^ = E
Part c
Calculate IXi () the information for one observation Xi .
From part a @log(@f (xi)) = ,ai + xi so
@ 2 log(f (xi)) = , xi
(@)2
2 ) IXi () = ,E , X2i
i
= EX
2
= ai
Part d
Is ^ the UMVUE for ? Note, since the Xi are not iid, the Cramer-Rao inequality implies that for
any unbiased estimator ^
V ar(^) P I1 ()
i Xi
1
= P1 ai = Pa
I
(
)
i
i Xi
i Therefore since ^ is unbiased and V ar(^) reaches the Cramer-Rao lower bound it is UMVUE for
.
P
2
16th of December 1999
GSBA 603 Fall 99
Part e
Suppose you wish to test H0 : = 0 vs the alternative HA : > 0. Use Neyman-Pearson to nd
the form of the optimal hypothesis test. (Note you do not need to specify how to calculate the
constant C .)
Start by nding the optimal test for HA : = A (A > 0 ).
(x) = ff0((xx))
A
=
=
e,ai 0 (ai 0 )xi
i=1
xi !
Qn e,ai A (ai A)xi
i=1 P xi !
P
e 0 ai 0P xi
P
eA ai A xi
Qn
P
Since P
0; A and ai are known constants this value is small when xi is large. We will reject
when xi > C . However, note that this test is the same for any A > 0 so it is also optimal for
HA : > 0.
Part f
We are performing a wildlife study and are interested in whether animals in a certain species are
randomly distributed about a region or whether they tend to clump together. We divide the region
into 5 dierent sub-regions and count the number of animals in each sub-region. Unfortunately
because of the geography of our region we can not divide it into 5 equal sized sub-regions. Let be the average number of animals per square mile and let Xi be the number of animals observed
in the ith sub-region. If the animals are randomly distributed then
Xi Poisson(ai )
where ai is the area of the region. Use the following data to perform a goodness of t test to see
whether the animals are randomly distributed or not.
Sub-region Xi Area (sq miles)
1
9
5
34
10
2
3
41
6
4
70
30
59
20
5
P
P
First note that Xi = 213 and ai = 71 so ^lambda = 3. From this we can calculate expected
values assuming the animals are evenly distributed using the fact that EXi = ai under the null
hypothesis so we get
Sub-region Xi Area (sq miles) Ei
1
9
5
15
2
34
10
30
3
41
6
18
4
70
30
90
5
59
20
60
3
16th of December 1999
GSBA 603 Fall 99
So the chi-square statistic is
X 2 = (9 ,1515) + + (59 ,6060) = 36:78
2
2
We should compare this to a 23 because there are 4 free parameters under the alternative and 1
under the null. 23 (0:005) = 12:84 so there is extremely strong evidence to reject the null hypothesis.
It is clear that the animals are not evenly distributed.
Question 3 (15 Points, 27 minutes)
Suppose you are performing an analysis of CEO compensation across 4 dierent industries. For
each of the 4 industries you randomly choose 10 companies and record the compensation for their
CEOs. The mean compensation for each industry was as follows. (All numbers are in thousands
of dollars.)
Y1: = 410; Y2: = 511; Y3: = 399; Y4: = 488
This gave a grand mean of Y:: = 452
(a) Use this information to complete the partial One Way Anova table below. Is there signicant
evidence that there is a dierence in compensation across the four industries?
Source
SS
Between Groups (Industries)
93500
Within Groups
Total
df
MS
F
3
31166:7
37:5
29906
36
830:7
123406
39
F3;30(0:005) = 5:2388 so there is extremely strong evidence to conclude that there is a dier-
ence between industries.
(b) Suppose that Industries 1 and 3 are heavily technology oriented while Industries 2 and 4 are
not. As part of your study you are interested in comparing technology with non technology
industries. Produce a 95% condence interval for the dierence in the average level of compensation for technology industries with that of non technology industries. (Assume that the
data was not used to make this decision).
Answering this question requires using a contrast i.e.
L = c11 + c2 2 + c33 + c4 4
where c1 = 0:5; c2 = ,0:5; c3 = 0:5; c4 = ,0:5. The condence interval is then
q
L^ t(0:975; 36) MSE
X
p
c2i =J = ,95 2:042 830:7 0:1 = [,113:6; ,76:4]
4
16th of December 1999
GSBA 603 Fall 99
Question 4 (30 Points, 54 minutes)
You are performing a study of the \hits" on a certain website. Let N denote the number
of people that attempt to connect to the website in a one day period. Suppose that N Poisson() with unknown. As with all websites a certain percentage of the time people
are unable to connect. These people are unobserved. Let p denote the probability a random
person manages to connect. Assume the probability of connecting is independent for all
people and that p is known. Let X denote the actual number of hits i.e. the number of
people that connect. X is observed but N is the quantity we are interested in.
Part a
What is the conditional distribution of X given N = n?
X jN = n Bin(n; p)
Part b
What is the joint probability function of X and N i.e. P (X = x; N = n)?
, n
P (X = x; N = n) = P (X = xjN = n)P (N = n) = nx px(1 , p)n,x e n! ; n x
Part c
Show that the marginal distribution of X is Poisson(p). Hint recall
1
X
x=0
x=x! = e
5
16th of December 1999
GSBA 603 Fall 99
P (X = x) =
=
=
=
=
=
=
Hence X P (p).
1
X
P (X = x; N = n)
n=x
1 n
X
, n
x(1 , p)n,x e p
n!
n=x x
1
e, px X (1 , p)n,x n
x! n=x (n , x)!
1 ((1 , p))n,x
(p)xe, X
x! n=x (n , x)!
1 ((1 , p))k
(p)xe, X
k =n,x
x! k=0
k!
(p)xe, e(1,p)
x!
,
p
e (p)x
x!
Part d
Assuming that p is known give both the maximum likelihood and method of moments estimate for .
E (X ) = p ! = 1=p
Therefore the MOM for is MOM = X=p. For MLE note
l = ,p + x log(p) , log(x!)
If we dierentiate this with respect to and set it equal to zero we get ^ = X=p.
Part e
What is the conditional distribution of N given X = x?
n)
P (N = njX = x) = P (XP =(Xx;=Nx=
)
,n x
p (1 , p)n,x e,n!n
x
=
e,p (p)n
1
n!
= (n , x)! ((1 , p))n,x exp (,(1 , p))
= x + Poisson((1 , p))
6
16th of December 1999
GSBA 603 Fall 99
Part f
Show that the expected value of N given X = x is x + (1 , p).
Since we showed in part e that N jX = x x + P ((1 , p)) and the expected value of a
P ((1 , p)) is (1 , p) we know
E (N jX = x) = E (x + P ((1 , p))) = x + (1 , p)
Part g
Suppose p = :95 and on a given day X = 10; 000. Give reasonable estimates for , N , and
the number of lost customers i.e. the number of people that failed to connect.
A reasonable guess for N is
^ = X=p = 10000=:95 = 10526
E (N jX = 1000) = 10000 + ^(1 , :95) = 10526
and the number of lost customers is
10526 , 10000 = 526
7