Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
16th of December 1999 GSBA 603 Fall 99 Question 1 (25 Points, 45 minutes) We are interested in the average sales (y ) of rms in a certain industry. (All numbers are in millions). We take a random sample of 100 rms and ask them for their last years sales. This gives Y = 27. However we also ask them for last years prot (X ). This gives X = 15. It is known that the average prot in the industry last year was x = 10. Use the following information to answer the questions. x = 10 x = 50 y =? y = 100 r = 2 = :95 Part a Give the simple random sample (YSRS ) and ratio (YR ) estimates for y . 27 YSRS = Y = 27. YR = XY x = 15 10 = 18. Part b Give the exact bias of YSRS and the approximate bias of YR (ignore the nite population correction). E (YSRS ) = y so bias(YSRS ) = 0 1 1 (2 502 , :95 50 100) = 0:25 bias(YR) n1 1 (rx2 , xy ) = 100 10 x Part c Give the exact variance of YSRS and the approximate variance of YR (ignore the nite population correction). V ar(YSRS ) = y2 =n = 1002=100 = 100 1 (22 502 + 1002 , 2 :95 50 100) = 10 V ar(YR ) n1 (r2x2 + y2 , 2rxy ) = 100 Part d Give the mean squared errors of YSRS and YR (ignore the nite population correction). MSE (YSRS ) = bias(YSRS )2 + V ar(YSRS ) = 100 MSE (YR) = bias(YR)2 + V ar(YR) = 0:252 + 10 = 10:0625 Question 2 (30 Points, 54 minutes) Suppose that X1; X2; : : : ; Xn are independent random variables and that Xi Poisson(ai ) where a1 ; a2; : : : ; an are known constants. 1 16th of December 1999 GSBA 603 Fall 99 Part a Show that the MLE for is L = ^ = P X P i ai n ,ai Y e (ai )xi xi ! X X X ) l = , ai + xi log(ai) + log() xi , log(xi!) P X 0 ) l = , ai + ^xi = 0 for min P x i ) ^ = P a i i=1 X Part b Calculate E ^ and V ar(^). P X i P P EXi P P = = P ai = ai ai ai P P V ar(Xi ) = P ai = P V ar(^) = (P ai)2 ( ai )2 ai E ^ = E Part c Calculate IXi () the information for one observation Xi . From part a @log(@f (xi)) = ,ai + xi so @ 2 log(f (xi)) = , xi (@)2 2 ) IXi () = ,E , X2i i = EX 2 = ai Part d Is ^ the UMVUE for ? Note, since the Xi are not iid, the Cramer-Rao inequality implies that for any unbiased estimator ^ V ar(^) P I1 () i Xi 1 = P1 ai = Pa I ( ) i i Xi i Therefore since ^ is unbiased and V ar(^) reaches the Cramer-Rao lower bound it is UMVUE for . P 2 16th of December 1999 GSBA 603 Fall 99 Part e Suppose you wish to test H0 : = 0 vs the alternative HA : > 0. Use Neyman-Pearson to nd the form of the optimal hypothesis test. (Note you do not need to specify how to calculate the constant C .) Start by nding the optimal test for HA : = A (A > 0 ). (x) = ff0((xx)) A = = e,ai 0 (ai 0 )xi i=1 xi ! Qn e,ai A (ai A)xi i=1 P xi ! P e 0 ai 0P xi P eA ai A xi Qn P Since P 0; A and ai are known constants this value is small when xi is large. We will reject when xi > C . However, note that this test is the same for any A > 0 so it is also optimal for HA : > 0. Part f We are performing a wildlife study and are interested in whether animals in a certain species are randomly distributed about a region or whether they tend to clump together. We divide the region into 5 dierent sub-regions and count the number of animals in each sub-region. Unfortunately because of the geography of our region we can not divide it into 5 equal sized sub-regions. Let be the average number of animals per square mile and let Xi be the number of animals observed in the ith sub-region. If the animals are randomly distributed then Xi Poisson(ai ) where ai is the area of the region. Use the following data to perform a goodness of t test to see whether the animals are randomly distributed or not. Sub-region Xi Area (sq miles) 1 9 5 34 10 2 3 41 6 4 70 30 59 20 5 P P First note that Xi = 213 and ai = 71 so ^lambda = 3. From this we can calculate expected values assuming the animals are evenly distributed using the fact that EXi = ai under the null hypothesis so we get Sub-region Xi Area (sq miles) Ei 1 9 5 15 2 34 10 30 3 41 6 18 4 70 30 90 5 59 20 60 3 16th of December 1999 GSBA 603 Fall 99 So the chi-square statistic is X 2 = (9 ,1515) + + (59 ,6060) = 36:78 2 2 We should compare this to a 23 because there are 4 free parameters under the alternative and 1 under the null. 23 (0:005) = 12:84 so there is extremely strong evidence to reject the null hypothesis. It is clear that the animals are not evenly distributed. Question 3 (15 Points, 27 minutes) Suppose you are performing an analysis of CEO compensation across 4 dierent industries. For each of the 4 industries you randomly choose 10 companies and record the compensation for their CEOs. The mean compensation for each industry was as follows. (All numbers are in thousands of dollars.) Y1: = 410; Y2: = 511; Y3: = 399; Y4: = 488 This gave a grand mean of Y:: = 452 (a) Use this information to complete the partial One Way Anova table below. Is there signicant evidence that there is a dierence in compensation across the four industries? Source SS Between Groups (Industries) 93500 Within Groups Total df MS F 3 31166:7 37:5 29906 36 830:7 123406 39 F3;30(0:005) = 5:2388 so there is extremely strong evidence to conclude that there is a dier- ence between industries. (b) Suppose that Industries 1 and 3 are heavily technology oriented while Industries 2 and 4 are not. As part of your study you are interested in comparing technology with non technology industries. Produce a 95% condence interval for the dierence in the average level of compensation for technology industries with that of non technology industries. (Assume that the data was not used to make this decision). Answering this question requires using a contrast i.e. L = c11 + c2 2 + c33 + c4 4 where c1 = 0:5; c2 = ,0:5; c3 = 0:5; c4 = ,0:5. The condence interval is then q L^ t(0:975; 36) MSE X p c2i =J = ,95 2:042 830:7 0:1 = [,113:6; ,76:4] 4 16th of December 1999 GSBA 603 Fall 99 Question 4 (30 Points, 54 minutes) You are performing a study of the \hits" on a certain website. Let N denote the number of people that attempt to connect to the website in a one day period. Suppose that N Poisson() with unknown. As with all websites a certain percentage of the time people are unable to connect. These people are unobserved. Let p denote the probability a random person manages to connect. Assume the probability of connecting is independent for all people and that p is known. Let X denote the actual number of hits i.e. the number of people that connect. X is observed but N is the quantity we are interested in. Part a What is the conditional distribution of X given N = n? X jN = n Bin(n; p) Part b What is the joint probability function of X and N i.e. P (X = x; N = n)? , n P (X = x; N = n) = P (X = xjN = n)P (N = n) = nx px(1 , p)n,x e n! ; n x Part c Show that the marginal distribution of X is Poisson(p). Hint recall 1 X x=0 x=x! = e 5 16th of December 1999 GSBA 603 Fall 99 P (X = x) = = = = = = = Hence X P (p). 1 X P (X = x; N = n) n=x 1 n X , n x(1 , p)n,x e p n! n=x x 1 e, px X (1 , p)n,x n x! n=x (n , x)! 1 ((1 , p))n,x (p)xe, X x! n=x (n , x)! 1 ((1 , p))k (p)xe, X k =n,x x! k=0 k! (p)xe, e(1,p) x! , p e (p)x x! Part d Assuming that p is known give both the maximum likelihood and method of moments estimate for . E (X ) = p ! = 1=p Therefore the MOM for is MOM = X=p. For MLE note l = ,p + x log(p) , log(x!) If we dierentiate this with respect to and set it equal to zero we get ^ = X=p. Part e What is the conditional distribution of N given X = x? n) P (N = njX = x) = P (XP =(Xx;=Nx= ) ,n x p (1 , p)n,x e,n!n x = e,p (p)n 1 n! = (n , x)! ((1 , p))n,x exp (,(1 , p)) = x + Poisson((1 , p)) 6 16th of December 1999 GSBA 603 Fall 99 Part f Show that the expected value of N given X = x is x + (1 , p). Since we showed in part e that N jX = x x + P ((1 , p)) and the expected value of a P ((1 , p)) is (1 , p) we know E (N jX = x) = E (x + P ((1 , p))) = x + (1 , p) Part g Suppose p = :95 and on a given day X = 10; 000. Give reasonable estimates for , N , and the number of lost customers i.e. the number of people that failed to connect. A reasonable guess for N is ^ = X=p = 10000=:95 = 10526 E (N jX = 1000) = 10000 + ^(1 , :95) = 10526 and the number of lost customers is 10526 , 10000 = 526 7