* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exam 3
Inductive probability wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Secretary problem wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Student's t-test wikipedia , lookup
1 Exam 3 STAT305A Spring 2017 Due 4/27(R) Name_______________________________________________ PROBLEM 1(30pts) You are charged with conducting an investigation in relation to herbicide pollution of IA lakes. Let X denote the act of measuring the level of a certain chemical in any randomly chosen lake. Assume X ~ N ( X , X ) , and that the lakes to be chosen for testing are such that the data collection variables { X k }nk 1 can be assumed mutually independent. Let ( X , X ) denote the usual estimators of ( X , X ) . Data collected on n 50 lakes resulted in ( X 323, X 69) . (a)(10pts) Compute the estimate of the 95% 2-sided confidence interval (CI) for each of ( X , X ) . Show ALL steps. Solution: (b(5pts) Federal law requires that any state having X 300 must develop a plan to rectify the situation. Conduct the test H 0 : X 300 vs. H1 : X 300 at a significance level 0.05 to determine whether or not such a plan will be ordered. Solution: (c)(5pt) Find the p-value of the test in (b). Solution: (d)(5pts) In view of (b-c), you should have found that a clean-up plan will be called for. Your company has been contacted to submit a bid for the work. Before deciding whether you will bid on the project, you asked for, and received the data associated with the investigation. A careful look at it revealed that for the northern half of IA the results were (1 316,1 52.01, n1 33) , and for the southern half they were (2 330,1 45.35, n1 18) . Test the hypotheses: H 0 : 1 2 0 vs. H1 : 1 2 0 for 0.05 . Solution: (e)(5pts) Test the hypotheses: H 0 : 1 / 2 1 vs. H 0 : 1 / 2 1 for 0.05 . Solution: 2 PROBLEM 2(25pts) This problem addresses the relation between the weight of a package being air-shipped (X) to a given location, and the amount of fuel used (Y). The data associated with n 100 packages is included in the file named wgtfueldata.txt located in the exam folder. (a)(10pts) Consider the model: Y ( x) b1 x b0 . Denote the associated model error as: W ( x) Y ( x) Y ( x) . Compute the estimates (b1 , b0 ) using the method addressed in relation to linear modeling. Then overlay your model on a scatter plot of the data. Finally, obtain an estimate, W , of the error std. deviation. Solution: [See code @ 2(a).] Figure 2(a) Scatter plot and linear model. (b)(8pts) In Lecture 19 the following fact was given: FACT: For a given x (resulting from a given x-data set): T (b1 b1 ) x n 2 W Use this fact to arrive at a 95% 2-sided CI for the slope, b1 . Solution: ~ t n 2 .[Miller & Miller p.395]. 2 2 (c)(7pts) Formula (11-30) on p.447 gives the CI for b0 : b0 t / 2,n2W 1 x b1 b1 t / 2,n2W 1 x . Use this to n n arrive at the CI for b0 . [Note: from (11-10) we have S xx ( xk x ) 2 .] k 1 Solution: [See code @ 2(c).] S xx n S xx 3 PROBLEM 3(20pts) The sample mean X is the most popular of all statistics. A close second is the sample correlation coefficient . It is not a ‘pretty’ statistic, as is evident in (11-43) on p.459. Let X=the act of measuring the temperature at which a reaction is carried out, and let Y= the act of measuring the reaction rate. [c.f. https://en.wikipedia.org/wiki/Reaction_rate ] n 2 . For (a)(6pts) To test H : 0 vs. H : 0 the appropriate test statistic is [see (11-46)]: 0 Tn 2 1 1 2 n 30 samples of ( X , Y ) , the estimate was 0.248 . Conduct this test with a false alarm probability 0.05 . Solution: (b)(8pts) The code that resulted in the estimate in (a) is given in the Appendix. Modify it to generate nsim 105 simulations of . Then use these to compare a simulation-based pdf for T n2 n 2 , and 1 2 overlay the pdf for Tn 2 on it. Comment on how they compare. Solution: [See code @ 3(b).] Figure 4(b) Plots of pdfs for Tn 2 and Tn 2 . (c)(6pts) In the case where 0 , the test statistic T n 2 no longer has a tn2 pdf. However, as noted on .459, for n2 1 2 n 25 the statistic W atanh ( ) ~ N ( W , W ) where W atanh ( ) and W 1/ n 3 . Use W and (A) in Table 1 to arrive at the 95% 2-sided CI for . [Note: You still have n 30 and 0.248 .] Solution: 4 PROBLEM 4(25pts) This problem addresses a situation where announcing H1 does not cost anything. In fact, you can profit by it. Example 9-10 on p.345: A semiconductor manufacturer claims that its defect rate does not exceed p 0.05 , and that it demonstrates process capability at this level using 0.05 . A recent inspection of n 200 devices found only 4 defective ones. This corresponds to p 0.02 . Management would like to use this result to convince potential customers that its defect rate is actually lower than p 0.05 . To this end, consider the test H 0 : p 0.05 vs. H1 : p 0.05 . The decision rule is: If p is sufficiently smaller than 0.05, we will announce H1 ; supporting the claim that the printed maximum defect rate is actually lower than advertised. The authors carry out the test, first, assuming that the CLT holds (i.e. they can use a normal test statistic. They then carry out the test, using the fact that the number of defect Y np ~ bino (n, p) . We will focus on this latter approach. (a)(5pts) The false alarm probability is Pr[ p pth ] Pr[Y yth ] . Show that the p-value of the test is 0.0264 (as is given at the top of p.347). Solution: (b)(5pts) Compute the Type-2 error for a true value p 0.025 . Solution: (c)(10pts) Suppose that we now consider the hypotheses: H 0 : p 0.025 vs. H1 : p 0.025 . We chose the value 0.025 since our data defect proportion 0.02 will result in announcing H1 . Write Y ~ bino (200, p) Y ( p) . Then our false alarm probability for this new test ( p) Pr[Y ( p) 4] is only valid for p 0.025 . Similarly, our Type-2 error probability ( p) Pr[Y ( p) 4] is only valid for p 0.025 . Since the Type-2 error is the event that we announce H 0 when H1 is true, then the probability of announcing H1 when it is, indeed, true is 1 ( p) . Here again, we note that this probability is only valid for p 0.02 . Hence, the probability that we will announce H1 , whether or not it is true, is: ( p) for p 0.025 . 1 ( p) for p 0.025 This quantity is called the power function for our new test. Show that ( p) binocdf (4,200, p) . Then plot of it over the range p = 0 : .001 : 0.1. Solution: ( p) (1) Figure 4(c) Plot of ( p) for pth 0.02 . (d)(5pts) (i) A random sample of 1000 student's statistics exam scores was drawn from the population of all possible scores. The computed sample mean is the true population mean. TRUE / FALSE (circle your answer) (ii)While trying to figure out the probability that the sample mean for a sample size n=10 from a population would exceed a specified value, use of the Central Limit Theorem is usually justified. TRUE / FALSE (circle your answer) 5 APPENDIX Table 1 and Your Matlab Code Table 1. Some Handy-Dandy Test Statistics For X ~ N ( x ; X , X2 ) and associated iid data collection variables {X k }nk 1 : (A): Z ( X X ) /( X / n ) ~ N (0,1) ; (B): T X X ~ tn1 X2 / n 2 (C): n X2 / X2 ~ n2 when X is used ; (D): (n 12) X ~ n21 when X is used. 2 / X2 ~ f n ,n when X & Y are used ; X / X2 (E): F 2X1 1 2 2 1 2 X X2 1 / X2 1 (F): F 2 ~ f n11,n2 1 when X & Y used. 2 X2 / X2 %PROGRAM NAME: exam3.m (Spring 2017) %PROBLEM 2: X=pkg wgt (lb) & Y=fuel used (gal.) %(a): load wgtfueldata.txt xy=wgtfueldata; figure(20) plot(x,y,'*') hold on plot(x,yhat,'r','LineWidth',2) title('Scatter Plot & Linear Model of Fuel vs. Weight') xlabel('Weight (lbs)') ylabel('gal.') grid %(c): %======================================================= %PROBLEM 3: X=temperature(C) & %Y=rate(moles/ltr)/sec %Truth Model Parameters: muX=100; muY=20; Mu=[muX muY]; stdX=5; stdY=2; C=[stdX^2 0 ; 0 stdY^2]; %Assumes rho=0 n=30; %Sample Size xy=mvnrnd(Mu,C,n); Rhat=corrcoef(xy); rhat=Rhat(1,2); figure(30) %============================================ %PROBLEM 4 %(c): p=0:.001:.1; np=length(p); pwr=zeros(1,np); figure(40) plot(p,pwr) title('Power Function for p_t_h=0.025') xlabel('p') ylabel('power') grid