Download Exam 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Secretary problem wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1
Exam 3 STAT305A Spring 2017 Due 4/27(R) Name_______________________________________________
PROBLEM 1(30pts) You are charged with conducting an investigation in relation to herbicide pollution of IA lakes. Let
X denote the act of measuring the level of a certain chemical in any randomly chosen lake. Assume X ~ N ( X , X ) , and that
the lakes to be chosen for testing are such that the data collection variables { X k }nk 1 can be assumed mutually independent.
 


Let ( X , X ) denote the usual estimators of ( X , X ) . Data collected on n  50 lakes resulted in ( X  323, X  69) .
(a)(10pts) Compute the estimate of the 95% 2-sided confidence interval (CI) for each of ( X , X ) . Show ALL steps.
Solution:
(b(5pts) Federal law requires that any state having  X  300 must develop a plan to rectify the situation. Conduct the test
H 0 :  X  300 vs. H1 :  X  300 at a significance level   0.05 to determine whether or not such a plan will be ordered.
Solution:
(c)(5pt) Find the p-value of the test in (b).
Solution:
(d)(5pts) In view of (b-c), you should have found that a clean-up plan will be called for. Your company has been
contacted to submit a bid for the work. Before deciding whether you will bid on the project, you asked for, and received
the data associated with the investigation. A careful look at it revealed that for the northern half of IA the results were




(1  316,1  52.01, n1  33) , and for the southern half they were (2  330,1  45.35, n1  18) . Test the hypotheses:
H 0 : 1   2  0 vs. H1 : 1  2  0 for   0.05 .
Solution:
(e)(5pts) Test the hypotheses: H 0 :  1 /  2  1 vs. H 0 :  1 /  2  1 for   0.05 .
Solution:
2
PROBLEM 2(25pts) This problem addresses the relation between the weight of a package being air-shipped (X) to a
given location, and the amount of fuel used (Y). The data associated with n  100 packages is included in the file named
wgtfueldata.txt located in the exam folder.

(a)(10pts) Consider the model: Y ( x)  b1 x  b0 . Denote the

associated model error as: W ( x)  Y ( x)  Y ( x) . Compute the
 
estimates (b1 , b0 ) using the method addressed in relation to linear
modeling. Then overlay your model on a scatter plot of the data.

Finally, obtain an estimate,  W , of the error std. deviation.
Solution: [See code @ 2(a).]
Figure 2(a) Scatter plot and linear model.
(b)(8pts) In Lecture 19 the following fact was given:

FACT: For a given  x (resulting from a given x-data set): T  (b1  b1 ) x n  2
W

Use this fact to arrive at a 95% 2-sided CI for the slope, b1 .
Solution:
~ t n  2 .[Miller & Miller p.395].
 
2
2 

(c)(7pts) Formula (11-30) on p.447 gives the CI for b0 : b0  t / 2,n2W 1  x  b1  b1  t / 2,n2W 1  x  . Use this to

n
n

arrive at the CI for b0 . [Note: from (11-10) we have S xx   ( xk  x ) 2 .]
k 1
Solution: [See code @ 2(c).]
S xx
n
S xx 
3

PROBLEM 3(20pts) The sample mean  X is the most popular of all statistics. A close second is the sample correlation

coefficient  . It is not a ‘pretty’ statistic, as is evident in (11-43) on p.459. Let X=the act of measuring the temperature at
which a reaction is carried out, and let Y= the act of measuring the reaction rate.
[c.f. https://en.wikipedia.org/wiki/Reaction_rate ]

 n  2 . For
(a)(6pts) To test H :   0 vs. H :   0 the appropriate test statistic is [see (11-46)]:
0
Tn  2 
1

1  2

n  30 samples of ( X , Y ) , the estimate was   0.248 . Conduct this test with a false alarm probability   0.05 .
Solution:
(b)(8pts) The code that resulted in the estimate in (a) is given in the

Appendix. Modify it to generate nsim  105 simulations of  . Then
use these to compare a simulation-based pdf for T
n2


 n  2 , and

1  2
overlay the pdf for Tn  2 on it. Comment on how they compare.
Solution: [See code @ 3(b).]

Figure 4(b) Plots of pdfs for Tn  2 and Tn  2 .

(c)(6pts) In the case where   0 , the test statistic T   n  2 no longer has a tn2 pdf. However, as noted on .459, for
n2

1  2
 
n  25 the statistic W  atanh (  ) ~ N ( W ,  W ) where W  atanh (  ) and  W  1/ n  3 . Use W and (A) in Table 1 to arrive


at the 95% 2-sided CI for  . [Note: You still have n  30 and   0.248 .]
Solution:
4
PROBLEM 4(25pts) This problem addresses a situation where announcing H1 does not cost anything. In fact, you can
profit by it. Example 9-10 on p.345: A semiconductor manufacturer claims that its defect rate does not exceed p  0.05 ,
and that it demonstrates process capability at this level using   0.05 . A recent inspection of n  200 devices found

only 4 defective ones. This corresponds to p  0.02 . Management would like to use this result to convince potential
customers that its defect rate is actually lower than p  0.05 . To this end, consider the test
H 0 : p  0.05 vs. H1 : p  0.05 .

The decision rule is: If p is sufficiently smaller than 0.05, we will announce H1 ; supporting the claim that the printed
maximum defect rate is actually lower than advertised. The authors carry out the test, first, assuming that the CLT holds
(i.e. they can use a normal test statistic. They then carry out the test, using the fact that the number of defect


Y  np ~ bino (n, p) . We will focus on this latter approach.

(a)(5pts) The false alarm probability is Pr[ p  pth ]  Pr[Y  yth ] . Show that the p-value of the test is 0.0264 (as is given
at the top of p.347).
Solution:
(b)(5pts) Compute the Type-2 error for a true value p  0.025 .
Solution:
(c)(10pts) Suppose that we now consider the hypotheses: H 0 : p  0.025 vs. H1 : p  0.025 . We chose the value 0.025

since our data defect proportion 0.02 will result in announcing H1 . Write Y ~ bino (200, p)  Y ( p) . Then our false alarm
probability for this new test  ( p)  Pr[Y ( p)  4] is only valid for p  0.025 . Similarly, our Type-2 error probability
 ( p)  Pr[Y ( p)  4] is only valid for p  0.025 . Since the Type-2 error is the event that we announce H 0 when H1 is
true, then the probability of announcing H1 when it is, indeed, true is 1   ( p) . Here again, we note that this probability
is only valid for p  0.02 . Hence, the probability that we will announce H1 , whether or not it is true, is:
  ( p) for p  0.025
.
1   ( p) for p  0.025
This quantity is called the power function for our new test. Show that
 ( p)  binocdf (4,200, p) . Then plot of it over the range p = 0 : .001 : 0.1.
Solution:
 ( p)  
(1)
Figure 4(c) Plot of  ( p) for pth  0.02 .
(d)(5pts) (i) A random sample of 1000 student's statistics exam scores was drawn from the population of all possible
scores. The computed sample mean is the true population mean. TRUE / FALSE (circle your answer)
(ii)While trying to figure out the probability that the sample mean for a sample size n=10 from a population would exceed
a specified value, use of the Central Limit Theorem is usually justified. TRUE / FALSE (circle your answer)
5
APPENDIX Table 1 and Your Matlab Code
Table 1. Some Handy-Dandy Test Statistics
For X ~ N ( x ;  X , X2 ) and associated iid data collection variables {X k }nk 1 :


(A): Z  (  X   X ) /( X / n ) ~ N (0,1)
;
(B): T   X   X ~ tn1

 X2 / n
2


(C): n  X2 /  X2 ~  n2 when  X is used ;
(D): (n  12) X ~  n21 when  X is used.
2
 /  X2
~ f n ,n when  X & Y are used ;
 X /  X2
(E): F   2X1
1
2
2
1
2
X

 X2 1 /  X2 1


(F): F   2
~ f n11,n2 1 when  X & Y used.
2
 X2 /  X2
%PROGRAM NAME: exam3.m (Spring 2017)
%PROBLEM 2: X=pkg wgt (lb) & Y=fuel used (gal.)
%(a):
load wgtfueldata.txt
xy=wgtfueldata;
figure(20)
plot(x,y,'*')
hold on
plot(x,yhat,'r','LineWidth',2)
title('Scatter Plot & Linear Model of Fuel vs. Weight')
xlabel('Weight (lbs)')
ylabel('gal.')
grid
%(c):
%=======================================================
%PROBLEM 3: X=temperature(C) & %Y=rate(moles/ltr)/sec
%Truth Model Parameters:
muX=100; muY=20;
Mu=[muX muY];
stdX=5; stdY=2;
C=[stdX^2 0 ; 0 stdY^2]; %Assumes rho=0
n=30; %Sample Size
xy=mvnrnd(Mu,C,n);
Rhat=corrcoef(xy);
rhat=Rhat(1,2);
figure(30)
%============================================
%PROBLEM 4
%(c):
p=0:.001:.1; np=length(p);
pwr=zeros(1,np);
figure(40)
plot(p,pwr)
title('Power Function for p_t_h=0.025')
xlabel('p')
ylabel('power')
grid