Download AMS 572 Lecture Notes #6 September 24, 2013

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AMS 572 Lecture Notes #6
September 24, 2013
<Today’s Topics>
1. Power Calculation (Inference on one population mean)
2. Sample Size Determination
3. How to do it in SAS & R
4. Inference on one population variance
⑧ Sample Size Determination
Sample size determination in the hypothesis test scenario
1.
H 0 :    0

H a :    0 (    a   0 )
1st scenario, Normal population,  2 is known.
1   = P( Z  
a  0
 Z |    a ) ,
 n
Z ~ N (0,1)
1
Z  
n
n 
2.
a  0
  a
 Z  0
 Z
 n
 n
( Z   Z  )
0  a
(Z  Z  ) 2  2
( 0   a ) 2
H 0 :    0

H a :    0 (    a   0 )
1st scenario, Normal population,  2 is known.
n 
(Z  Z  ) 2  2
( 0   a ) 2
(*the same result as in Scenario 1 – in summary, the formula is
identical for the one-sided tests.)
3.
H 0 :    0

H a :    a   0
2
1st scenario, Normal population,  2 is known.
1   = P( Z  Z  2 
a  0
  0
|    a )  P( Z   Z  2  a
|   a )
 n
 n
(Assume  a   0 . Then, P( Z   Z  2 
1    P( Z  Z  2 
 Z   Z  2 
a  0
|    a )  0 . So, we can neglect it.)
 n
a  0
|   a )
 n
a  0
 n
a  0
 Z 2  Z 
 n
n 
( Z 2  Z  ) 2  2
(a  0 ) 2
(* in summary, this hand-calculated formula for the two sided test
differs from that for the one-sided tests in two aspects:
(1) α is replaced by α/2;
(2) it is an approximate formula, not an exact formula.)
3
Sample size determination in the CI scenario
1st scenario, Normal population,  2 is known.
P.Q. Z 
X 

~ N (0,1)
n
100(1   )% CI for  : X  Z  2
L  ( X  Z 2
 2  Z 2
n(

n
)  ( X  Z 2

n

n
)

n
2  Z 2  
L
)2
Sample size determination based on the maximum error E
1st scenario, Normal population,  2 is known.
P( X    E )  1  
 P( E  X    E )  1  
 P(
n(
E

n
Z 2  
E

X 

n

E

n
)  1
)2
Compare the above formula to:
n(
2  Z 2  
L
)2
For a given  , L  2  E -- one can prove this easily.
4
⑨ Do it in SAS
For the inference on one population mean, three procedures are most relevant:
Proc means;
Proc univariate; - we studied this in lecture 5
Proc ttest; -- we will study how to use this today as this is a more recent SAS procedure
We start, however, by reviewing how to enter the data to SAS from the key board.
data one ;
input ID $ weight ;
X = weight – 100 ;
datalines ;
P1 100
P2 93
P3 88
…
P37 105
;
run ;
Alternative data entry procedures in SAS:
data two ;
input ID $ weight @@ ;
P1 20 p2 37 p3 47
P4 34 …
…
;
run ;
*** infile ; (used to read data stored in other files already, e.g. excel files)
5
proc univariate data=one normal ;
var X ;
run ;
Normality test : Shapiro-Wilk Test
H 0 : population is normal / H a : population is NOT normal
 t-test / z-test
non-parametric test : Sign Test / Signed Rank Test
Alternative test procedures for one population mean in SAS:
proc means data=one t prt ;
var X ;
run ;
H 0 :   0
prt : p-value of 
H a :   0
proc ttest ;
ttest : 1 population t-test / 2 populations t-test (paired and independent)
Proc ttest can directly test:
H 0 :   100

H a :   100
Ex1) The seven scores listed below are axial loads (in pounds) for a random sample of 7
12-oz aluminum cans manufactured by ALUMCO. An axial load of a can is the
maximum weight supported by its sides, and it must be greater than 165 pounds, because
that is the maximum pressure applied when the top lid is pressed into place.
270, 273, 258, 204, 254, 228, 282
(1) As the quality control manager, please test the claim of the engineering supervisor
that the average axial load is greater than 165 pounds. Use   0.05 . What assumptions
are needed for your test?
6
(2) Please write a SAS program to do part (a).
Sol)
H 0 :   165
H a :   165
(1) 
Assume the distribution is normal.
T0 
X  0
S
H0
~ t n1
n
At the significance level of   0.05 , we reject H 0 in favor of H a if T0  t n 1,
T0 
87.7
27.6
7
 8.9  1.943 : We reject H 0
CI : X  t n1, 2
S
n
(2)
data cans ;
input pressure @@ ;
newvar = pressure – 165;
datalines ;
270 273 258 204 254 228 282
;
run ;
proc univariate data=cans normal ;
var newvar ;
run ;
Alternatively, we can use the proc ttest procedure as follows:
Proc ttest data=cans h0=165 sides=u alpha = 0.05;
Var pressure;
Run;
*** Note: Here we can also perform the one-sided test.***
Please see the following site for more examples and explanations:
7
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statu
g_ttest_a0000000115.htm
Ex2) To determine whether glaucoma affects the corneal thickness, measurements were
made in 8 people affected by glaucoma in one eye but not in the other. The corneal
thickness (in microns) were as follows:
Person
Eye affected
Eye not affected
1
488
484
2
478
478
3
480
492
4
426
444
5
440
436
6
410
398
7
458
464
8
460
476
(a) According to the data, can you conclude, at the significance level of 0.10, that the
corneal thickness is not equal for affected versus unaffected eyes? Please first
derive the general formula for the test for the given scenario based on a sample
size of n and a significance level of α.
(b) Calculate a 90% confidence interval for the mean difference in thickness. Please
first derive the general formula for the confidence interval for the given
scenario based on a sample size of n and a confidence level of 1-α.
(c) Please write the entire SAS code to check the assumptions necessary in (a) and to
perform the test asked for in (a). (*Part C was not given as part of the quiz today.)
Note: For the general derivation of (a) and (b), please refer to your lecture notes #4.
Solution: (a) Using d  4 and s d  10.744 , the test statistic is
t
d 0
sd
n

40
10.744
8
 1.053
Since t  t81,0.05  1.895 , we can NOT reject H 0 at   0.10 . That is, we do NOT have
enough evidence to support the claim that the average corneal thicknesses are affected by
glaucoma.
(b) A 90% CI for 1   2 is given by
d  tn1, 2  sd
n  4  1.895  10.744
8
That is, [11.198, 3.198]
(c) The SAS code is as follows.
Data eyes;
Input bad good;
Diff=bad-good;
Datalines;
488 484
478 478
480 492
426 444
8
440 436
410 398
458 464
460 476
;
Run;
Proc univariate data = eyes normal;
Var diff;
Run;
Alternatively, we can use the proc ttest procedure as follows:
Proc ttest data=eyes alpha = 0.1;
Paired bad*good;
Run;
*** Note: Here we can also obtain the CI and choose the confidence level.***
Please see the following site for more examples & options (see especially the plots):
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#sta
tug_ttest_sect011.htm
Ex3) Jerry is planning to purchase a sports goods store. He calculated that in order to
make profit, the average daily sales must be  $525 . He randomly sampled 36 days and
found X  $565 and S  $150
1) In order to estimate the average daily sales to within $20 with 95% reliability,
how many days should Jerry sample?
2) If the true average daily sales is $530, what is the power of Jerry’s test at the
significance level of 0.05?
3) Suppose   $530 . In order to guarantee   0.05 and   0.2 . How many days
should Jerry sample?
Sol) 1). P( X    E )  1   .
n(
z / 2 2 1.96*150 2
) (
)  216.09  217 .
E
20
H 0 :    0  525
H a :    a  530   0
2). 
9
Power = P(Reject H 0 | H a )
 P( Z 0  Z  |    a )
 P(
 P(
X  0
S
n
X  a
S
n
 Z |    a )
 Z 
 P ( Z  1.645 
a  0
S
n
530  525
150
36
|   a )
)  P ( Z  1.445) 
0.0749  0.0735
 0.0742
2
3).   0.05
  0.2
  530   a
n 
(Z  Z  ) 2  2
( a   0 ) 2

(1.645  0.845) 2 150 2
 5580.09  5581
(530  525) 2
Ex4) John Pauzke, president of Cereal’s Unlimited Inc, wants to be very certain that the
mean weight  of packages satisfies the package label weight of 16 ounces. The
packages are filled by a machine that is set to fill each package to a specified weight.
However, the machine has random variability measured by  2 . John would like to have
strong evidence that the mean package weight is about 16 oz. George Williams, quality
control manager, advises him to examine a random sample of 25 packages of cereal.
From his past experience, George knew that the weight of the packages follows a normal
distribution with standard deviation 0.4 oz. At the significance level   0.05 ,
1) What is the decision rule (rejection region) in terms of the sample mean X ?
2) What is the power of the test when   16.13 oz?
3) How many packages of cereal should be sampled if we wish to achieve a power
of 80% when   16.13 oz?
Sol) Let X denote the weight of a randomly selected package of cereal, then
X ~ N (   16,   0.4)
10
 H 0 :   16

 H a :   16
  H 0 :   16 


 H :   16 
 a

1) Test Statistic : Z 0 
X  0

n
H0
~ N (0,1) if    0  16
  P( Z 0  c | H 0 )  c  Z 
We reject H 0 at   0.05 if
Z0 
X  0

n
 Z  X   0  Z

n
 16  1.645 
0.4
25
 16.1316 (oz)
H 0 :    0  16

2) H a :    a  16.13   0 (n=25)
Power = P(Reject H 0 | H a )
 P( Z 0  Z  |    a )
 P(
 P(
X  0

n
X  a

n
 Z |    a )
 Z 
 P( Z  1.645 
a  0
|   a )
 n
16.13  16
0.4
25
)  P ( Z  0.02)  0.49
3)   0.05,   0.2,  0  16,  a  16.13,   0.4
n
(Z  Z  ) 2  2
( a   0 ) 2

(1.645  0.845) 2 (0.4) 2
 59
(16.13  16) 2
⑩ Inference on one population variance
 When the population is normal
i .i .d .
Data X 1 , X 2 ,  , X n ~ N (  ,  2 )
W
(n  1) S 2
2
~  n21 : Pivotal Quantity for the inference on  2
11
1. CI for  2
P( 
P(
(
2
n 1, 2, L

(n  1) S 2
 n21, 2,U
(n  1) S 2

2
 
2
  n21, 2,U )  1  
(n  1) S 2
 n21, 2, L
)  1
(n  1) S 2 (n  1) S 2
, 2
) : 100(1   ) % CI for  2
2
 n1, 2,U
 n1, 2, L
2. test on  2
 H 0 :  2   02

 H a :  2   02
E (S 2 )   2
Test statistic : W0 
(n  1) S 2

2
0
H0
~  n21
At the significance level  , we reject H 0 if W0   n21, ,U
12
Question: What if the population is NOT normal?
Answer:
 if you know the population distribution, you can do the LR test (likelihood ratio
test)
 If the population distribution is unknown, you can try Box-Cox normal
transformation, or apply non-parametric procedures such as Bootstrap resampling.
13