Download AMS312.01 Lecture notes April 14, 2008 Prof. Wei Zhu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
AMS312.01
Lecture notes
April 14, 2008
Prof. Wei Zhu
1. Jerry is planning to purchase a sporting goods store. He calculated that in order to
cover basic expenses average daily sales must be at least $525. He checked the daily sales
of 36 randomly selected business days. And he found that the average daily sale for these
days is $565 with a standard deviation of $150.
(a) At the significance level =0.05, can Jerry conclude that the average daily sale is
higher than $525? What is the p-value?
(b) In order to estimate the average daily sale of the store to within $20 with 95%
reliability, how many days should Jerry sample? Please derive the general formula first.
(c) If Jerry could only check the daily sales of 9 randomly selected business days (instead
of 36 randomly selected days). Suppose the daily sale for these 9 days are 510, 537, 548,
592, 503, 490, 601, 499 and 640 respectively. At the significance level =0.05, can Jerry
conclude that the average daily sale is higher than $525? What assumptions are necessary
for your test? What is the p-value of your test?
Solution. Inference on one population mean. n=36. Population variance  2 is unknown.
If you know the data, then you do normality test (e.g., Sharpio-Wilk test) to see if the
sample is from normal distribution. If the population is normal, then we only use tdistribution.
If it’s not normal but the sample size is large (>=30), the pivotal quantity
Z
X 
~ N (0,1) (by Central Limit Theorem and Slusky Theorem).
S/ n
(a) X  565 , S=150.
 H 0 :   0  525

 H a :   0
Note: If X  505 (<525), then you should notice that H a is not suitable.
Test statistic: Z 0 
X  0 565  525

 1.6
150 / 6
S/ n
At the significance level , we reject H 0 if Z 0  z . Here z  z0.05  1.645 . Since
z0  1.6  1.645  z , we cannot reject H 0 .
P-value= P(Z  z0 | H 0 )  P( Z  1.6)  0.0548   . We can not reject H 0 .
Note: P-value= P( X  565 | H 0 )  P( Z  1.6 | H 0 ) .
(b). First we derive the general formula.
P( X    E )  1  
P( E  X    E )  1  
E
X 
E


)  1
/ n / n / n
z 
E

 z / 2  n  (  / 2 )2
E
/ n
P(
Next we plug in the values to obtain the answer for the given problem.
n(
z / 2 2 1.96*150 2
) (
)  216.09  217 .
E
20
(c) Suppose from the Sharpio-Wilk test, we know the data/sample is from a normal
population.
 H 0 :   0  525

 H a :   0
X  546.67 , S=53.09.
Test statistic: T0 
X  0 546.67  525

 1.22 .
53.09 / 3
S/ n
At the significance level , we reject H 0 if T0  tn 1,  t8,0.05  1.860 . Since
T0  1.22  t8,0.05  1.860 , we cannot reject H 0 .
P-value= P(t  t0 | H 0 )  P(t  1.22)  0.1 . Since P-value > α = 0.05, we cannot reject H 0 .
2. John Pauzke, president of Cereals Unlimited, Inc., wants to be very certain that the
mean weight μ of packages satisfies the package label weight of 16 ounces. The packages
are filled by a machine that is set to fill each package to a specified weight. However, the
machine has random variability measured by σ2. John would like to have strong evidence
that the mean package weight is above 16 ounces. George Williams, quality control
manager, advises him to examine a random sample of 25 packages of cereal. From his
past experience, George knew that the weight of the cereal packages follows a normal
distribution with standard deviation 0.4 ounce. At the significance level α =.05,
(a) What is the decision rule (rejection region) in terms of the sample mean x ? Please
derive the general formula using the concept of Type I error rate.
(b) What is the power of the test when μ =16.2 ounces? Please derive the general formula
for power calculation first.
(c) What is the sample size necessary to ensure a power of 80% when μ =16.2 ounces?
Please derive the general formula for sample size calculation based on the Type I and II
error rates first.
Solution:
(a) Given the population is normal and the population variance is known, first we derive
the general formula for the hypotheses
 H 0 :   0

 H a :   0
* The pivotal quantity for  :
Z
X 
~ N (0,1)
/ n
* Test statistic (plug in H 0 :    0 to Z):
Z0 
X  0
~ N (0,1)
/ n
* Now we derive the decision threshold (rejection region) for our test based on the
Type I error rate, that is, the significance level 
*  = P(Type I error) = P(reject H 0 | H 0 )
= P(Z0  c | H 0 :   0 )
Given Ho, Zo follows exactly N(0, 1), therefore c  Z
Thus we reject H 0 (    0 ) in favor of H a (   0 ) if Z 0  Z  ; that is, if
Z0 
X  0
 Z  X   0  Z  / n
/ n
For the given problem, we have 0  16,   0.4, n  25,   0.05 . Thus we reject we
reject H 0 (  16) in favor of H a (  16) if X  16 
0.4
0.4
z0.05  16 
1.645  16.1316
5
5
(b)
 H 0 :   16  0

 H a :   16.2  a  16
Power = 1-  = P(reject H 0 | H a ) = P( Z 0  Z  |    a )
= P(
= P(
X  0

n
X  a

n
 Z  |    a ) , If    a , Z 0 

= P( Z  Z  
X  0

n
~ N(
a  0
,1)
 n
a  0
 Z |    a )
 n
a  0
|   a ) ,
 n
Z ~ N (0,1)
Please see Figure 1 for an illustration.
Figure 1. Power calculation and test related sample size calculation.
For the given problem, Z 
a  0 16.1316  16.2

 0.855
0.4 / 5
 n
Thus Power  P( Z  0.855)  0.805
(c)
 H 0 :   16  0

 H a :   16.2  a  16
Following Figure 1, we derive the sample size necessary by solving the equation:
 Z   Z 
 a  0
 n
The resulting general formula is n 
( z  z ) 2  2
(  a  0 ) 2
For the given problem: power  0.8    0.2,   0.4,   0.05 , therefore we have
n

( z  z ) 2  2
(  a  0 ) 2

(1.645  0.845) 2 0.42 6.2001  0.16

(16.2  16) 2
0.22
0.992016
 24.8  25
0.04
Hence, about 25 packages of cereal should be sampled to achieve a power of 80% when
=16.2 ounces.
3. For a random sample from any population for which the mean and variance exist.
Please prove that
(a) The sample mean and sample variance are unbiased estimators of the population
mean and variance respectively.
(b) When the population is normal, we have learned that the sample mean and the
sample variance, are indeed, independent. Please prove this for n = 2. That is, for
a random sample of size 2 only.
Solution:
  Xi  1
1
(a) E  X   E 
   E  X i   * n  
n
 n  n
  X  X 2 
 i
2
  1   E  X i 2   nE  X 2  
E S   E 


 n 1 
n 1


 2  2 
1 
2
2
2

n       n   
  
n 1 
n


X  X2
(b) When n=2, X  1
,
2
X1  X 2 2
X  X2 2
)  (X2  1
)
( X1  X )  ( X 2  X )
2
2
2
S 

2 1
1
2
X  X2 2
X  X1 2 ( X1  X 2 )
( 1
) ( 2
) 
2
2
2
2
2
( X1 
If we can show that X 1  X 2 and X 1  X 2 are independent, then X and S 2 are
independent. This can be done easily using the mgf technique:
M ( X1  X 2 ),( X1  X 2 ) (t1 , t2 )  M ( X1  X 2 ) (t1 )  M ( X1  X 2 ) (t2 )
4. It has been claimed that more than 70 percent of the students attending a large
university are opposed to a plan to increase student fees in order to build new parking
facilities. If 25 out of 30 students selected at random at that university are opposed to the
plan, please
(a) Test the claim at the 0.05 level of significance. Please derive the general formula
for the test at the significance level α first.
(b) Construct a 95% confidence interval for the proportion of students opposed to the
plan. Please derive the general formula for the 100(1-α)% confidence interval first.
x 25
Solution:
Inference on one population proportion. n  30, x  25, pˆ   .
n 30
(a). By the central limit theorem and Slusky’s theorem, we found the pivotal quantity:
Z
pˆ  p
 N (0,1) for large sample
p(1  p) / n
In general: H 0 : p  p0 v.s. H a : p  p0
The test statistic is:
Z
pˆ  p0
 N (0,1) When the null hypothesis is true.
p0 (1  p0 ) / n
By the definition of Type I error, we have,   P(Z0  c | H 0 )  P(Z0  c | H 0 : p  p0 )
Given Ho, Zo follows approximately N(0, 1), therefore, c  Z . That is, we reject H0 if
Z0  Z
In the given problem, we have
H 0 : p  0.7,
H a : p  0.7;   0.05
*Check: ( np0  5, n(1  p0 )  5) !
25
 0.7
pˆ  p0
30
T .S . Z 0 

 1.59
p0 (1  p0 )
0.7(1  0.7)
30
n
 We can NOT reject H 0 .
 Z 0  1.59  Z 0.05  1.645
(b). By the central limit theorem and Slusky’s theorem, we found the pivotal quantity:
pˆ  p
Z
 N (0,1) for large sample
pˆ (1  pˆ ) / n
From the pivotal quantity and its distribution:
P(  Z  
2
pˆ  p
 Z )  1  
pˆ (1  pˆ ) / n
2
 C.I . [ pˆ  Z 
pˆ (1  pˆ ) / n , pˆ  Z 
2
2
pˆ (1  pˆ ) / n ]
Since x  25  5, n  x  30  25  5, we can use the following formula to construct
the CI for p:
pˆ (1  pˆ )
pˆ  1.96 
 [.70, .96]
n
That is, we are 95% sure that between 70% to 96% of the students are against the plan.
5. To study the effectiveness of wall insulation in saving energy for home heating, the
energy consumption (in MWh) for 5 houses in Bristol, England, was recorded for two
winters; the first winter was before insulation and the second winter was after insulation.
House
1
2
3
4
5
Before
12.1 10.6 13.4 13.8 15.5
After
12.0 11.0 14.1 11.2 15.3
(a) Please provide a 95% confidence interval for the difference between the mean energy
consumption before and after the wall insulation is installed. What assumptions are
necessary for your inference?
(b) Can you conclude that there is a difference in mean energy consumption before and
after the wall insulation is installed at the significance level 0.05? Please test it and
evaluate the p-value of your test. What assumptions are necessary for your inference?
Two dependent population mean, paired samples t-test (that is, one sample t-test on the
paired differences)
Assumptions for (a) and (b): the paired differences follow a normal distribution.
(a). d  0.36 , sd=1.30
1.30
CI: 0.36  2.776 
 ( 1.25,1.97)
5
(b). H 0 :  d  0 , H 0 :  d  0
(1) t0 
d  d
0.36  0

 0.619
sd n 1.30 / 5
Since t0  0.619  t
2
, n 1
 t0.025,4  2.776 , we fail to reject H 0 .
(2) p  value  2  P(t  0.619)  0.2