Download 252soln0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Secretary problem wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
252soln0 2/3/00
PROBLEM A2. If n  64 and x  11.50 , find 95% confidence intervals for the mean under the following
circumstances:
a.   6.30, N  3000
b.   6.30 , N  300
c. s  6.30, N  3000
d. s  6.30 , N  300
SOLUTION: Use the formulas from Table 3 of the syllabus supplement or from the outline.
a.   x  z  x  11.50  1.960 .7875   11.50  1.54 or 9.96 to 13.04
2
x 
x
6.30

64
n
 .7875 z 2  z.025  1.960
b.   x  z  x  11.50  1.960 .6996   11.50  1.37 or 10.13 to 12.87
2
z 2
x
N n
6.30

N

1
64
n
 z.025  1.96
x 
236
300  64
 0.7875 .8884   .6996
 0.7875 
299
300  1
c.   x  tn1 s x  11.50  1.998 .7875   11.50  1.57 or 9.93 to 13.07
2
sx 
sx

6.30
n
 .7875
64
t  2
n 1
63
 t .025
 1998
.
d.   x  tn1 s x  11.50  1.998 .6996   11.50  1.40 or 10.10 to 12.90
2
sx 
sx
n
N n
6.30

N 1
64
300  64
 .6996
300  1
63
tn1  t.025
 1.998
2
PROBLEM A3. In a study of a grain market in an African country we want to figure out how large a
sample we must take to find a daily average price for a grain transaction. (Assume a standard deviation of
5 cents.)
a. We want a 99% confidence interval for the mean with an error of ±1 cent.
b. What if the error is to be ±1/2 cent?
z 2 2
, where z  z  z.005 since   .01 .
2
e2
a. We are told that the maximum error must be e  1 (or e  .01 ) and that   5 (or   .05 ).
SOLUTION: We use the formula n 
From the t table, z.005  2.576 so that n 
z 2 2

2.576 2 52
12
 165 .89 . since we always
e2
round this quantity up, use a sample size of at least 166. Note that if we use n  165 , we find that

5
 90  2.576
 90  1.003 . The error term will
(if we assume that x  90 )   x  z  2
n
165

5
 90  2.576
 90  1.000 .
be slightly above 1. However, if we use n  166 ,   x  z  2
n
166
b. This time the maximum allowable error is e  0.5 , so n 
z 2 2

2.576 2 52
0.52
 663 .57 and
e2
we must use a sample size of 664. Note that his sample size is four times the size in part a.
252soln0 2/3/00
PROBLEM A4 If s = 15 find a 95% confidence interval for  if a) n = 26, b) n = 99
SOLUTION: Use the formulas from Table 3 of the syllabus supplement or from the outline.
a. This is a small sample since n  31 , so use
n  1s 2
 22
 2 
n  1s 2
12 2
. Since the degrees of
2
 13 .1197 , the interval
freedom are n  1  31  1  30 ,  22   .2025  40 .6466 and  12 2   9725
25 15 2
25 15 2
or 138 .388   2  428 .745 . Since an interval for the
40 .6466
13 .1197
standard deviation was requested, take the square root of both sides. 11.76    20.71 .
b. Since the degrees of freedom are n  1  99  1  98 and are too large for the chi-square table use
becomes
s 2DF 
z 2  2DF 
 
 2 
s 2DF 
 z 2  2DF 
.
Since
2DF   298   196  14
and
15 14 
15 14 
 
or 13.158    17.442 .
1.960  14
 1.960  14
Note that due to the larger sample size, this interval is smaller than the one in a.
z 2  z.025  1.960 , the formula becomes
PROBLEM A5. a. Find the confidence level for an interval for the median using binomial tables, if from a
sample of 12 we take the third observation from both ends.
b. Do the same for the 19th observation from both ends in a sample of 50.
c. Do the same for an interval using the 10th observation from both ends in a sample of 40, using
the normal approximation to the binomial distribution.
d. In part c, try to find a 95% confidence interval for the median.
SOLUTION: a) If we take the third number from both the bottom and the top of the data, we get the
interval x3    x10 from the ordered numbers x1, x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11, x12 . For example, if
the numbers are 13.2, 17.1,18.5, 21.3, 21.4, 22.0, 27.1, 27.7, 28.9, 29.2, 35.4, 35.9 , we would say that the
interval is 18 .5    29.2.
To find the confidence level, first find the significance level  , the probability that the interval is
wrong. The interval will be wrong if (i) x3 through x10 are all below the median or (ii) x3 through x10
are all above the median. The probability of these two events are both the same, so that we can figure out
the probability that x3 through x10 are all below the median and double it. The probability of any given
number being below (or above) the median is 0.5, and the probability that x3 through x10 are all below the
median is the probability that the first 10 or more numbers are all below the median, and is the same as the
probability of getting ten or more heads in twelve flips of a coin.
252soln0 2/3/00
From the binomial table for n  12 and p  .5 , we find that Px  10   1  Px  9  1  .98071
 .01929 . But note that the binomial distribution with p  .5 is symmetrical so that Px  10   Px  2 .
Also remember that we stated in the previous paragraph that to get the significance level, we must double
this probability, so that   2.01929   .03858 . Thus the confidence level is 1    1  2.01929   .96142 .
More generally, if k is the index of the number at the bottom of the confidence interval, (in the case we
just did k  3 ) the confidence level is 1    1  2Px  k  1 .
b) If we take a sample of n  50 , put it in order, and then pick the 19th number k  19  from both
the top and the bottom, so that the confidence interval is x19    x32  , the confidence level is
1    1  2Px  k  1  1  2Px  19  1  1  2Px  18   1  2.03245   .93510 . This can also be done
using the Normal Distribution. If we ignore the continuity correction, and recall that for the binomial
distribution with p  .5 and q  1  p  .5 ,   np  .5n and  2  npq  n.5.5  .25n ,
k  1     P z  k  1  .5n   P z  18  .550    Pz  1.98   .5  .4761  .0239

Px  k  1  P z 








.25 n 
.5 50 


.and 1    1  2Px  k  1  1  2.0239   .9522 . This looks way off, so try the same problem with a

18  .5  .550  
continuity correction Px  k  1  Px  18   P z 
  Pz  1.84   .5  .4671  .0329
.5 50


and the confidence level is 1    1  2Px  k  1  1  2.0329   .9342 .
c) If n  40 and k  10 we have no binomial table, so use the normal approximation to the
binomial distribution with a continuity correction.
k  1  .5     P z  k  1  .5  .5n   P z  9  .5  .540    Pz  3.32   .5  .4995  .0005

Px  k  1  P z 








.25 n
.5 40




and the confidence level is 1    1  2Px  k  1  1  2.0005   .9990 .
d) If we want a 95% confidence interval and n  40 , we require that 1    1  2Px  k  1
k  1  .5     P z  k  1  .5  .5n   .025 .

 1 2.025   .95 . This means that Px  k  1  P z 




.25 n




But since z .025  1.960 , we know that Px  k  1  Pz  1.960   .025 . So we can say that
k  1  .5  .5n  1.960 . Solve this with n  40 , or note that k  1  .5  .5n  1.960 .25n and, solving
.25 n
for k , we find k  .5  .5n  1.960 .25n . If we substitute n  40 , k  .5  .540   1.960 .2540 
 20.5  10  20.5  6.26  14.30 . We could also follow the formula in the outline that says
k
n  1  z

n

40  1  z

40
 14 .30 . Obviously k must be a whole number and the more
2
2
conservative choice would be to round it down, so that the interval is x14    x 27 .
2
2
252soln0 2/3/00
PROBLEM B.1 A firm claims that its median wage is $32000. The union claims that the median () is
lower. A random sample of 100 employees shows that 40% are above $32000. Set this up as two
hypotheses and test with a significance level of 5%.
SOLUTION: We always replace a hypothesis about a median with a hypothesis about a proportion in the
sign test. The statement implicit in the above is that the median is at least $32000. Since we have the
 H :   32000
 H : p  .5
number over $32000 let p be the proportion over $32000. Then  0
becomes  0
.
 H 1 :   32000
 H 1 : p  .5
p0 q0
x
.5.5  .05 . Note that a
 .40 , so that  p 

n
n
100
continuity correction has been added to all of these solutions. It has the effect of making the “accept”
0 .5
region larger by x  0.5 or p 
.
n
(i)
Critical Value Method:
.5
.5
p cv  p 0  z  p   .5  1.645 .05  
 .5  .08225  .005  .41275 . If p is below this
n
100
critical value we reject H 0 . Since .40 is below .41275, reject H 0 .
Note that   .05, n  100, x  40 and p 
(ii)
Test Ratio Method: There are three possible versions. In all those below, the rule
 if x  n 

2
frequently used is , where  appears, use 
.
n
 if x  2 

p  .5  p 0
.40  0.5
 .5 
n
100
  Pz  1.90 
  z 
P p  .40   P  z 


p
.05


 .5  .4713  .0287

x  .5  np0
40  0.5  100 .5 
Px  40   P  z 
  Pz  1.90   .0287
  z 
100 .5.5 

np0 q 0


  z  2 x  1  n Px  40   P  z  240   1  100   Pz  1.90   .0287
n
100


In each case, the p-value is .0287. Since   .05, p  value  and we reject H 0
252soln0 2/3/00
PROBLEM B.2 We are testing that the median is 14. Let x be the number of items above 14. From a
sample of size n  30 , we find x  25 . Use p for the proportion of the population over 14 and p for the
proportion of the sample over 14.
a) Test  = 14
b) Test  > 14
c) Test  < 14
25
SOLUTION: Note that p 
 .8333 . Assume   .05 .
30
 H :   14
 H : p  .5
a)  0
becomes  0
. If we use the critical value method
H
:


14
 1
 H 1 : p  .5


p0 q0 0.5 
.5.5 .5 
pcv  p0   z 2

 .5  1.96

 .5  .179  .017   .5  0.196


n
n 
30
30 


or .304 to .696. Since .8333 is not in this interval reject H 0 . We are probably better off using
the test ratio method, with z 
x  .5  np0
np0 q 0
. Here np0  30 .5  15 and np0 q 0  15.5  7.5 .
So

24 .5  15 
pvalue  2 Px  25   2 P  z 
  2 Pz  3.47   2.5  .4797   2.0003   .0006 .
7.5 

Since this is below the significance level, reject H 0 .
b)
 H 0 :   14
becomes

 H 1 :   14
 H 0 : p  .5
.

 H 1 : p  .5
In this case

24 .5  15 
pvalue  Px  25   P  z 
  .0003 . Since this is below the significance level,
7.5 

reject H 0 .
c)
 H 0 :   14
 H : p  .5
becomes  0
. In this case, it is possible to have many items

 H 1 :   14
 H 1 : p  .5
over 14 and for H 0 still to be true.

25 .5  15 
pvalue  Px  25   P  z 
  Pz  3.83   .5  .4999   .9999 . Since this is
7.5 

above the significance level, accept H 0 .
PROBLEM B.3 A bank's average default rate on loans is supposedly 6 per month. In the first month
there are 12 defaults. Test the first assertion assuming a Poisson distribution. Use a two-sided test with a
5% significance level.
H 0 : Poisson6
SOLUTION: 
. Though it is possible to put together a rejection region, the easiest way
H 1 : not Poisson6
to do this is to use the Poisson(6) table and a p-value approach. If we look up the probability that x is 12 or
larger we find: pvalue  2Px  12   21  Px  11  21  .9799   2.0201   .0402 . Since pvalue   ,
reject H 0 .
252soln0 2/3/00
PROBLEM B.4 a. I claim that x is binomially distributed with p  .01 . Test this assertion using a 2sided 5% test if there are 3 successes in 10 trials.
b. Test for a binomial distribution with p  .10 when n  10 and x  4 .
c. If n  100 and x  9 , test to see if p is at least 0.4.
d. Calls coming into a switchboard in an hour presumably have a Poisson distribution with a mean of
144. Test this hypothesis if, in a given hour, 200 calls come in
SOLUTION:
x
a. If we assume that
has the Binomial distribution, our Hypotheses are
H 0 : Binomial p  .01
. If we have a Binomial table for p  .01, note that

H 1 : not Binomial p  .01
  np  10.01  .01, so that our value of x is too large. pvalue  2Px  3  21  Px  2
 21  .99989   .00022 . This is below the significance level, so reject H 0 .
b.
c.
If p  .10 and n  10 ,   np  1 so that x  4 is too large. pvalue  2Px  4  21  Px  3
 21  .9984   2.0016   .0032 . This is below the significance level, so reject H 0 .
H : Binomial p  .4
Our hypotheses are now  0
. Since n  100 and x  9 ,   np  40 and x is
H 1 : Binomial p  .4
too small. From the binomial table for p  .4 , pvalue  Px  9  .00000 , so reject H 0 . If a
table with n  100 is unavailable, use the Normal approximation, pvalue  Px  9  P p  .09 
d.


.5
.09 
 .4 

100
  P z  .09  .005  .4   Pz  6.23   0, so reject H .
 Pz 
0



.4.6  
.0024




100


This is a Poisson problem, but a table for Poisson(144) is not available. Fortunately for large


values of m , the Poisson mean, x ~ N m, m . Since there are no specific requirements, assume
that
a
2-sided
95%
test
is
wanted.
H 0 : Poisson144 

H 1 : not Poisson144 
Then
 x  m
z

 m 
 200  144 

  4.67 . Since z 2  z.025  1.96 , and our test ratio is not between 1.96 , reject
 144 
H0.
PROBLEM B.5 If
 x  x    x
2
2
 nx 2  40 and the confidence level is 95%, test if it is true that
the variance is 2 when a) n  10 , b) n  20 , c) n  40.
SOLUTION:
We
are
 H 0 :  2  2
.

 H 1 :  2  2
testing
n  1   x  x 


From
the
outline,
since
s
2
 x  x 

n 1
2
,


 x  x 2
x  x 2 40

and

 20   2 in all cases. Since

 
2
n 1
 02
 02
 02 
 02


the confidence level is 95%, all we really need to do is find out whether our value of  2 falls between
2
n  1 s 2

2
 12 2 and  22 , in this case  .2975 and  .2025 .
a.
9   2.700
9   19.023 . Since our value of 2 does
n  10 implies 9 degrees of freedom.  .2975
and  .2025

not fall between them, reject H 0 .
b.
19  8.907 and  219  32 .852 . Since our value of 2
n  20 implies 19 degrees of freedom.  .2975

.025
falls between them, do not reject H 0 .
c.
n  40 implies 39 degrees of freedom. Because we are beyond the  2 table, we must use the
approximation, z  2  2  2 DF  1. We already know that  2  20, so that
z  220   239   1  6.32  8.77  2.95 . For a confidence level of 95%, z must be between
1.96 and 1.96. Since this value of z is not, reject H 0 .
252soln0 2/3/00
PROBLEM C.1 Assume that = 4 and n = 70. Find the critical values, power function and operating
characteristic curve for:
H0 :   50
H1 :  < 50
Use a significance level of 5 percent.
SOLUTION: a) First, state the problem and find a critical value or values.
 H 0 :   50

4
  4, n  70,   .05 so  x 

 0.47809 . Since this is a one-sided test, the

n
70
 H 1 :   50
formula for a two-sided critical value x cv   0  z   x becomes xcv   0  z  x , so that
2
xcv  50  1.645 0.47809   49.2135 . So we will not reject H 0 if the sample mean x is greater than or
equal to 49.2135.
b) Decide on what values of 1 to use to compute  , the probability of a type II error. The usual set
of values includes the mean from the null hypothesis, the critical value, a point about midway between
these values and two points, one further out beyond the critical value by a distance equal to the distance
between the null hypothesis mean and the critical value, and another halfway between this point and the
critical value. We thus choose 50, 49.2135, and 49.6, which is about halfway between them. Since the
difference between 50 and 49.2135 is about 0.8, the lowest value of 1 the we use is 48.4, and a point
about halfway between 48.4 and 49.2135 is 48.8.
c) Compute  for each value of 1 . Since a type II error is wrongly ‘accepting’ the null hypothesis, we
compute the probability that the sample mean will be above or equal to the critical value for each value of

x  1 
1 . Our computations are below. Note that, in general, for this one-sided hypothesis   P  z  cv
.
x 

49 .2135  50 

  Px  49 .2135   50   P  z 
1  50
  Pz  1.645   .95
.47809


power  1    .05  
1  49.6
  Px  49 .2135   49 .6  P  z 


49 .2135  49 .6 
  Pz  0.81  .2910  .5  .7910
.47809

power  1    .2090
1  49.2135
  Px  49 .2135   49 .2135   P  z 


49 .2135  49 .2135 
  Pz  0  .5000
.47809

power  1    .5000
1  48 .8
  Px  49 .2135   48 .8  P  z 


49 .2135  48 .8 
  Pz  0.86   .5  .3051  .1949
.47809

power  1    .8051
1  48.4
  Px  49 .2135   48 .4  P  z 


power  1    .9554
49 .2135  48 .4 
  Pz  1.70   .5  .4554  .0446
.47809

252soln0 2/3/00
PROBLEM C.2 A hardware firm charges a flat rate for mailing of small tools based on an average weight
of 20 oz. with a standard deviation of 3.60 oz. A consultant challenges this assumption and a sample of 100
packages is taken. Find critical values for a significance level of 1% and compute the power function and
operating characteristic curve.
SOLUTION: a) First, state the problem and find a critical value or values.
 H 0 :   20

3.60
  3.60, n  100 ,   .01 so  x 

 0.360 . Since this is a two sided test, the

H
:


20
n
100
 1
formula for a critical value is x cv   0  z   x , so that xcv  20  2.576 0.360   20  0.927 . So we will
2
not reject H 0 if the sample mean x is between 19.073 and 20.927.
b) Decide on what values of 1 to use to compute  , the probability of a type II error. The usual set
of values includes the mean from the null hypothesis, the critical values, a point about midway between
these values and two points, one further out beyond the critical value by a distance equal to the distance
between the null hypothesis mean and the critical value, and another halfway between this point and the
critical value. We thus choose the null hypothesis mean, 20 and the two critical values 19.073 and
20.927.20.5 and 21.5 are about halfway between 20 and the critical values. Since the difference between 20
and the critical values is about 1.0, the lowest value of 1 the we use is 18.0 and the highest is 22.0. Points
about halfway between these numbers and the critical values are 18.5 and 21.5.
c) Compute  for each value of 1 . Since a type II error is wrongly ‘accepting’ the null hypothesis, we
compute the probability that the sample mean will be between the critical values for each value of 1 . Our
 x  1
x  1 
computations are below. Note that, in general, for a two-sided hypothesis   P  cv1
 z  cv 2
.

x 
x

20 .927  20 
19 .073  201
1  20
  P19 .073  x  20 .927   20   P 
z
0.360 
 0.360
 P2.575  z  2.575   2.4950   .99  1  
1  20.5 or 19.5
20 .927  20 .5 
19 .073  20 .5
z

0
.
360
0.360


  P19 .073  x  20 .927   20 .5  P 
 P 3.96  z  1.19   .5  .3830  .8830
1  20.927 or 19.073
20 .927  20 .927 
19 .073  20 .927
z

0.360
0.360


  P19 .073  x  20 .927   20 .927   P 
 P 5.15  z  0.00   .5000
1  21 .5 or 18.5
20 .927  21 .5 
19 .073  21 .5
z

0.360
0.360


  P19 .073  x  20 .927   19 .5  P 
 P 6.74  z  1.59   .5  .4441  .0559
1  22.0 or 18.0
20 .927  22 .0 
19 .073  22 .0
z

0.360
0.360


  P19 .073  x  20 .927   22 .0  P 
 P 8.13  z  2.98   .5  .4986  .0014
252soln0 2/3/00
If we round these results, we get the following values for the operating characteristic and power:
22.0
21.5
20.9
20.5
20.0
19.5
19.0
18.5
1

power
.00
1.00
.06
.94
.50
.50
.88
.12
.99
.01
.88
.12
.50
.50
.06
.94
18.0
.00
1.00