Download 252y0811 - On-line Web Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Omnibus test wikipedia, lookup

Statistical hypothesis testing wikipedia, lookup

Transcript
252y0811 3/7/08
1
ECO252 QBA2
FIRST EXAM
February 28, 2008
Version 1
Name _KEY____________
Class hour: _____________
Student number: __________
Show your work! Make Diagrams! Include a vertical line in the middle! Exam is normed on 50
points. Answers without reasons are not usually acceptable.
I. (8 points) Do all the following.
x ~ N 7, 11 (But you can’t buy donuts there!)
07

1. Px  0  P  z 
 Pz  0.64   Pz  0  P0.64  z  0  .5  .2389  .2611
11 

For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the entire area below -0.64. Because this is entirely on the left side of zero, we must subtract the area
between -0.64 and zero from the larger area below zero. If you wish, make a completely separate diagram
for x . Draw a Normal curve with a mean at 7. Indicate the mean by a vertical line! Shade the entire area
below zero. This area is entirely on the left side of the mean (7), so we subtract the smaller area between
zero and the mean from the larger entire area (.5) below the mean.
42  7 
14  7
z
 P0.64  z  3.18   P0  z  3.18   P0  z  0.64 
2. P14  x  42   P 
11 
 11
 .4993  .2389  .2604
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the area between 0.64 and 3.18. Because this is entirely on the right side of zero, we must subtract
the area between zero and 0.64 from the larger area between zero and 3.18. If you wish, make a completely
separate diagram for x . Draw a Normal curve with a mean at 7. Indicate the mean by a vertical line!
Shade the area between 14 and 42. This area is entirely on the right side of the mean (7), so we subtract the
smaller area between 14 and the mean from the larger area between 42 and the mean.
30  7 
  30  7
z
 P 3.36  z  2.09   P3.36  z  0  P0  z  2.09 
3. P30  x  30   P 
11 
 11
 .4996  .4817  .9813
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the entire area between -3.36 and 2.09. Because this is on both sides of zero, we must add the area
between -3.36 and zero to the area between zero and 2.09. If you wish, make a completely separate diagram
for x . Draw a Normal curve with a mean at 3. Indicate the mean by a vertical line! Shade the entire area
between -30 and +30. This area is on both sides of the mean (7), so we add the area between -30 and the
mean to the slightly smaller area between the mean and 30.
4. x.0005 (Do not try to use the t table to get this.) (I only need one answer – you may find more than one
possibility.) For z make a diagram. Draw a Normal curve with a mean at 0. z .0005 is the value of z with
0.05% of the distribution above it. Since 100 – 0.05 = 99.95, it is also the .9995 fractile. Since 50% of the
standardized Normal distribution is below zero, your diagram should show that the probability between
z .0005 and zero is 99.95% - 50% = 49.95% or P0  z  z.0005   .4995 . If we check this against the
Normal table, we would usually find the probability closest to .4995, but we actually have a choice this
time. We can say P0  z  3.27   .4995 , but we also could say P0  z  3.32   .4995 . (Actually the
table in the text says Pz  3.29   .99950 . ) Any value between 3.27 and 3.32 is acceptable here. So
3.27  z .125  3.32, with values closer to 3.29 best. This is the value of z that you need for a 99.9%
confidence interval. To get from z .0005 to x.0005 , use the formula x    z , which is the opposite of
x
. x  7  3.2911  43.19 , but any answer from x  7  3.27 11  42 .97 to x  7  3.32 11

 43.52 is acceptable. If you wish, make a completely separate diagram for x . Draw a Normal curve with
z
252y0811 3/7/08
a mean at 7. Show that 50% of the distribution is below the mean (3). If 0.05% of the distribution is above
x.0005 , it must be above the mean and have 49.95% of the distribution between it and the mean.
43 .19  7 

Check: Px  43.19   P  z 
  Pz  3.29   Pz  0  P0  z  3.29   .5  .4995  .0005 .
11


This is identical to the way you normally get a p-value for a right-sided test.
2
252y0811 3/7/08
3
II. (9 points-2 point penalty for not trying part a.)
Langley) A copier has been turning out 45 copies a minute. After a repair, the copier is tested 5 times. In
these 5 runs the output per minute is 46, 47, 48, 47, 47, 46
a. Treating the above numbers as a random sample of size 6 from the Normal distribution, compute the sample standard
deviation, s , of expenditures. Show your work! (2)
b. Compute a 90% confidence interval for the mean output per minute. (2)
c. Redo b) when you find out that there were only 20 runs to pick the sample of 6 from. (2)
d. Assume that the population standard deviation is 0.7 and create a 99.9% two-sided confidence interval for the mean. (2)
e. Use your results in a) to test the hypothesis that the mean is above 45 at the 90% level. (3) State your null and alternative
hypotheses clearly!
f. (Extra Credit) Test the hypothesis that the population standard deviation is 0.70 at the 99.9% significance level assuming
that a random sample of 50 yielded a sample standard deviation of 0.75.
Solution: a. Treating the above numbers as a random sample of size 6 from the Normal distribution,
compute the sample standard deviation, s , of expenditures. Show your work! (2)
Total 281 13163
Row
x
x2
1
2
3
4
5
6
46
47
48
47
47
46
2116
2209
2304
2209
2209
2116
x

 x  281 and  x  13163 .
 x  281  46.83333
x
2
So
n
6
13163  646 .83333 2 2.8352

 0.5679 and s x  0.5670  0.7530 .
n 1
5
5
b. Compute a 90% confidence interval for the mean output per minute. (2)
The table below is part of Table 3.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Mean (
  x  z 2  x
xcv   0  z 2  x
x  0
H0 :   0
z
known)
x
H1 :    0

x 
n
Mean (
x  0
  x  t 2 s x
xcv   0  t  2 s x
H0 :   0
t
unknown)
sx
H1 :    0
DF  n 1
s
sx 
n
At his point the population standard deviation is unknown, so we must use the formula for 
s x2
2
 nx 2

s
5

unknown. df  n  1  5 .   .10 . t n1  t.05
 2.015 . s x 
2
n
s2

n
0.5679
 0.09465
6
5
s x  46 .8333  2.015 0.30765   46 .8333  0.6199 or 46.213 to 47.453.
 0.30765 . So   x  t .05
c. Redo b) when you find out that there were only 20 runs to pick the sample of 6 from. (2)
sx 
s
n
N n

N 1
s2  N  n 
0.5679  20  6 



  .06974  0.26409
n  N 1 
6  20  1 
5
  x  t .05
s x  46 .8333  2.015 0.26479   46.8333  0.5678 or 46.266 to 47.401
Incidentally s x 
N n
 20  6 
 
  0.7368  .8584
N 1
 20  1 
252y0811 3/7/08
4
d. Assume that the population standard deviation is 0.7 and create a 99.9% two-sided confidence
interval for the mean. (2)   1  .999  .001 . We found on page 1 that z  z.0005  3.29 (or
2
something similar).  x 

n


2
n

0.7
6
2
 0.081667
 0.2858 . So
  x  z .0005 x  46 .8333  3.29 0.2858   46 .833  0.940 or 45.893 to 47.773.
e. Use your results in a) to test the hypothesis that the mean is above 45 at the 90% level. (3) State
your null and alternative hypotheses clearly!
Since the statement ‘the mean is above 45’ does not contain an equality, it must be an alternative
 H :   45
hypothesis. Our hypotheses are  0
  .10 , x  46.83333 , n  6 ,  0  45 and
 H 1 :   45
s x  0.30765 .
Since we are worrying about the mean being too large, this is a right-sided test and we want a
5
single critical value for the sample mean above 45. tn 1  t .10
 1.476 . The two sided formula
is xcv   0  t  s x and this becomes xcv   0  t s x  45  1.476 0.30765   45 .4541 . Make a
2
diagram. Make a Normal curve centered at  0  45 . (45, of course, is in your ‘do not reject’ zone.)
The ‘reject’ zone is the area under the curve above 45.4531. Shade it. Since x  46.83333 is in the reject
zone, reject the null hypothesis. We can say that the copier’s speed has improved.
You should, of course, only do this one way – I have to do it 3. If you choose to use a test ratio,
x  0
46 .8333  45
 5.959 . Make a diagram. Make a Normal
the formula t 
gives us t 
0.30765
sx
curve centered at zero. (Zero, of course, is in your ‘do not reject’ zone.) The ‘reject’ zone is the area under
5
 
the curve above tn 1  t .10  1.476 . Shade it. Since t  5.959 is in the reject zone, reject the null
hypothesis. If you want a p-value for the null hypothesis, remember t  5.959 and df  5 . The
5
 5.893 . Since the significance level falls as
highest value on the df  5 row of the t-table is t .001
t gets larger, we can say that p  value  .001 . Since the p-value is below the .10 significance
level, reject the null hypothesis.
If you choose to do a confidence interval for the population mean, the formula   x  t s x
2
must be replaced by a one-sided interval in the same direction of the alternate hypothesis
H 1:   45 . This means that the interval is   x  t s x  46.8333  1.476 0.30765   46.379.
Make a diagram. Make a Normal curve centered at x  46.83333 . ( x , of course, is in
the
confidence interval.) The confidence interval is the area under the curve above 46.379.Shade it. Since
 0  45 is not in the confidence interval, reject the null hypothesis.
252y0811 3/7/08
5
f. (Extra Credit) Test the hypothesis that the population standard deviation is 0.70 at the 99.9%
significance level assuming that a random sample of 50 yielded a sample standard deviation of
0.75.
The table below is part of Table 3.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
VarianceH 0 :  2   02
n  1s 2
n  1s 2
 .25 .5 2  02
2
2 
2  2
Small Sample
s cv 
.5 .5 2 
 02
n 1
H1: :  2   02
VarianceLarge Sample
.
 
s 2DF 
 z 2  2DF 
H 0 :  2   02
z 
2  2DF   1
2
H1 :  2   02
s cv 
 2 DF
 z  2  2 DF
 H :   0.70
Our hypotheses are  0
  1  .999  .001 , s  0.75 , n  50 and  0  0.70 . The test
 H 1 :   0.70
ratio, which is what most people use, is  2 
n  1s 2
 02

49 0.75 2
0.70 2
 56 .25 . Because the degrees of
freedom, df  n  1  50  1  49 , are too high for the chi-squared table, we must use
z  2  2  2df  1  256 .25   249   1  256 .25   249   1  112 .50  97
 10.6066  9.8489  0.7577 . We could make a diagram showing a Normal curve centered
above zero with one ‘reject’ zone below  z   z.0005  3.29 and a second reject zone above
2
z 2  z.0005  3.29 . Since z  0.7577 does not fall in these zones, do not reject the null
hypothesis. However the easiest way for me is to say
p  value  2Pz  0.7577   2Pz  0  P0  z  0.76   2.5  .2764   .4472 . Since this pvalue is above any significance level that we would ever use, do not reject the null hypothesis.
252y0811 3/7/08
6
III. Do as many of the following problems as you can. (2 points each unless marked otherwise adding to
13+ points). Show your work except in multiple choice questions. (Actually – it doesn’t hurt there
either.) If the answer is ‘None of the above,’ put in the correct answer if possible. ( ) gives points for the
question. [ ] gives a running total.
1) If I want to test to see if the population mean of x is smaller than 5 my null hypothesis is:
i)   5
ii)   5
iii) *   5
iv)   5
vi) None of the above. (So what is it?)
vii) Any of i)-iv) could be right. We need more information.
Explanation: The statement “The population mean of x is smaller than 5” translates as   5 . Since this
does not contain an equality, it is an alternative hypothesis. The opposite is H 0 :   5 and must be the null
hypothesis.
2) Assuming that you have a sample mean of 100 based on a sample of 36 taken from a population of 900
and you are testing to see if the population mean is 90 with a known population standard deviation of 80,
the 99% critical values for the sample mean are
 80 

a) 100  2.576 
 36 
 80 

b) 100  2.626 

 36 
 80 

c) 100  2.576 

 900 
 80 

d) * 90  2.576 

 36 
 80 

e) 90  2.626 

 36 
 80 

f) 90  2.576 
 900 
Explanation: The statement “The population mean is 90” translates as H 0 :   90 . So we have
  .01 ,  0  90 , n  36 and   80 . The other two statements, x  100 and N  100 are irrelevant since
1) a critical value is based on the null hypothesis and 2) 900 is more than 20 times 36 so no finite
population correction is required. On page 2 of this exam we quote Table 3 to say xcv   0  z  x .
2
Degrees of freedom are irrelevant since the population standard deviation is known. The t-table says

z  2  t .005
 2.576 .
3) Which of the following is a Type 1 error?
a) Rejecting the null hypothesis when the null hypothesis is false.
b) *Rejecting the null hypothesis when the null hypothesis is true.
c) Not rejecting the null hypothesis when the null hypothesis is true.
d) Not rejecting the null hypothesis when the null hypothesis is false.
e) All of the above
f) None of the above.
252y0811 3/7/08
7
4) (Langley) It is generally believed that 15% of white Australians are allergic to penicillin. A doctor
believes that the allergy occurs in a lower proportion of Native Australians. To test that belief a random
sample is gathered of 50 Native Australians and it is found that only one (2% of the sample) is allergic to
penicillin. The doctor creates a p-value to compare against a significance level of 5%. What do we mean by
a p-value? [8]
a) P-value is the 2% proportion.
b)* P-value is P p  .02  .
c) P-value is P p  .02  .
d) P-value is 2 P p  .02  .
e) P-value is 2 P p  .02  .
f) P-value is P p  .15  .
g) P-value is P p  .15  .
h) P-value is 2 P p  .15  .
i) P-value is 2 P p  .15  .
j) P-value is 2 P p  .05  .
Explanation: The statement “The allergy occurs in a lower proportion” translates as H 1 : p  .15 . The pvalue is a measure of the credibility of the null hypothesis and is defined as the probability that a test
statistic or ratio as extreme as or more extreme than the observed statistic or ratio could occur, assuming
2
 .02 and the alternative
that the null hypothesis is true. The observed statistic that we are using is p 
50
hypothesis says that this is a left-sided test. (We would reject the null hypothesis H 0 : p  .15 if p is too
far below .15) Thus the p-value is P p  .02  .
Exhibit 1: (Langley) Langley’s daughter loved to play chutes and ladders. She told her daddy, however,
that the one die she used seemed to come up a six when she didn’t want a six. A fair die is equally likely to
come up with each of its six faces on top. Langley now suspected that the die was coming up a six more
often than it should. As an experiment, he rolled the die 108 times and it came up a six 25 times.
5) We wish to test whether Langley’s suspicion in exhibit 1 is correct . To do so, we must do which of the
following: [10]
a) A z-test of the population mean.
b) *A z-test of a population proportion.
c) A t-test of the population mean.
d) A  2 -test of the population variance.
e) A test of the population median
f) None of the above (To get full credit propose a test type.)
Explanation: If we are trying to compare the proportion of times that six comes up with 1 6 , we are testing
proportions. This could be a binomial test, but the only test anyone uses for a proportion with a large
sample is a z-test. A number of you said this involved a mean. Ask yourself of what you would take the
mean.
6) In Exhibit 1, what are the null and alternative hypotheses that Langley should be testing. (2)
According to Table 3 the test for a proportions involves the following.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Proportion
p  p0
p  p  z 2 s p
pcv  p0  z 2  p
H 0 : p  p0
z

H1 : p  p0
p
pq
p0 q0
sp 
p 
n
n
q  1 p
q0  1  p0
252y0811 3/7/08
8
“A fair die is equally likely to come up with each of its six faces on top. Langley now suspected that the die
was coming up a six more often than it should,” means that Langley believed H 1 : p  1 6 , which is an
alternative hypothesis because it does not contain an equality. So the null hypothesis is the opposite, which
H 0 : p  1 6
is H 0 : p  1 6 and we have 
.  1 6  .1667 
1
H
:
p

1
6

7) In Exhibit 1, what is the value of the test ratio that you would use to test your hypotheses in 6)? Show
your work. (Note that this could be right even if the answer to 6) is wrong.) (3)
Solution: n  108 and x  25 , so p 

1
6
5 6 
108
25
1
 .2315 . p 0  from the null hypothesis.  p 
108
6
 .001286  .03576 . Thus our test ratio is z 
p  p0
p

p0 q0
n
.2315  .1667
 1.8202
.0356
8) Using a 95% confidence level, explain, using your hypotheses, whether the die was fair. (2) [17]
Solution: Since z has the standardized Normal distribution and since our alternative hypothesis is
H 1 : p  16 , we are worried about p being too large, so we have a right-sided test. With a 95% confidence
level and a 1-sided test, we use z  1.645 and test our computed value of z . Make a diagram. The
Normal curve is centered at zero and the ‘reject’ region is all points above 1.645. Shade the ‘reject’ region
and note that 1.8202 falls in the ‘reject’ region so we reject our null hypothesis and feel that we have
demonstrated that the null hypothesis is false.
Exhibit 2: (Ng) The manager of the credit department believes that the average balance held by credit card
holders is $75. A random sample of 29 accounts is selected and she finds that the sample mean of the
amount owed is $83.40 and the sample standard deviation is $23.40. It is believed that the distribution of
the population is approximately Normal.
9) We wish to test whether the manager’s belief in exhibit 2 is correct. To do so, we must do which of the
following: (1) [18]
a) A z-test of the population mean.
b) A z-test of a population proportion.
c) *A t-test of the population mean.
d) A  2 -test of the population variance.
e) A test of the population median.
f) None of the above (To get full credit propose a test type.)
Explanation: If we are trying to compare the sample mean of $83.40 with a population mean of $75 and
the only information we have about the variance is a sample standard deviation, the population variance is
unknown so we must do a t-test.
252y0811 3/7/08
9
10) a) State your null and alternative hypotheses to test the manager’s belief in Exhibit 2 (1). b) Give an
appropriate critical value or values (for a mean, proportion, variance or median). (2) [21]
 H :   75
Solution: a)  0
b) On page 2 of this exam we quote Table 3 to say that xcv   0  t  s x , where
2
 H 1 :   75
s
. We are given the following information in the exhibit:  0  75 , n  29 , x  83.40 and
sx 
n
s  23.40. We will assume in the absence of any other value that   .05 and use t n 1  t 28  2.048 .

The standard error is s x 
s

23 .40

2
.025
23 .40 2
 18 .8814  4.3453 .If we fill in the critical value
29
n
29
formula, xcv   0  t  s x  75  2.048 4.3453   75  8.899 . The critical values are 66.101 and 83.899.
2
Note: You were not asked for a conclusion to this problem, but x  83.40 is barely below the upper critical
value, so that we do not reject the null hypothesis.
11) The manager of the credit department believes that the median balance held by credit card holders is
above $75 and that the population does not have a Normal distribution. A random sample of 100 accounts
is selected and 60 of the accounts have balances above $75. Which of the following is the null hypothesis
that the manager will end up testing? (To protect yourself, you might want to explain what p is. Otherwise
I will use my own assumption.) (2) [23]
a) p  .5
b) p  .5
c) p  .5 .
d) p  .5
e) * p  .5
f) None of the above (To get full credit propose a null hypothesis.)
 H :  75
Solution:  0
. Note that if the distribution was Normal, we could test for the mean. Other things
 H 1 :  75
being equal, we can assume that p is the proportion of accounts that have balances above $75. The quick
and dirty way to do this is to copy from the outline. The relevant part is starred.
Hypotheses about
Hypotheses about a proportion
a median
If p is the proportion
If p is the proportion
 H 0 :   0

H 1 :   0
 H 0 :   0

H 1 :   0
above  0
below  0
 H 0 : p .5

 H 1 : p .5
 H 0 : p .5
*

 H 1 : p  .5
 H 0 : p .5

 H 1 : p  .5
 H 0 : p .5

 H 1 : p  .5
Note: I assumed that by p you meant the proportion of accounts that had balances above 75. Telling me
that p is a proportion of the accounts doesn’t tell me anything.
252y0811 3/7/08
10
ECO252 QBA2
FIRST EXAM
February 28, 2008
TAKE HOME SECTION
Name: _________________________
Student Number and class time: _________________________
IV. Do at least 3 problems (at least 7 each) (or do sections adding to at least 20 points - Anything extra you do helps, and grades
wrap around) . Show your work! State H 0 and H 1 where appropriate. You have not done a hypothesis test unless you have
stated your hypotheses, run the numbers and stated your conclusion. (Use a 95% confidence level unless another level is
specified.) Answers without reasons usually are not acceptable. Neatness and clarity of explanation are expected. This must be
turned in when you take the in-class exam. Note that answers without reasons and citation of appropriate statistical tests
receive no credit. Failing to be transparent about which section of which problem you are doing can lose you credit. Many answers
require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value.
If you haven’t done it lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere
Else).
Problem 1: (Doane and Seward) A fast food restaurant has just started serving hot cocoa. The management wishes to serve cocoa of
an average temperature of 142 degrees. 24 measurements of the temperature in 10 stores are taken. You are manager of store a and
will use the corresponding column, where a is the second to last digit of your student number. (For example, Seymour Butz’ student
number is 543987 so he uses column x8.) If that number is zero, use column 10. You are testing to see if the mean for your store is
142. There will be a penalty if you do not make it clear what column you are using.
Row
x1
x2
x3
x4
x5
x6
x7
x8
x9 x10
1 140 142 142 143 144 142 143 146 143 143
2 142 143 143 138 139 142 144 145 144 139
3 141 141 141 140 144 142 144 144 141 145
4 142 142 142 140 143 140 145 144 144 144
5 141 139 142 142 141 141 146 145 142 145
6 141 142 140 139 142 141 142 144 140 142
7 145 144 143 142 141 137 145 141 142 141
8 142 145 142 139 138 142 142 141 141 140
9 142 143 141 145 144 139 145 144 146 142
10 142 143 136 139 145 141 143 144 140 142
11 137 141 142 142 139 141 142 139 143 142
12 139 139 138 138 141 143 142 142 144 146
13 139 143 142 142 145 141 141 141 142 142
14 144 144 139 141 142 142 147 142 143 143
15 140 140 140 142 144 140 141 144 142 142
16 141 138 143 141 145 142 137 145 141 140
17 140 141 139 141 142 142 142 139 142 144
18 140 140 139 140 144 142 140 142 144 143
19 140 142 139 142 136 139 144 143 144 141
20 139 140 141 141 138 142 142 146 145 144
21 146 138 143 143 141 143 147 142 145 143
22 138 139 141 141 142 143 146 144 141 141
23 139 142 140 140 140 142 140 144 143 139
24 142 141 143 140 141 140 143 146 142 142
Assume that the Normal distribution applies to the data and use a 98% confidence level.
a. Find the sample mean and sample standard deviation of the incomes in your data, showing your work. (1) (Your mean
should be between 140 and 146 and your sample standard deviation should be around 2.)
b. State your null and alternative hypotheses (1)
c. Test the hypothesis using a test ratio (1)
d. Test the hypothesis using a critical value for a sample mean. (1)
e. Test the hypothesis using a confidence interval (1)
f. Find an approximate p-value for the null hypothesis. (1)
g. On the basis of your tests, is the mean temperature correct in your restaurant?? Why? (1)
h. How do your conclusions change if the random sample of 24 temperatures is taken on a day in which only 48 cups
cocoa are sold? (2)
i. Assume that the Normal distribution does not apply and test to see if the median is 142. Be careful! What should you do
with numbers that are exactly 142? (2)
[12]
j. (Extra Credit) Do a 98% confidence interval for the median. (2)
Problem 2: Once again assume that the Normal distribution applies to the data in Problem 1, but that we know that the population
standard deviation is 2. Our confidence level remains 98%, but we are now testing the hypothesis that the mean is below 143 degrees.
a. State your null and your alternative hypotheses. (1)
b. Find the value of z that you need for a critical value for a 1-sided test if the confidence level is 98%.(1)
252y0811 3/7/08
11
c. Find a critical value for the sample mean to test if the mean is below 143 degrees. (1)
d. Test the hypothesis that the mean is below 143 degrees using an appropriate confidence interval. (2)
e. Using your critical value from 2b, create a power curve for your test. (6)
f. Assume that the population standard deviation is 2. How large a sample do you need to get a two-sided 98% confidence
interval with an error not exceeding 0.5 degrees? (2) [22]
Problem 3: According to Doane and Seward about 13% of goods bought at a department store are returned. An organization called
Return Exchange will sell you a software product called Verify-1for which it makes the claims below.
Verify-1® is quickly operational. And it authorizes returns even quicker
Verify-1® identifies fraud and abuse at the point of return before they become liabilities to your brand equity or profits. In
stand-alone mode, this easy-to-use, turnkey solution can be operational in 30 days and will reduce your return rate
immediately, without disrupting your business or IT configuration. Verify-1® also integrates easily into your existing POS
platform.
You set the policy, Verify-1® enforces it
With Verify-1®, your returns are dealt with consistently utilizing advanced statistical modeling in combination with state
return laws and your existing return policies. At the point of return, using the customer’s driver’s license or other valid
identification, Verify-1® automatically checks prior return behavior and authorizes or declines the transaction. Customers
identified as risks for presenting fraudulent returns are declined, while legitimate returns are speedily accepted.
You take a sample of n items and find that there were x returns (about 9%).You are the manager of store a . (a is the last digit of
your student number. (For example, Seymour Butz’ student number is 543987 so he manages store 7.) The sample size and number of
returns for your store is given below. On the basis of this sample, can you now say that the return rate is now below 13%? Use a
confidence level of 95%.
Store
1
2
3
4
5
6
7
8
9
10
n
275
250
225
200
175
150
125
100
75
50
x
25
22
20
18
16
13
11
9
7
4
a) State your null and alternative hypotheses. (1) Make sure I know which store you manage.
b) Test the hypothesis using a test ratio or a critical value for the observed proportion. (1) Make a diagram showing clearly where your
‘reject’ region is. (Do not round excessively. If you compute proportions carry at least 3 significant figures.)
c) Find a p-value for your null hypothesis. (1)
d) Test your hypothesis using an appropriate confidence interval. (2) [5]
e) Using the 13% proportion as an estimate of the true proportion, find out how large a sample you need to create a 95% confidence
interval with an error of no more than 1% (2)
f) (Extra credit) Remember that the method that you have been using to deal with proportions substitutes the Normal distribution for
the binomial distribution. In general the p-values that you have computed are higher than you would get if you used the binomial
distribution. Verify this by making a continuity correction as described in the outline and repeating your test in b). (2)
g) (Extra credit) Using 13%, your critical value, a point between your critical value and 13% and one or two other points on the side of
the critical value implied by the alternative hypothesis (only one point on this side may give a reasonable value for a proportion) put
together a power curve for your test. Remember that your standard error will change if the true proportion changes. (8)
h) Go back to the test in parts a) b) and c) of this problem. Take your values of n and x and multiply them by 1.6, rounding your
values to the nearest whole number (or numbers) if necessary. Find the new value of the test ratio and get a p-value. What does the
change in p-value between parts c) and g) suggest about the effect of increased sample size on the power of the test? (3) [32]
Problem 4: According to Doane and Seward both the mean and the standard deviation of pH (a measure of acidity) are of interest to
winemakers. Assume that your firm (store from the last problem) has gotten into the wine business. A sample of 16 wine bottles is
taken. Your column has the same number as your store. Minitab has calculated all sorts of sample statistics on your data. These are
listed below. Use them.
Row
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
1 3.41 3.44 3.61 3.39 3.41 3.43 3.40 3.56 3.53 3.17
2 3.45 3.42 3.59 3.37 3.39 3.41 3.38 3.53 3.56 3.21
3 3.51 3.45 3.63 3.41 3.43 3.45 3.42 3.59 3.63 3.27
4 3.52 3.48 3.65 3.44 3.46 3.47 3.45 3.63 3.65 3.28
5 3.68 3.68 3.87 3.66 3.69 3.69 3.68 3.95 3.82 3.44
6 3.29 3.45 3.62 3.41 3.43 3.44 3.42 3.58 3.39 3.05
7 3.39 3.42 3.59 3.37 3.39 3.41 3.38 3.53 3.50 3.15
8 3.57 3.50 3.67 3.45 3.48 3.49 3.47 3.65 3.70 3.33
9 3.38 3.41 3.58 3.36 3.38 3.40 3.37 3.52 3.49 3.14
10 3.14 3.36 3.52 3.30 3.32 3.34 3.31 3.43 3.23 2.90
11 3.61 3.69 3.87 3.66 3.70 3.70 3.68 3.95 3.75 3.37
12 3.23 3.40 3.57 3.35 3.37 3.39 3.36 3.51 3.32 2.99
13 3.48 3.48 3.66 3.44 3.46 3.48 3.45 3.63 3.59 3.24
14 3.39 3.48 3.65 3.43 3.45 3.47 3.44 3.62 3.51 3.15
15 3.49 3.45 3.62 3.40 3.42 3.44 3.41 3.57 3.61 3.25
16 3.50 3.63 3.81 3.60 3.63 3.64 3.62 3.87 3.62 3.26
252y0811 3/7/08
Variable
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
12
N
N*
Mean
SE Mean
StDev
Minimum
Q1
Median
Q3
16
16
16
16
16
16
16
16
16
16
0
0
0
0
0
0
0
0
0
0
3.4400
3.4837
3.6569
3.4400
3.4631
3.4781
3.4525
3.6325
3.5562
3.2000
0.0347
0.0245
0.0259
0.0268
0.0281
0.0265
0.0278
0.0388
0.0382
0.0347
0.1387
0.0980
0.1037
0.1072
0.1124
0.1061
0.1110
0.1553
0.1528
0.1387
3.1400
3.3600
3.5200
3.3000
3.3200
3.3400
3.3100
3.4300
3.2300
2.9000
3.3825
3.4200
3.5900
3.3700
3.3900
3.4100
3.3800
3.5300
3.4925
3.1425
3.4650
3.4500
3.6250
3.4100
3.4300
3.4450
3.4200
3.5850
3.5750
3.2250
3.5175
3.4950
3.6675
3.4475
3.4750
3.4875
3.4650
3.6450
3.6450
3.2775
You must state
Maximum
3.6800
3.6900
3.8700
3.6600
3.7000
3.7000
3.6800
3.9500
3.8200
3.4400
H 0 and H 1 where applicable to get credit for any of the tests below. Make sure that I know which column you
are using!
a) The acceptable standard deviation for wine pH is 0.10. Using the data for your store, test the hypothesis that the standard
deviation is 0.10 using a 95% confidence level. (2)
b) Test the hypothesis that the standard deviation is below .14. (1)
c) Repeat a) and b) using the sample (mean and) variance you used in a) and b) but assuming a sample size of 100. Find pvalues. (4)
d) Find 2-sided 95% confidence interval for the standard deviation using data from your store and assuming a sample size
of 16. (2)
e) Repeat d) for a sample size of 100. (1)
[41]
f) Here’s the easiest question on the exam. By now you should have figured out that you don’t have to understand a
statistical test at all if you know i) what it assumes, ii) what the null hypothesis is and iii) what the p-value is associated
with the null hypothesis. So, I am going to do a test that the standard deviation is 0.1 on the following data set.
C11
3.53
3.51
3.54
3.57
3.78
3.54
3.51
3.59
3.50
3.44
3.78
3.49
3.57
3.57
3.54
3.72
Then I am going to run a Lilliefors test on these data using Minitab. The null hypothesis of the Lilliefors test is that the
sample comes from the Normal distribution. The test makes no assumptions about the mean and standard deviation of the
population and computes these as sample statistics from the data. After it printed ‘Probability plot of C11,’ the computer
printed a graph of the data, but the only thing I looked at was the p-value which was less than .01. After the Lilliefors test,
the computer printed out the results of two versions of a statistical test on the standard deviation. The ‘Standard’ version is
the method that you learned and is only applicable if the data comes from a Normal distribution. The ‘Adjusted’ version is
for all other cases. So explain what p-value I look at and what it tells me.
MTB > NormTest c11;
SUBC>
KSTest.
Probability Plot of C11
MTB > OneVariance c11;
SUBC>
Test .1;
SUBC>
Confidence 95.0;
SUBC>
Alternative 0;
SUBC>
StDeviation.
Test and CI for One Standard Deviation: C11
Method
Null hypothesis
Sigma = 0.1
Alternative hypothesis Sigma not = 0.1
The standard method is only for the normal distribution.
The adjusted method is for any continuous distribution.
Statistics
Variable
N
C11
16
StDev
0.100
Variance
0.0100
95% Confidence Intervals
Variable
C11
Tests
Variable
C11
Method
Standard
Adjusted
CI for StDev
(0.074, 0.155)
(0.071, 0.170)
Method
Standard
Adjusted
Chi-Square
15.06
11.12
CI for
Variance
(0.0055, 0.0240)
(0.0050, 0.0288)
DF
15.00
11.07
P-Value
0.895
0.880
252y0811 3/7/08
13
Solutions
Problem 1: (Doane and Seward) A fast food restaurant has just started serving hot cocoa. The management
wishes to serve cocoa of an average temperature of 142 degrees. 24 measurements of the temperature in 10
stores are taken. You are manager of store a and will use the corresponding column, where a is the
second to last digit of your student number. (For example, Seymour Butz’ student number is 543987 so he
uses column x8.) If that number is zero, use column 10. You are testing to see if the mean for your store is
142. There will be a penalty if you do not make it clear what column you are using.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
x1
140
142
141
142
141
141
145
142
142
142
137
139
139
144
140
141
140
140
140
139
146
138
139
142
x2
142
143
141
142
139
142
144
145
143
143
141
139
143
144
140
138
141
140
142
140
138
139
142
141
x3
142
143
141
142
142
140
143
142
141
136
142
138
142
139
140
143
139
139
139
141
143
141
140
143
x4
143
138
140
140
142
139
142
139
145
139
142
138
142
141
142
141
141
140
142
141
143
141
140
140
x5
144
139
144
143
141
142
141
138
144
145
139
141
145
142
144
145
142
144
136
138
141
142
140
141
x6
142
142
142
140
141
141
137
142
139
141
141
143
141
142
140
142
142
142
139
142
143
143
142
140
x7
143
144
144
145
146
142
145
142
145
143
142
142
141
147
141
137
142
140
144
142
147
146
140
143
x8
146
145
144
144
145
144
141
141
144
144
139
142
141
142
144
145
139
142
143
146
142
144
144
146
x9
143
144
141
144
142
140
142
141
146
140
143
144
142
143
142
141
142
144
144
145
145
141
143
142
x10
143
139
145
144
145
142
141
140
142
142
142
146
142
143
142
140
144
143
141
144
143
141
139
142
Assume that the Normal distribution applies to the data and use a 98% confidence level   .02  .
a. Find the sample mean and sample standard deviation of the incomes in your data, showing
your work. (1) (Your mean should be between 140 and 146 and your sample standard deviation
should be around 2.)
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
x4
x 42
x8
x 82
143 20449 146 21316
138 19044 145 21025
140 19600 144 20736
140 19600 144 20736
142 20164 145 21025
139 19321 144 20736
142 20164 141 19881
139 19321 141 19881
145 21025 144 20736
139 19321 144 20736
142 20164 139 19321
138 19044 142 20164
142 20164 141 19881
141 19881 142 20164
142 20164 144 20736
141 19881 145 21025
141 19881 139 19321
140 19600 142 20164
142 20164 143 20449
141 19881 146 21316
143 20449 142 20164
141 19881 144 20736
140 19600 144 20736
140 19600 146 21316
3381 476363 3437 492301
x
Column 4:
x4 

x
4
n

4
 3381 ,
2
s 4  2.8098  1.6762
x8

x

x
n
8
2
4
476363 and n 4  24
3381
 140 .875 s 42 
24
476363  24 140 .875 
23
Column 8:
x
8
3437 ,

x
2
8
n 1
492301 and n8  24
3437

 143 .20833 s 82 
24
492301  24 143 .20833 
23
 nx 4 2
64 .6250
 2.8098
23
x
2
2
4
x
2
8
 nx8 2
n 1
93 ..9583

 4.0851
23
s8  4.0451  2.0212
If you used the definitional formula, your answers should be
identical except for rounding error.
252y0811 3/7/08
14
For later use the standard errors are s 4  x 
s8 x

s8

n
 H 0 :   142

 H 1 :   142
s82
s4

n
s 42

n
2.8098
 0.3422 and
24
4.0851
 0.4126 .
n
24
b. State your null and alternative hypotheses (1)

23
Note:  0  142 ,   .02 and df  n  1  23 . For a 2-sided test use t .01
 2.500 .
The table below is part of Table 3.
Interval for
Confidence
Interval
Mean (
  x  t 2 s x
unknown)
DF  n 1
Hypotheses
Test Ratio
H0 :   0
t
H1 :    0
x  0
sx
Critical Value
xcv   0  t  2 s x
sx 
s
n
c. Test the hypothesis using a test ratio (1)
Make a diagram. Make a Normal curve with a center at zero. Indicate two ‘reject’ zones, one below
23
23
 t .01
 -2.500 and one above t .01
 2.500.
For Column 4 the sample mean is x 4  140 .875 and the standard error is s 4  x  0.3422 . So the value of
the test ratio is t 
x   0 140 .875  142

 -3.288. This is in the lower ‘reject’ zone so reject the null
0.3422
sx
hypothesis.
For Column 8 the sample mean is x8  143 .20833 and the standard error is s 8  x  0.4126 . So the value of
the test ratio is t 
x   0 143 .20833  142

 2.9286. This is in the upper ‘reject’ zone so reject the null
0.4126
sx
hypothesis.
d. Test the hypothesis using a critical value for a sample mean. (1)
23
 2.500 and  0  142 .
xcv   0  t  s x where t .01
2
For Column 4, the standard error is s 4  x  0.3422 so the critical values are
xcv  142  2.500 0.3422   142  0.856 . Make a diagram. Make a Normal curve with a center at
 0  142 . Indicate two ‘reject’ zones, one below 141.145 and one above 142.856. Since the sample mean
of x 4  140 .875 falls in the lower ‘reject’ zone, reject the null hypothesis,
For Column 8 the standard error is s 8  x  0.4126 , so the critical values are
x cv  142  2.500 0.4126   142  1.032 . Make a diagram. Make a Normal curve with a center at
 0  142 . Indicate two ‘reject’ zones, one below 140.969 and one above 143.032. Since the sample mean
of x8  143 .20833 falls in the upper ‘reject’ zone, reject the null hypothesis.
e. Test the hypothesis using a confidence interval (1)
23
 2.500.
  x  t 2 s x where t .01
For Column 4 the sample mean is x 4  140 .875 and the standard error is s 4  x  0.3422 . So
  140 .875  2.500 0.3422   140 .875  0.856 . Make a diagram. Make a Normal curve with a center at
x 4  140 .875 . Shade the confidence interval between 140.019 and 141.731. Since  0  142 does not fall in
the confidence interval, reject the null hypothesis.
252y0811 3/7/08
15
For Column 8 the sample mean is x8  143 .20833 and the standard error is s 8  x  0.4126 . So
  143 .20833  2.500 0.4126   143 .208  1.032 . Make a diagram. Make a Normal curve with a center at
x8  143 .20833 . Shade the confidence interval between 142.176 and 144.240. Since  0  142 does not fall
in the confidence interval, reject the null hypothesis.
f. Find an approximate p-value for the null hypothesis. (1)
Here is the df  n  1  23 line of the t table.
df .45 .40 .35 .30 .25 .20 .15 .10 .05 .025 .01 .005 .001
23 0.127 0.256 0.390 0.532 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.485
x  0
= -3.288. Since 3.288 is
sx
between 2.807 and 3.485, we can say .005  Pt  3.288   .001 . Since this is a 2-sided test, we double the
p-value and say P  value  2Px  140 .875   2Pt  3.288   2Pt  3.288  . Thus
.01  p  value  .002 .
For Column 4 the sample mean is x 4  140 .875 and test ratio is t 
x  0
= 2.9286. Since 2.9286 is
sx
between 2.807 and 3.485, we can say .005  Pt  2.9286   .001 . Since this is a 2-sided test, we double
the p-value and say P  value  2Px  143 .20833   2Pt  3.485  . Thus .01  p  value  .002 .
It is time to give the p-values for all of the columns as computed by Minitab. P-values are underlined.
For Column 8 the sample mean is x8  143 .20833 and the test ratio is t 
MTB > Onet c1;
SUBC>
Test 142.
One-Sample T: x1
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x1
24 140.917 2.104
0.430
95% CI
(140.028, 141.805)
T
-2.52
P
0.019
MTB > Onet c2;
SUBC>
Test 142.
One-Sample T: x2
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x2
24 141.333 1.926
0.393
95% CI
(140.520, 142.147)
T
-1.70
P
0.103
MTB > Onet c3;
SUBC>
Test 142.
One-Sample T: x3
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x3
24 140.875 1.849
0.377
95% CI
(140.094, 141.656)
T
-2.98
P
0.007
MTB > Onet c4;
SUBC>
Test 142.
One-Sample T: x4
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x4
24 140.875 1.676
0.342
95% CI
(140.167, 141.583)
T
-3.29
P
0.003
MTB > Onet c5;
SUBC>
Test 142.
One-Sample T: x5
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x5
24 141.708 2.476
0.505
95% CI
(140.663, 142.754)
T
-0.58
P
0.569
MTB > Onet c6;
SUBC>
Test 142.
One-Sample T: x6
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x6
24 141.208 1.444
0.295
95% CI
(140.599, 141.818)
T
-2.69
P
0.013
252y0811 3/7/08
16
MTB > Onet c7;
SUBC>
Test 142.
One-Sample T: x7
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x7
24 143.042 2.404
0.491
95% CI
(142.026, 144.057)
T
2.12
P
0.045
MTB > Onet c8;
SUBC>
Test 142.
One-Sample T: x8
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x8
24 143.208 2.021
0.413
95% CI
(142.355, 144.062)
T
2.93
P
0.008
MTB > Onet c9;
SUBC>
Test 142.
One-Sample T: x9
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x9
24 142.667 1.606
0.328
95% CI
(141.988, 143.345)
T
2.03
P
0.054
MTB > Onet c10;
SUBC>
Test 142.
One-Sample T: x10
Test of mu = 142 vs not = 142
Variable
N
Mean StDev SE Mean
x10
24 142.292 1.829
0.373
95% CI
(141.519, 143.064)
T
0.78
P
0.443
g. On the basis of your tests, is the mean temperature correct in your restaurant?? Why? (1)
Our statistical tests   .02  seem to show that columns 1, 3, 4, 6, and 8 give p-values below the
significance level, so we reject the hypothesis that the mean temperature is 42 in the corresponding
restaurants. However, the p-values for columns 2, 5, 7, 9 and 10 are above the significance level, so we
cannot reject the hypothesis.
h. How do your conclusions change if the random sample of 24 temperatures is taken on a day in
which only 48 cups cocoa are sold? (2)
This is pushing things, because you might contend that the samples given are samples from an indefinitely
long sting of cocoas. But if we assume that the machine is shut down and recalibrated after the 48 cups are
taken, we need a finite population correction because the sample of 24 cups is being taken from a
population that is far less than 20 times the sample size.
Let’s try this for Column 4. s 4  x 
s4
N n

N 1
s 42
n
N n
2.8098

N 1
24
24
 0.3422 0.51064
47
n
x  0
= -3.288, we are
 0.3422 0.71459   0.2445 . If we put this into a t-ratio, the ratio, which was t 
sx
dividing the old ratio by 0.71459, which makes the t-ratio larger, in this case 4.60. This value is off the ttable indicating p  value  .002 . Any column that resulted in a rejection of the null hypothesis will still
23
 2.500 was the value of t that we used in c) above. If
reject it with a lower p-value. Remember that t .01
we check the Minitab output for column 2, t  1.70 becomes 2.38 so we still do not reject H 0 . For
column 5, t  0.58 becomes -0.81 so we still do not reject H 0 . For column 7, t  2.12 becomes 2.97 so
we reject H 0 . For column 9, t  2.03 becomes 2.84 so we reject H 0 . For column 10, t  0.78 becomes
1.09 so we still do not reject H 0 .
252y0811 3/7/08
17
i. Assume that the Normal distribution does not apply and test to see if the median is 142. Be
careful! What should you do with numbers that are exactly 142? (2)
[12]
The following comes from the outline.
Hypotheses about
Hypotheses about a proportion
a median
If p is the proportion
If p is the proportion
 H 0 :   0

 H 1 :   0
above  0
below  0
 H 0 : p .5

 H 1 : p .5
 H 0 : p .5

 H 1 : p .5
The numbers have been placed in order below and the 142s eliminated. At the end of each column we find
x , the number of items above 142, and n , the total count of the remaining column.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
x
n
x1
137
138
139
139
139
139
140
140
140
140
140
141
141
141
141
144
145
146
x2
138
138
139
139
139
140
140
140
141
141
141
141
143
143
143
143
144
144
145
x3
136
138
139
139
139
139
140
140
140
141
141
141
141
143
143
143
143
143
x4
138
138
139
139
139
140
140
140
140
140
141
141
141
141
141
143
143
145
3
18
7
19
5
18
3
18
x5
136
138
138
139
139
140
141
141
141
141
141
143
144
144
144
144
144
145
145
145
9
20
x6
137
139
139
140
140
140
141
141
141
141
141
143
143
143
x7
137
140
140
141
141
143
143
143
144
144
144
145
145
145
146
146
147
147
3
14
13
18
x8
139
139
141
141
141
143
144
144
144
144
144
144
144
144
145
145
145
146
146
146
15
20
x9
140
140
141
141
141
141
143
143
143
143
144
144
144
144
144
145
145
146
x10
139
139
140
140
141
141
141
143
143
143
143
144
144
144
145
145
146
12
18
10
17
The outline says that for relatively small values of n , a continuity correction is advisable, so try
n
n
2x  1  n
z
, where the + applies if x  , and the  applies if x  .
2
2
n
Column 1
x  3, n  18
Column 2
x  7, n  19
Column 3
x  5, n  18
Column 4
x  3, n  18
Column 5
x  9, n  20
Column 6
x  3 n  14

pvalue  2 P z 


pvalue  2 P z 


pvalue  2 P z 


pvalue  2 P z 


pvalue  2 P z 


pvalue  2 P z 

23  1  18 
  2Pz  3.06   2.5  .4989   .0022

18

27   1  19 
  2Pz  0.92   2.5  .3212   .3576

19

25  1  18 
  2Pz  1.64   2.5  .4495   .1010

18

28  1  18 
  2Pz  0.24   2.5  .0948   .8014

18

29  1  20 
  2Pz  0.22   2.5  .0871   .8258

20


23  1  14
  2Pz  1.87   2.5  .4693   .0614

14

252y0811 3/7/08
Column 7
Column 8
Column 9
18

213   1  18 
  2Pz  1.64   2.5  .4495   .1010
x  13, n  18 pvalue  2 P z 

18



215   1  20 
  2 Pz  2.46   2.5  .4931   .0138164
x  15, n  20 pvalue  2 P z 

20



212   1  18 
  2 Pz  5.24   2.5  .5000   .0000
x  12, n  18 pvalue  2 P z 

18



210   1  17 
  2 Pz  0.49   2.5  .1879   .6242
Column 10 x  10, n  17 pvalue  2 P z 

17


There are actually 2 ways to do this. Remember   .02 and z  z.01  2.576 , so reject the null
2
hy6pothesis if z is not between 2.576 or if the p-value is below .02. The only rejection we have here is
for column 9.
j. (Extra Credit) Do a 98% confidence interval for the median. (2)
Actually, this is a lot easier than the last section. If we don’t mind being a bit sloppy, the outline
recommends using k 
n  1  z . 2 n
2
. For the original data n  24 and z  z.01  2.576 , so
2
24  1  2.576 24
 6.19 . This must be rounded down, so use the 6th number from the bottom and the
2
19th from the bottom, which is the 6th (24 + 1 – 6) number from the top. The columns appear below with the
6th and 19th number in boldface.
k
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
x1
137
138
139
139
139
139
140
140
140
140
140
141
141
141
141
142
142
142
142
142
142
144
145
146
x2
138
138
139
139
139
140
140
140
141
141
141
141
142
142
142
142
142
143
143
143
143
144
144
145
x3
136
138
139
139
139
139
140
140
140
141
141
141
141
142
142
142
142
142
142
143
143
143
143
143
x4
138
138
139
139
139
140
140
140
140
140
141
141
141
141
141
142
142
142
142
142
142
143
143
145
x5
136
138
138
139
139
140
141
141
141
141
141
142
142
142
142
143
144
144
144
144
144
145
145
145
x6
137
139
139
140
140
140
141
141
141
141
141
142
142
142
142
142
142
142
142
142
142
143
143
143
x7
137
140
140
141
141
142
142
142
142
142
142
143
143
143
144
144
144
145
145
145
146
146
147
147
x8
139
139
141
141
141
142
142
142
142
143
144
144
144
144
144
144
144
144
145
145
145
146
146
146
x9
140
140
141
141
141
141
142
142
142
142
142
142
143
143
143
143
144
144
144
144
144
145
145
146
x10
139
139
140
140
141
141
141
142
142
142
142
142
142
142
143
143
143
143
144
144
144
145
145
146
252y0811 3/7/08
19
Problem 2: Once again assume that the Normal distribution applies to the data in Problem 1, but that we
know that the population standard deviation is 2. Our confidence level remains 98%, but we are now testing
the hypothesis that the mean is below 143 degrees.
a. State your null and your alternative hypotheses. (1)
 H :   143
Our hypotheses are  0
  .02 ,  0  143 and this is a left sided test because we are worrying about
 H 1 :   143
our sample mean being below 143. Recall that n  24 , so that our standard error is  x 
 0.16667  0.40825 .
The table below is part of Table 3.
Interval for
Confidence
Hypotheses
Interval
Mean (
H0 :   0
  x  z 2  x
known)
H1 :    0

x 
n
Test Ratio
z


n
2

24
4
24
Critical Value
x  0
x
xcv   0  z 2  x
b. Find the value of z that you need for a critical value for a 1-sided test if the confidence level is
98%. (or, for less credit, 99%)
Actually, I should have had you do the test, but, since we are doing a left-sided test, so we must compute
x  0
x  143

and, if   .01 reject our hypothesis is we find that z is below z.01  2.327 . If
z
0.40825
x
  .02 , use z.02  2.054 or something similar. (Find z.02  2.054 by figuring out that
P0  z  z.02   .4800 )
c. Find a critical value for the sample mean to test if the mean is below 143 degrees. (1)
If   .01 use xcv   0  z  x  143  2.327 0.40825   142 .050
If   .02 use xcv   0  z  x  143  2.054 0.40825   142 .161
d. Test the hypothesis that the mean is below 143 degrees using an appropriate confidence
interval. (2)
Our alternative hypothesis is H 1 :   143 , so we use   x  z  x  x  z 0.40825 
If   .01 use   143 .21  2.327 0.40825   144 .16 to   140 .87  2.327 0.40825   141 .82 .
( z.005  2.576 , so a 2-sided interval would be 1.05 )
If your sample mean was 142.05 or lower you will contradict and reject H 0 :   143 .
If   .02 use   143 .21  2.054 0.40825   144 .05 to   140 .87  2.054 0.40825   141 .71
( z.01  2.327 , so a 2-sided interval would be 0.95 )
If your sample mean was 142.16 or lower you will contradict and reject H 0 :   143 .
e. Using your critical value from 2b, create a power curve for your test. (6)
If   .01 , you are using x cv   0  z  x  142 .050 and you will not reject the null hypothesis if
x  142 .050 . A group of suggested points for computing  are 143, 142.5, 142.050, 141.5 and 141
1  143

  Pz 

142 .050  143 
 Pz  2.33   .5  .4901  .9901  99 % . This was just a
0.40825 
check. Power  .01  
1  142 .5

  Pz 

142 .050  142 .5 
  Pz  1.10   .5  .3643  .8643 Power  13.6%
0.40825

252y0811 3/7/08
20

1  141 .952
  Pz 
1  141 .5
  Pz 
1  141
  Pz 





142 .050  141 .050 
  Pz  0  .5. Power  50 %
0.40825

142 .050  141 .5 
  Pz  1.35   .5  .4115  .0885 Power  91.2%
0.40825

142 .050  141 
 Pz  2.57   .5  .4949  .0051 Power  99.5%
0.40825 
If   .02 , you are using xcv  142 .161 and you will not reject the null hypothesis if x  142 .161 . A group
of suggested points for computing  are 143, 142.6, 142.161, 141.8 and 141.4
1  143

  Pz 

142 .161  143 
 Pz  2.06   .5  .4803  .9803  98 % . This was just a
0.40825 
check. Power  .02  

1  142 .6
  Pz 
1  142 .161
  Pz 
1  141 .8
  Pz 





142 .161  142 .6 
  Pz  1.07   .5  .3577  .8577 Power  14.2%
0.40825

142 .161  142 .161 
  Pz  0  .5000 Power  50 %
0.40825

142 .161  141 .8 
  Pz  0.88   .5  .3106  .1894 Power  81.1%
0.40825

142 .161  141 .4 
  Pz  1.86   .5  .4686  .0314 Power  96.9%
0.40825


Graph this neatly, with the means on the x axis and probabilities between zero and one on the y axis.
1  141 .4

  Pz 
f. Assume that the population standard deviation is 2. How large a sample do you need to get a
two-sided 98% confidence interval with an error not exceeding 0.5 degrees? (2)
[22]
The formula is n 
z 2 2
e2
.
If   .01 , z.005  2.576 and n 
2.576 2 2 2
If   .02 , z.01  2.327 and n 
2.327 2 2 2
0.5 2
0.5 2
 106 .17 . Use 107 or more.
 86 .64 . Use 87 or more.
252y0811 3/7/08
21
Problem 3: According to Doane and Seward about 13% of goods bought at a department store are
returned. An organization called Return Exchange will sell you a software product called Verify-1for which
it makes the claims below.
Verify-1® is quickly operational. And it authorizes returns even quicker
Verify-1® identifies fraud and abuse at the point of return before they become liabilities to your brand equity or profits. In
stand-alone mode, this easy-to-use, turnkey solution can be operational in 30 days and will reduce your return rate
immediately, without disrupting your business or IT configuration. Verify-1® also integrates easily into your existing POS
platform.
You set the policy, Verify-1® enforces it
With Verify-1®, your returns are dealt with consistently utilizing advanced statistical modeling in combination with state
return laws and your existing return policies. At the point of return, using the customer’s driver’s license or other valid
identification, Verify-1® automatically checks prior return behavior and authorizes or declines the transaction. Customers
identified as risks for presenting fraudulent returns are declined, while legitimate returns are speedily accepted.
You take a sample of n items and find that there were x returns (about 9%).You are the manager of store
a . (a is the last digit of your student number. (For example, Seymour Butz’ student number is 543987 so
he manages store 7.) The sample size and number of returns for your store is given below. On the basis of
this sample, can you now say that the return rate is now below 13%? Use a confidence level of 95%.
Store
1
2
3
4
5
6
7
8
9
10
n
275
250
225
200
175
150
125
100
75
50
x
25
22
20
18
16
13
11
9
7
4
a) State your null and alternative hypotheses. (1) Make sure I know which store you manage.
According to Table 3 the test for a proportion involves the following.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Proportion
p  p0
p  p  z 2 s p
pcv  p0  z 2  p
H 0 : p  p0
z
p
H1 : p  p0
pq
p0 q0
sp 
p 
n
n
q  1 p
q0  1  p0
You are trying to find out if your returns are significantly below 13%. This is an alternative hypothesis.
 H : p  .13
Your hypotheses are  0
. . This is a left-sided test since values of the observed proportion below
 H 1 : p  .13
.13 are the only values that could lead to a rejection of the null hypotheses. p 0  .13 .
b) Test the hypothesis using a test ratio or a critical value for the observed proportion. (1) Make a
diagram showing clearly where your ‘reject’ region is. (Do not round excessively. If you compute
proportions carry at least 3 significant figures.)
(Store 1) The problem states that we are testing p  .13 , when n  275 and   .05 .
p0 q0
25
.13.87 
 .09091 and we can compute  p 

 .000411  0.02028 ,
275
n
275
where p0 comes from our null hypothesis and q0  1  p0 .
x  25 or p 
sp 
pq
.09091 .90909 

 .000301  0.017336
n
275
252y0811 3/7/08
It is most expedient to use a test ratio, z 
22
p  p0
p

.09091  .13
 1.928 . Make a diagram. Show a
.02028
normal curve with a mean at zero and an area of 5% below z.05  1.645 . The area below -1.645 is the
‘rejection zone.’ Since our value of z is below -1.645, it is in the ‘rejection zone,’ and we reject the null
hypothesis.
If we want a critical value use p 0  z.05 p  .13  1.645 .02028   .0966 . We reject our null hypothesis if
25
 .09091 is in the ‘rejection zone,’ and we reject the null hypothesis.
275
(Store 10) The problem states that we are testing p  .13 , when n  50 and   .05 .
p  .0966 . p 
4
 .08000 and we can compute  p 
50
p0 comes from our null hypothesis and q0  1  p0 .
x  4 or p 
p0 q0
.13.87 

 .002262  0.04756 , where
n
50
pq
.08 .92 

 .001472  0.038361
n
50
p  p0 .08  .13
It is most expedient to use a test ratio, z 

 1.051 . Make a diagram. Show a normal
p
.04756
sp 
curve with a mean at zero and an area of 5% below z.05  1.645 . The area below -1.645 is the ‘rejection
zone.’ Since our value of z is not below -1.645, it is not in the ‘rejection zone,’ and we do not reject the
null hypothesis.
If we want a critical value, use p 0  z.05 p  .13  1.645 .04756   .05176 . We reject our null hypothesis
4
 .08000 is not in the ‘rejection zone,’ and we do not reject the null hypothesis.
50
c) Find a p-value for your null hypothesis. (1)
(Store 1) pvalue  Px  25   P p  .09091   Pz  1.928   .5  P1.93  z  0  .5  .4732  .0268 .
Since this is below   .05 , we reject the null hypothesis.
(Store 10) pvalue  Px  4  P p  .08   Pz  1.051  .5  P1.05  z  0  .5  .3531  .1469 . Since
this is above   .05 , we cannot reject the null hypothesis.
if p  .0576 . p 
d) Test your hypothesis using an appropriate confidence interval. (2) [5]
 H : p  .13
Our hypotheses are  0
and the formula for a two-sided confidence interval is p  p  z s p . If
2
 H 1 : p  .13
we form a one-sided interval by mimicking the alternative hypothesis, we find p  p  z s p . Remember
that a confidence interval always includes p . z.05  1.645
25
 .09091 and s p  0.017336 p  .09091  1.645 0.017336   .11943
275
4
(Store 10) p 
 .08000 and s p  0.038361 p  .0800  1.645 0.038361   .1431
50
Make a diagram. Draw a Normal curve with a mean at your value of p . Shade the area to the left the
(Store 1) p 
value just computed. For store 1, the interval p  .11943 does not include p 0  .13 , so reject the null
hypothesis. For store 10, the interval p  .1431 includes p 0  .13 , so do not reject the null hypothesis.
252y0811 3/7/08
23
e) Using the 13% proportion as an estimate of the true proportion, find out how large a sample you
need to create a 95% confidence interval with an error of no more than 1% (2)
This is a two-sided interval, so we can use z.005  1.960 . Then n 
pqz 2
e2

.13.87 1.960 2
 4344 .8 . The
.01
sample must have a size of at least 4345.
f) (Extra credit) Remember that the method that you have been using to deal with proportions
substitutes the Normal distribution for the binomial distribution. In general the p-values that you
have computed are lower than you would get if you used the binomial distribution. Verify this by
making a continuity correction as described in the outline and repeating your test in c). (2)
p  .5 n  p 0
According to the outline, the continuity corrected version of z is z 
. The rule of thumb with
p
this is to use + if p  p 0 and to use – if p  p 0 .
25
 .09091 . We had pvalue  Pz  1.928   .5  .4732  .0268 . Since this was below
275
p  .5 275  p0
  .05 , we rejected the null hypothesis. With continuity correction we get z 
p
(Store 1) p 
.092727  .13
 1.838 . We now find pvalue  Pz  1.838   .5  .4671  .0336 .
.02028
4
(Store 10) p 
 .08000 . We had pvalue  Pz  1.051  .5  .3531  .1469 . Since this was above
50
p  .5 50  p0
  .05 , we could not reject the null hypothesis. With continuity correction we get z 

p
.09  .13
 0.841 . We now find pvalue  Pz  0.841   .5  .2995  .2005 . For the values found by
.04756
Minitab with and without the correction see the Appendix.

g) (Extra credit) Using 13%, your critical value, a point between your critical value and 13% and
one or two other points on the side of the critical value implied by the alternative hypothesis (only
one point on this side may give a reasonable value for a proportion) put together a power curve for
your test. Remember that your standard error will change if the true proportion changes. (8)
 H : p  .13
We are testing  0
with   .05 .
 H 1 : p  .13
(Store 1) If we want a critical value use p 0  z.05 p  .13  1.645 .02028   .0966 . We do not reject our
null hypothesis if p  .0966 . For a power curve we will try proportions  p1  of .13, .115, .0966, .085 and
.07. Recall that  p 


.0966  p1 
p  p1 
p1 q1
and that P p  p1   P  z  1

 or P p  .0966   P  z 
p
 p 
n



.0966  .13 
.13.87 

P p  .0966   P  z 
 Pz  1.65   .5  .4505
 0.02028
0.02028 
275

 .9505  .95 . This is automatic, but serves as a check. Power  .050
If p  .13 ,  p 
.115 .885 
 0.01924
275
 .5  .3315  .8815 . Power  .119
If p  .115 ,  p 
.0966  .115 

P p  .0966   P  z 
 Pz  0.96 
0.01924 

If p  .0966 ,  p  Not needed

.0966  ..0966 
P p  .0966   P  z 
  Pz  0  .5000
p


Power  .500
252y0811 3/7/08
24
.0966  .085 
.085 .915 

 Pz  0.69   .5  .2549
 0.016817 P p  .0966   P  z 
0.016817 
275

 .2451 . Power  .755
If p  .085 ,  p 
.07 .93 
 0.01539
275
= 0418. Power  .958
If p  .07 ,  p 
.0966  .07 

P p  .0966   P  z 
 Pz  1.73   .5  .4582
0.01539 

(Store 10) If we want a critical value, use p 0  z.05 p  .13  1.645 .04756   .05176 . We do not reject
our null hypothesis if p  .0576 . For a power curve we will try proportions  p1  of .13, .09, .05178, and
.01. I also tried .02 to verify my results. Recall that  p 

p  p1 
p1 q1
and that P p  p1   P  z  1
 or
 p 
n


.05176  p1 
P p  .05176   P  z 
 . Note how much faster power rises than in the store 1 version.
p


.13.87 
 0.04756
50
=.5 + .0445 = .9500. Power  .050
If p  .13 ,  p 
.09 .91
 0.04047
50
=.5 + .3264 = .8264 Power  .174
.05176  .13 

P p  .0576   P  z 
 Pz  1.645 
.04756 

If p  .09 ,  p 
.05176  .09 

P p  .0576   P  z 
 Pz  0.94 
.04047 

If p  .05178 ,  p  Not needed

.05176  .05176 
P p  .0576   P  z 
  Pz  0  .5000
p


Power  .500
.02 .98 
 0.01980
50
= .0548 Power  .945
If p  .02 ,  p 
.01.99 
 0.01407
50
=.0015 Power  .9985
If p  .01 ,  p 
.05176  .02 

P p  .0576   P  z 
 Pz  1.60   .5  .4452
.01980 

.05176  .01 

P p  .0576   P  z 
 Pz  2.97   .5  .4985
.01407 

h) Go back to the test in parts a) b) and c) of this problem. Take your values of n and x and
multiply them by 1.6, rounding your values to the nearest whole number (or numbers) if
necessary. Find the new value of the test ratio and get a p-value. What does the change in p-value
between parts c) and g) suggest about the effect of increased sample size on the power of the test?
(3) [32]
Sorry!!! I multiplied by 1.5. It really doesn’t matter if we are trying to make a point. The point is that
for any given population mean, raising the sample size will increase the power of the test.
 H 0 : p  .13
We are testing 
with   .05 .
 H 1 : p  .13
25
(Store 1) p 
 .09091 . We had pvalue  Pz  1.928   .5  .4732  .0268 . Since this was below
275
  .05 , we rejected the null hypothesis. If we multiply both the numerator and the denominator by 1.5, we
37
38
get p 
 .09201 or p 
 .08981 . If we take the first ratio
413
412
252y0811 3/7/08
p0 q0
.13.87 

 .0002738  0.01655
n
413
pvalue  Pz  2.295   .5  .4890  .0110 .
p 
25
z
p  p0
p

.09201  .13
 2.295 . We now find
.01655
4
 .08000 . We had pvalue  Pz  1.051  .5  .3531  .1469 . Since this was above
50
  .05 , we could not reject the null hypothesis. If we multiply both the numerator and the denominator by
(Store 10) p 
p0 q0
6
.13.87 
 .08000 . We can now compute  p 

 .001508  0.03883
75
n
75
.0800  .13

 1.288 . We now find pvalue  Pz  1.288   .5  .4015  .0850 . In both
.03883
1.5, we get p 
z
p  p0
p
cases, the rise of n serves to make the standard error smaller, which increases the absolute value of z and
thus decreases p-value. The lower p-value is, the closer we get to rejection at any confidence level.
252y0811 3/7/08
26
Problem 4: According to Doane and Seward both the mean and the standard deviation of pH (a measure of
acidity) are of interest to winemakers. Assume that your firm (store from the last problem) has gotten into
the wine business. A sample of 16 wine bottles is taken. Your column has the same number as your store.
Minitab has calculated all sorts of sample statistics on your data. These are listed below. Use them.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
C1
3.41
3.45
3.51
3.52
3.68
3.29
3.39
3.57
3.38
3.14
3.61
3.23
3.48
3.39
3.49
3.50
C2
3.44
3.42
3.45
3.48
3.68
3.45
3.42
3.50
3.41
3.36
3.69
3.40
3.48
3.48
3.45
3.63
C3
3.61
3.59
3.63
3.65
3.87
3.62
3.59
3.67
3.58
3.52
3.87
3.57
3.66
3.65
3.62
3.81
C4
3.39
3.37
3.41
3.44
3.66
3.41
3.37
3.45
3.36
3.30
3.66
3.35
3.44
3.43
3.40
3.60
C5
3.41
3.39
3.43
3.46
3.69
3.43
3.39
3.48
3.38
3.32
3.70
3.37
3.46
3.45
3.42
3.63
C6
3.43
3.41
3.45
3.47
3.69
3.44
3.41
3.49
3.40
3.34
3.70
3.39
3.48
3.47
3.44
3.64
C7
3.40
3.38
3.42
3.45
3.68
3.42
3.38
3.47
3.37
3.31
3.68
3.36
3.45
3.44
3.41
3.62
C8
3.56
3.53
3.59
3.63
3.95
3.58
3.53
3.65
3.52
3.43
3.95
3.51
3.63
3.62
3.57
3.87
Variable
N N*
Mean SE Mean
StDev Minimum
C1
16
0 3.4400
0.0347 0.1387
3.1400
C2
16
0 3.4837
0.0245 0.0980
3.3600
C3
16
0 3.6569
0.0259 0.1037
3.5200
C4
16
0 3.4400
0.0268 0.1072
3.3000
C5
16
0 3.4631
0.0281 0.1124
3.3200
C6
16
0 3.4781
0.0265 0.1061
3.3400
C7
16
0 3.4525
0.0278 0.1110
3.3100
C8
16
0 3.6325
0.0388 0.1553
3.4300
C9
16
0 3.5562
0.0382 0.1528
3.2300
C10
16
0 3.2000
0.0347 0.1387
2.9000
Maximum is removed since it is irrelevant.
C9
3.53
3.56
3.63
3.65
3.82
3.39
3.50
3.70
3.49
3.23
3.75
3.32
3.59
3.51
3.61
3.62
Q1
3.3825
3.4200
3.5900
3.3700
3.3900
3.4100
3.3800
3.5300
3.4925
3.1425
C10
3.17
3.21
3.27
3.28
3.44
3.05
3.15
3.33
3.14
2.90
3.37
2.99
3.24
3.15
3.25
3.26
Median
3.4650
3.4500
3.6250
3.4100
3.4300
3.4450
3.4200
3.5850
3.5750
3.2250
Q3
3.5175
3.4950
3.6675
3.4475
3.4750
3.4875
3.4650
3.6450
3.6450
3.2775
You must state H 0 and H 1 where applicable to get credit for any of the tests below. Make sure that I
know which column you are using!
The usual excerpt from the formula table follows.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
VarianceH 0 :  2   02
n  1s 2
n  1s 2
 .25 .5 2  02
2
2 
2  2
Small Sample
s

cv
.5 .5 2 
 02
n 1
H1: :  2   02
VarianceLarge Sample
 
s 2DF 
 z 2  2DF 
H 0 :  2   02
z 
s cv 
 2 DF
2  2DF   1
H1 :  2   02
I will work out the solutions for Stores 2 and 8 since they have the largest and smallest variances.
2
 z  2  2 DF
252y0811 3/7/08
27
a) The acceptable standard deviation for wine pH is 0.10. Using the data for your store, test the
hypothesis that the standard deviation is 0.10 using a 95% confidence level. (2)
H :   0.10
Our hypotheses are  0
.   .05 and n  16 . So df  n  1  15 . We will use the test ratio
H 1 :   0.10
method only. Make a diagram! If you are fussy, the diagram should be skewed to the right with a mode at
13, a median at 14.3333 (as computed by Minitab) and a mean at df  n  1  15 . Look up
15  27 .4884 and  215  6.2621 . Add a vertical line to indicate the mean or the median and shade
 .2025
.075
the ‘reject’ zones below 6.2621 and above 27.4884 (the area that isn’t shaded on the diagram below).
We have for store 2 s 2  0.0980 or s 22  0.009604.  2 
n  1s 2
 02

15.009604 
 14 .406 . Since this
.10 2
15  6.2621 and  215  27 .4884 , do not reject the null hypothesis.
value is between  .2075
.025
We have for store 8 s 8  0.1553 or s82  0.024118.  2 
n  1s 2
 02

15.024118 
.10 2
 36 .177 . Since this
15  6.2621 and  215  27 .4884 , reject the null hypothesis.
value is not between  .2075
.025
b) Test the hypothesis that the standard deviation is below .14. (1)
 H 0 :   0.14
Our hypotheses are 
  .05 and n  16 . So df  n  1  15 .
 H 1 :   0.14
We will use the test ratio method only. This is a left-sided test because we will only reject the null
hypothesis if the sample standard deviation is too small. Make a diagram! (If you are fussy, the diagram
should be skewed to the right with a mode at 13, a median at 14.3333 (as computed by Minitab) and a mean
at df  n  1  15 ). You want a value of  .215 that will cut off the bottom 5% of the distribution so look
up  .29515  7.2609 . Add a vertical line to indicate the mean or the median and shade the ‘reject’ zone
below 7.2609.
We have for store 2 s 2  0.0980 or s 22  0.009604.  2 
n  1s 2
 02

15.009604 
.14 2
 7.350 . Since this value
is not below  .29515  7.2609 , do not reject the null hypothesis.
We have for store 8 s 8  0.1553 or s82  0.024118.  2 
n  1s 2
 02

value is not below  .29515  7.2609 , do not reject the null hypothesis.
15.024118 
.14 2
 18 .458 . Since this
252y0811 3/7/08
28
c) Repeat a) and b) using the sample (mean and) variance you used in a) and b) but assuming a
sample size of 100. Find p-values. (4)
H :   0.10
Our hypotheses are  0
  .05 and n  100 . So df  n  1  99 . Because of the large number
H 1 :   0.10
of degrees of freedom, we must use z . Make a diagram of a Normal curve with a mean at zero. Indicate
two ‘reject’ zones, one below z.025  1.960 and one above z .025  1.960 .
We have for store 2 s 2  0.0980 or s22  0.009604.  2 
n  1s 2

 02
99 .009604 
.10 2
 95 .0796
z  2  2  2DF   1  295 .0796   299   1  190.1592  197  13.7898  14.0357  0.246
This is not in a ‘reject’ zone, so do not reject the null hypothesis.
We have for store 8 s 8  0.1553 or s82  0.024118.  2 
n  1s 2
 02

99 .024118 
 238 .7682
.10 2
z  2  2  2DF   1  2238 .7682   299   1  477.5264  197  21.8524  14.0357  7.817
This is in a ‘reject’ zone, so reject the null hypothesis.
 H :   0.14
Our hypotheses are  0
  .05 and n  100 . So df  n  1  99 .
 H 1 :   0.14
This is a left-sided test because we will only reject the null hypothesis if the sample standard deviation is
too small. Because of the large number of degrees of freedom, we must use z . Make a diagram of a
Normal curve with a mean at zero. Indicate one ‘reject’ zone below z.05  1.645 .
We have for store 2 s 2  0.0980 or s 22  0.009604.  2 
n  1s 2

 02
99 .009604 
.14 2
 48 .5100
z  2  2  2DF   1  248 .5100   299   1  97.0200  197  9.8499  14.0357  4.186
This is in the ‘reject’ zone, so reject the null hypothesis.
We have for store 8 s 8  0.1553 or s82  0.024118.  2 
n  1s 2
 02

99 .024118 
.14 2
 121 .8205
z  2  2  2DF   1  2121 .8205   299   1  243.6410  197  15.6090  14.0357  1.5733
This is not in a ‘reject’ zone, so do not reject the null hypothesis.
d) Find 2-sided 95% confidence interval for the standard deviation using data from your store and
assuming a sample size of 16. (2)
The formula for a Confidence Interval for the variance is
15  6.2621 and  215  27 .4884 , So
found  .2075
.025
15
 0.7387 and
27 .4884
n  1s 2
 22
2 
n  1s 2
12 2
. We have already
15
15
 .5457 and
 2.3954 , which means
27 .4884
6.2621
15
 1.5477
27 .4884
We have for store 2 s 2  0.0980 or s 22  0.009604.
150.009604 
150.009604 
 2 
. If we take square roots, we have
27 .4884
6.2621
0.7387 0.0980     1.5477 0.0980  or 0.07239    0.1517
So the interval for the variance is
252y0811 3/7/08
29
We have for store 8 s 8  .1553 or s82  .024118.
150.024118 
150.024118 
. If we take square roots, we have
 2 
27 .4884
6.2621
0.7387 0.1553     1.5477 0.1553  or 0.1147    0.2404
So the interval for the variance is
e) Repeat d) for a sample size of 100. (1)
[41]
s 2df 
The large sample formula for a Confidence Interval for the variance is
z  2  2df 
 
s 2df
 z  2  2df 
.
299   14 .0712 So
We have already found z   z .025  1.960 and
2
. 299 
1.960  2df 

. 299 
14 .0712
14 .0712
 0.8777 and

 1.1618 .
1.960  14 .0712
 1.960  2df   1.960  14 .0712
We have for store 2 s 2  0.0980 or s 22  0.009604. So the interval is
.0980 0.8777     .0980 1.1618  or 0.0860    0.1139
We have for store 8 s 8  .1553 or s82  .024118. So the interval is
.1553 0.8777     .1553 1.1618  or 0.1363    0.1804 .
.0980 299 
1.96  2df 
.1553 299 
1.96  2df 
 
 
.0980 299 
 1.96  299 
.1553 299 
 1.96  299 
or
or
f) Here’s the easiest question on the exam. By now you should have figured out that you don’t
have to understand a statistical test at all if you know i) what it assumes, ii) what the null
hypothesis is and iii) what the p-value is associated with the null hypothesis. So, I am going to do
a test that the standard deviation is 0.1 on the following data set.
C11
3.53
3.44
3.51
3.78
3.54
3.49
3.57
3.57
3.78
3.57
3.54
3.54
3.51
3.72
3.59
3.50
Then I am going to run a Lilliefors test on these data using Minitab. The null hypothesis of the
Lilliefors test is that the sample comes from the Normal distribution. The test makes no
assumptions about the mean and standard deviation of the population and computes these as
sample statistics from the data. After it printed ‘Probability plot of C11,’ the computer printed a
graph of the data, but the only thing I looked at was the p-value which was less than .01. After the
Lilliefors test, the computer printed out the results of two versions of a statistical test on the
standard deviation. The ‘Standard’ version is the method that you learned and is only applicable if
the data comes from a Normal distribution. The ‘Adjusted’ version is for all other cases. So
explain what p-value I look at and what it tells me.
So, if it’s a test for the Normal distribution and we know that the null hypothesis is that it’s Normal and the
p-value is less than .01, the p-value is less than any significance level we might use and we can be very sure
that the data do not follow a Normal distribution.
MTB > NormTest c11;
SUBC>
KSTest.
Probability Plot of C11
MTB > OneVariance c11;
SUBC>
Test .1;
SUBC>
Confidence 95.0;
SUBC>
Alternative 0;
SUBC>
StDeviation.
Test and CI for One Standard Deviation: C11
Method
Null hypothesis
Sigma = 0.1
252y0811 3/7/08
30
Alternative hypothesis
Sigma not = 0.1
The standard method is only for the normal distribution.
The adjusted method is for any continuous distribution.
Statistics
Variable
N
C11
16
StDev
0.100
Variance
0.0100
95% Confidence Intervals
Variable
C11
Tests
Variable
C11
Method
Standard
Adjusted
CI for StDev
(0.074, 0.155)
(0.071, 0.170)
Method
Standard
Adjusted
Chi-Square
15.06
11.12
CI for
Variance
(0.0055, 0.0240)
(0.0050, 0.0288)
DF
15.00
11.07
P-Value
0.895
0.880
H :   0.10
The output says that it is testing  0
. The adjusted method works if data is not Normal and
H 1 :   0.10
gives us a p-value of .880. Since this p-value is above any significance level we are likely to use, we cannot
possibly reject the null hypothesis that the standard deviation is 0.10.