Download 252y0421

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
252y0421 3/29/04
(Page layout view!)
ECO252 QBA2
Name
SECOND HOUR EXAM Hour of Class Registered
March 24, 2004
Circle 10am 11am
Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not
usually acceptable. Note that some equations have been squashed by Word. If you click on them or
print them out, they should be fine.
I. (8 points) Do all the following.
x ~ N 2,9 - If you are not using the supplement table, make sure that I know it.
0  2
  27 .88  2
z
 P 3.32  z  0.22 
1. P27.88  x  0  P 
9
9 

 P3.32  z  0  P0.22  z  0  .4995  .0871  .4124
Make a diagram! For z draw a Normal curve with zero in the middle. Shade the area between -3.32 and 0.22 and note that it is all on one side of the mean, so that you subtract the area between -0.22 and zero
from the area between -3.32 and zero.
16  2 
1  2
z
 P 0.11  z  1.56 
2. P0  x  16   P 
9
9 

 P0.11  z  0  P0  z  1.56   .0438  .4406  .4844
Make a diagram! For z draw a Normal curve with zero in the middle. Shade the area between -0.11 and
1.56 and note that it is on both sides of the mean, so that you add the area between -0.11 and zero to the
area between zero and 1.56.
16  2 

 Pz  1.56 
3. F 16  (The cumulative probability up to 16) F 16   Px  16   P  z 
9 

 Pz  0  P0  z  1.56   .5  .4406  .9406
Make a diagram! For z draw a Normal curve with zero in the middle. Shade the entire area below 1.56
and note that it is on both sides of the mean, so that you add the area below zero to the area between zero
and 1.56.
x.115 . First we must find z .115 . This is the value of z that has Pz  z .115   .115 or
P0  z  z..115   .5  .115  .3850 . On the Normal table, the closest we can find to .3850 is
P0  z  1.20   .3849 . So z.115  1.20 and x.115    z .115   2  1.209  12.8.
4.
12 .8  2 

 Pz  1.20   .5  .3849  .1151  .115
Check: Px  12 .8  P  z 
9 

Make a diagram! For z draw a Normal curve with zero in the middle. Divide the area above zero into
11.5% above z .115 and 50% - 11.5% below z .115 .
252y0421 3/17/04
II. (24+ points) Do all the following? (2points each unless noted otherwise).
Note the following:
1. You will be penalized if you do not compute the sample variance of the x L column in
question 1.
2. This test is normed on 50 points, but there are more points possible including the takehome. You may not finish the exam and might want to skip some questions.
3. A table identifying methods for comparing 2 samples is at the end of the exam.
4. If you answer ‘None of the above’ in any question, you should provide an alternative
answer and explain why. You may receive credit for this even if you are wrong.
Questions 1-6 refer to Exhibit 1.
Exhibit 1:(Edited from problems presented by Samuel Wathen with one small error for
Lind et. al. 2002)
The first two columns below are evaluations of a sample of five products, first at
FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective than
FIFO in keeping the value of inventory lower?
(Assume that the underlying
distribution is Normal.)
d  xF  xL
Product
xF
xL
1
2
3
4
5
225
119
100
212
248
904
221
100
113
200
245
879
4
19
-13
12
3
25
x F2
x L2
d2
50625
14161
10000
44944
61504
181234
48841
10000
12769
40000
60025
171635
16
361
169
144
9
699
Minitab calculated the following sample statistics:
n
Mean
Median
StDev
SE Mean
xF
5
180.8
212.0
66.7
29.8
xL
5
175.8
200.0
____
d
5
5.00
4.00
11.98
Variable
1.
Compute the standard deviation of x L . You may use any of the material given in exhibit 1.
Solution:
2.
5.36
s L2
 x

L
2  nx L2
n 1

171635  5175 .82
 4276 .7 s L  4276 .7  65 .396
4
Note: If you wasted our time using the definitional formula, see the end of Part II.
What is the null hypothesis?
a)  F   L
F  L
c) *  F   L
d)  F   L
b)
e)
None of the above.
Explanation: The question seems to be asking if  L   F . This is the same as  F   L ,
which cannot be a null hypothesis because it does not contain an equality. It must be an
alternate hypothesis so that the null hypothesis is its opposite,  F   L
2
252y0421 3/17/04
Exhibit 1:The first two columns below are evaluations of a sample of five products,
first at FIFO and, second, at LIFO. Based on the results shown, is LIFO more effective
than FIFO in keeping the value of inventory lower?
(Assume that the underlying
distribution is Normal.)
xL
d  xF  xL
x F2
221
100
113
200
245
879
n
4
19
-13
12
3
25
Mean
50625
14161
10000
44944
61504
181234
Median
xF
5
180.8
212.0
66.7
xL
5
175.8
200.0
____
d
5
5.00
4.00
Product
1
2
3
4
5
xF
225
119
100
212
248
904
Variable
d2
x L2
48841
16
10000
361
12769
169
40000
144
60025
9
171635
699
StDev
11.98
SE Mean
29.8
5.36
3.
What is (are) the degrees of freedom?
a) *4
Since each line represents on product, this is paired data.
b) 5
c) 8
d) 15
e) 10
Explanation: Since each line represents one product, this is paired data. Our variable is thus
really d , which contains only 5 numbers.
4.
If you used the 5% level of significance, what is the appropriate t or z value from the tables.
a)  2.571
b)  2.776
c)  2.262
d)  2.228
e) 1.645
f) 1.960
g) *None of the above.
This is a one-sided 5% test, and the alternate hypothesis,  F   L , is the same as D  0 ,
where D   F   L . t 
5.
d 0 50

 0.93 and must be larger than t.405  2.132
sd
5.36
What is the value of your calculated t or z ?
a) * 0.933
b)  2.776
c) 0.477
d) 2.028
e) None of the above.
3
252y0421 3/17/04
6.
What is your decision at the 5% significance level?
a) Do not reject the null hypothesis and conclude that LIFO is more effective in keeping the
value of the inventory lower.
b) Reject the null hypothesis and conclude that LIFO is more effective in keeping the value
of the inventory lower.
c) Reject the alternative hypothesis and conclude that LIFO is more effective in keeping the
value of the inventory lower.
d) *Do not reject the null hypothesis and conclude that LIFO is not more effective in
keeping the value of the inventory lower.
e) None of the above.
7.
Find an approximate p-value for the null hypothesis that you tested. Please explain your result!
Solution: We need Pt  0.933  . If we check the line for 4 degrees of freedom, we find
4
4
t .20
 0.941 and t .25
 0.741 , which means that Pt  .941   .20 and Pt  .741   .25 . Since
0.933 lies between these values, it must be true that .20  Pt  0.933   .25 . There is some
flexibility here depending on your answer to Question five.
8.
A manufacturer revises a manufacturing process and finds a fall in the defect rate of 4%  5%.
a) The fall in defects is statistically significant because 5% is larger than 4%.
b) The fall in defects is statistically significant because the confidence interval supports H0.
c) *The fall in defects is not statistically significant because 4% is smaller than 5%.
d) The fall in defects is not statistically significant because the confidence interval would
lead us to reject H0.
Questions 9-11 refer to Exhibit 2.
Exhibit 2:(Edited from problems presented by Samuel Wathen) A group of adults and a
group of children both tried Wow! Cereal. Was there a difference in how adults and
kids responded to it?
Number in
Number who
Fraction of
Sample
liked it
sample who
250
187
liked it
Adults
.748
(Group 1)
Children
250
100
66
.660
(Group 2)
Total
.748 .252   .000754
.660 .340   .002244
100
350
253
.723
.723 .277   .0005722
350
4
252y0421 3/29/04
9.
What is the null hypothesis ?
a) 1   2 There is no reasonable way to define a mean here.
b)
c)
1   2
1   2
d) * p1  p 2 There is no reason to assume that one fraction is larger than the other before
we look at the data. Of course b), c), e) and f) do not contain equalities and cannot be null
hypotheses.
e) p1  p 2
f) p1  p 2
g) None of the above.
10. Calculate a 99% confidence interval for the difference between the fraction of adults and fraction
of kids that liked Wow! Explain why you reject or do not reject the null hypothesis. (4)
Solution: The outline says the following: p  p  z s p or
2
 p1  p2    p1  p2   z

2
s p , where s p 
p1 q1 p 2 q 2

. The tables say
n1
n2
z.005  2.576 , so the interval is p  .748  .660   2.576 .000754  .002244 , or
.088  2.576 .002998  .088  2.576 .05475   .088  0.141 . Since this interval includes
zero, do not reject the null hypothesis.
11. (Extra Credit)Calculate a 77% confidence interval for the difference between the fraction of adults
and fraction of kids that liked Wow! (2)
On page 1, we found z.115  1.20 . Since the confidence level is 77%, the significance level is
23% and half the significance level is 11.5%. The interval is thus
p  .088  1.20.152296   .088  0.183
Questions 12-14 refer to Exhibit 3.
Exhibit 3:(Edited from problems presented by Samuel Wathen)
A survey was taken among a randomly selected 100 property owners to see if opinion about a
street widening was related to the distance of front footage they owned. The results
appear below.
Opinion
Front-Footage
For
Undecided
Against
Under 45 feet
12
4
4
45-120 feet
35
5
30
Over 120 feet
3
2
5
12. How many degrees of freedom are there?
a) 2
b) 3
c) *4  r  1c  1
d) 5
e) 9
f) None of the above.
5
252y0421 3/17/04
13. What is the value of E for people in favor of the project who own less than 45 feet of frontage ?
a) *10  .2050 
b) 12
c) 35
d) 50
e) None of the above.
Front-Footage
Under 45 feet
45-120 feet
Over 120 feet
For
12
35
3
50
Undecided
Against
4
5
2
11
4
30
5
39
pr
20
70
10
100
.20
.70
.30
14. Assume that the computed value of chi square is 8.5
a) What is the null hypothesis that you are testing ? (2)
Solution: Opinions and Front footage are independent.
b) What is your conclusion ? Why ? (3)
Solution: We do not reject the null hypothesis at the 5% level because 8.5 is below
4
 2 .05  9.4877 .
15. Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3
points - 2 point penalty for not handing this in.)
16. The following output is from a computer problem very much like the one you did to compare two
sets of data. Two production processes are in use. I wish to compare numbers of defects in Process
A and Process B to test the statement “ The number of defects in process A is significantly lower
than in process B.” Three tests are done. Assume that the underlying distribution is Normal.
a)Which of the three tests should we use? b) What is the null hypothesis as we use it? c) Should we
reject the null hypothesis? Why?
Test 1:
MTB > twosamplet 'A' 'B'
Two-Sample T-Test and CI: A, B
Two-sample T for A vs B
N
A 90
B 110
Mean
220.5
300.5
StDev SE Mean
34.7
3.7
82.7
7.9
Difference = mu A - mu B
Estimate for difference: -79.98
95% CI for difference: (-97.15, -62.81)
T-Test of difference = 0 (vs not =): T-Value = -9.20 P-Value = 0.000 DF =
152 H 0
:  A  B
H1 :  A  B
6
252y0421 3/17/04
Test 2:
MTB > twosamplet 'A' 'B';
SUBC> alter 1.
Two-Sample T-Test and CI: A, B
Two-sample T for A vs B
N
A 90
B 110
Mean
220.5
300.5
StDev SE Mean
34.7
3.7
82.7
7.9
Difference = mu A - mu B
Estimate for difference: -79.98
95% lower bound for difference: -94.36
T-Test of difference = 0 (vs >): T-Value = -9.20 P-Value = 1.000 DF = 152
H0 :  A  B
H 1 :  A  B
Test 3:
MTB > Twosamplet 'A' 'B';
SUBC> alter -1.
Two-Sample T-Test and CI: A, B
Two-sample T for A vs B
N
A 90
B 110
Mean
220.5
300.5
StDev SE Mean
34.7
3.7
82.7
7.9
Difference = mu A - mu B
Estimate for difference: -79.98
95% upper bound for difference: -65.59
T-Test of difference = 0 (vs <): T-Value = -9.20 P-Value = 0.000 DF = 152 H 0
:  A  B
H 1 :  A  B
H : D  0
Solution: a), b)Test 3 is appropriate because it tests  0
and D  0 is equivalent to
H 1 : D  0
 A  B
c) Since the p-value is below any significance level we might use, we reject the null hypothesis.
17. (Extra credit) My boss objects that he thinks that the variances are equal, so that I used the wrong
test. I go back to the computer and do the following. (The null hypothesis is equal variances.) Was
I right? Why?
MTB > %VarTest c3 c4;
SUBC> Unstacked.
Test for Equal Variances
F-Test (normal distribution)
Test Statistic: 0.176
P-Value
: 0.000
Solution: I was right. If the null hypothesis was equal variances, the p-value was below any
significance level that I would use. So we reject the null hypothesis.
7
252y0421 3/17/04
18. (Extra Credit)Now my beloved boss says that maybe the underlying distribution is not Normal. I
go back to the computer and run the following. Process A results are in C3. Process B results are in
C4. Remember that there are 90 data items for process A and 100 for process B. What are our
hypotheses and results?
MTB > Stack c3 c4 c5;
SUBC> Subscripts c6;
SUBC> UseNames.
MTB > Rank c5 c7.
This stacks the 2 sets of results together so they can be ranked.
C7 now contains the ranks.
MTB > Unstack (c7);
SUBC> Subscripts c6;
SUBC> After;
SUBC> VarNames.
Ranks for A are now in C7_A. Ranks for B are now in C7_B.
MTB > sum c8
Sum of C7_A
Sum of C7_A = 6008.0
MTB > sum c9
Sum of C7_B
Sum of C7_B = 14092
Solution: We use the Wilcoxon-Mann-Whitney Test for Two Independent Samples to compare the
medians..
According to the outline, for values of n1 and n 2 that are too large for the tables, W has the normal
n1  n2  1 and variance  W2
 16 n2 W . If the significance level is 5%
W  W
and the test is one-sided, we reject our null hypothesis if z 
lies below -1.645.
W
So n1  90 and n 2  110 . W  6008 is the smaller of the rank sums. W  1 2 n1 n1  n 2  1
distribution with mean W 

1
2 90
1
2 n1
90  110  1  45 201  9045
So z 
W  W
W

6008  9045

185825
and  W2  16 n2 W  16 1109045  185825.
 3037
 7.045 . Since the p-value would be
431 .07
Pz  7.045   .5  .5  0 , we would reject the null hypothesis of  A   B at any significance level.
252x0421 3/17/04
Questions 19-22 refer to Exhibit 4.
Exhibit 4:(Edited from problems presented by Samuel Wathen)
A professor asserts that she uses a Normal curve with a mean of 75 and a standard
deviation of 10 to grade students. Last year’s grades are below. Test to see if the
professor’s assertions are correct at the 99% confidence level.
Row
Grade
Interval
1
2
3
4
5
A
B
C
D
F
90+
80-90
70-80
60-70
Below 60
E
7.6820
27.7955
44.0450
27.7955
7.6820
115.0000
O
15
20
40
30
10
115
O2
E
29.2892
14.3908
36.3265
32.3793
13.0174
125.4032
19. Show the calculations necessary to get the number that were expected to get B’s.
90  75 
 80  75
z
 P0.5  z  1.5  .4332  .1995  .2417
Solution: P80  x  90   P 
10
10 

and .2417 115   27.7755
252y0421 3/29/04
8
20. What table value of chi-square would you use to test the professor’s assertion?
4
Solution:  2 .01  13.2767
21. What is the calculated value of chi-square?
Solution:

O2
 n  125 .4032  115  10.4032
E
22. Explain your conclusion.
Solution: Since the calculated chi-square is smaller than the table chi-squared, do not reject the
null hypothesis that the grades follow a Normal distribution.
Answer to Question 1 using definitional formula:
Row
xL
C2
C3
1
2
3
4
5
221
100
113
200
245
879
45.2
-75.8
-62.8
24.2
69.2
0.0
2043.04
5745.64
3943.84
585.64
4788.64
17106.80
s
2
 x  x 

n 1
2

17106 .80
 4276 .7
4
Location - Normal distribution.
Compare means.
Location - Distribution not
Normal. Compare medians.
s L  4276 .7  65 .396
Paired Samples
Method D4
Independent Samples
Methods D1- D3
Method D5b
Method D5a
Proportions
Method D6
Variability - Normal distribution.
Compare variances.
Method D7
9
252y0421 3/17/04
ECO252 QBA2
SECOND EXAM
March 24, 2004
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly.
(19+ points)
1) Chi-squared and Related Tests (Bassett et. al.) To personalize the data below, change the number
of stations reporting 4 thunderstorms to the second to last digit of your student number. This will
change the total number of stations reporting. For example, Seymour Butz’s student number is 976500,
so he will change the number of stations reporting 4 thunderstorms to zero and the total number of
stations reporting will be 22 + 37 + 20 + 13 + 0 + 2 = 94.
a) 100 weather stations reported the following in August 2003:
Number of Thunderstorms
x 
Number of stations reporting
thunderstorms
O 
x
0
1
2
3
22
37
20
13
4
5
6
2
In the region in question, the number of thunderstorms per month is believed to have a Poisson distribution with a mean of
1. Test to see if this is appropriate using a chi-squared method. For example if, 5 stations reported 2 thunderstorms and 5
stations reported 3 thunderstorms and there were only 10 stations, the total number of storms reported would be
25  35  25 , and the average number of storms reported would be 25
10
 2.5 . (4)
b) Repeat the test using the Kolmogorov-Smirnov method. (3)
c) Find the average number of storms per station and use it to generate a Poisson table on Minitab. To do so follow the
k , column 2 Pk  and column
'
le
'
3 Px le k  or something similar.. (
stands for ' ' ) In column 1 place the numbers 0 through 10.
example below, replacing 0.732 with your mean (a number like 1.723). Head Column 1 (C1)
MTB > PDF c1 c2;
SUBC> Poisson 0.732.
MTB > CDF c1 c3;
SUBC> Poisson 0.732.
MTB > print c1 - c3
Data Display
Row
k
P(k)
P(x le k)
1
0
0.480946
0.48095
2
1
0.352053
0.83300
3
2
0.128851
0.96185
4
3
0.031440
0.99329
5
4
0.005753
0.99904
6
5
0.000842
0.99989
7
6
0.000103
0.99999
8
7
0.000011
1.00000
9
8
0.000001
1.00000
10
9
0.000000
1.00000
11
10
0.000000
1.00000
10
252y0421 3/17/04
This table tells us that, for a Poisson distribution with a mean of 0.732,
Px  3  .031440 and Px  3  .99329 .
To keep the numbers correct, you could merge the data for k = 5 to 10 into a category of ‘5 or more storms.’ Decide whether a
chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson distribution with your mean,
remembering that you estimated the mean from your data. (4)
d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that the distribution follows
the binomial distribution with
Number of Sixes
Frequency
O 
x 
n  2 and p  .15 . (2)
0
105
1
2
70
5
e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using
pˆ 
Total number of sixes
(2)
Total number of throws
Solution: I will use Seymour’s version here, and try to put the others into an appendix.
a) 100 weather stations reported the following in August 2003:
Number of Thunderstorms x 
0
1
2
3
4
5
Number of stations reporting x
22
37
20
13
0
2
thunderstorms O 
In the region in question, the number of thunderstorms per month is believed to have a Poisson
distribution with a mean of 1. Test to see if this is appropriate using a chi-squared method. (4)
H 0 : Poisson(1) This is the data from the Supplementary Materials book.
x
P x 
F x 
0
1
2
3
4
5
6
7
8
0.367879
0.367879
0.183940
0.061313
0.015328
0.003066
0.000511
0.000073
0.000009
0.36788
0.73576
0.91970
0.98101
0.99634
0.99941
0.99992
0.99999
1.00000
So we need to put together a version of E . Note that O adds to n  94 . So if we take Px  and multiply
by 94, we get (34.5806, 34.5806, 17.2904, 5.7634, 1.4408, 0.2882, 0.0480, 0.0069, 0.0008). The last
three, at least, are too small to use, so we combine them to get the table below.
Row
O
1
2
3
4
5
4
22
37
20
13
2
94
E
34.5806
34.5806
17.2904
5.7634
1.7847
93.9997
E O
12.5806
-2.4194
-2.7096
-7.2366
-0.2153
-0.0003
E  O2
158.272
5.853
7.342
52.368
0.046
E  O  2
E
4.57690
0.16927
0.42464
9.08628
0.02597
14.2833
O2
E
13.9963
39.5886
23.1343
29.3229
2.2413
108.2831
 2 .05  9.4877. Depending on which method you used  2  14.2833 or  2  108.2831 94  14.2831.
4 
These are both above  2.05 , so reject the null hypothesis.
b) Repeat the test using the Kolmogorov-Smirnov method. (3) First, take O , divide by n  94
and make the result into a cumulative distribution.
11
252y0421 3/17/04
Row
1
2
3
4
5
6
O
O
n
22
37
20
13
0
2
0.234043
0.393617
0.212766
0.138298
0.000000
0.021277
Fo
0.23404
0.62766
0.84043
0.97872
0.97872
1.00000
Copy F x  , label it Fe , compute D , and find the maximum D , which is .133837. According to the K-S
table, this should be compared with
1.36

1.36
 .1403 . Because the maximum deviation is not above
n
94
.1403, do not reject the null hypothesis.
Row
Fo
Fe
D
1
2
3
4
5
6
7
8
9
0.23404
0.62766
0.84043
0.97872
0.97872
1.00000
0.36788
0.73576
0.91970
0.98101
0.99634
0.99941
0.99992
0.99999
1.00000
0.133837
0.108100
0.079274
0.002287
0.017617
0.000590
c) Find the average number of storms per station and use it to generate a Poisson table on Minitab.
Decide whether a chi-squared or K-S method is appropriate (Only one method is!) and test for a Poisson
distribution with your mean, remembering that you estimated the mean from your data. (4) We multiply O
and x and get 22(0)+ 37(1)+20(2)+13(3)+0(4)+2(5) = 0 + 37 + 40 + 39 + 0 + 10 = 126
as our mean. We generate the part of the Poisson table that we need, multiply Px  by n  94 and use a
3
chi-square method. We compare our computed chi-square of 5.9364 to  2 .05  7.8147 , and do not reject
the null hypothesis, H 0 : Poisson . Note that we have lost a degree of freedom by computing the mean from
the data, which is why we can’t use the K-S method.
x
Row
P x 
1
2
3
4
5
6
7
8
9
10
11
Row
1
2
3
4
5
6
0
1
2
3
4
5
6
7
8
9
10
0.261846
0.350873
0.235085
0.105005
0.035177
0.009427
0.002105
0.000403
0.000068
0.000010
0.000001
O
E
E O
22
37
20
13
0
2
94
24.6135
32.9821
22.0980
9.8704
3.3066
1.1293
93.9999
2.61349
-4.01792
2.09799
-3.12956
3.30660
-0.87070
-0.00009
E  O2
6.8303
16.1437
4.4016
9.7942
10.9336
0.7581
E  O  2
E
0.27750
0.48947
0.19918
0.99227
3.30660
0.67132
5.93644
O2
E
19.6640
41.5074
18.1012
17.1218
0.0000
3.5420
99.9364
12
252y0421 3/17/04
d) (Extra Credit) Two dice were thrown 180 times with the results below. Test the hypothesis that
the distribution follows the binomial distribution with n  2 and p  .15 . (2)
Number of Sixes x 
0
1
2
Frequency O 
105
70
5
e) (Extra extra credit) Test the data in d) for a binomial distribution in general by using
pˆ 
Total number of sixes
Total number of throws
 80
 .4444 .
I did these together. Since the total number of sixes was 1(70) + 2(5) = 80 p 
180
My table for .4444 could have been generated by Px  C xn p x q n x , where q  1  p , but I used
MTB >
SUBC>
MTB >
SUBC>
cdf c7 c10;
binomial 2 .4444.
pdf c7 c11;
binomial 2 .4444.
The table for .15 could have been done the same way or with the formula. I used the table in the
Supplement and then took the difference between the numbers. I got the E column by multiplying Px  by
180.
p  .15
p  .4444
x
Row
F x 
P x 
F x 
P x 
1
2
3
0
1
2
0.7225
0.9775
1.0000
0.7225
0.2550
0.0225
0.30869
0.80251
1.00000
0.308691
0.493817
0.197491
Only one method was needed in each of d) and e). If you used chi-squared, you should have gotten the
following. H 0 : Binomial(.15,2)
Row
1
2
3
x
O
0
1
2
105
70
5
180
E
130.05
45.90
4.05
180.00
E O
E  O2
25.05
-24.10
-0.95
0.00
627.503
580.810
0.903
E  O  2
O2
E
E
4.8251
12.6538
0.2228
17.7017
84.775
106.754
6.173
197.702
2
Since  2 .05  5.9915, and our computed chi-square of 17.7017 is larger, we reject the null hypothesis.
The K-S method is probably easier. I got the following.
O
Row x
O
Fo
Fe
D
E
n
1
2
3
0
1
2
105
70
5
130.05
45.90
4.05
0.58333
0.97222
1.00000
0.7225
0.9775
1.0000
0.139167
0.005278
0.000000
0.583333
0.388889
0.027778
The maximum D is .139167. According to the K-S table, this should be compared with
1.36 1.36

 .1014 . Because the maximum deviation is above .1403, reject the null hypothesis.
n
180
1
d) In this section, we have lost a degree of freedom. Since  2 .05  3.84146 is way below our chi-square of
74.248, we reject the null Hypothesis. H 0 : Binomial
Row x
1
2
3
0
1
2
O
105
70
5
180
E
55.5644
88.8871
35.5484
180.000
E O
-49.4356
18.8871
30.5484
-0.0000
E  O2
2443.87
356.72
933.21
E  O  2
E
43.9827
4.0132
26.2517
74.2476
O2
E
198.418
55.126
0.703
254.248
13
252y0421 3/17/04
2) (Meyer and Krueger) WEFA compiled the following random samples of single-family home prices
in the eastern and western parts of the US (in $thousands.). (Note – in this problem it is OK to use
Excel or Minitab as a help – but you must fool me into believing that you did it by hand.)
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
City - E
x1
Albany NY
Allentown PA
Baltimore MD
Bergen NJ
Boston MA
Buffalo NY
Charlestown SC
Charlotte NC
Greensboro NC
Greenville SC
Harrisburg PA
Hartford CT
Middlesex NJ
Monmouth NJ
New Haven CT
New York NY
Newark NJ
Philadelphia PA
Raleigh/Durham NC
Rochester NY
Springfield MA
Syracuse NY
Washington DC
108.607
85.250
112.747
195.232
180.865
83.122
92.840
104.433
97.638
88.355
79.846
129.130
169.540
137.859
134.856
170.830
187.128
114.553
119.355
85.043
102.678
82.372
155.176
City-W
x2
Bakersfield CA
Fresno CA
Orange C. CA
Portland OR
Riverside CA
Sacramento CA
San Diego CA
San Francisco CA
San Jose CA
Seattle WA
Stockton CA
Tacoma WA
137.171
107.627
204.862
123.605
123.836
120.232
172.601
220.067
224.828
147.854
98.440
119.884
City-No
1
2
3
4
5
6
7
8
9
10
11
12
These are available on the website in Minitab. Minitab reports the following sample statistics.
Variable
x1
x2
n
23
12
Mean
122.50
150.10
Median
112.75
130.50
StDev
37.20
44.50
You may use the statistics given for x1, but personalize the data for Western cities as follows: Use the
fourth digit of your student number to pick the first city to be eliminated and then eliminate the third city
after that. (You may, if you wish, drop the last two digits of the prices in the Western Cities.) For example,
Seymour Butz’s student number is 976500, so he will use the number 5 to eliminate cities 5 (Riverside) and
8 (San Francisco). If the fourth digit of your student number is zero, eliminate cities 10 and 1. You will thus
have only 10 cities in your second sample.
a. Compute a (mean and) standard deviation for your personalized second sample. Show your work! (2)
b. Test to see if there is a significant difference between the mean home prices in the eastern and western US. You may
assume that the samples come from Normal populations with equal variances, though there are 2 points extra credit if you do not
assume equal variances. You may use a test ratio, critical value or a confidence interval (4 points) or all three of these (6 points –
assuming that you get the same conclusion for all of them) .
c. Test the variances to find out if you were or would have been justified to assume equality of variances. Were you? (2)
d. (Extra Credit)Use a Lilliefors test to see if the Western data is Normally distributed. (2)
e. (Extra Credit) Assume that the data is not normally distributed and test to see if there is a significant difference between
the medians. (3)
Solution: Because there is no way that I will find time to do the individual solutions, you will have to make
do with the original data.
14
252y0421 3/17/04
a. Compute a (mean and) standard deviation for your personalized second sample. Show your
work! (2)
Row
x2
x 22
1
2
3
4
5
6
7
8
9
10
11
12
137.171
107.627
204.862
123.605
123.836
120.232
172.601
220.067
224.828
147.854
98.440
119.884
1801.0
18815.9
11583.6
41968.4
15278.2
15335.4
14455.7
29791.1
48429.5
50547.6
21860.8
9690.4
14372.2
292129
 x and s   x
From the formula sheet (Table 20) x 
2
2
n 1
n

 nx 2
, so x 2 
x
n2
2

1801
 150 .08 and
12
292129  12 150 .08 

 1985 .5384 s 2  1985 .5384  44 .559
n2  1
11
b. Test to see if there is a significant difference between the mean home prices in the eastern and
western US. You may assume that the samples come from Normal populations with equal variances, though
there are 2 points extra credit if you do not assume equal variances. You may use a test ratio, critical value
or a confidence interval (4 points) or all three of these (6 points – assuming that you get the same
conclusion for all of them) .
From the syllabus supplement. The hypotheses are as on the table. H 0 : 1   2 H 1 : 1   2
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d  t 2 s d
d cv  D0  t 2 sd
d  D0
t
between Two
H 1 : D  D0 ,
s
1 1
d
Means (
sd  s p

D





n

1s12  n2  1s22
1
2
n n
2
1
unknown,
s 22 
x 22
 nx 22
2
1
variances
assumed equal)
(Method D2)
Difference
Between Two
Means(
Unknown,
Variances
Assumed
Unequal)
sˆ p 
2
n1  n2  2
DF  n1  n2  2
H 0 : D  D0
D  d  t 2 sd
s12 s22

n1 n2
sd 
DF 
 s12 s22 
  
n

 1 n2 
H1 : D  D0
D  1   2
t
d  D0
sd
d cv  D0  t 2 sd
2
   
s12
2
n1
n1  1
s 22
2
n2
n2  1
I will use the Minitab sample statistics reported above.
15
252y0421 3/29/04
d  x1  x 2  122.5 150.1  27.6 .
n  1  n2  1

s p2  1
n1  n 2  2
s12
s 22

If we assume that the variances are equal
221383 .84   111980 .25 
 1582 .64 , so that
33
 1
1 
1 
1
  1
 1
  200.698  14.1668 .
  1582 .64    200 .698 and s d  s p2  
s d2  sˆ 2p  
 23 12 
 n1 n 2 
 n1 n 2 
d  D0
 27 .6

 1.948 df  n1  n2  2  23  12  2  33 Make a diagram: Show an almost
sd
14 .1668
Normal curve with a center at zero and critical values at t 33  2.035 and  t 33  2.035 . Since the
t
.025
.025
computed value of t is between these, do not reject the null hypothesis.
If we do not assume equal variances, use the following worksheet.
sx21 
s12 1383 .84

 60.1670
n1
23
s x22 
s 22 1980 .25

 165 .021
n2
12
s d2 
sd 
s12 s 22

n1 n 2
2
 s12 
 
 n1 
60 .1670 2
 

 164 .548
n1  1
22
2
 s12 
 
 n1 
165 .021 2
 

 2475 .63
n1  1
11
 225 .188
s12 s 22

 225.188  15.0063
n1 n 2


  s2 s2 2
  1  2 
  n1 n 2 
df  
2
2
  s2 
 s 22 
 
 1 
 n2 
  n1 
 


n2 1
 n1  1




2

s d2


225 .188 2

  19 .2069


2
2


2
2
164
.
548

2475
.
63
 sx


s x2
1


 n 1 1 n 2  1


 
   
d  D0
 27 .6

 1.839 Make a diagram: Show
sd
15 .0063
19
19
an almost Normal curve with a center at zero and critical values at t .025
 2.093 . Since
 2.093 and  t .025
Round this down and use 19 degrees of freedom. t 
the computed value of t is between these, do not reject the null hypothesis.
16
252y0421 3/29/04
c. Test the variances to find out if you were or would have been justified to assume equality of
variances. Were you? (2) From Table 3 we have the following.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Ratio of Variances  22 s22 DF , DF
s2
H0 : 12   22
 2 F.5  1.5  2 
F DF1 , DF2  12
2
2
1
s1
s2
H1 : 12   22
1
1 , DF2
F1DF


 2
DF

n

1
DF1 , DF2
and
1
1
F
2
DF2  n 2  1
s2
F DF2 , DF1  22

 2
s1

.5  .5   2    or
1  
2

F 11, 22 
s 22
s12

1980 .25
11,22  2.65 . We should also
 1.431 for a two sided test compare this to F.025
1383 .84
s12
22,11 , but the test ratio is below 1 and cannot possibly be above the
against F.025
s 22
critical value. Since both ratios are below their critical values, we cannot reject the null hypothesis.
compare F 22,11 
d. (Extra Credit) Use a Lilliefors test to see if the Western data is Normally distributed. (2)
This is a decidedly machine aided solution. The computer got a mean of 150.084 and a standard deviation
of 44.5448. Note that the data has been put in order. For the first row
x  x 98 .440  150 .084
z

 1.15937 The computation of F0 is not shown, but comes from the fact
s
44 .5448
that each of the numbers is 112 of the data, so that the observed cumulative distribution consists of 112 ,
etc. Fe  Pz  1.15937   .5  P1.15937  z  0  .5  P1.16  z  0  .5  .3770  .1230 .
This value was found from the Normal table, so that it is less accurate than the value below.
2
12 ,
3
12
Row
1
2
3
4
5
6
7
8
9
10
11
12
x
98.440
107.627
119.884
120.232
123.605
123.836
137.171
147.854
172.601
204.862
220.067
224.828
z
-1.15937
-0.95313
-0.67797
-0.67016
-0.59443
-0.58925
-0.28989
-0.05006
0.50549
1.22973
1.57107
1.67795
Fo
0.08333
0.16667
0.25000
0.33333
0.41667
0.50000
0.58333
0.66667
0.75000
0.83333
0.91667
1.00000
Fe
0.123153
0.170262
0.248896
0.251379
0.276111
0.277848
0.385952
0.480037
0.693394
0.890601
0.941917
0.953322
D
0.039819
0.003596
0.001104
0.081954
0.140556
0.222152
0.197382
0.186629
0.056606
0.057268
0.025250
0.046678
The maximum difference is .222152. Compare this to the 5% value for n  12 on the Lilliefors table of
.242. Since the maximum observed difference is below the critical value, do not reject the null hypothesis.
17
252y0421 3/17/04
e. (Extra Credit) Assume that the data is not normally distributed and test to see if there is a
significant difference between the medians. (3)
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
If we use a Wilcoxon-Mann-Whitney rank test, we get the following.
x1
r1
x1
r1
108.607
85.250
112.747
195.232
180.865
83.122
92.840
104.433
97.638
88.355
79.846
129.130
169.540
137.859
134.856
170.830
187.128
114.553
119.355
85.043
102.678
82.372
155.176
13
5
14
32
30
3
7
11
8
6
1
21
27
24
22
28
31
15
16
4
10
2
26
356
137.171
107.627
204.862
123.605
123.836
120.232
172.601
220.067
224.828
147.854
98.440
119.884
23
12
33
19
20
18
29
34
35
25
9
17
274
According to the outline, for values of n1 and n 2 that are too large for the tables, W has the normal
n1  n2  1 and variance  W2
 16 n2 W . If the significance level is 5%
W  W
and the test is two-sided, we reject our null hypothesis if z 
does not lie between 1.960 .
W
So n1  12 and n2  23. W  274 is the smaller of the rank sums. W  1 2 n1 n1  n 2  1
distribution with mean W 

1
2 12
1
2 n1
12  23  1  6 36   216
We have z 
W  W
W

and  W2 
274  216
1
6 n2 W
 16 23216  828.
 2.015 . Since this is not between the critical values, reject the null
828
hypothesis of equal medians.
18