Download 251y0815

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
251y0815 6/11/08
ECO251 QBA1
FIRST EXAM
June 11, 2008
Name: ___KEY_________________
Student Number: _________________________
Class Hour: _____________________
Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer
needs a calculation or short explanation to count. Show your work! The exam is normed on 50 points,
so that any grade above 48 is an A+ and grades wrap around.
Part I. (7 points)
The following numbers are to be considered a sample of prices of gasoline taken at five gas stations.
4.14 4.19 4.25 4.01 3.99
x1
Compute the following: Show your work!
a) The Median (1)
b) The Standard Deviation (3)
c) The 3rd quintile (2)
d) The Coefficient of variation (1)
Extra credit beyond this point!
e) The harmonic mean (1.5)
f) The root-mean square (1.5)
g) The geometric mean (3 ways!) (3)
Solution: My calculations are below. logx  is a logarithm to the base 10 and ln x  is a natural logarithm.
Row
1
2
3
4
5
x
x in order
4.14
4.19
4.25
4.01
3.99
20.58
3.99
4.01
4.14
4.19
4.25
x
logx 
0.241546
0.238663
0.235294
0.249377
0.250627
1.21551
0.617000
0.622214
0.628389
0.603144
0.600973
3.07172
x2
17.1396
17.5561
18.0625
16.0801
15.9201
84.7584
1
ln x 
1.42070
1.43270
1.44692
1.38879
1.38379
7.07290
a) The Median (1): the median is the middle number when the data is in order – 4.14
b) The Standard Deviation (3): We have
x
 x  20 .58 ,  x
2
 84 .7584 , x 
 x  20.58  4.116 ,
n
5
84 .7584  54.116 2 0.05112

 0.01278 . So that s  0.01278  0.113049.
n 1
4
4
If you chose to annoy me by using the definitional formula, you should have gotten the following.
x  20 .58 ,
You would have
x
xx
Row
 x  x 2
s2 
1
2
3
4
5
s
2
2
 nx
4.14
4.19
4.25
4.01
3.99
20.58

0.024
0.074
0.134
-0.106
-0.126
0.000
 x  x 

n 1
2
2

0.000576
0.005476
0.017956
0.011236
0.015876
0.05112

x
 x  20.58  4.116 ,
n
 x  x 
5
2
 0.05112
0.05112
 0.01278 . So that s  0.01278  0.113049
4
1
251y0815 6/11/08
I had better repeat the table from the last page.
1
x
Row
x in order
x2
x
1
2
3
4
5
4.14
4.19
4.25
4.01
3.99
20.58
3.99
4.01
4.14
4.19
4.25
17.1396
17.5561
18.0625
16.0801
15.9201
84.7584
logx 
0.241546
0.238663
0.235294
0.249377
0.250627
1.21551
c) The 3rd quintile (2): The third quintile has
3
5
ln x 
0.617000
0.622214
0.628389
0.603144
0.600973
3.07172
1.42070
1.43270
1.44692
1.38879
1.38379
7.07290
 .60 of the data below it. The data must be in order.
location  pn  1  .606  3.60  a.b . x1 p  x.40  x3  0.60x4  x3   4.14  0.604.19  4.14  4.17
d) The Coefficient of variation (1): C 
s 0.113049

 .0247 or 2.47%
x
4.116
Extra credit beyond this point!
e) The harmonic mean (1.5): The formula table says
1
1

xh n
 x or x1  15  4.114  4.119  4.125  4.101  3.199 
1
h
1
1
1
 1.21551   0.243102 . So xh 

 4.1135 .
1
1 0.243102
5
n
x
Of course some of you decided that

1 1

xh n
1 1
1 
1

1

 x  5  4.14  4.19  4.25  4.01  3.99   ? 5  4.14  4.19  4.25  4.01  3.99   5  20.58   0.009718
1
1
1
1
1
1
This is, of course, an easier way to do the problem. It is also wrong and unreasonable (since it is not
between 3.99 and 4.25), and you will get an A for the course if you can prove to me that it is not
wrong! And please don’t try any math if you get on “Are you smarter than a fifth-grader.”
1
1
x 2 or x rms 2 
x2
f) The root-mean square (1.5): The formula table says x rms 
n
n
84
.
7584
x rms 2 
 16 .95168 . So x rms  16 .95168  4.11724
5


1
g) The geometric mean (3): The formula table says x g  x1  x 2  x3  x n  n  n
xg  n
x 
5
 x . So
4.14  4.19  4.25  4.01  3.99  5 1179 .61428  1179 .61428 0.2  4.11476
 
The formula table also says ln x g 
1
n
 ln( x) , but I said in class that this could be either natural logs or
 
logs to the base 10. If we use logarithms to the base 10 we get log x g 
1
n
 log( x) 
 
 0.614344 and x g  10 0.614344  4.11476 . If we use natural logarithms we get ln x g

3.07172
5
1
ln( x) 

n

7.07290
 1.41458 and x g  e1.41458  4.11476
5
2
251y0815 6/11/08
Part II. (18 points) According to Anderson, Sweeny and Williams a bank found the following as a sample
of 30 waiting times (in seconds) for service.
Row
1
2
3
4
5
Time
60-119.99
120-179.99
180-239.99
240-299.99
300-359.99
Frequency
6
10
8
4
2
a. Calculate the Cumulative Frequency (1)
b. Calculate the Mean (1)
c. Calculate the Median (2)
d. Calculate the Mode (1)
e. Calculate the Variance (3)
f. Calculate the Standard Deviation (2)
g. Calculate the Interquartile Range (3)
h. Calculate a Statistic showing Skewness and
interpret it (3)
i. Make a frequency polygon of the data
(Neatness Counts!)(2)
Solution: Note that unreasonable answers are answers where the mean, median, mode, first quartile
and third quartile do not fall between 60 and 360.
If we use the computational method, we get the following. x is the midpoint of the class.
Row
1
2
3
4
5
Class
60-119.99
120-179.99
180-239.99
240-299.99
300-359.99
f
F
x
6
10
8
4
2
30
6
16
24
28
30
90
150
210
270
330
fx
fx2
fx3
540
48600
4374000
1500 225000 33750000
1680 352800 74088000
1080 291600 78732000
660 217800 71874000
5460 1135800 262818000
If we use the definitional method, we get the following. x is the midpoint of the class. I usually tell people
that they are wasting their time if they use the definitional method. Because of the large numbers here that
may not be true.
Row
1
2
3
4
5
Class
60-119.99
120-179.99
180-239.99
240-299.99
300-359.99
F
f
6
10
8
4
2
30
6
16
24
28
30
fx
x
90
150
210
270
330
540
1500
1680
1080
60
5460
xx
f x  x 
f x  x 2
-92
-32
28
88
148
-552
-320
224
352
296
0
50784
10240
6272
30976
43808
142080
If you used the computational method, you would have gotten n 
the mean is x 
 fx  5460  182 .0000 . You would also find
n
30
 f  30
 fx
2
and
f x  x 3
-4672128
-327680
175616
2725888
6483584
4385280
 fx
 1135800 and
 5460 , so that
 fx
3

262818000.
If you used the definitional method, you would have and gotten If you used the computational method, you
would have gotten n 
 f  84
You would have followed by getting
 f x  x 
3
 fx 5460
 fx  5460 , so that the mean is x  n  30  182 .0000 .
 f x  x   0 (a check),  f x  x 2  142080 and
and
 4385280 .
If you used one of Pearson’s measures of skewness, you would not have bothered with the f x  x 3 or the
fx3 columns.
a. Calculate the Cumulative Frequency (1) See the F column above.
b. Calculate the Mean (1): We have already found x  182 .0000
3
251y0815 6/11/08
Row
f
F
x
6
10
8
4
2
30
6
16
24
28
30
90
150
210
270
330
Class
1
2
3
4
5
60-119.99
120-179.99
180-239.99
240-299.99
300-359.99
fx
fx2
fx3
540
48600
4374000
1500 225000 33750000
1680 352800 74088000
1080 291600 78732000
660 217800 71874000
5460 1135800 262818000
 fx 5460
 f  30 ,  fx  5460 , x  n  30  182 .0000 ,  fx  1135800 ,
2
3
 262818000,  f x  x   0 ,  f x  x   142080 and  f x  x   4385280 .
Remember n 
 fx
3
2
c. Calculate the Median (2): position  pn  1  .531  15.5 . This is above F  6 and below F  16 , so
the interval is the 2nd, 120 to 180, which has a frequency of 10. Each interval width is 180 – 120 = 60.
 pN  F 
 .530   6 
x1 p  L p  
 w so x1.5  x.5  120  
 60   120  0.960   174 . Check: this is
 10 
 f p 
between 120 and 160.
d. Calculate the Mode (1): The largest group is 120 to 180, which has a frequency of 10, so by convention
the mode is its midpoint, which is mo  150.
e. Calculate the Variance (3): We have
s2 
 fx
s2 
 f x  x 
2
 nx 2
n 1
n 1
 fx
2
 1135800 and x  182 .0000 or

1135800  30 182 .0000 2 142080

 4899 .3103 or
29
29

142080
 4899 .3103 .
29
2
 f x  x 
2
 142080 .
f. Calculate the Standard Deviation (2): s  4899.3103  69.9951 .
g. Calculate the Interquartile Range (3): Note that to be reasonable, Q1  x50  Q3 . First
Quartile: position  pn  1  .2531  7.75 . This is above F  6 and below F  16 , so the interval is the
2nd, 120 to 180, which has a frequency of 10. Each interval width is 180 – 120 = 60.
 pN  F 
 .2530   6 
x1 p  L p  
 w gives us Q1  x1.25  x.75  120  
 60   120  0.1560   129 .00 .
10


 f p 
Third Quartile: position  pn  1  .7531  23.25 . This is above F  16 and below F  24, so the
 .7530   16 
interval is the 3rd, 180 to 240 which has a frequency of 8. Q3  x1.75  x.25  180  
 60 
8


 180  0.8125 60  228 .75 . So IQR  Q3  Q1  228 .75  129 .00   99.75 .
 f  30 ,
 1135800 ,  fx
h. Calculate a Statistic showing Skewness and interpret it (3): Remember n 
x  182 .0000 , x.5  174 , mo  150, s  4899.3103  69.9951 ,
 fx
2
3

 f x  x   0 , and  f x  x 3  4385280.
n
 fx  3x  fx  2nx   293028 262818000  3182 .000 1135800  230182 .000  

(n  1)( n  2) 
262818000,
k3
3
2
3
3
 0.0369458 262818000  620146800  361714080   0.0369458 4385280   162017 .734 .
or k 3 
n
(n  1)( n  2)
 f x  x 
3

30
4385280   162017 .734 .
29 28 
4
251y0815 6/11/08
or g1 
k3
s
3

162017 .734
69 .9951 3
 0.4725
Pearson's Measure of Skewness SK1 
or
mean  mode  182  150   0.4572 or
69 .9951
std .deviation
3mean  median 3182  174 

 0.3429
69 .9951
std .deviation
Because of the positive sign, the measures all imply (slight) skewness to the right..
SK 2 
i. Make a frequency polygon of the data (Neatness Counts!)(2)
Row
0
1
2
3
4
5
6
Class
0-60
60-119.99
120-179.99
180-239.99
240-299.99
300-359.99
360-419.99
f
F
x
fx
0
6
10
8
4
2
0
6
16
24
28
30
30
90
150
210
270
330
390
540
1500
1680
1080
660
fx2
48600
225000
352800
291600
217800
fx3
4374000
33750000
74088000
78732000
71874000
The seven points on your graph should be (30, 0), (90, 6), (150, 10), (210, 8), (270, 4), (330, 2) and (390,
0).
5
251y0815 6/11/08
Part III. Multiple choice (12 points). Note: If you say ‘None of the above,’ you should supply a correct
answer to get full credit.
1. If a distribution is skewed to the right, the following must be true. (Hint: making a diagram first is a good
way to prevent errors.)
a. Mean < median < mode
b. Median < mean < mode
c. *Mode < median < mean
d. Mode < mean < median
e. Mean = median = mode
f. None of the above.
2. If I have a population described as grouped data and I am using definitional formulas.
f x     0
a.

b.  f x     0
c.  f x     1
d. *None of the above.
Solution: For the same reason that
 f x  x   0 on page 4, this sum is zero. To do the
mathematics,  f x  x    fx   fx   fx   f x   fx  nx
 fx  fx  fx  0
  fx  n
 
n
3. Which of the following does not describe a population?
a.* x
b. 
c. The coefficient of variation
d. 
e. Pearson’s coefficient of skewness.
f. 
g. All of the above describe a population.
4. Mark the following items N (nominal), O (ordinal), I (interval) or R (ratio) data. If the data is interval or
ratio data, would it be considered C (continuous) or D (discrete)? (4)
a) Likert Scale - The format of a typical five-level Likert item is: 1) Strongly disagree;
2) Disagree; 3) Neither agree nor disagree; 4) Agree; 5) Strongly agree O
b) Next year’s tuition (in dollars and cents) RC
c) Place of residence N
d) Number of credit cards that you hold RD
5. If you make a graph to represent a data set, the following should be plotted at class midpoints.
a. An ogive
b. *A frequency polygon
c. A Pareto diagram
d. All of the above
e. None of the above
6
251y0815 6/11/08
Part IV. (13+ points)
Table 1
Given below is a stem-and-leaf display for the amount of gas purchased at a service station. Minitab gives
the following information. (SE Mean is the standard deviation divided by the square root of n .)
MTB > describe c1
Descriptive Statistics: x
Variable
n
x
25
9|147
10|02238
11|125566777
12|223489
13|02
Mean SE Mean
11.372
0.232
StDev
1.158
Minimum
9.100
Q1 Median Q3 Maximum
…………… ………… ………… 13.200
1. In Table 1, what is the median purchase? (2)
Solution: The way that I did these problems is to write out the indices of the numbers as below.
9|147
x1  x3 
10|02238
x 4  x8 
11|125566777
12|223489
13|02
x9  x17 
x18  x23 
x 24  x 25 
It may be clearer if I actually write out the numbers and their indices.
Value
Index
9.1
1
9.4
2
9.7
3
10.0
4
10.2
5
10.2
6
10.3
7
10.8
8
11.1
9
11.2
10
11.5
11
11.5
12
Value
Index
11.6
13
11.6
14
11.7
15
11.7
16
11.7
17
12.2
18
12.2
19
12.3
20
12.4
21
12.8
22
12.9
23
13.0
24
Value
13.2
Index
25
location  pn  1  0.526   13.
median  x13  11.6 .
2. Create a 5-number summary from Table 1. (4)
Solution: To find the first quartile, we write location  pn  1  0.2526   6.5. This implies that the first
quartile is x6  0.5x7  x6   10.2  0.510.3  10.2  10.25 . For the third quartile, location  pn  1
 0.7526   19 .5. This implies that the third quartile is x19  0.5x 20  x19   12.2  0.512.3  12 .2
 12 .25 . The five-number summary is thus 9.1, 10.25, 11 .6, 12 .25, 13.2 
7
251y0815 6/11/08
3. In Table 1, assume that you were asked to present the data in 4 classes. Using the method you learned in
class, show how you would decide what class interval to use and list the classes below with their
frequencies. (4) [10]
A
B
C
D
___
___
___
___
Class
to under
to under
to under
to under
___
___
___
___
Frequency
___
___
___
___
Solution: The highest number is 13.2 and the lowest is 9.1. We calculate
1.25.
If we use 1.25, we might get the following.
Class
frequency
A 8.75 to under 10.0
3
B 10.00 to under 11.25
7
C 11.25 to under 12.50
11
D 12.50 to under 13.75
4
You could also start at 8.50. 25
13 .2  9.1
 1.025 . Use 1.2 or
4
If we use 1.2, we might get the following.
Class
frequency
A 9.0 to under 10.2
4
B 10.2 to under 11.4
6
C 11.4 to under 12.6
11
D 12.6 to under 13.8
4
1.5 would work if you start at 8. 25
4. In Table 1, according to the Tchebyschev inequality, what is the minimum number of observations that
should be between 9.056 and 13.688? What would the empirical rule say? Should the empirical rule apply
here? Why? (4)
Variable
x
n
25
Mean SE Mean
11.372
0.232
StDev
1.158
Minimum
9.100
Q1 Median Q3 Maximum
…………… ………… …………
The mean is 11.372 and 11.372 – (2)1.158 = 9.056, 11.372 + (2)1.158 = 13.688. These points are thus 2
standard deviations from the mean. The inequality says that at most 1 22  1 4 of the points should be below
9.056 or above 13.688. So at least 75% of the points should be between these numbers. This is at least 19
points. The empirical rule says that about 95% of the observations should be between these two points. This
is about 24 points. The stem-and leaf diagram shows that the data is roughly symmetrical, so we would
expect this to be almost true. Actually all the 25 points are in the interval. As usual the inequality gives us
an underestimate of the number of the points in the interval.
5. Which of the following are not sensitive to extreme values? (Circle all correct answers.) (3)
a. *The mode
b. The mean
c. The variance
d. The coefficient of variation
e. k 3 , the third k-statistic.
8