Download 251y0211 - On-line Web Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
251y0211 10/07/02
Part I.
ECO251 QBA1
FIRST HOUR EXAM
OCTOBER 1, 2002
Name __________________
SECTION MWF 10 11 TR 11 12:30
(10 points)
1. (Lind et. al.) A student takes a survey of heights of 500 college women and divides them into 12 classes.
Her first two class midpoints are 62.5" and 65.5". There are 111 women in the first class and 145 women in
the second class. (5)
a. What is the (width of the) class interval w ?
b. What is the lower limit of the third class?
c. What is the relative frequency of the second class?
d. What is the cumulative relative frequency of the second class?
e. If this distribution is skewed to the left, about what percent of the data is above the median?
Solution: a) Since 65.5 - 62.5 = 3, the interval must be 3.
b) The lower limit must be 67. If the class interval is 3, the class must extend by 1.5 on either side
of the midpoint. The layout for the first three classes is given in the table below. n  500
f
Class
Midpoint
F
f
Frel  Fn
f rel 
n
61 - 64
62.5
111
.222
111
.222
64 - 67
65.5
145
.290
256
.512
67 - 70
68.5
?
?
?
?
c) The relative frequency is .290
d) The cumulative relative frequency is .512, which might be found as the sum of .222 and .290.
e) the median is defined as a point with 50% of the data above or below it.
2. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or
Discrete Ratio Data. (3)
a. A recent cartoon suggested that the numbers on pro Football jerseys be replaced by their
salaries. What kind of data would the numbers be before the change? Ans: Nominal.
b. What kind of data would the numbers usually be considered after the change? Ans: Continuous
ratio.
c. What kind of data is a sports editor's ranking of the top 5 teams.? Ans: Ordinal
3. All my family doctor's patient files are coded as follows: FS (Adult females who currently smoke); FN
(Adult females who do not currently smoke); MS (Adult males who currently smoke) and MN (Adult males
who do not currently smoke). Are these categories mutually exclusive and collectively exhaustive? If you
say 'no' to either characteristic, explain. (2) Ans: The categories are mutually exclusive ( for instance no
person who is coded FS will be in FN, MS or MN), but not collectively exhaustive because there is no
reason to assume that all the family doctor's patients are adults.
1
251y0211 9/30/02
Part II. Compute an appropriate answer, showing your work (15 Points maximum - if you do more than 15
points, only your right answers will be counted.):
1) A sample of pipe outside diameters gives a mean of 14.0 inches and a standard deviation of 0.1 inches.
a) If the median diameter is 14.1 inches and the mode is unknown
(i) What is the maximum fraction of the pipes that could have a diameter below 13.8

x  x 13 .8  14
inches? (1) Ans: Get a z-score. k  z 

 2 . We know from the
s
0.1
Chebyshef rule that the fraction in the tail below   k is less than 1 2 . Since
k
k  2, this fraction is below 1 4  25%.
(ii) Between what two diameters must at least 15/16 of the pipe diameters lie? (1) Ans: If
only 116  1 k 2 are outside the interval, we must have k 2  16 or k  4. Thus the interval
is   k  14.0  4.1  14.0  0.4 or 13.6 to 14.4.
(iii) Is this distribution skewed to the left or the right, or is it symmetrical? (1) Ans: Since
the median is above the mean, it should be skewed to the left.
b) If, instead, the median diameter is 14.0 inches and the mode is also 14.0 inches, between what
two diameters must almost all the pipe diameters lie? (1) Ans: Since 14  30.1  14  0.3 is 3
standard deviations from the mean, the Empirical Rule (which applies because the mean, median
and mode are equal) says that there will be almost none outside 13.7 to 14.3.
c) What is the coefficient of variation for this sample of pipes? (1) Ans: C  s x  0.114  0.0071 .
2) The newest computer in the headquarters of a firm that you are liquidating is three months old and the
oldest is 97 months old.
a) If the (absolute) frequencies are to be presented in a line graph in seven classes, what intervals
would you use? Explain your reasoning using an appropriate formula and use it to fill in the table below.(3)
97  3
 13 .42 so use 14. This is only a suggestion. Any number, like 15, somewhat above
Ans:
7
13.42 will work, as long as you cover the range. Two possibilities are shown below. You should
have shown only one.
Class
A
B
C
D
E
F
G
From
3
17
31
45
59
73
87
to
16.9
30.9
44.9
58.9
72.9
86.9
100.9
From
0
15
30
45
60
75
90
to
14.9
29.9
44.9
59.9
74.9
89.9
104.9
b) What is the name of the type of graph that you are drawing (Is it a histogram?) and what would
the x and y coordinates be of the last point on the line that you draw to represent the frequencies?
(2) Ans: The graph is a frequency polygon and we must create an empty class to end it. For the
first classification above, the interval is 14 and the midpoint of the last class on the table is 94, so
the last point is x  108 , y  0 . For the second classification above, the interval is 15 and the
midpoint of the last class on the table is 97.5, so the last point is x  112 .5, y  0 .
2
251y0211 9/30/02
3) For the numbers 50, 250, 450 and 650, compute the a) Geometric Mean b) Harmonic mean, c) Rootmean-square (2 each). Label each clearly. If you wish, d) Compute the geometric mean using natural or
base 10 logarithms. (3 points if you need it here or two points if you need it in the next section - doing this
is insurance, you cannot get more than cannot get more than 15 points on part II or 25 points on part III
unless you do the extra credit in part III. )
x  1400 . This is not used in any of the following calculations and there is
Solution: Note that

no reason why you should have computed it!
(a) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
 3656250000
x 
4
50 250 450 650   4 3656250000
 3656250000

1
4
0.25  245 .9002 .
Do any of you really believe that 3656250000

1
4

3656250000
?
4
(b) The Harmonic Mean.
1 1

xh n

1 1
1 
 x  4  50  250  450  650   4 0.0200000  0.0040000  0.0022222  0.0015385 
1
1
1
1
1
1
0.0277607   0.00694017 . So xh  1 
 144 .089 .
1
1
4
0.00694017
n
x

As I explained several times, 1  1 could not possibly be 1  1  .
n
4  1400 
x
(c) The Root-Mean-Square.
1
1
1
2
x rms

x 2  50 2  250 2  450 2  650 2  2500  62500  202500  422500 
n
4
4



1
690000   172500 . So x rms 
4

1
n
x
2
 172500  415 .331 .
(d) The geometric mean using logarithms.
Using natural logarithms:
1
ln( x)   1 ln 50   ln 250   ln 450   ln 650 
ln x g 
n
4
1
1
5.50493
 245 .900 .
 3.91202  5.52146  6.10925  6.47692   22 .0197   5.50493 . So x g  e
4
4
Or using logarithms to the base 10:
1
log( x)  1 log50   log250   log450   log650  
log x g 
n
4
1
1
 1.69897  2.39794  2.65321  2.81291   9.56304   2.39076 . So
4
4
2.39706
x g  10
 245 .900 .
 

 

Notice that the original numbers and all the means are between 50 and 650.
3
251y0211 9/30/02
Part III. Do the following problems (25+ Points)
1. I have the following data for March electricity bills at a sample of 6 homes of similar sizes.
92 106 212 129 176 191
Compute the following:
a) The Median (1)
b) The Standard Deviation (4)
c) The 2nd Quintile (2)
Solution: Compute the Following:
Note that x is in order
Index
1
2
3
4
5
6
Total
x
92
106
129
176
191
212
906
x2
8464
11236
16641
30976
36481
44944
148742
x  x  x  x 2
-59
3481
-45
2025
-22
484
25
625
40
1600
61
3721
0 11936
Note that, to be reasonable, the mean, median and 3rd decile must fall between 92 and 212. You
should have done either the third column or the fourth and fifth columns. If you did both you were wasting
time.
n  6,
x
 906 ,
x
2
 148742 ,
 x  x   0.00,  x  x 2  11936 .
a) Just put the numbers in order and average the middle numbers, x.5 
Or formally: position  pn  1  a.b  .57  3.5
x3  x 4 129  176

 152 .5 .
2
2
x1 p  xa  .b( xa1  xa ) so x1.5  x.5  x3  0.5( x 4  x3 )  129  0.5(176  129 )  152 .5 .
b) x 
 x  906  151 .00
n
 x  x 
6
s2 
x
2
 nx 2
n 1

148742  6151 2
 2387 .2 or
5
2
11936

 2387 .2 s  2387.2  48.85898
n 1
5
c) The 2nd quintile has 40% below it. position  pn  1  a.b  0.47  2.8 . a  2, .b  0.8 .
s2 
x1 p  x a  .b( x a1  x a ) so x1.4  x.6  x 2  0.8( x3  x 2 )  106  0.8(129  106 )  124 .4
(New Formula: position  1  pn  1  a.b  1  0.4(5)  1  2.0  3.0 . a  3, .b  0.0 .
x1 p  xa  .b( xa1  xa ) so x1.4  x.6  x3  0.0( x 4  x3 )  129  0.0(176  129 )  129 )
4
251y0211 9/30/02
2. A bus line takes a sample of the distances ridden by commuters and gets the following. is investigating
the amount of time customers are put on hold when they call. The times are tabulated below. (Assume that
the numbers are a sample.)
amount
frequency
F
4
9
10
10
15
20
12
4
13
23
33
48
68
80
less than 5 miles
5 - 9.99 miles
10 - 14.99 miles
15 - 19.99 miles
20 - 24.99 miles
25 - 29.99 miles
30 - 34.99 miles
a. Calculate the Cumulative Frequency (1)
b. Calculate The Mean (1)
c. Calculate the Median (2)
d. Calculate the Mode (1)
e. Calculate the Variance (3)
f. Calculate the Standard Deviation (2)
g. Calculate the Interquartile Range (3)
h. Calculate a Statistic showing Skewness and
Interpret it (3)
i. Make an ogive of the Data (Neatness
Counts!)(2)
j. Extra credit: Put a (horizontal) box plot below
the ogive using the same scale.
Solution: x is the midpoint of the class. Our convention is to use x as the midpoint of 0 to 2, not
1.99999. If you did this using computational formulas, you should have the table below.
Row
1
2
3
4
5
6
7
Total
x
f
class
below 5
5-9.99
10-14.99
15-19.99
20-24.99
25-29.99
30-34.99
4
9
10
10
15
20
12
80
fx 2
fx
2.5
7.5
12.5
17.5
22.5
27.5
32.5
10.0
67.5
125.0
175.0
337.5
550.0
390.0
1655.0
25.0
506.2
1562.5
3062.5
7593.8
15125.0
12675.0
40550.0
fx3
62
3797
19531
53594
170859
415938
411938
1075719
Definitional formulas give you the table below. There is no reason to do both the tables for computational
and definitional formulas.
Row
1
2
3
4
5
6
7
Total
class
below 5
5-9.99
10-14.99
15-19.99
20-24.99
25-29.99
30-34.99
4
9
10
10
15
20
12
80
2.5
7.5
12.5
17.5
22.5
27.5
32.5
fx
10.0
67.5
125.0
175.0
337.5
550.0
390.0
1655.0
xx
-18.1875
-13.1875
-8.1875
-3.1875
1.8125
6.8125
11.8125
f x  x 
-72.750
-118.687
-81.875
-31.875
27.187
136.250
141.750
0.000
f  x  x 2
1323.14
1565.19
670.35
101.60
49.28
928.20
1674.42
6312.19
f x  x 3
-24064.6
-20641.0
-5488.5
-323.9
89.3
6323.4
19779.1
-24326.1
 fx  40550 ,  fx  1075719 ,  f x  x   0,
 f x  x 2  6312.19, and  f x  x 3  24326.1. Note that, to be reasonable, the mean, median and
n
 f  80,  fx
x
f
 1655 ,
2
3
quartiles must fall between 0 and 36.
a. Calculate the Cumulative Frequency (1): (See above - in red) The cumulative frequency is the whole
F column.
b. Calculate the Mean (1): x 
 fx  1655  20.6875
n
80
5
251y0211 9/29/02
c. Calculate the Median (2): position  pn  1  .581  40.5 . This is above F  33 and below F  48,
 pN  F 
 .580   33 
so the interval is 20-24.99. x1 p  L p  
 w so x1.5  x.5  20  
5  22 .3333
15
f


p


d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 20 is the largest frequency,
the modal group is 25 to 29.99 and the mode is 27.5.
e. Calculate the Variance (3): s 2 
s2 
 f x  x 
n 1
2

 fx
2
 nx 2
n 1

40550  80 20 .6875 2 6312 .19

 79 .9011 or
79
79
6312 .19
 79 .9011
79
f. Calculate the Standard Deviation (2): s  79.9011  8.93874
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .25 81  20.25 . This is above
 pN  F 
F  13 and below F  23 so the interval is 10-14.99. x1 p  L p  
 w gives us
 f p 
 .2580   13 
Q1  x1.25  x.75  10  
 5  10  3.5  13 .5 .
10


Third Quartile: position  pn  1  .75 81  60.75 . This is above F  48 and below F  68, so the
 .7580   48 
interval is 25-29.99. x1.75  x.25  25  
 5  25  3  28 .00 .
20


IQR  Q3  Q1  28.00 13.50  14.5 .
(New Formula:
For the median - position  1  pn  1  1  0.579   40 .5 . This is the same result as on the previous
page.
For the first quartile - position  1  pn  1  1  0.2579   20.75 . This leads to the same result as above.
For the third quartile - position  1  pn  1  1  0.7579   60 .25 . This leads to the same result as
above.)
h. Calculate a Statistic showing Skewness and interpret it (3):
n
k 3
fx 3  3x
fx 2  2nx 3  80 1075719  320.6875 40550  280 20.6875 3
(n  1)( n  2)
7978 





 0.0129828 24325 .883   315 .82 .
or k 3 
n
(n  1)( n  2)
or g 1 
k3
s
3

 f x  x 
 315 .82
8.93874 3
3

80
 24326 .1  315 .82
79 78 
 0.4422
3mean  mode 320 .6875  27 .5

 2.286
std .deviation
8.93874
Because of the negative sign, the measures imply skewness to the left.
i. Make an ogive of the Data (Neatness Counts!)(2) An ogive is a line graph of the cumulative frequency.
It should hit zero on the left at the origin. The next point is 4 at x  5 . It continues to rise until x  35 ,
when the height is 80. It ends with a horizontal line.
j. Your box plot should show the first and third quartiles at the left and right of the box and a vertical line
across the box at the median. It should be immediately below the ogive and use the same x points.
or
Pearson's Measure of Skewness SK 
6