Download 251y0312 - On-line Web Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
251y0312 9/26/03
ECO251 QBA1
FIRST HOUR EXAM
October 1, 2003
Name: _____KEY____________
Social Security Number: _____________________
Part I.
(32 points)
1.
The process of using sample statistics to draw conclusions about true population parameters is
called
a) *statistical inference.
b) the scientific method.
c) sampling.
d) descriptive statistics.
2.
A summary measure that is computed to describe a characteristic of an entire population is called
a) *a parameter.
b) a census.
c) a statistic.
d) the scientific method.
3.
Which of the following is a discrete quantitative variable?
a) the Dow Jones Industrial Average
b) the volume of water released from a dam
c) the distance you drove yesterday
d) *the number of employees of an insurance company
TABLE 1-1
The manager of the customer service division of a major consumer electronics company is interested in determining
whether the customers who have purchased a videocassette recorder made by the company over the past 12 months are
satisfied with their products.
4.
Referring to Table 1-1, the possible responses to the question "Are you happy, indifferent, or
unhappy with the performance per dollar spent on the videocassette recorder?, " if we write down a
1 for ‘happy, ’ a 2 for ‘unhappy’ and a 3 for ‘indifferent, are the following kind of random
variable.
a) ratio
b) *nominal
c) interval
d) ordinal
1
251y0312 9/26/03
TABLE 2-2
At a meeting of information systems officers for regional offices of a national company, a survey was taken to
determine the number of employees the officers supervise in the operation of their departments, where X is the
number of employees overseen by each information systems officer.
X
f_
1
7
2
5
3
11
4
8
5
9
5.
Referring to Table 2-2, how many regional offices are represented in the survey results?
a) 127
b) 5
c) 15
n
f
d) *40

TABLE 2-5
The following are the durations (in minutes) of a sample of long-distance phone calls made within the continental
United States, reported by one long-distance carrier:
Time (in Minutes)
0 but less than 5
5 but less than 10
10 but less than 15
15 but less than 20
20 but less than 25
25 but less than 30
30 but less than 35
Relative
Frequency
0.37
0.22
0.15
0.10
0.07
0.07
0.02
6.
Referring to Table 2-5, if 1,000 calls were randomly sampled, how many calls lasted under 10
minutes?
a) 220
class
f rel
Frel
b) 370
0 but less than 5
0.37 0.37
c) 410
5 but less than 10
0.22 0.59
10 but less than 15
0.15 0.74
d) *590
15 but less than 20
0.10 0.84
The answer is the
20 but less than 25
0.07 0.91
cumulative frequency
25 but less than 30
0.07 0.98
nd
for the 2 class
30 but less than 35
0.02 1.00
multiplied by 1000.
7.
If I make a graph of the data in table 2-5 (Assume the table represents a sample of 1000 calls) with
the following x and y coordinates for the first five points: {(0, 0), (5, 370), (10, 590), (15, 740) ,
(20, 840)}, a one-word name for this type of graph is _ogive_ , and the last point on the line could
be (45, _1000_ ) Explanation: The x points are the upper limits of the class, starting at the last
empty class. The y points are the cumulative frequencies, gotten by multiplying the Frel column
by 1000. When the graph gets to x = 35, y hits 1000 and is 1000 for all subsequent points.
2
251y0312 9/26/03
8.
Referring to Table 2-5, what is Frel for the percentage of calls that lasted under 20 minutes?
a) 0.10
b) 0.76
c) *0.84
Look at the table.
d) None of the above – write in the correct answer.
TABLE 2-7
The stem-and-leaf display below contains data on the number of months between the date a civil suit is filed and
when the case is actually adjudicated for 50 cases heard in superior court.
Stem
Leaves
1
234447899
2
22223455678889
3
0011135778
4
02345579
5
112466
6
158
9.
Referring to Table 2-7, the civil suit with the fourth shortest waiting time between when the suit
was filed and when it was adjudicated had a wait of _14__ months. Explanation: The first four
numbers are 12, 13, 14, 14.
k
n
x  x 3 , 33 ,
10. Eunice computes the following statistics from a sample
(n  1)( n  2)
s

 x  x 


x  x 4 3n  13 s 4 
3mean  mode
n2

 . She
,
, k4 
n  1


std .deviation
n 1
n  1n  2n  3 
n
n2


thinks the sample represents a population that is skewed to the right. Which of the statistics would
show skewness and what sign should she expect from them? (No partial credit on this one.)
Answer: Any legitimate measure of skewness would be positive if the population is skewed to the
n
x  x 3 right. From your formula table, the measures of skewness are: (i) k 3 
(n  1)( n  2)
2

skewness, (ii) g1 
k3
s
3
skewness.
The other two are s
2
- relative skewness and (iii) SK 
 x  x 

n 1
3mean  mode 
- Pearson’s measure of
std .deviation
2
- the sample variance, which is always positive and


x  x 4 3n  13 s 4 
n2
n  1
 - the


n  1n  2n  3 
n
n2


coefficient of excess (in the outline), which measures kurtosis.
measures dispersion and
k4 
11. In a perfectly symmetrical distribution with one mode.
a) the arithmetic mean equals the median.
b) the median equals the mode.
c) the arithmetic mean equals the mode.
d) *all of the above.
e) none of the above.
3
251y0312 9/26/03
12. According to the Bienayme-Chebyshev rule (I called it Chebyshef’s Inequality), at least 93.75% of
all observations in any data set are contained within a distance of how many standard deviations
around the mean?
a) 1
b) 2
c) 3
d) *4
Explanation: If at least 93.75% are ‘in,’ then at most 6.25% are out in the tails. The rule says
that 1 k 2 is the proportion in the tails, defined as the points below   k and the points
above   k . If you try out the values here, you will find
More directly, you could solve 1 
1
k2
1
42
 116  .0625, so k must be 4.
 .9375 , by trying the four values of k that were
given. This is a problem that was done in class.
13. Evaluate the following statements. (i) The median of the values 3.4, 4.7, 1.9, 7.6, and 6.5 is 4.05.
(ii) In a set of numerical data, the value for Q3 can never be smaller than the value for Q1. (iii) In a
set of numerical data, the value for Q2 is always halfway between Q1 and Q3.
a) (i) and (ii) are false.
b) *(i) and (iii) are false.
c) (ii) and (iii) are false
d) Only one of the statements is false.
e) All of the statements are false.
Explanation: The numbers in order are 1.9, 3.4 ,4.7 ,6.5 ,7.6 , so the median is 4.7 and (i) is
wrong. The order of the quartiles is Q1, median, Q3. If all the middle numbers are the same,
Q3 could equal both the median and Q1, but it could never be smaller than Q1, so (ii) is true.
Q2 is the second quartile and it could be any value between Q1 and Q3, depending of what the
numbers are. Its position, however, is halfway between them, so (iii) is false.
14. Which one of the following statements is false?
a) In a sample of size 40, the sample mean is 15. In this case, the sum of all observations in
x  600 .
the sample is

b) *A population with 200 elements has an arithmetic mean of 10. From this
information, it can be shown that the population standard deviation is 15.
c) The median of a data set with 20 items would be the average of the 10th and 11th items in
the ordered array.
d) The coefficient of variation measures variability in a data set relative to the size of the
arithmetic mean.
e) If every possible group of 10 individuals in the population is equally likely to be chosen
to be in the sample, we must be taking a simple random sample of 10.
f) All of the above statements are false.
15. Which of the following is NOT a measure of central tendency?
a) the arithmetic mean
b) the geometric mean
c) the mode
d) *the interquartile range
4
251y0312 9/26/03
16. Which of the following is most sensitive to extreme values?
a) the median
b) the interquartile range
c) *the arithmetic mean
d) the 1st quartile
5
251y0312 9/26/03
Part II. (Ng pp 77-79) (8 points)
The data below represent the amount of grams of carbohydrates in a serving of breakfast cereal. It is a
x  217 ,
x 2  4541
sample containing 11 numbers. Note:


{11, 15, 23, 29, 19, 22, 21, 20, 15, 25, 17}
Find:
a) The First Quartile (1.5)
b) The Standard Deviation (2)
c) The Coefficient of variation (1.5)
d) The five-number summary (3)
 x , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x8 , x 9 , x10 , x11 
Solution: a) Put the numbers in order.  1
.
11, 15, 15, 17 , 19 , 20 , 21, 22 , 23, 25, 29 
n  11, so the first quartile is at position  pn  1  .2512   3.0 , and Q1  x3  15 . Or if
a.b  3.0, x1 p  x.75  xa  .bxa1  xa   x3  0x 4  x3   15  017  15.
 x  217  19.7273 , so, using the computational formula, s   x
b) x 
2
n
2
 nx 2
n 1
11
4541  1119 .7273 
260 .17

 26 .017 . s  26.017  5.101 .
10
10
st .deviation s
5.101
 
 0.2586 .
c) C 
mean
x 19 .7273
d) For the median position  pn  1  .512   6.0 and for the third quartile, position  pn  1  .7512 
 9.0 . So, x.50  x6  20 and Q3  x.75  x9  23. The 5 number summary would be {lower bound, Q1,

2
median, Q3, upper bound} or 11, 15, 20, 23, 29 .
6
251x0312 9/23/03
ECO251 QBA1
FIRST EXAM
October 1, 2003
TAKE HOME SECTION
Name: _____KEY________________
Social Security Number: _________________________
Throughout this exam show your work! Please indicate clearly what sections of the problem you are
answering and what formulas you are using.
Part III. Do all the Following (11 Points) Show your work!
1. My Social Security Number is 265398248. If I use each digit as a frequency in and the intervals below, I
get:
Class
Frequency
$0- 5999
$6000- 11999
$12000- 17999
$18000- 23999
$24000- 29999
$30000- 35999
$36000- 41999
$42000- 47999
$48000- 53999
Assume that this data represents a sample of rents paid in
Chester County.
a. Calculate the Cumulative Frequency (0.5)
b. Calculate The Mean (0.5)
c. Calculate the Median (1)
d. Calculate the Mode (It is possible but unlikely that there is
more than one)(0.5)
e. Calculate the Variance (1.5)
f. Calculate the Standard Deviation (1)
g. Calculate the Interquartile Range (1.5)
h. Calculate a Statistic showing Skewness and Interpret it
(1.5)
i. Make a frequency polygon of the Data (Neatness
Counts!)(1)
j. Extra credit: Put a (horizontal) box plot below the
histogram using the same scale. (1)
2
6
5
3
9
8
2
4
8
Replace my Social Security number with your own in the
frequency column. To make the problem easier, you may
replace all zeros in your new frequency column with 10s.
Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999. Note
also, that the midpoints and class limits have been divided by 1000. Most numbers should be multiplied by
1000, the variance should be multiplied by 1,000,000 and k 3 by 1,000,000,000.
class
A
B
C
D
E
F
G
H
I
0- 5.999
6-11.999
12-17.999
18-23.999
24-29.999
30-35.999
36-41.999
42-47.999
48-53.999
f F x
2
6
5
3
9
8
2
4
8
47
2
8
13
16
25
33
35
39
47
3
9
15
21
27
33
39
45
51
6
54
75
63
243
264
78
180
408
1371
18
486
1125
1323
6561
8712
3042
8100
20808
50175
fx3 x  x
54
4374
16875
27783
177147
287196
118638
364500
1061208
2058075
-26.1702
-20.1702
-14.1702
- 8.1702
- 2.1702
3.8298
9.8298
15.8298
21.8298
f x  x  f x  x 2 f x  x 3
-52.340 1369.76 -35846.9
-121.021 2441.02 -49236.0
-70.851 1003.97 -14226.5
-24.511
200.26 -1636.1
-19.532
42.39
-92.0
30.638
117.34
449.4
19.660
193.25
1899.6
63.319 1002.33 15866.6
174.638 3812.32 83222.1
0.000 10182.64
400.2
 fx  50175 ,  fx  2058075 ,  f x  x   0,
 f x  x 2  10182.64, and  f x  x 3  400.2. Note that, to be reasonable, the mean, median and
n
 f  47,  fx
fx 2
fx
 1371 ,
2
3
quartiles must fall between 0 and 54.
a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column.
b. Calculate the Mean (1): x 
 fx  1371  29.1702
n
47
7
c. Calculate the Median (2): position  pn  1  .548   24 . This is above F  16 and below F  25, so
 pN  F 
the interval is E, 24-29.999 in thousands. x1 p  L p  
 w so
 f p 
 .547   16 
x1.5  x.5  24  
 6  24  0.83333 10  24 .5000
9


d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 9 is the largest frequency,
the modal group is E, 24 to 29.999 and the mode is 27 (in thousands).
e. Calculate the Variance (3): s 2 
s2 
 f x  x 
2

n 1
 fx
2
 nx 2
n 1

51075  47 29 .1702 2 11082 .673

 221 .3627 or
46
46
10182 .64
 221 .3617 . The computer got 221.362. (in millions)
46
f. Calculate the Standard Deviation (2): s  221.3627  14.8783 or s  221.3617  14.8782 (in
thousands)
g. Calculate the Interquartile Range (3): First Quartile: position  pn  1  .2548   12 . This is above
 pN  F 
F  8 and below F  13, so the interval is C, 12-17.999. x1 p  L p  
 w gives us, in thousands,
 f p 
 .25 47   8 
Q1  x1.25  x.75  12  
 6  16 .500 .
5


Third Quartile: position  pn  1  .7548   36 . This is above F  35 and below F  39, so the interval
 .7547   35 
is H, 42-47.999. x1.75  x.25  42  
 6  42 .375 .
4


IQR  Q3  Q1  42.375 16.500  25.875 (in thousands).
h. Calculate a Statistic showing Skewness and interpret it (3):
n
k 3
fx 3  3x
fx 2  2nx 3  47 2058075  329.1702 50175  247 29.1702 3
(n  1)( n  2)
4645 





0.0227053 2058075  4390844 .4  2333168 .3
 0.0227053 399 .3  9.066 .
or k 3 
g1 
k3
s
3
n
(n  1)( n  2)

 f x  x 
9.085
14 .8782 3
3

47
400 .2  9.087 (The computer gets 9.0849) or
46 45 
 .00276
3mean  mode 329 .1702  27 

 0.4376
std .deviation
14 .8782
Because of the positive sign, the measures imply skewness to the right.
i. A frequency polgon is a simple line graph with frequency on the y-axis and the numbers 0- 54 (thousand)
on the x-axis. Since class A has a frequency of 2 plotted at x = 3 and the class width is 6, it should really
start at x = -3 and y = 0. You should, at least show, the line falling across the y axis. Sinne the last nonempty class is 48-53.999, with its frequency plotted at x = 51, there should be a zero at x = 57.
j. The box plot should show the median and the quartiles. (See text)
or
Pearson's Measure of Skewness SK 
8
251y0312 9/26/03
2. My Social Security Number is 265398248. If I write it in clumps of 2 numbers and add 100 to the end, I
get:
26, 53, 98, 24, 8, 100.
Write your social security number the same way, so that you have a list of six numbers. Note: If any of these
five numbers is a zero, change it to a one. For these five numbers, compute the a) Geometric Mean b)
Harmonic mean, c) Root-mean-square (1point each). Label each clearly. If you wish, d) Compute the
geometric mean using natural or base 10 logarithms. (1 points extra credit each ).
Solution: Note that
 x  209 . This is not used in any of the following calculations and there is no reason
why you should have computed it!
a) The Geometric Mean.
1
x g  x1  x 2  x3  x n  n  n
 25928448
x 
26 5398 24 8100   6 2592844800
5
 2592844800

1
6
0.16667  37.0648 .
b) The Harmonic Mean.
1 1

xh n
1 1
1
1
0.20289454
6

1 
 x  6  26  53  98  24  8  100   6 0.0384615  0.0188679  0.010204  0.00036099
1
1
1
  0.0338157
1
.
So xh 
1
1
1
n

1
x

 0.125  0.01
1
 29 .57208
0.0338157
c) The Root-Mean-Square.
1
1
1
2
x rms

x 2  26 2  53 2  98 2  24 2  8 2  100 2  676  2809  9604  576  64  10000 
n
6
6



1
23729   3954 .83 . So x rms 
6

1
n
x
2
 3954 .83  62 .8875 .
d) (i)
 
ln x g 
1
n
 ln( x)  6 ln 26   ln 53  ln 98   ln 24   ln 8  ln 100 
1
1
3.25809  3.97029  4.58497  3.17805  2.07944  4.60517   1 21 .67594   3.6127
6
6

So x g  e 3.6127  37 .0644 .
(ii)
 
log x g 

1
n
 log( x)  6 log26   log53  log98   log24   log8  log100  
1
1
1.41497  1.72428  1.99123  1.38021  0.90309  2.00000   1 9.41378   1.56896 .
6
6
So x g  10 1.56896  37 .0649 .
Notice that the original numbers and all the means are between 8 and 100.
9