Download CHAPTER 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
CHAPTER 1
INTRODUCTION AND DESCRIPTIVE STATISTICS
1-1.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
quantitative/ratio
qualitative/nominal
quantitative/ratio
qualitative/nominal
quantitative/ratio
quantitative/interval
quantitative/ratio
quantitative/ratio
quantitative/ratio
quantitative/ratio
quantitative/ordinal
1-2.
Data are based on numeric measurements of some variable, either from a data set comprising an
entire population of interest, or else obtained from only a sample (subset) of the full population.
Instead of doing the measurements ourselves, we may sometimes obtain data from previous
results in published form.
1-3.
The weakest is the Nominal Scale, in which categories of data are grouped by qualitative
differences and assigned numbers simply as labels, not usable in numeric comparisons. Next in
strength is the Ordinal Scale: data are ordered (ranked) according to relative size or quality, but
the numbers themselves don't imply specific numeric relationships. Stronger than this is the
Interval Scale: the ordered data points have meaningful distances between any two of them,
measured in units. Finally is the Ratio Scale, which is like an Interval Scale but where the ratio of
any two specific data values is also measured in units and has meaning in comparing values.
1-4.
Fund:
Style:
US/Foreign:
10 yr Return:
Expense Ratio:
1-5.
Ordinal.
1-6.
A qualitative variable describes different categories or qualities of the members of a data set,
which have no numeric relationships to each other, even when the categories happen to be coded
as numbers for convenience. A quantitative variable gives numerically meaningful information,
in terms of ranking, differences, or ratios between individual values.
Qualitative
Qualitative
Qualitative
Quantitative
Quantitative
1
1-7.
The people from one particular neighborhood constitute a non-random sample (drawn from the
larger town population). The group of 100 people would be a random sample.
1-8.
A sample is a subset of the full population of interest, from which statistical inferences are drawn
about the population, which is usually too large to permit the variables to be measured for all the
members.
1-9.
A random sample is a sample drawn from a population in a way that is not a priori biased with
respect to the kinds of variables being measured. It attempts to give a representative cross-section
of the population.
1-10.
Nationality: qualitative. Length of intended stay: quantitative.
1-11.
Ordinal. The colors are ranked, but no units of difference between any two of them are defined.
1-12.
Income:
Number of dependents:
Filing singly/jointly:
Itemized or not:
Local taxes:
1-13.
Lower quartile = 25th percentile = data point in position (n + 1)(25/100) =
34(25/100) = position 8.5. (Here n = 33.) Let us order our observations: 109, 110,
114, 116, 118, 119, 120, 121, 121, 123, 123, 125, 125, 127, 128, 128, 128, 128, 129, 129, 130,
131, 132, 132, 133, 134, 134, 134, 134, 136, 136, 136, 136.
Lower quartile = 121
Middle quartile is in position: 34(50/100) = 17. Point is 128.
Upper quartile is in position: 34(75/100) = 25.5. Point is 133.5
10th percentile is in position: 34(10/100) = 3.4. Point is 114.8.
15th percentile is in position: 34(15/100) = 5.1. Point is 118.1.
65th percentile is in position: 34(65/100) = 22.1. Point is 131.1.
IQR = 133.5 - 121 = 12.5.
quantitative, ratio
quantitative, ratio
qualitative, nominal
qualitative, nominal
quantitative, ratio
2
Percentile and Percentile Rank Calculations
x
10
15
65
x-th
Percentile
116.4
118.8
130.8
1st Quartile
Median
3rd Quartile
121
128
133
y
116.4
118.8
130.8
Percentile
rank of y
10
15
65
Quartiles
IQR
12
1-14.
First, order the data:
-1.2, 3.9, 8.3, 9, 9.5, 10, 11, 11.6, 12.5, 13, 14.8, 15.5, 16.2, 16.7, 18
The median, or 50th percentile, is the point in position 16(50/100) = 8. The point is 11.6.
First quartile is in position 16(25/100) = 4. Point is 9.
Third quartile is in position 16(75/100) = 12. Point is 15.5.
55th percentile is in position 16(55/100) = 8.8. Point is 12.32.
85th percentile is in position 16(85/100) = 13.6. Point is 16.5.
1-15.
Order the data:
38, 41, 44, 45, 45, 52, 54, 56, 60, 64, 69, 71, 76, 77, 78, 79, 80, 81, 87, 88, 90, 98
Median is in position 23(50/100) = 11.5. Point is 70.
20th percentile is in position 23(20/100) = 4.6. Point is 45.
30th percentile is in position 23(30/100) = 6.9. Point is 53.8.
60th percentile is in position 23(60/100) = 13.8. Point is 76.8.
90th percentile is in position 23(90/100) = 20.7. Point is 89.4.
Percentile and Percentile Rank Calculations
x-th
x
Percentile
20
46.4
30
54.6
60
76.6
y
46.4
54.6
76.6
Quartiles
1st Quartile
Median
3rd Quartile
52.5
70
79.75
IQR
3
27.25
1-16.
Order the data: 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7.
Lower quartile is the 25th percentile, in position 16(25/100) = 4. Point is 2.
The median is in position 16(50/100) = 8. The point is 3.
Upper quartile is in position 16(75/100) = 12. Point is 5.
IQR = 5 - 2 = 3.
60th percentile is in position 16(60/100) = 9.6. Point is 4.
Percentile and Percentile Rank Calculations
x-th
x
Percentile
60
4
1
1
y
4.0
0
0
Quartiles
1st Quartile
Median
3rd Quartile
1-17.
2
3
5
IQR
3
The data are already ordered; there are 16 data points. The median is the point in position
17(50/100) = 8.5 It is 51.
Lower quartile is in position 17(25/100) = 4.25. It is 30.5.
Upper quartile is in position 17(75/100) = 12.75. It is 194.25.
IQR = 194.25 - 30.5 = 163.75.
45th percentile is in position 17(45/100) = 7.65. Point is 42.2.
Percentile and Percentile Rank Calculations
x-th
x
Percentile
45
43
y
43.0
0
0
Quartiles
1st Quartile
Median
3rd Quartile
1-18.
31.5
51
162.75
IQR
131.25
The mean is a central point that summarizes all the information in the data. It is sensitive to
extreme observations. The median is a point "in the middle" of the data set and does not contain
all the information in the set. It is resistant to extreme observations. The mode is a value that
occurs most frequently.
4
1-19.
Mean, median, mode(s) of the observations in Problem 1-13:
Mean  x   xi  126 .64
Median = 128
Modes = 128, 134, 136 (all have 4 points)
Measures of Central tendency
Mean 126.63636
1-20.
For the data of Problem 1-14:
Mean = 11.2533
Median = 11.6
Mode: none
1-21.
For the data of Problem 1-15:
Mean = 66.955
Median = 70
Mode = 45
Median
128
Median
70
Mode
128
Measures of Central tendency
Mean 66.954545
1-22.
Mode
45
For the data of Problem 1-16:
Mean = 3.466
Median = 3
Mode = 1 and 2
Measures of Central tendency
Mean 3.4666667
1-23.
Median
3
Mode
1
For the data of Problem 1-17:
Mean = 199.875
Median = 51
Mode: none
Measures of Central tendency
Mean
199.875
Median
51
5
Mode
#N/A
1-24.
For the data of Example 1-1:
Mean = 163,260
Median = 166,800
Mode: none
1-25.
(Using the template: “Basic Statistics.xls”, enter the data in column K.)
Basic Statistics from Raw Data
Measures of Central tendency
Mean 21.75
1-26.
Median
Mode 12
13
(Using the template: “Basic Statistics.xls”)
Frequency
2.5
2
1.5
1
0.5
0
-2.6
-1.2
0.3
0.6
3.4
4.3
Mean = .0514
Median = 0.3
Outliers: none
1-27.
Mean = 592.93
Median = 566
Std Dev = 117.03
QL = 546
QU = 618.75
Outliers: 940
Suspected Outlier: 399
1-28.
Measures of variability tell us about the spread of our observations.
1-29.
The most important measures of variability are the variance and its square root- the standard
deviation. Both reflect all the information in the data set.
1-30.
For a sample, we divide the sum of squared deviations from the mean by n – 1, rather than by n.
6
1-31.
For the data of Problem 1-13, assumed a sample: Range = 136 – 109 = 27
Variance = 57.74
Standard deviation = 7.5986
Variance
St. Dev.
If the data is of a
Sample
Population
57.7386364
55.9889807
7.59859437
7.48257848
1-32.
For the data of Problem 1-14: Range = 18 – (–1.2) = 19.2
Variance = 25.90 Standard deviation = 5.0896
1-33.
For the data of Problem 1-15: Range = 98 – 38 = 60
Variance = 321.38 Standard deviation = 17.927
If the data is of a
1-34.
Sample
Population
Variance
321.378788
306.770661
St. Dev.
17.9270407
17.5148697
For the data of Problem 1-16: Range = 7 – 1 = 6
Variance = 3.98 Standard deviation = 1.995
Variance
St. Dev.
1-35.
If the data is of a
Sample
Population
3.98095238
3.71555556
1.99523241
1.92757764
For the data of Problem 1-17: Range = 1,209 – 23 = 1,186
Variance = 110,287.45 Standard deviation = 332.096
If the data is of a
Sample Population
Variance 110287.45 103394.484
St. Dev. 332.095543 321.550127
1-36.
n  33, x  126.64, s  7.60, so x  2s  111.44,141.84 ; this captures 31/33 of the data points,
so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply.
1-37.
n  15, x  11.253, s  5.090, so x  2s  1.073, 21.433 ; this captures 14/15 of the data points, so
Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply
7
1-38.
n  22, x  66.95, s  17.93, so x  2s  31.09,102.81 ; this captures all the data points, so
Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply.
1-39.
1-40.
n  15, x  3.467 , s  1.995, so x  2s   0.523, 7.457  ; this captures all the data points, so
Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply.
n  16, x  199.9, s  332.1, so x  2s   464.3, 864.1; this captures 15/16 of the data points,
so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply.
1-41.
Electrolux
GE
Matsushita
Whirlpool
B-S
Philips
Maytag
1-42.
Stock 5
Stock 4
Stock 3
Stock 2
Stock 1
0
5
10
15
20
8
1-43.
Mean = 0.917
Median = 0.85
Std dev = 0.4569
Annual Percentage Yields
2.5
2
1.5
yields
1-44.
1
0.5
0
Chase
Citi
Fleet
HSBC
Banco
Popular
Banks
9
North
Fork
Valley
Nat'l
PNC
M&T
1-45.
Mean = $18.53 Median = $15.93
Average Book Prices
Adult MM paper
8%
Adult Trade
17%
Adult Nonfiction
31%
Adult Nonfiction
Adult Fiction
Children's HC
Adult Trade
Adult MM paper
Children's HC
17%
Adult Fiction
27%
1-46.
1-47.
Using MINITAB
Stem
Leaves
4 5
5688
8 6
0123
14 6
677789
(9) 7
002223334
11 7
55667889
3 8
224
10
Box and Whisker Plot
1-48.
8.5
7.9
C1
7.3
6.7
6.1
5.5
34 cases
There are no outliers. Distribution is skewed to the left.
1-49.
A stem-and-leaf display is a quickly drawn type of histogram useful in analyzing data. A box
plot is a more advanced display useful in identifying outliers and the shape of the distribution of
the data.
1-50.
Stem
1 0
1 1
1 2
7 3
(13) 4
11 5
2 6
1 7
1-51.
Leaves
5
234578
2234567788899
012235678
3
8
The data are narrowly and symmetrically concentrated near the median (IQR and the whisker
lengths are small), not counting the two extreme outliers.
Box and Whisker Plot
80
C1
60
40
20
0
31 cases
11
1-52.
Wider dispersion in data set #2. Not much difference in the lower whiskers or lower hinges of
the two data sets. The high value, 24, in data set #2 has a significant impact on the median,
upper hinge and upper whisker values for data set #2 with respect to data set #1.
1-53.
Mean = 127
Var = 137
sd = 11.705
mode = 127
outliers: TWA, Lufthansa
160
150
140
130
120
110
100
1-54.
Stem-and-leaf of C2
Leaf Unit = 1.0
f
13
18
(6)
21
15
8
6
3
2
Stem
1
1
2
2
3
3
4
4
5
N = 45
Leaves
0011111223444
55689
022333
567789
0122234
78
012
7
23
12
1-55.
Outliers are detected by looking at the data set, constructing a box plot or stem-and-leaf display.
An outlier should be analyzed for information content and not merely eliminated.
1-56.
The median is the line inside the box. The hinges are the upper and lower quartiles. The inner
fences are the two points at a distance of 1.5 (IQR) from the upper and lower quartiles. Outer
fences are similar to the inner fences but at a distance of 3 (IQR). The box itself represents 50%
of the data.
1-57.
Mine A:
f
2
4
7
(5)
7
4
4
3
1
Stem
3
3
4
4
5
5
6
7
8
Leaves
24
57
123
55689
123
0
36
5
Mine B:
f
2
4
6
9
(3)
7
4
1
Stem
2
2
3
3
4
4
5
5
Leaves
34
89
24
578
034
789
012
9
Values for Mine A are smaller than for Mine B, right-skewed, and there are three outliers. Values
for Mine B are larger and the distribution is almost symmetric. There is larger variance in B.
1-58.
No. One needs to use descriptive statistics and/or statistical inference.
1-59.
Comparing two data sets using Box Plots
Lower Lower
Upper Upper
Whisker Hinge Median Hinge Whisker
Shipments 1.3
1.975
2.4
3.4
4.2
Market Share 3.6
5.3
6.55
9.275
11.4
Shipments
Market Share
13
1-60.
Mean = 5.785 median = 5.782
The mean is impacted by the high rate of fatalities for the very small car classification.
Fatality Rates
Minivans
Large cars
Very small cars
Large SUVs
Very small cars
Small cars
Compact pickups
Midsize SUVs
Large pickups
Small SUVs
Small cars
Midsize cars
Midsize cars
Large pickups
Large SUVs
Small SUVs
Compact pickups
Minivans
Midsize SUVs
1-61.
Large cars
Answers will vary.
a. If we add the value “5” to all the data points, then the average, median, mode, first quartile,
third quartile and 80th percentile values will change by “5”. There is no change in the
variance, standard deviation, skewness, kurtosis, range and interquartile range values.
b. Average: if we add “5” to all the data points, then the sum of all the numbers will increase
by “5*n”, where n is the number of data points. The sum is divided by n to get the average.
So 5*n / n = 5: the average will increase by “5”.
Median: If we add “5” to all the data points, the median value will still be the midway point
in the ordered array. Its value will also increase by “5”
Mode: Adding “5” to all the data points changes the number that occurs most frequently by
“5”
First Quartile: adding “5” to all the data points does not change the location of the first
quartile in the ordered array of numbers, which is: (.25)(n+1) where n is the number of data
points. Whether the first quartile falls on a specific data point or between two data points,
the resulting value will have been increased by “5”.
Third Quartile: adding “5” to all the data points does not change the location of the third
quartile in the ordered array of numbers, which is: (.75)(n+1) where n is the number of data
points. Whether the third quartile falls on a specific data point or between two data points,
the resulting value will have been increased by “5”.
14
80th percentile: adding “5” to all the data points has the same effect as in the calculation of
the first or third quartile. The value will be increased by “5”
Range: adding “5” to the all the data points will have no effect on the calculation of the
range. Since both the highest value and the lowest value have been increase by the same
number, the subtraction of the lowest value from the highest value still yields the same value
for the range.
Variance: adding “5” to all the data points has no effect on the calculation of the variance.
Since each data point is increased by “5” and the average has also been shown to increase by
the same factor, the differences between each individual new data point and the new average
will not change and will not be affected by squaring the difference, summing the squared
differences and dividing by number of data points.
Standard Deviation: since the variance is not affected by adding “5” to each data point,
neither is the standard deviation.
Skewness: Since each data point is increased by “5” and the average has also been shown to
increase by the same factor, the differences between each individual new data point and the
new average will not change. Therefore, the numerator in the formula for skewness is not
affected. Since the standard deviation is not affected as well (the denominator), there is no
change in the value for skewness.
Kurtosis: Since each data point is increased by “5” and the average has also been shown to
increase by the same factor, the differences between each individual new data point and the
new average will not change. Therefore, the numerator in the formula for kurtosis is not
affected. Since the standard deviation is not affected as well (the denominator), there is no
change in the value for kurtosis.
Interquartile Range: given that both the first quartile and the third quartile increased by the
same factor, “5”, the difference between the two values remains the same.
c. Multiplying each data point by a factor “3” results in the following changes. The mean,
median, mode, first quartile, third quartile and 80th percentile values will be increased by the
same factor “3”. In addition, the standard deviation and the range will also increase by the
same factor “3”. The variance will increase by the factor squared, and the skewness and
kurtosis values will remain unchanged.
d. Multiplying all data points by a factor “3” and adding a value “5” to each data point has the
following results. The order of operation is first to multiply each data point and then add a
value to each data point. Each data point is first multiplied by the factor “3” and then the
value “5” is added to each newly multiplied data point. Multiplying each data point by the
factor “3” yields the results listed in c). Adding a value 5 to the newly multiplied data points
yields the results listed in a).
1-62.
x  74.7
s = 13.944
s2 = 194.43
15
1-63.
 = 504.688
 = 94.547
Measures of Central tendency
Mean 504.6875
Median
501.5
Mode
#N/A
Range
IQR
346
149.5
Measures of Dispersion
Variance
St. Dev.
If the data is of a
Sample
Population
9227.5121
8939.15234
96.0599401 94.5470906
1-64.
Step 1: Enter the data from problem 1-63 into cells Y4:Y35 of the template: Histogram.xls from Chapter
1. The template will order the data automatically.
Step 2: We need to select a starting point for the first class, an ending point for the last class, and a class
interval width. The starting point of the first class should be a value less than the smallest value
in the data set. The smallest value in the data set is 344, so you would want to set the first class
to start with a value smaller than 344. Let’s use 320. We also selected 710 as the ending value
of the last class, and selected 50 as the interval width. The data input column and the histogram
output from the template are presented below. The end-point for each class is included in that
class; i.e., the first class of data goes from more than 320 up to and including 370, the second
class starts with more than 370 up to and including 420, etc.
16
1-65.
Range: 690 – 344 = 346
90th percentile lies in position: 33(90/100) = 29.7 It is 632.7
First quartile lies in position: 33(25/100) = 8.25 It is 419.25
Median lies in position: 33(50/100) = 16.5
It is 501.5
Third quartile lies in position: 33(75/100) = 24.75 It is 585.75
1-66.
17
1-67.
2
7
(3)
6
4
2
2
Stem
1
1
2
2
3
3
4
Leaves
24
56789
023
55
24
01
Box and Whisker Plot
1-68.
42
36
C2
30
24
18
12
The data is skewed to the right.
1-69.
Stem Leaves
3 1
012
4 1
9
12 2
1122334
(9) 2
556677889
6 3
024
3 3
57
1 4
1 4
1 5
1 5
1 6
2
The data is skewed to the right with one extreme outlier (62) and three suspected outliers
(10,11,12)
18
Box and Whisker Plot
1-70. 80
C1
60
40
20
0
1-71.
Mean = 25.857
sd = 9.651
D
Media Cos.
1-72.
Mean = 18.875 var = 38.65 outliers: none
Box and Whisker Plot
34
C1
26
18
10
16 cases
19
Vi
ac
om
C
is
ne
In
y
te
rA
ct
iv
eC
or
p
Li
be
rty
M
ed
ia
N
ew
s
C
or
p
Ti
m
e
W
ar
ne
r
40
35
30
25
20
15
10
5
0
om
ca
st
price
Stock Prices
1-73.
Mean = 33.271
sd = 16.945
var = 287.15
QL = 25.41
Med = 26.71
QU = 35
Outliers: Morgan Stanley (91.36%)
Box and Whisker Plot
100
C1
80
60
40
20
15 cases
1-74.
Mean = 3.18
sd = 1.348
var = 1.817
QL = 1.975
Med = 2.95
QU = 3.675
Outliers: 8.70
Box and Whisker Plot
9
C1
7
5
3
1
20 cases
20
1-75.
a.
b.
c.
d.
IQR = 3.5
data is right-skewed
9.5 is more likely to be the mode, since the data is right-skewed
Will not affect the plot.
1-76.
Bar graph showing changes over time. Both the employee’s out-of-pocket and payroll deduction
expenses have increased substantially over the last three years.
1-77.
Mean (billions of tons) = 1.439
Mean (per capita tons) = 9.98
The mathematical computation for both averages is the same, however, they do differ in
meaning. On average, the countries listed emit 1.439 billion tons of carbon dioxide each.
However, the emissions per person is 9.98 tons. Dividing billions of tons by the rate per capita
for the US, we get a population estimate of 256 million people, which is close to the actual
population for 1997.
1-78.
Mean = 2.75
sd = 14.44
var = 208.59
QL = 5.075
Med = 7.9
QU = 13.675
Outliers: –30.2
Box and Whisker Plot
20
C1
0
-20
-40
8 cases
21
1-79.
Mean = 10301.05
sd = 16.916
var = 286.155
(Using the template: “Basic Statistics.xls”)
Measures of Central tendency
Mean 10301.05
Median
10300.5
Mode
10300
Range
IQR
54
16.25
Measures of Dispersion
If the data is of a
Sample
Population
Variance 286.155263
271.8475
St. Dev. 16.9161244 16.4877985
1-80.
Mean = 99.039
sd = .4366
var = .1907
Median = 99.155
1-81.
Mean = 17.587
sd = .466
var = .2172
Measures of Central tendency
Mean
Median
17.5875
17.5
Mode
18.3
Range
IQR
1.4
0.75
Mode
#N/A
Range
IQR
12.38
2.92
Measures of Dispersion
If the data is of a
Sample
Population
Variance 0.21716667 0.20359375
St. Dev. 0.46601144 0.45121364
1-82.
Mean = 29.018
sd = 4.611
(Using the template: “Basic Statistics.xls”)
Measures of Central tendency
Mean
Median
29.018
29.75
Measures of Dispersion
Variance
St. Dev.
If the data is of a
Sample
Population
21.26552
17.012416
4.6114553 4.12461101
22
1-83.
Mean = 4.8394
sd = .08
Median = 4.86
1-84.
Stock Prices for period: April, 2001 through June, 2001 [Answers will vary due to dates used.]
a). Mean and Standard Deviation for Wal-Mart
Basic Statistics from Raw Data
Stock Prices: Wal-Mart
Measures of Central tendency
Mean
51.041478
Median
51.1266
Mode
50.158
Range
IQR
6.1911
1.9613
Measures of Dispersion
If the data is of a
Sample Population
Variance 2.25711298 2.22128579
St. Dev. 1.50236912 1.49039786
Higher Moments
If the data is of a
Sample Population
Skewness 0.07083784 0.06913994
(Relative) Kurtosis -0.711512 -0.7500338
23
b). Mean and Standard Deviation for K-Mart
Basic Statistics from Raw Data
Stock Prices: K-Mart
Measures of Central tendency
Mean
10.450952
Median
10.66
Mode
11.8
Range
IQR
3.51
1.955
Measures of Dispersion
If the data is of a
Sample Population
Variance 0.9852023 0.96956417
St. Dev. 0.99257358 0.9846645
Higher Moments
If the data is of a
Sample Population
Skewness -0.4070262 -0.3972703
(Relative) Kurtosis -1.132009 -1.1378913
c). Coefficient of variation:
CV = std. dev  mean
For Wal-Mart:
considering the data as a population:
CV = 1.49039786 / 51.041478 = 0.0292
considering the data as a sample:
CV = 1.50236912 / 51.041478 = 0.02943
for K-Mart:
CV = 0.9846645 / 10.450952 = 0.0942
CV = 0.99257358 / 10.450952 = 0.09497
d). There is a greater degree of risk in the stock prices for K-Mart than for Wal-Mart over this
three month period.
e). For DJIA
considering the data as a population:
CV = 427.913791 / 10681.11 = 0.04006
considering the data as a sample:
CV = 431.350905 / 10681.11 = 0.04038
Wal-Mart stocks provided a less risky return for this time period relative to DJIA and K-Mart.
f). 100 Shares of Wal-Mart stocks purchased April 2, 2001:
Price = $50.5674 Cost = $5056.74
Mean of holding 100 shares: $5104.15
Std dev of holding 100 shares: 1.4904 (rounded: if data considered a population)
1.5024 (rounded: if data considered a sample)
24
1-85.
a). for a process mean = 2004
VARP = Average SSD2004 + offset2
VARP = 3.5 + offset2
where offset = target – process
b). if target = process, then offset = 0
substituting: VARP = 3.5 + offset2 = 3.5 + 02 = 3.5
1-86.
a) & b): CPI and Gas prices for period: June 97 through May 01. (Non-seasonally adjusted series.)
CPI index converted (by  100) in order to compare both series on same chart. There is no seasonal
pattern present in the CPI index. Steady trend present in CPI; considerable variability in gas prices. Gas
prices increased considerably more than the overall CPI for the same time period.
25
1-87.
a). Pie Chart: AIDS cases by Age groups
Age Group
Under 5:
Ages 5 to 12:
Ages 13 to 19:
Ages 20 to 24:
Ages 25 to 29:
Ages 30 to 34:
Ages 35 to 39:
Ages 40 to 44:
Ages 45 to 49:
Ages 50 to 54:
Ages 55 to 59:
Ages 60 to 64:
Ages 65 or older:
No.
6812
1992
3865
26518
99587
168723
168778
124398
72128
38118
20971
11636
10378
%
0.90%
0.26%
0.51%
3.52%
13.21%
22.38%
22.39%
16.50%
9.57%
5.06%
2.78%
1.54%
1.38%
AIDS cases by age
Under 5: (0.90%)
Ages 5 to 12: (0.26%)
Ages 65 or older: (1.38%)
Ages 13 to 19: (0.51%)
Ages 60 to 64: (1.54%)
Ages 55 to 59: (2.78%) Ages 20 to 24: (3.52%)
Ages 50 to 54: (5.06%)
Ages 25 to 29: (13.21%)
Ages 45 to 49: (9.57%)
Ages 40 to 44: (16.50%)
Ages 30 to 34: (22.38%)
Ages 35 to 39: (22.39%)
26
b). Pie Chart: AIDS cases by Race
Race
White, not Hispanic
Black, not Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Race/ethnicity unknown
No.
324822
282720
137575
5546
2234
1010
%
43.09%
37.50%
18.25%
0.74%
0.30%
0.13%
AIDS cases by Race
Race/ethnicity unknown (0.13%)
American Indian/Alaska Native (0.30%)
Asian/Pacific Islander (0.74%)
Hispanic
(18.25%)
White, not Hispanic (43.09%)
Black, not Hispanic
(37.50%)
1-88. (Using the template: “Box Plot 2.xls”)
Comparing two data sets using Box Plots
Lower
Whisker
Cubs 300000
White Sox 301000
Salaries 2004
Lower
Hinge
650000
340000
Upper
Upper
Median
Hinge
Whisker
1550000 5750000 9500000
775000 3875000 8000000
Cubs
White Sox
Outliers: Cubs: Sosa’s salary of $16M
White Sox: Ordonez’s salary of $14M
Furthermore, the median salary of the Cubs is twice the median salary of the White Sox. There are some
players on both teams making the league minimum salary.
27
Somewhat lower salary range for the White Sox relative to the Cubs due to the fact that only seven (7)
players on the Cubs were paid $500,000 or less while eleven (11) players earned less than that amount on
the White Sox.
1-89
25
20
15
Errors
OT
Type
10
5
0
0
5
10
15
20
250
200
150
Skill
Stress
100
50
0
0
2
4
6
8
10
28
12
14
16
18
Correlation Table:
Errors
Errors
OT
Type
Skill
Stress
OT
Type
1
0.962672
1
0.036243 0.065654
1
-0.89162 -0.82627 -0.00447
0.979628 0.926601 0.053555
Skill
1
-0.93428
Stress
1
There is high positive correlation between the number of errors and the amount of overtime and
stress, but a high negative correlation between the number of errors and skill level. Skill level
appears to decrease the number of errors, but overtime and stress add to the number of errors.
Overtime is highly correlated with stress and negatively correlated with skill level. Skill level
and stress are negatively correlated. The higher the skill level of the employee the lower the
stress level.
29