Download Solutions to Exercises for Chapter 3 3.1. Given: Y f(Y) cf 6 2 14 5 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Solutions to Exercises for Chapter 3
3.1. Given:
Y
f(Y)
cf
6
2
14
5
4
12
4
3
8
3
2
5
2
1
3
1
2
2
Mode: The mode is 5, the value that occurs with the highest frequency. That is, f(5) = 4.
Median:
æn
ö
æ 14 ö
ç - å fb ÷
ç - 5÷
Mdn = Yl l + i ç 2
÷ = 3.5+1 ç 2
÷ = 4.17
fi
çç
÷÷
çç 3 ÷÷
è
ø
è
ø
Mean:
n

Y
i 1
n
i

54
 3.86
14
3.2. First, construct the frequency distribution, and then the cumulative frequency distribution:
Y
f(Y)
f(Y)
cf
9
|
1
13
0
12
8
7
|
1
12
6
|
1
11
5
|||
3
10
4
|||
3
7
3
||
2
4
2
|
1
2
1
|
1
1
Mode: Bimodal, 4 and 5. Both the values of 4 and 5 occur with the same frequency, 3.
Median:
æn
ö
æ 13 ö
ç - å fb ÷
ç -4÷
Mdn = Yl l + i ç 2
÷ = 3.5+1 ç 2
÷ = 4.33
fi
çç
÷÷
çç 3 ÷÷
è
ø
è
ø
Mean:
n

Y
i 1
n
i

58
 4.46
13
3.3. Using the key to Exercise 2.3 (Part B),
Modal Class: 65–69, 67 is the midpoint
Median:
æn
ö
æ 52
ö
ç - å fb ÷
ç - 22 ÷
Mdn = Yl l + i ç 2
÷ = 59.5+ 5 ç 2
÷ = 62.83
f
çç
÷÷
çç 6 ÷÷
i
è
ø
è
ø
Mean: Based on the keyed solution, we do not see the “raw” scores, so we use the midpoints of the
intervals and the frequencies of the intervals as follows:
 Y  f Y 
k
j 1
j
j
n

(17 1)  (22  0)  ...  (97 1) 3224

 62.0
52
52
Calculating the mean in this fashion introduces a bit of error; the mean of the individual scores is 62.096.
This type of error is called “grouping error.”
Looking the relative magnitudes of the three measures of location, we note that the mode is
greater than the median, which in turn is greater than the mean. This pattern suggests some degree of
negative skewness.
3.4. Using the ungrouped solution for Exercise 2.2,
the range is the largest value minus the smallest value, or 29 – 4 = 25,
æn
ö
æ 60
ö
ç - å fb ÷
ç -13 ÷
æ2ö
Q1 = Yl l + i ç 4
÷ = 6.5+1ç 4
÷ = 6.5+1ç ÷ = 6.9
fi
è5ø
çç
÷÷
çç 5 ÷÷
è
ø
è
ø
æ 3n
ö
æ 3(60)
ö
- 44 ÷
ç - å fb ÷
ç
æ1ö
Q3 = Yl l + i ç 4
÷ = 15.5+1ç 4
÷ = 15.5+1ç ÷ = 16.0
fi
2
è2ø
çç
÷÷
çç
÷÷
è
ø
è
ø
IQR = Q3 - Q1 = 16 - 6.9 = 9.10
Using the grouped solution for Exercise 2.3, the range could be approximated by taking the difference
between the midpoints of the intervals. There would probably be some degree of grouping error involved,
but we could say that the range is approximately 97 – 17 = 80. The interquartile range would be
calculated as follows:
æn
ö
æ 52
ö
ç - å fb ÷
ç -9÷
æ4ö
Q1 = Yl l + i ç 4
÷ = 49.5+ 5ç 4
÷ = 49.5+ 5ç ÷ = 52.83
fi
è6ø
çç
÷÷
çç 6 ÷÷
è
ø
è
ø
æ 3n
ö
æ 3(52)
ö
- 36 ÷
ç - å fb ÷
ç
æ 3ö
Q3 = Yl l + i ç 4
÷ = 69.5+ 5ç 4
÷ = 69.5+ 5ç ÷ = 71.64
fi
7
è7ø
çç
÷÷
çç
÷÷
è
ø
è
ø
IQR = Q3 - Q1 = 71.64 - 52.83 = 18.80
3.5. Using the data from Exercise 3.2,
n
åYi = 58
n
åY
and
2
i
i=1
= 312
i-1
We can calculate the standard deviation  in two different ways. First, there is the definitional formula:
n

 Y   
i 1
2
i
n
We do not recommend that you find  using this formula. First, you have to find , which will probably
involve some rounding error. Instead, we recommend that you use the computational formula, which is
algebraically equivalent, as proven in Technical Note 3.2 at the end of this chapter:
 n 
 Y 
n
2
Yi   i 

n
  i 1
n
2
Thus, for these data,

582
13  312  258.77  53.23  4.0946  2.02
13
13
13
312 
3.6. The condensed version of the R script for the solution. The first command (source) loads our
“functions” file, which contains functions for the mode, skewness, and kurtosis, and the standard errors
for skewness and kurtosis, among other functions:
source ("C:/R/functions.txt")
data.chap3.ex6 <-read.table("c:/bookdatar/chap2.ex2.txt",header=T)
attach(data.chap3.ex6)
length(jobsat)
table(jobsat)
mode(jobsat)
median(jobsat)
mean(jobsat)
range(jobsat)
IQR(jobsat)
var.pop(jobsat)
var(jobsat)
sd(jobsat)
quantile(jobsat)
summary(jobsat)
fivenum(jobsat)
skewness(jobsat)
SEsk(jobsat)
kurtosis(jobsat)
SEku(jobsat)
What follows is the output that appears in the R Console window, with comments interspersed:
> length(jobsat)
[1] 60
We can see that there are 60 observations in the set of scores:
> table(jobsat)
jobsat
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 21 22 23 25 29
2
5
6
5
3
6
3
4
4
3
1
2
2
3
2
1
2
3
1
1
1
Looking at the output from the table function, we note that the modes of the distribution are 6 and 9. This
result is confirmed by our mode function.
> mode(jobsat)
[1] 6 9
> median(jobsat)
[1] 10.5
> mean(jobsat)
[1] 11.83333
Given that the mean is a little larger than the median, there may be some positive skewness in the set of
scores:
> range(jobsat)
[1]
4 29
> IQR(jobsat)
[1] 9
> var.pop(jobsat)
[1] 35.40556
> var(jobsat)
[1] 36.00565
> sd(jobsat)
[1] 6.000471
> quantile(jobsat)
0%
4.0
25%
50%
75% 100%
7.0 10.5 16.0 29.0
> summary(jobsat)
Min. 1st Qu.
4.00
Median
7.00
10.50
Mean 3rd Qu.
11.83
16.00
Max.
29.00
> fivenum(jobsat)
[1]
4.0
7.0 10.5 16.0 29.0
> skewness(jobsat)
[1] 0.8328486
> SEsk(jobsat)
[1] 0.3086939
> kurtosis(jobsat)
[1] -0.02813806
> SEku(jobsat)
[1] 0.608492
The measure of skewness and its standard error suggest that the distribution is more than two standard
errors skewed, confirming what we noted after seeing the median and the mean.
The kurtosis statistics suggest that the distribution may be platykurtic, but only to a very small
degree. After examining the descriptive statistics (e.g., sample size and range), it would appear that there
might be two “reasonable” solutions for the histogram. Given a range of 25 for a set of 60 cases, we
might think about a class interval width of two points, which should yield approximately 12–13 intervals,
or a class interval width of three points, which would yield approximately eight or nine intervals. We
could take a look at both solutions:
hist(jobsat,prob=T,breaks=seq(3.5,29.5,2),xlab='Job Satisfaction Scores')
lines(density(jobsat))
rug(jitter(jobsat))
hist(jobsat,prob=T,breaks=seq(2.5,29.5,3),xlab='Job Satisfaction Scores')
lines(density(jobsat))
rug(jitter(jobsat))
Histogram of jobsat
0.04
Density
0.02
0.04
0.00
0.00
0.02
Density
0.06
0.06
0.08
0.08
Histogram of jobsat
5
10
15
20
Job Satisfaction Scores
25
30
5
10
15
20
25
30
Job Satisfaction Scores
Both of the histograms suggest that the scores are skewed to the right, or positively skewed. Looking back
at the output from the table function, we can see that there is a break between 25 and 29. The histogram
on the left does a better job of depicting that feature, but we don’t have very strong opinions about which
choice of histograms is best.
Finally, let’s take a look at a boxplot of the Job Satisfaction scores:
boxplot(jobsat)
f=fivenum(jobsat)
text(rep(1.3,5),f,labels=c("minimum","lower hinge","median","upper
hinge","maximum"))
30
10
15
20
25
maximum
upper hinge
median
5
lower hinge
minimum
Related documents