Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Solutions to Exercises for Chapter 3 3.1. Given: Y f(Y) cf 6 2 14 5 4 12 4 3 8 3 2 5 2 1 3 1 2 2 Mode: The mode is 5, the value that occurs with the highest frequency. That is, f(5) = 4. Median: æn ö æ 14 ö ç - å fb ÷ ç - 5÷ Mdn = Yl l + i ç 2 ÷ = 3.5+1 ç 2 ÷ = 4.17 fi çç ÷÷ çç 3 ÷÷ è ø è ø Mean: n Y i 1 n i 54 3.86 14 3.2. First, construct the frequency distribution, and then the cumulative frequency distribution: Y f(Y) f(Y) cf 9 | 1 13 0 12 8 7 | 1 12 6 | 1 11 5 ||| 3 10 4 ||| 3 7 3 || 2 4 2 | 1 2 1 | 1 1 Mode: Bimodal, 4 and 5. Both the values of 4 and 5 occur with the same frequency, 3. Median: æn ö æ 13 ö ç - å fb ÷ ç -4÷ Mdn = Yl l + i ç 2 ÷ = 3.5+1 ç 2 ÷ = 4.33 fi çç ÷÷ çç 3 ÷÷ è ø è ø Mean: n Y i 1 n i 58 4.46 13 3.3. Using the key to Exercise 2.3 (Part B), Modal Class: 65–69, 67 is the midpoint Median: æn ö æ 52 ö ç - å fb ÷ ç - 22 ÷ Mdn = Yl l + i ç 2 ÷ = 59.5+ 5 ç 2 ÷ = 62.83 f çç ÷÷ çç 6 ÷÷ i è ø è ø Mean: Based on the keyed solution, we do not see the “raw” scores, so we use the midpoints of the intervals and the frequencies of the intervals as follows: Y f Y k j 1 j j n (17 1) (22 0) ... (97 1) 3224 62.0 52 52 Calculating the mean in this fashion introduces a bit of error; the mean of the individual scores is 62.096. This type of error is called “grouping error.” Looking the relative magnitudes of the three measures of location, we note that the mode is greater than the median, which in turn is greater than the mean. This pattern suggests some degree of negative skewness. 3.4. Using the ungrouped solution for Exercise 2.2, the range is the largest value minus the smallest value, or 29 – 4 = 25, æn ö æ 60 ö ç - å fb ÷ ç -13 ÷ æ2ö Q1 = Yl l + i ç 4 ÷ = 6.5+1ç 4 ÷ = 6.5+1ç ÷ = 6.9 fi è5ø çç ÷÷ çç 5 ÷÷ è ø è ø æ 3n ö æ 3(60) ö - 44 ÷ ç - å fb ÷ ç æ1ö Q3 = Yl l + i ç 4 ÷ = 15.5+1ç 4 ÷ = 15.5+1ç ÷ = 16.0 fi 2 è2ø çç ÷÷ çç ÷÷ è ø è ø IQR = Q3 - Q1 = 16 - 6.9 = 9.10 Using the grouped solution for Exercise 2.3, the range could be approximated by taking the difference between the midpoints of the intervals. There would probably be some degree of grouping error involved, but we could say that the range is approximately 97 – 17 = 80. The interquartile range would be calculated as follows: æn ö æ 52 ö ç - å fb ÷ ç -9÷ æ4ö Q1 = Yl l + i ç 4 ÷ = 49.5+ 5ç 4 ÷ = 49.5+ 5ç ÷ = 52.83 fi è6ø çç ÷÷ çç 6 ÷÷ è ø è ø æ 3n ö æ 3(52) ö - 36 ÷ ç - å fb ÷ ç æ 3ö Q3 = Yl l + i ç 4 ÷ = 69.5+ 5ç 4 ÷ = 69.5+ 5ç ÷ = 71.64 fi 7 è7ø çç ÷÷ çç ÷÷ è ø è ø IQR = Q3 - Q1 = 71.64 - 52.83 = 18.80 3.5. Using the data from Exercise 3.2, n åYi = 58 n åY and 2 i i=1 = 312 i-1 We can calculate the standard deviation in two different ways. First, there is the definitional formula: n Y i 1 2 i n We do not recommend that you find using this formula. First, you have to find , which will probably involve some rounding error. Instead, we recommend that you use the computational formula, which is algebraically equivalent, as proven in Technical Note 3.2 at the end of this chapter: n Y n 2 Yi i n i 1 n 2 Thus, for these data, 582 13 312 258.77 53.23 4.0946 2.02 13 13 13 312 3.6. The condensed version of the R script for the solution. The first command (source) loads our “functions” file, which contains functions for the mode, skewness, and kurtosis, and the standard errors for skewness and kurtosis, among other functions: source ("C:/R/functions.txt") data.chap3.ex6 <-read.table("c:/bookdatar/chap2.ex2.txt",header=T) attach(data.chap3.ex6) length(jobsat) table(jobsat) mode(jobsat) median(jobsat) mean(jobsat) range(jobsat) IQR(jobsat) var.pop(jobsat) var(jobsat) sd(jobsat) quantile(jobsat) summary(jobsat) fivenum(jobsat) skewness(jobsat) SEsk(jobsat) kurtosis(jobsat) SEku(jobsat) What follows is the output that appears in the R Console window, with comments interspersed: > length(jobsat) [1] 60 We can see that there are 60 observations in the set of scores: > table(jobsat) jobsat 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 25 29 2 5 6 5 3 6 3 4 4 3 1 2 2 3 2 1 2 3 1 1 1 Looking at the output from the table function, we note that the modes of the distribution are 6 and 9. This result is confirmed by our mode function. > mode(jobsat) [1] 6 9 > median(jobsat) [1] 10.5 > mean(jobsat) [1] 11.83333 Given that the mean is a little larger than the median, there may be some positive skewness in the set of scores: > range(jobsat) [1] 4 29 > IQR(jobsat) [1] 9 > var.pop(jobsat) [1] 35.40556 > var(jobsat) [1] 36.00565 > sd(jobsat) [1] 6.000471 > quantile(jobsat) 0% 4.0 25% 50% 75% 100% 7.0 10.5 16.0 29.0 > summary(jobsat) Min. 1st Qu. 4.00 Median 7.00 10.50 Mean 3rd Qu. 11.83 16.00 Max. 29.00 > fivenum(jobsat) [1] 4.0 7.0 10.5 16.0 29.0 > skewness(jobsat) [1] 0.8328486 > SEsk(jobsat) [1] 0.3086939 > kurtosis(jobsat) [1] -0.02813806 > SEku(jobsat) [1] 0.608492 The measure of skewness and its standard error suggest that the distribution is more than two standard errors skewed, confirming what we noted after seeing the median and the mean. The kurtosis statistics suggest that the distribution may be platykurtic, but only to a very small degree. After examining the descriptive statistics (e.g., sample size and range), it would appear that there might be two “reasonable” solutions for the histogram. Given a range of 25 for a set of 60 cases, we might think about a class interval width of two points, which should yield approximately 12–13 intervals, or a class interval width of three points, which would yield approximately eight or nine intervals. We could take a look at both solutions: hist(jobsat,prob=T,breaks=seq(3.5,29.5,2),xlab='Job Satisfaction Scores') lines(density(jobsat)) rug(jitter(jobsat)) hist(jobsat,prob=T,breaks=seq(2.5,29.5,3),xlab='Job Satisfaction Scores') lines(density(jobsat)) rug(jitter(jobsat)) Histogram of jobsat 0.04 Density 0.02 0.04 0.00 0.00 0.02 Density 0.06 0.06 0.08 0.08 Histogram of jobsat 5 10 15 20 Job Satisfaction Scores 25 30 5 10 15 20 25 30 Job Satisfaction Scores Both of the histograms suggest that the scores are skewed to the right, or positively skewed. Looking back at the output from the table function, we can see that there is a break between 25 and 29. The histogram on the left does a better job of depicting that feature, but we don’t have very strong opinions about which choice of histograms is best. Finally, let’s take a look at a boxplot of the Job Satisfaction scores: boxplot(jobsat) f=fivenum(jobsat) text(rep(1.3,5),f,labels=c("minimum","lower hinge","median","upper hinge","maximum")) 30 10 15 20 25 maximum upper hinge median 5 lower hinge minimum