Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BMI 541/699 - Assignment 2 - Solutions BMI 699 students: please be sure you know how to do all of these problems but do not turn them in. 1. Whitlock, Chapter 2, problem 22. (a) Frequency table. (c) 21. (d) 265 of 395 (the fraction 0.67) had no convictions. (f) Skewed (right) and unimodal (mode is 0 convictions). There are no outliers. 2. Whitlock, Chapter 3, problem 16 parts c and e. (c) Median: approximately 900 yards/minute. The frequency distribution is fairly symmetric, so the median should lie near the middle, close to the mean. (e) Standard deviation (s): approximately 200 yards/minute. Based on the fact that about 95% of the observations will lie between the mean minus 2s and the mean plus 2s. From the histogram we observe that 600 to 1400 yards/min should include about 95% of the frequency distribution, so (1400 − 600)/4 = 200 yards/min. This is a very rough calculation! 3. The breast cancer data. Read in the breast cancer data set into R Commander. The data set is on the home page as “brca.txt” and “brca.q” Install (if you have not already) and load the epicalc package For the variable fatkcal in the breast cancer data set (the amount of fat consumed in a specified period measured in kilocalories) use R to create the plots in parts (a) through (d) and to calculate the values requested in parts (e) and (f). Answer parts (g) and (h) based on your plots. (a) create a histogram of fatkcal Answer not shown. (b) calculate the mean and median of fat consumed. mean fat consumed is approximately 1173 median fat consumed is approximately 1215 (c) is the distribution of fat consumed skewed? If so, in which direction? Yes. Left. 4. In R, the variable rivers gives the lengths (in miles) of 141 “major” rivers in North America as compiled by the US Geological Survey. The following table gives all of the percentiles for the river length variable from 0 to 100 by 5. Percentile Value Percentile Value 0% 135 5% 230 55% 460 10% 255 60% 505 15% 276 65% 545 20% 291 70% 610 25% 310 75% 680 30% 330 80% 735 35% 350 85% 890 So, for instance, the 65th percentile is equal to 545 miles. Using just this information (and not R) answer the following questions: (a) how long is the longest river in North America? 3710 miles (b) what is the median river length? 425 miles (c) what is the first quartile? 310 miles 1 40% 375 90% 1054 45% 392 95% 1450 50% 425 100% 3710 (d) what is the third quartile? 680 miles (e) approximately how many of these rivers (not the percent but the count) have lengths between the first and third quartile? 0.5 × 141 = 70.5. 70 or 71 (f) find the two values which encompass the center 60 percent of the river lengths. 291 and 735 (g) approximately 90 percent of the river lengths are shorter than what length? 1054 (h) approximately what percent of these rivers are between 505 and 890 miles long? 25% (i) do you think the distribution of the river lengths is skewed? If so, in which direction? Explain how you used the information in the table to answer this question. median - 10th percentile = 425 - 225 = 200, 90th percentile - median = 1054- 425 = 629. The distribution is skewed to the right because the median is much closer to the 10th percentile than it is to the 90th percentile. 2