Download HW #2 solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
BMI 541/699 - Assignment 2 - Solutions
BMI 699 students: please be sure you know how to do all of these problems but do not turn them in.
1. Whitlock, Chapter 2, problem 22.
(a) Frequency table.
(c) 21.
(d) 265 of 395 (the fraction 0.67) had no convictions.
(f) Skewed (right) and unimodal (mode is 0 convictions). There are no outliers.
2. Whitlock, Chapter 3, problem 16 parts c and e.
(c) Median: approximately 900 yards/minute. The frequency distribution is fairly symmetric, so the
median should lie near the middle, close to the mean.
(e) Standard deviation (s): approximately 200 yards/minute. Based on the fact that about 95% of
the observations will lie between the mean minus 2s and the mean plus 2s. From the histogram we
observe that 600 to 1400 yards/min should include about 95% of the frequency distribution, so
(1400 − 600)/4 = 200 yards/min. This is a very rough calculation!
3. The breast cancer data.
Read in the breast cancer data set into R Commander. The data set is on the home page as
“brca.txt” and “brca.q”
Install (if you have not already) and load the epicalc package
For the variable fatkcal in the breast cancer data set (the amount of fat consumed in a specified
period measured in kilocalories) use R to create the plots in parts (a) through (d) and to calculate the
values requested in parts (e) and (f). Answer parts (g) and (h) based on your plots.
(a) create a histogram of fatkcal
Answer not shown.
(b) calculate the mean and median of fat consumed.
mean fat consumed is approximately 1173 median fat consumed is approximately 1215
(c) is the distribution of fat consumed skewed? If so, in which direction?
Yes. Left.
4. In R, the variable rivers gives the lengths (in miles) of 141 “major” rivers in North America as
compiled by the US Geological Survey.
The following table gives all of the percentiles for the river length variable from 0 to 100 by 5.
Percentile
Value
Percentile
Value
0%
135
5%
230
55%
460
10%
255
60%
505
15%
276
65%
545
20%
291
70%
610
25%
310
75%
680
30%
330
80%
735
35%
350
85%
890
So, for instance, the 65th percentile is equal to 545 miles.
Using just this information (and not R) answer the following questions:
(a) how long is the longest river in North America?
3710 miles
(b) what is the median river length?
425 miles
(c) what is the first quartile?
310 miles
1
40%
375
90%
1054
45%
392
95%
1450
50%
425
100%
3710
(d) what is the third quartile?
680 miles
(e) approximately how many of these rivers (not the percent but the count) have lengths between
the first and third quartile?
0.5 × 141 = 70.5. 70 or 71
(f) find the two values which encompass the center 60 percent of the river lengths.
291 and 735
(g) approximately 90 percent of the river lengths are shorter than what length?
1054
(h) approximately what percent of these rivers are between 505 and 890 miles long?
25%
(i) do you think the distribution of the river lengths is skewed? If so, in which direction? Explain
how you used the information in the table to answer this question.
median - 10th percentile = 425 - 225 = 200, 90th percentile - median = 1054- 425 = 629. The
distribution is skewed to the right because the median is much closer to the 10th percentile than
it is to the 90th percentile.
2