Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 11 February 5, 2008 Homework #2 - SOLUTIONS This homework is due at the start of class on the date due. You may work in groups, consult with others, or use any references or tools that seem useful, but you must write up your own solutions. 1. Getting at the truth from a set of measurements. Consider the data for exercise 1.34 (page 37). a. What is your best estimate of the earth’s density based on these measurements? (You should make a dot plot or histogram, but you don’t have to turn it in. Then you should pick your favorite statistic for this purpose --- mean, median, some kind of trimmed mean, RMS or geometric mean, whatever --- or make up your own approach.) The median (5.46) and the mean (about 5.45) are both good choices. 218.5 b. What statistic did you use for part a? 214.0 So would be the midmean or about any other measure of location. 216.5 c. In class we considered 12 reported measurements of the boiling point of seawater ------> 218.0 (The sum of these 12 numbers is 2281.0.) 219.0 Based on these numbers, what is your best estimate of the 0.0 true boiling point? My favorite method is to delete the outliers (0.0 and 104.0) --- in 221.0 part because we can see that they arose from something 220.0 other than ordinary measurement errors --- and then to take 104.0 the mean of what’s left. That gives 217.7. Any value above 218.5 about 215 is sensible. 215.5 d. Did you use the same method for parts a and c ? If not, why not? 216.0 There’s no absolute rule for how to chose a central value for a variable; it depends on the shape of the distribution, how the values were obtained, and why you need a central value. Here, there’s really no difference in method. We would have excluded outliers in part a, too, if there had been any. 2. Combining medians of groups. I class I remarked that you can’t usually determine the median of a group from the medians of two subgroups. That’s true. But I also said that it’s possible for the median score of the men in the class to be 50, and the median score of the women in the class to be 50, but the median of the whole class to be something else. Was that right? Do ONE of these: a. Give an example of actual scores for a group of men and women for which --- the median of the men is 50 --- the median of the women is 50 --- the median of the entire group is different from 50 OR b. Explain why the median of the entire group would necessarily be 50. 1 For problem 2: The median of the combined group would necessarily be 50. If half the women are below 50 and half the men are below 50, then half of everybody must be below 50. Similarly above 50. (Does it matter whether there are an odd or even number in each group? If you analyze the cases carefully, you get the same result in each case.) 3. For one measurement the scores of the men were 70, 50, 50, 90 and the scores of the women were 30, 90, 80, 40, 60, 80, 60, 40. a. What was the midmean of the men’s scores? 60 (average of the middle two values, 50 and 70) b. What was the midmean of the women’s scores? 60 (average of the middle 4 values: 40, 60, 60, 80) c. What was the midmean of all the scores combined? 61 2/3 (ave. of the middle 6 values: 50, 50, 60, 60, 70, 80) 4. Construct a 5-number summary and a Box plot for the combined scores in problem 3. Five numbers: min, Q3, med, Q1, max = 30, ?, 60, 80, 90. You could get away with anything from 40 to 50 for Q3, since if you count off 3 observations from the bottom you end between these values. The text’s method gives 45. 5. What is the 40-th percentile for the combined scores in problem 3 ? 50 Normal distributions 6. Scores on a typical IQ test have mean 100, standard deviation 16. Assume that they are normally distributed. a. What fraction of the scores are between 84 and 116 ? That’s one standard deviation above and below the mean, so the answer is 68 %. b. An article in Parade Magazine reported that Sharon Stone has an IQ of 160. About what fraction of people taking this test would score at or above 160? 160 is 3.75 standard deviations above the mean [ (160-100)/16 = 3.75 ] . But my table only goes up to 3.00, and the book’s Table A only goes to 3.49, so you need to (a) guess or (b) use Excel or a calculator to find (3.75) = 0.999912. That’s the fraction of scores below 160, so the answer to the question is 0.000088, or about 88 per million. c. Ginger’s score was at the 80-th percentile. What was her score ? In the table, find the fraction z such that (z) = 0.80. It’s 0.84 (closest approximation in either table), so the 80th percentile is 0.84 standard deviations above the mean. In this case, this means a score of 100 + (0.84 times 16) = about 113. 2 You should be able to answer the next two problems from pictures and pure thought, without calculation. Of course, you can calculate if you like. 7. Assume that X is normally distributed with mean 0.0 and standard deviation 5.0, and Y is normally distributed with mean 0.0 and standard deviation 10.0. Which variable has a larger fraction of its values above +1 ? Y has a larger fraction 5 above +1. 10 For X, we’re asking what fraction is above 0.2 sd’s above the mean. For Y, we’re asking what fraction is above 0.1 sd’s above the mean. It’s easier to be 0.1 sd’s above than 0.2 sd’s above, so the second answer must be larger. 8. Assume that X is normally distributed with mean 5.0 and standard deviation 10.0, and Y is normally distributed with mean 10.0 and standard deviation 5.0. Which variable has a larger fraction of its values above +20 ? To be above +20, a Y value would have to be two standard deviations above the mean. But an X value would only have to be 1 1/2 standard deviations above the mean, so X has a larger fraction above +20. Associations and Correlations 9. Draw what you think a scatterplot would look like for each of these three pairs of variables. Label your axes. a. Apples: weight in grams, weight in ounces. The dots are exactly on a line. b. College freshmen: reported shoe size, grade point average. (Is shoe size bimodal? Does that show in the scatterplot?) There might be one symmetrical blob or two side-byside blobs, depending on how much women’s shoe sizes overlap men’s. c. Gasoline: days since your last fill-up, gallons remaining in your tank. c. Gasoline: days since your last fill-up, gallons remaining in your tank. Negative association, maybe not very linear, with some randomness added. 3 10. Can you reconstruct the distributions of both variables from a scatterplot? 7 6.8 BOATS a. In this scatterplot, what are the minimum and maximum values for the CARS variable? Minimum --- about 3.05 Maximum – about 4.00 b. Can you reconstruct the entire 5-number summary for the BOATS variable? (That is --- min, Q1, median, Q3, max.) (all values approx.) Min = 6.05 Q1 = 6.25 Median = 6.52 Q3 = 6.6 Max = 6.81 6.6 6.4 6.2 6 2.75 3 3.25 3.5 CARS 3.75 4 4.25 Q1 is here because 5 of 20 BOAT values are below this line. (end) 4