Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HOMEWORK 4 Due: Feb.5 1. Suppose the scores on an achievement test follow an approximately symmetric moundshaped distribution with mean 500, min = 350, and max = 650. Which of the following is the most likely value of the standard deviation? 10 50 100 150 Justify your answer. A sketch may help. By the standard deviation rule, the values should extend about three sd’s on each side from the mean. Since (650 – 500) ÷ 3 = 50, and (500 – 350) ÷ 3 = 50, the most likely value for the sd is 50. 2. What if the diameters of a sample of new tires coming off one production line turned out to have a standard deviation of 0. Would the manufacturer be happy or unhappy, assuming the average diameter was correct? Explain. Very happy. The only way that the standard deviation could be 0 is if all of the tires had exactly the same diameter, which is the consistency of product that manufacturer would hope for. 3. For each of the following cases, indicate which would give a better summary of the data: the five-number summary (min, Q1, median, Q3, max) or the mean and standard deviation? Explain your choice. a. Verbal SAT scores for 205 students entering a local college in the fall of 2002: Since the distribution is mound shaped and roughly symmetrical, the mean and standard deviation would do a good job of summarizing the distribution. We would not need the five number summary for this data set. b. Recent sales prices of homes in a local neighborhood (in thousands of dollars): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Here the distribution is significantly skewed, so the five number summary would be better. 5. Extra Credit: I had four sets of data. I have the variables, and I made a histogram and a boxplot for each, and found the summary statistics for each. Somehow they got mixed up. (Match each variable to the correct histogram, boxplot, and summary statistics). Variables: A. Age at death of a sample of people B. Heights of a class of college students C. Number of medals won by medal-winning countries in the 2008 Olympics D. Random numbers between 0 and 9 generated by a computer Histograms: a. c. b. d. Boxplots: i. iii. ii. iv. Summary statistics (each row goes with one set of data): SS1: Mean: 72.82 Median: 75 Standard deviation: 15.51 IQR: 20 SS2: Mean: 6.07 Median: 3 Standard deviation: 8.91 IQR: 4 SS3: Mean: 4.1 Median: 4 Standard deviation: 2.808 IQR: 4 SS4: Mean: 67.8 Median: 68 Standard deviation: 4.22 IQR: 6.5 First match the variables with the histograms. A. Age at death of a sample of people Most people die at older age, around 70 or 80, few die at a younger age, and very few die at a very young age. Thus, we can expect a left-skewed distribution. And that would be histogram c. B. Heights of a class of college students The distribution of heights of college students is usually fairly mound shaped and symmetric. Most of the students’ height is average, few are taller, and few are shorter, with very few very tall and very short students. Histogram b shows a roughly symmetric, mound shaped distribution. C. Number of medals won by medal-winning countries in the 2008 Olympics Most of the medal winning countries got one or two medals, fewer countries got four of more. Very few countries got many medals. Thus, we can expect a right-skewed distribution. And that would be histogram d. D. Random numbers between 0 and 9 generated by a computer Since each number between 0 and 9 has the same chance to come up, we can expect a fairly uniform distribution. That would be histogram a, then. Now match the histograms to the boxplots: Boxplot i definitely goes with histogram d showing that the distribution is highly skewed to the right. Boxplot iii show that the distribution is skewed to the left, with one outlier, and that matches is up with histogram c. Boxplots ii and iv are bit trickier since they are very similar. But the distribution of the heights (histogram b) shows a little skewness to the left, just like boxplot ii. Boxplot iv shows a symmetric distribution, maybe a tiny bit skewed to the right, and that matches up with histogram a. Now let’s match the summary statistics with the variables: SS1: Mean: 72.82 Median: 75 Standard deviation: 15.51 IQR: 20 SS2: Mean: 6.07 Median: 3 Standard deviation: 8.91 IQR: 4 SS3: Mean: 4.1 Median: 4 Standard deviation: 2.808 IQR: 4 SS4: Mean: 67.8 Median: 68 Standard deviation: 4.22 IQR: 6.5 SS3 shows that the mean and the median are almost the same, meaning that the distribution is almost symmetric. Same for SS4. We had two fairly symmetric distributions: the random numbers, and the height. Now it doesn’t make sense to say that the mean height of college students is 4.1, so it must be 67.8. Thus, SS4 goes with the heights, and SS3 goes with the random numbers. We have two more variables left: age at death, and the medals won, and two more summaries: SS1 and SS2. Again, it doesn’t make sense to say that the mean age at death was 6.07, so SS2 must belong to the variable “medals won”, and SS1, with a mean of 72.82 must belong to the variable “age at death” (at this mean makes sense for age at death). Also, you can compare the mean and the median again. For SS1 median > mean, which usually means that the distribution is left-skewed, and that’s “age at death”. For SS2, mean > median, that means that the distribution is right-skewed, thus it must belong to the variable “medals won”. So, here’s the solution: A/c/iii/SS1 B/b/ii/SS4 C/d/i/SS2 D/a/iv/SS3