Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ch. 5-6 Review Exercises 1. Prenatal care. Results of a 1996 American Medical Association report about the infant mortality rate for twins carried for the full term of a normal pregnancy are shown below, broken down by the level of prenatal care the mother had received. Full-Term Infant Mortality Rate Pregnancies, Level of among Twins (deaths Prenatal Care per thousand live births) Intensive 5.4 Adequate 3.9 Inadequate 6.1 Overall 5.1 a) Is the overall rate the average of the other three rates? Should it be? Explain. b) Do these results indicate that adequate prenatal care is important for pregnant women? Explain. c) Do these results suggest that a woman pregnant with twins should be wary of seeking too much medical care? Explain. 2. Singers. The boxplots shown display the heights (in inches) of 130 members of a choir. It appears that the median height for sopranos is missing, but actually the median and the upper quartile are equal. How could that happen? 3. Beanstalks. Beanstalk Clubs are social clubs for very tall people. To join, a man must be over 6’2” tall, and a woman must be over 5’10”. The National Health Survey suggests that height of adults may be normal distributed with mean heights of 69.1’ for men and 64’ for women. The respective standard deviations are 2.8” and 2.5”. a) You are probably not surprised to learn that men are generally taller than women, but what does the greater standard deviation for men’s height indicate? b) Who are more likely to qualify for Beanstalk membership, men or women? 4. Bread. Clarksburg Bakery is trying to predict how many loaves to bake. In the last 100 days, they have sold between 95 and 140 loaves per day. Here is a histogram of the number of loaves they sold for the last 100 days. a) Describe the distribution. b) Which should be larger, the mean number of sales or the median? Explain. c) Here are the summary statistics for Clarksburg Bakery's bread sales. Use these statistics and the histogram above to create a boxplot. You may approximate the values of any outliers. Summary of Sales Median 100 Min 95 Max 140 25th %tile 97 75th %tile 105.5 5. Acid Rain. Based on long-term investigation, researchers have suggested that the acidity (pH) of rainfall in the Shenandoah Mountains can be described by the Normal model N(4.9, 0.6). a) Draw and carefully label the model. b) What percent of storms produce rain with pH over 6? c) What percent of storms produce rain with pH under 4? d) The lower the pH, the more acidic the rain. What is the pH level for the most acidic 20% of all storms? e) What is the pH level for the least acidic 5% of all storms? f) What is the IQR for pH of rainfall? 6. Fraud detection. A credit card bank is investigating the incidence of fraudulent card use. The bank suspects that the type of product bought may provide clues to the fraud. To examine this situation, the bank looks at the Standard Industrial Code (SIC) of the business related to the transaction. This is a code that was used by the U.S. Census Bureau and Statistics Canada to identify the type of business of every registered business in North America. For example, 1011 designates Meat and Meat Products (except Poultry), 1012 is Poultry Products, 1021 is Fish Products, 1031 is Canned and Preserved Fruits and Vegetables, and 1032 is Frozen Fruits and Vegetables. A company intern produces the following histogram of the SIC codes for 1536 transactions: He also reports that the mean SIC is 5823.13 with a standard deviation of 488.17. a) Comment on any problems you see with the use of the mean and standard deviation as summary statistics. b) How well do you think the Normal model will work on these data? Explain. 7. Hard water II. The data set from England and Wales also notes for each town whether it was south or north of Derby. Here are some summary statistics and a comparative boxplot for the two regions. Summary of Mortality Group Count Mean Median StdDev North 34 1531.59 1631 138.470 South 27 1388.85 1369 151.114 a) What is the overall mean mortality rate for the two regions? b) Do you see evidence of a difference in mortality rates? Explain. 8. Seasons. Average daily temperatures in January and July for 60 large U.S. cities are graphed in the histograms below. a) What aspect of these histograms makes it difficult to compare the distributions? b) What differences do you see between the distributions of January and July average temperatures? c) Differences in temperatures (JulyJanuary) for each of the cities are displayed in the boxplot above- Write a few sentences describing what you see. 9. Liberty's nose. Is the Statue of Liberty's nose too long? Her nose measures 4'6", but she is a large statue after all. Her arm is 42 feet long. That means her arm is 42/4.5 = 9.3 times as long as her nose. Is that a reasonable ratio? Shown in the table are arm and nose lengths of 18 girls in a Statistics class, and the- ratio of arm to nose length for each. Arm Nose Arm/Nose (cm) (cm Ratio ) 73.8 5 14.8 74 4.5 16.4 69.5 4.5 15.4 62.5 4.7 13.3 68.6 4.4 15.6 64.5 4.8 13.4 68.2 4.8 14.2 63.5 4.4 14.4 63.5 5.4 11.8 67 4.6 14.6 67.4 4.4 15.3 70.7 4.3 16.4 69.4 4.1 16.9 71.7 4.5 15.9 69 4.4 15.7 69.8 4.5 15.5 71 4.8 14.8 71.3 4.7 15.2 a) Make an appropriate plot and describe the distribution of the ratios. b) Summarize the ratios numerically, choosing appropriate measures of center and spread. c) Is the ratio of 9.3 for the Statue of Liberty unrealistically low? Explain. 10. Sluggers. Roger Maris's 1961 home run record stood until Mark McGwire hit 70 in 1998. Listed below are the home run totals for each season McGwire played. Also listed are Babe Ruth's home run totals. McGwire: 3*, 49, 32, 33, 39, 22, 42, 9*, 9*, 39, 52, 58, 70, 65, 32*, 27* Ruth: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22 a) Find the five-number summary for McGwire's career. b) Do any of his seasons appear to be outliers? Explain. c) McGwire played in only 18 games at the end of his first big league season, and missed major portions of some other seasons because of injuries to his back and knees. Those seasons might not be representative of his abilities. They are marked with asterisks in the list above. Omit these values and make parallel boxplots comparing McGwire's career to Babe Ruth's. d) Write a few sentences comparing the two sluggers. e) Create a side-by-side stem-and-leaf display comparing the careers of the two players. f) What aspects of the distributions are apparent in the stem-and-leaf displays that did not clearly show in the boxplots? 11. Be Quick! Avoiding an accident when driving can depend on reaction time. That time, measured from the moment the driver first sees the danger until he or she gets the foot on the brake pedal, is thought to follow a normal model N(1.5, 0.18). a) Use the Empirical Rule to draw the Normal model b) What percent of drivers have a reaction time less than 1.25 seconds? c) What percent of drivers have a reaction time between 1.6 and 1.8 seconds? d) What is the IQR of reaction times? e) Describe the reaction time of the slowest 1/3 of all drivers. 12. Music and memory. Is it a good idea to listen to music when studying for a big test? In a study conducted by some Statistics students, 62 people were randomly assigned to listen to rap music, Mozart, or no music while attempting to memorize objects pictured on a page. They were then asked to list all the objects they could remember. Here are the five-number summaries for each group: n Min Q1 Median Q3 Max Rap 29 5 8 10 12 25 Mozart 20 4 7 10 12 27 None 13 8 9.5 13 17 24 b) Name the variables and classify each as categorical or quantitative. c) Create parallel boxplots as best you can from these summary statistics to display these results. d) Write a few sentences comparing the performances of the three groups. 13. Mail. Here are the number of pieces of mail received at a school office for 36 days. 123 70 90 80 78 72 52 103 138 112 92 93 118 118 106 95 131 59 151 115 97 100 128 130 66 135 76 143 100 88 110 75 60 115 105 85 a) Plot these data. b) Find appropriate summary statistics. c) Write a brief description of the school's mail deliveries. 14. Pay. According to the 1999 National Occupational Employment and Wage Estimates for Management Occupations, the mean hourly wage for Chief Executives was $48.67 and the median hourly wage was $52.08. By contrast, for General and Operations Managers, the mean hourly wage was $31.69 and the median was $27.23. Are these wage distributions likely to be symmetric, skewed left, or skewed right? Explain. (Bureau of Labor Statistics) 15. Engines. One measure of the size of an automobile engine is its "displacement," the total volume (in liters or cubic inches) of its cylinders. Summary statistics for several models of new cars are shown. These displacements were measured in cubic inches. Summary of Displacement Count 38 Mean 177.289 Median 148.500 StdDev 88.8767 Range 275 25th %tile 105 75th %tile 231 a) How many cars were measured? b) Why might the mean be so much larger than the median? c) Describe the center and spread of this distribution with appropriate statistics. d) Your neighbor is bragging about the 227-cubic-inch engine he bought in his new car. Is that engine unusually large? Explain. e) Are there any engines in this data set that you would consider to be outliers? Explain. 16. Engines, again. Horsepower is another measure commonly used to describe auto engines. Here are the summary statistics and histogram displaying horsepowers of the same group of 38 cars. Summary of Horsepower Count 39 Mean 101.737 Median 100 StdDev 26.4449 Range 90 25th %tile 78 75th %tile 125 a) Describe the shape, center, and spread of this distribution. b) What is the interquartile range? c) Are any of these engines outliers in terms of horsepower? Explain. 17. Bike safety. The Massachusetts Governor's Highway Safety Bureau's report on bicycle injuries for the years 1991-2000 included the counts shown in the table. Year Bicycle Injuries Reported 1991 1763 1992 1522 1993 1452 1994 1370 1995 1380 1996 1343 1997 1312 1998 1275 1999 1030 2000 1118 a) Display the data in a stem-and-leaf display. b) Display the data in a timeplot. c) What is apparent in the stem-and-leaf display that is hard to see in the timeplot? d) What is apparent in the timeplot that is hard to see in the stem-and-leaf display? e) Write a few sentences about bicycle injuries in Massachusetts. 18. Profits. Here is a stem-and-leaf display showing profits as a percent of sales for 29 of the Forbes 500 largest U.S. corporations. The stems are split; each stem represents a span of 5%, from a loss of 9% to a profit of 25%. a) Find the five-number summary. b) Draw a boxplot for these data. c) Find the mean and standard deviation. d) Describe the distribution of profits for these corporations. 19. The ages of people in a college class (to the nearest year) are as follows Age 18 19 20 21 22 23 24 25 32 # of students 14 120 200 200 90 30 10 2 1 Determine the mean, median, and interquartile range of the data 20. The following are parallel boxplots showing the daily fluctuations of a certain common stock over the course of 5 years What trends do the boxplots show? 21. From September 2, 1977, to May 29, 1987, Edwin Moses set one of the most incredible records in sports history by winning 122 consecutive races in the 400meter hurdles. The graph below shows a cumulative relative frequency curve (vertical axis is the percent) of the times of those 122 races. a) Approximately what percent of the times were under 48 seconds? b) What is the median time recorded by Moses? c) Approximately what is the fastest time recorded by Moses? d) What time exceeds approximately 75% of the remaining times? Multiple Choice 22. The average salary of all female workers is $35,000. The average salary of all male workers is $41,000. What must be true about the average salary of all workers? a) It must be $38,000. b) It must be larger than the median salary. c) It could be any number between $35,000 and $41,000? d) It must be larger than $38,000. 23. The mean age of five people in a room is 30 years. One the people whose age is 50 leaves the room. The mean age of the remaining four people in the room is a) 40 b) 30 c) 25 d) cannot be determined from the information. 24. A sample of data contains the numbers 1000, 600, 800, and 1000. The median of the data is a) 1000 b) 850 c) 700 d) 900 25. The number of new projects started each month at an advertising agency for the last six months are 2 5 3 3 6 3 The interquartile range for the above data is a) 1 b) 4 c) 5 d) 2 26. This is a standard deviation contest. Which of the following sets of four numbers has the largest possible standard deviation? a) 7, 8, 9, 10 b) 5, 5, 5, 5 c) 0, 0, 10, 10 d) 0, 1, 2, 3 27. When a set of data has suspected outliers, which of the following are preferred measures of central tendency and of variability? a) mean and standard deviation b) mean and variance c) mean and range d) median and range e) median and interquartile range 28. In the second game of the 1989 World Series between Oakland and San Francisco, ten players went hitless, eight players had one hit apiece, and one player had three hits. What were the mean and median numbers of hits? a) 11/19, 0 b) 11/19, 1 c) 1, 1 d) 0, 1 e) 1, 0 PART I REVIEW ANSWERS 1. a) It is (rounded to 1 decimal place), but there's no reason it should be unless the number of women receiving each type of cart' was roughly the same. b) Yes, but they do not prove that adequate prenatal care is important for pregnant women. The mortality rate is quite a bit lower for women with adequate care than for other women, but there may be a lurking variable. c) Intensive care is given for emergency conditions. The data do not suggest that the care is the cause of the higher mortality. 2. If enough sopranos have a height of 65 inches, this can happen. 3. a) It means their heights are also more spread out (more variable) b) The z-score for women to qualify is 2.4 compared with 1.75 for men, so it is harder for women to qualify. 4. a) The distribution is unimodal and skewed to the right and values range from 95 to 145. b) The mean will be larger than the median, since the distribution is right skewed. c) Create a boxplot with quartiles at 97 and 105.5, median at 100. The IQR is 8.5 so the upper fence is at (1.5 X 8.5) + 105.5 = 118.25. There are several outliers to the right. There are no outliers to the left because the minimum at 95 lies well within the left fence at 97- (1.5 x 8.5) = 84.25. 5. a) b) 3.3% c) 6.7% d) pH 4.40 e) pH 5.89 f) Quartiles are 4.5 and 5.3, so IQR is 0.8 6. a) These are categorical data, so mean and standard deviation are meaningless, b) Not appropriate. Even if it fits well, the Normal model is meaningless for categorical data. 7. a) Overall mean is (34 x 1531.59 + 27 X 1388.85)/(34+27) = 1524.15 deaths per 100,000. b) Yes. The distribution for the Northern towns is higher than that for the South. Fully half of the towns in the South have mortality rates lower than any of the Northern towns. Some Northern towns have rates higher than all of the Southern towns. It seems clear that mortality is higher, in general, in the North. 8. a) They are on different scales. b) January's values are lower and more spread out. c) Roughly symmetric but slightly skewed to the left. There are more low outliers than high ones. Center is around 40 degrees with an IQR of around 7.5 degrees. 9. a) The distribution is left skewed with a center of about 15. It has an outlier near 11. b) Even though the distribution is somewhat skewed, the mean and median are close. The mean is 15.0 and the SD is 1.25. c) Yes. 11.8 is already an outlier. 9.3 is more than 4.5 SDs below the mean. It is a very low outlier. 10. a) 3, 25.5, 36, 50.5, 70 b) Because the IQR is so c) large, none are technically outliers, but the seasons with fewer than 20 homeruns stand out as a separate group. d) Without the injured seasons, McGwire and Ruth's home run production distributions look similar. (Ruth's seasons as a pitcher were not included as well.) Ruth's median is a little higher, and he was a little more consistent (less spread), but McGwire had the two highest season totals. e) f) Now we can see how much more consistent Ruth was. Most of Ruth's seasons had home run totals in the 40s or 50s. McGwire's seasons are much more spread out (not even including the three at the bottom). 11. a) b) 8.2% c) 24.1% d) Quartiles are 1.38 and 1.62, so IQR is 0.24 e) The slowest 1/3 of all drivers have reaction times of 1.58 or more. 12. b) Type of music (categorical) and number of items remembered (quantitative) c) Because we do not have all the data, we can't know exactly how the boxplots look, but we do know that the minimums are all within the fences and that two groups have at least one outlier on the high side. d) All three distributions are right skewed. Mozart and rap had very similar distributions. The scores for the None groups are, if anything, slightly higher than those for the other two groups. It is clear that groups listening to music did not score higher than those who heard none. 13. a) b) Mean 100.25, SD 25.54 pieces of mail. c) The distribution is somewhat symmetric and unimodal, but the center is rather flat, almost uniform. 14. Chief executives have a mean salary less than the median, so the distribution is likely to be skewed to the left. General and operations managers have a mean salary larger than the median, so their distribution are likely to be skewed to the right. 15. a) 38 cars b) Possibly, the distribution is skewed to the right. c) Center—median is 148.5 cubic inches. Spread— IQR is 126 cubic inches. d) No It's bigger than average, but smaller than more than 25% of cars. The upper quartile is at 231 inches. e) No. 1.5 IQR is 189, and 105 - 189 is negative, so there can't be any low outliers. 231 + 189 = 420. There aren't any cars with engines bigger than this, since the maximum has to be at most 105 (the lower quartile) + 275 (the range) = 380. 16. a) Fairly symmetric, almost uniform except for the right tail. b) 47 horses c) IQR is 47, so fences are at 7.5 and 195.5. Since the range is only 90, we know there can't be any observations lower than 35 (upper quartile — range) or greater than 168 (lower quartile + range), so no. 17. a) Stem Leaf 10 3 11 2 12 8 13 1478 14 5 15 2 16 17 6 10 | 3 = 1030 injuries b) c) In the stemplot, it is clear that some years may be unusual. d) The downward trend is visible in the timeplot. e) In the decade 1991 to 2000, reported bicycle injuries decreased steadily from about 1800 per year to around 1100 per year. 18. a) -9, 1, 4, 9, 25 b) c) Mean 4.72% of sales, SD 7.55% of sales d) Fairly symmetric and unimodal, centered around 4% of sales. 50% of the companies report % profit between 1% and 9%. There is one outlier at 25% of sales. 19. Mean = 20.58; Median = 20; IQR = 1 20. The median has increased during the time while the spread has decreased. 21. a. 30% b. 48.5 c. 47 d. 49 22. C 23. C 24. D 25. D 26. C 27. A 28. A