Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Problem Set Section 2.1 Frequency Distributions 1. Multiple Births: The following data represent the number of live multiple births (three or more babies) in 2002 for women 15 to 44 years old. Age 15 - 19 20 - 24 25 - 29 30 - 34 35 - 39 40 - 44 Multiple Births 93 (f) 511 1,628 2,832 1,843 377 a. Identify the class width b. Construct a table to include the frequency distribution, class midpoints, relative frequency distribution, cumulative frequency distribution, and cumulative relative frequency distribution. 2. Serum HDL: A doctor randomly selects 40 of his 20 to 29 year old patient and obtains the following data regarding their serum HDL cholesterol: 70 56 48 48 53 52 66 48 36 49 28 35 58 62 45 60 38 73 45 51 56 51 46 39 56 32 44 60 51 44 63 50 46 69 53 70 33 54 55 52 With the first class having a lower class limit of 20 and a class width of 10, construct a frequency distribution and relative frequency distribution. Would using a class width of 5 instead of 10 be a better summary of the data? Why? 1 3. Head Circumferences: Refer to the HEADS Data Set. Construct a frequency distribution for the head circumferences of baby boys and construct a separate frequency distribution for the head circumferences of baby girls. In both cases, use the classes of 34.0-35.9, 36.0-37.9, and so on. Compare the results and determine whether there appears to be a significant difference between the two genders. 2 Problem Set Section 2.2 Displaying Data 1. Interpreting a frequency histogram: The following histogram represents the ages of students in chemistry classes for a particular college. a. What is the class width? b. What percentage of the 78 students were younger than 30 years? c. What is the approximate value of the center? d. Describe the shape of the distribution as bell shaped or skewed. 25 20 15 10 5 0 0 20 40 60 2. Interpreting Pie Chart: a. What is the approximate percentage of people with Group A blood? b. Assuming that the pie chart is based on a sample of 500 people, approximately how many of those 500 people have Group A blood? Group O Group AB Group A Group B 3 3. Multiple Births: Use the frequency distribution from #1 in Problem Set 2.1. a. Construct a histogram corresponding to the given distribution. b. Add a frequency polygon to the histogram. c. Construct a cumulative frequency histogram. d. Add a cumulative frequency polygon (Ogive) to the cumulative frequency histogram. 4. Government Income: In a recent year the federal government’s income was $1,782.3 billion. The various sources of income are broken down in the following table. Sources Individual income taxes Corporate income taxes Social insurance taxes Excise, estate and gift taxes, customer, and misc. a. b. c. d. Amount $B 793.7 131.8 713 143.8 Construct a Pareto chart that corresponds to the given data. Construct a relative frequency distribution. What percentage of income came from corporate income taxes? Use the results from part b to construct a pie chart that corresponds to the given data. 5. Per capita GNP: Refer to the Per Capita GNP ($US) for Western Africa. 4 a. List the original data represented by the given stem-and-leaf plot. b. Construct the dot plot for the data represented by the stem-and-leaf plot. Hint: use 100, 200, ….., etc. c. Interpret the data by analyzing the distribution. 6. Rates of Return: Refer to the three-year rates of return for mutual funds. 5.37 4.31 4.13 8.58 3.06 14.48 12.5 8.33 2.34 0.97 8.33 8.89 0.05 13.88 3.71 10.07 2.27 11.91 11.69 12.06 5.99 10.1 6.07 9.88 9.84 7.9 8.21 6.5 4.93 7.75 9.11 6.11 6.83 10.94 5.99 9.38 6.38 10.34 2.86 6.68 a. Construct the stem-and-leaf plot for the data. Hint: First round the data to the nearest tenth. Then use the tenths place as the leaves. b. Interpret your results by examining the distribution of data. 7. Oscars: Below are the ages of actors and actresses at the time they won Oscars. a. Construct a back-to-back stem-and-leaf plot for the given data. b. Use the results from part (a), compare the two different sets of data, and interpret any differences in distribution. Actors: 32 37 36 32 51 53 33 61 35 45 55 39 46 76 37 42 40 32 60 38 56 48 48 40 43 40 62 43 42 44 41 56 39 46 31 47 45 60 36 Actresses: 50 44 35 80 26 28 41 21 61 38 49 33 74 30 33 41 31 35 41 42 37 26 34 34 35 26 61 60 34 24 30 37 31 27 39 34 26 25 33 5 Problem Set Section 2.3 Measures of the Center Exercises 1- 5, find the a) mean, b) median, c) mode, and d) midrange. 1. Volume of ACME stock group: The volume of a stock is the number of shares traded on a given day. The following data represent the volume of stock traded for a random sample of 25 trading days in 2007. The data are in millions. 3.78 6.06 5.32 3.04 10.32 8.74 5.75 3.25 5.64 3.38 4.35 5.34 3.78 5.00 7.25 5.02 6.92 7.57 7.16 6.52 8.40 6.23 6.07 4.88 4.43 2. Drunk Driving: The blood alcohol concentrations of a sample of drivers involved in fatal crashes and then convicted with jail sentences are given below (based on data from the U.S. Department of Justice). Given that current state laws prohibit driving with levels above 0.08 or 0.10, does it appear that these levels are significantly above the maximum that is allowed? 0.27 0.12 0.17 0.16 0.17 0.21 0.16 0.17 0.13 0.18 0.24 0.29 0.24 0.14 0.16 3. Customer Waiting Times: Waiting times (in minutes) of customers at the Green Valley Bank and the Bank of Big Spenders. Does there appear to be a difference between the two data sets that is not apparent from a comparison of the measures of center. If so, what is it? GVB Spenders 6.5 4.2 6.6 5.4 6.7 5.8 6.8 6.2 7.1 6.7 7.3 7.7 7.4 7.7 7.7 8.5 7.7 9.3 7.7 10.0 4. Regular Pepsi / Diet Pepsi: Weights (pounds) of samples of the contents in cans of regular Pepsi and Diet Pepsi. Compare the two sets of data. Can you explain the difference? Regular Diet 0.8192 0.7773 0.8150 0.7758 0.8163 0.7896 0.8211 0.7868 0.8181 0.7844 0.8247 0.7861 5. Head Circumferences: Refer to the raw data (not the frequency distribution) from Section 2.1 problem #3. a. Compare the two sets of data. Does there appear to be a difference between the two genders? 6 b. Compare the mean and median. Do you think the distribution is approximately symmetric or is it skewed? Based on your answer, which measure better describes the center of the distribution? Why? 6. Multiple Births: Refer to the data from problem #1 in section 2.1. Calculate the mean. Set up a table. Use a formula. Show set up. 7. Weights of 6th graders: The following data represent the weights of 6th graders at Johnson Middle School. Calculate the mean using a calculator. Copy the table and add a column for the midpoint. Weights 58.3 - 61.3 61.3 - 64.3 64.3 - 67.3 67.3 - 70.3 70.3 - 73.3 73.3 - 76.3 76.3 - 79.3 79.3 - 82.3 82.3 - 85.3 85.3 - 88.3 Frequency 4 8 12 13 21 15 12 9 4 2 7 Problem Set Section 2.4 Measures of Dispersion Exercises 1- 5, find the a) range, b) variance, and c) standard deviation. 1. Volume of ACME stock group: Refer to the data used in problem #1 in section 2.3. Use the mean found in section 2.3 and the standard deviation to find the boundaries for days when the volume was unusual. Are there any days when the volume of stock being traded was unusual? 2. Drunk Driving: Refer to the data used in problem #2 in section 2.3. When the state wages a campaign to “reduce drunk driving”, is the campaign intended to lower the standard deviation? 3. Customer Waiting Times: Refer to the data in problem #3 in section 2.3. Compare the two sets of data. Does the standard deviation validate your conclusion for this problem in section 2.3? Explain. 4. A Fish Story: Fred and Ted went on a 10-day fishing trip. The number of trout caught and released by the two boys each day was as follows. Fred Ted 9 15 24 2 8 3 9 18 5 20 8 1 9 17 10 2 8 19 10 3 Compare the range and standard deviation. Discuss limitations of the range as a measure of dispersion. 5. Head Circumferences: Refer to the data in problem #3 in section 2.1. Does there appear to be a difference between the two genders? 6. Which investment is better? You have $5000 to invest. You decide to invest the money in the stock market and have narrowed you investment options down to two mutual funds. The following data represent the historical quarterly rates of return on each mutual fund for the past 20 quarters (5 years). Which has the best return? Which has the most risk (volatility)? Which would you invest in and why? Fund A Fund B 1.3 5.2 7.3 6.4 -0.3 4.8 8.6 1.9 0.6 2.4 3.4 -0.5 6.8 3.0 3.8 -2.3 5.0 1.8 -1.3 3.1 -5.4 6.7 11.9 3.5 10.5 2.9 -6.7 1.4 8.9 -4.7 -1.1 3.4 4.3 4.3 3.8 5.9 0.3 -2.4 7.7 12.9 7. Multiple Births: Refer to problem #1 in section 2.1. Find the standard deviation of the summarized data. 8. Weights of 6th graders: Refer to problem #7 in section 2.3. Find the standard deviation of the summarized data. 8 9. Drunk Driving: Use the mean from problem #2 in section 2.3 and problem #2 in this section to calculate the boundaries for unusual blood alcohol levels. Based on the data would a blood alcohol level of 0.24 be unusual? 10. Empirical rule: The weights, in grams, of a pair of kidneys in adult males between the ages of 40 and 50 has a bell-shaped distribution with a mean of 325 grams and a standard deviation of 30 grams. Justify your response. a. About 95% of kidneys will be between what weights? b. What percentage of kidneys weights between 235 and 415 grams? c. What percentage of kidneys weights more than 355 grams or less than 295 grams. 11. Chebyshev’s Inequality: In December 2004, the average price of regular unleaded gasoline per gallon excluding taxes in the United States was $1.37 with a standard deviation of $0.05. Using Chebyshev’s Inequality what can you conclude about the percentage of gasoline stations with a price of a gallon of gasoline between a. $1.52 and $1.22? Justify. b. $1.47 and $1.27? Justify. 9 Problem Set Section 2.5 Measures of Position 1. Z - Scores. According to the American Freshman the number of hours per week that college freshman spend studying has a mean of 7.06 hours with a standard deviation of 2.32 hours. Suppose Sally Simplestudent spends 2 hours per week studying. a. What is the difference between the time Sally spends studying and the mean? b. How many standard deviations is that? c. Convert Sally’s time to a z score. d. If we consider an “unusual” amount of time studying to be that which converts to z scores greater than 2 or less than -2, is Sally’s time spent studying unusual? 2. Z Scores. Graduating Math and Science Center students have a mean ACT score of 29. Calculate the z-score for their mean relative to the national mean of 21.0 and standard deviation of 4.7. Is the ACT score of 29 unusual? 3. Z Scores. Graduating Math and Science Center students have a mean SAT score of 1320. Calculate the z-score for their mean relative to the national mean of 1016 and standard deviation of 157. Is the SAT score of 1320 unusual? 4. Find value. A national achievement test is administered annually to 3rd graders. The test has a mean score of 100 and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on the test? 5. Find value. Heights of women have a mean of 63.6 inches with a standard deviation of 2.5 inches. Julia Roberts has a height that converts to a z score of 2.16. How tall is Julia? 6. Volume of ACME stock group: Refer to the data and results from problem #1 in section 2.3 and section 2.4. a. Find the z-score for the day when 10.32 million shares were traded. Was this level unusual? b. Find the z-score for the day when 6.07 million shares were traded. Was this level unusual? 7. SAT Scores: The follow are results from 5 students who took the SAT test. 960 1280 1020 960 1150 a. Find the mean and standard deviation. b. Find the z score for each student. c. Find the mean and standard deviation of the z scores. d. In general, what can you conclude about the mean and standard deviation of a set of z scores? 10 8. Compare Z – Scores. The average 20 to 29-year-old man is 69.6 inches tall with a standard deviation of 2.7 inches, while the average 20 to 29-year-old woman is 64.1 inches tall, with a standard deviation of 2.6 inches. Show z-score calculation in each case. a. Who is relatively taller, a 75-inch man or a 70-inch woman? b. Who is relatively taller, a 68-inch man or a 62-inch woman? 9. Compare Z – Scores. Three students take equivalent statistics tests. Which has the highest relative score? Show z-score calculation in each case. a. A score of 144 on a test with a mean of 128 and a standard deviation of 34. b. A score of 90 on a test with a mean of 86 and a standard deviation of 18. c. A score of 18 on a test with a mean of 15 and a standard deviation of 5. In Exercises 10 – 13, use the eruption duration for Old Faithful (OLDFAITHFUL) listed in Datasets. Find the percentile corresponding to the given eruption duration. Show setup. 10. 134 11. 240 12. 276 13. 120 In Exercises 14 – 21, use the eruption duration for Old Faithful (OLDFAITHFUL) listed in Datasets. Find the value corresponding to the given eruption duration. Show setup. 14. P60 15. P18 16. Q1 17. Q3 18. P85 19. P35 20. P57 21. P93 11 22. Basketball Scores. Suppose the following table represents the outcomes of a college basketball team (Iowa) in one particular season. Use points for. School Home or Away Points for Points Against NORTHWESTERN H 72 55 PENN STATE A 69 57 PURDUE A 59 56 WISCONSIN H 78 53 OHIO STATE H 76 62 MICHIGAN A 71 79 MINNESOTA A 51 66 ILLINOIS H 82 65 INDIANA H 75 67 ILLINOIS A 51 66 MICHIGAN STATE A 67 69 MINNESOTA H 66 68 MICHIGAN H 80 75 OHIO STATE A 69 56 WISCONSIN A 48 49 PURDUE H 84 62 PENN STATE H 81 55 NORTHWESTERN A 75 59 a. Find the percentile for the score of 67. b. Find Q1 - the 25th percentile (or first quartile). c. Find the 50th percentile of this data. 12 Problem Set Section 2.6 Bivariate Data Analysis For each set of bivariate data given below a) b) c) d) e) Plot a scatter diagram (you may use excel) Find the correlation coefficient (r) Find the equation of the line of best fit Plot the line on the scatter diagram (you may use excel) Find the predicted value if asked 1. x 1 2 2 5 6 y 2 5 4 15 15 2. Blood Pressure Measurements: Fourteen different second-year medical students took blood pressure measurements of the same patient and the results are listed below. Systolic (x) 138 130 135 140 120 125 120 130 130 144 143 140 130 150 Diastolic (y) 82 91 100 100 80 90 80 80 80 98 105 85 70 100 Find the best predicted diastolic blood pressure for a person with a systolic reading of 122. 3. Smoking and Nicotine: When nicotine is absorbed by the body, cotinine is produced. A measurement of cotinine in the body is therefore a good indicator of how much a person smokes. Listed below are the reported numbers of cigarettes smoked per day and the measured amounts of cotinine. Cigarettes (x) 60 Cotinine (y) 179 10 4 15 10 1 20 8 7 10 10 20 283 75 174 209 10 350 2 43 25 408 344 Find the best predicted level of cotinine for a person who smokes 40 cigarettes per day. Based on the correlation coefficient, is the number of cigarettes smoked useful in predicting the amount of cotinine in the body? 13 Selected Answers: Section 2.1 Section 2.2 Section 2.3 1a) mean = 5.768 1b) median =5.64 1c) mode = 3.78 1d) midrange = 6.68 2a) mean = 0.0187 2b) median = 2c) mode = 0.17 2d) midrange = 3a) see slides 4) Regular Pepsi: mean = 0.81907; median = 0.81865; mode = none; midrange = 0.81985 Diet Pepsi: mean = 0.78333; median = 0.78525; mode = none; midrange = 0.78270 6) mean = 72.25 Section 2.4 2a) range = 0.17; σ = .051195; σ2= .00262 11b) at least 75% 7) 5.2 9) no 10a) 265 & 385 Section 2.5 1a) 5.06 1b) 2.18 or -2.18 1c) -2.18 1d) yes 2) 1.7, no 4) 118 6a) 2.50 yes 6b) 0.165 no 7a) 1074, 138.9 7b)-0.82, 1.48, -0.39, -0.82, 0.55 7d) µ=0; σ=1 8a) 70” woman 8b) 68” man 9a) 0.47 9b) 0.22 9c) .6 10) P20=134 12) P98=276 14) P60=241 16) Q1=203 18) P85=270 20) P57=241 22a) P28 22b)Q1=66 22c)P50=71 Section 2.6 1b) 0.985 1c) y=2.862x-.957 2b) 0.658 2c) y=0.769x-14.380 2e) 79.438 3b) 0.262 3c) y=2.479x-139.013 3e) no predictions should be made based off of this data. Correlation coefficient is too low. 14