Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MDM4U1 Data Management and Statistics Statistics of Two Variables Test 2 Multiple Choice [12] Identify the choice that best completes the statement or answers the question. 1. The numbers of monitors sold by a computer store on 11 consecutive business days are: 4, 10, 10, 11, 10, 10, 13, 30, 12, 20, and 24. What are the mean, median, and mode for this set of data? a. mean 14, median 11, mode 10 c. mean 17, median 14, mode 10 b. mean 10, median 11, mode 14 d. mean 11, median 14, mode 10 Answer: a. 2. The numbers of parking violations on a main street during each hour from 9:00 to 17:00 are: 8, 28, 18, 12, 28, 21, 28, and 25. What are the mean, median, and mode for this set of data? c. mean 28, median 23, mode 21 c. mean 23, median 21, mode 28 d. mean 18, median 21, mode 28 d. mean 21, median 23, mode 28 Answer: d. 3. Which of the following is not a characteristic of the mean? a. It is the most familiar and widely used measure of central tendency. b. It is affected by the value of every piece of data. c. It can be computed when data are grouped even and if the last group is openended. d. Its value can be influenced greatly by outliers. Answer: c. 4. A box-and-whisker plot does not show the a. mean b. first quartile c. third quartile d. median. Answer: a. 5. Which set of data would probably show a strong positive linear correlation? a. Marks on a history test and the heights of the students b. The number of defective light bulbs produced and the time of the day when they were manufactured c. The colour of cars sold and the annual income of the car buyers d. The height of corn in a field and the amount of precipitation during the growing season. Answer: d. 6. Which of the following statements is false? a. The correlation coefficient is the covariance divided by the product of the standard deviations for X and Y. b. The correlation coefficient is also called Pearson product-moment coefficient of correlation. c. The value of the correlation coefficient always lies in the range 0 ≤ r ≤ 1. d. The correlation coefficient is a quantitative measure of the relationship between two variables. Answer: c. 7. a. b. c. d. A set of data with a correlation coefficient of – 0.55 has a strong negative linear correlation moderate negative linear correlation weak negative linear correlation little or no linear correlation Answer: b. 8. Using a linear-regression equation to predict values between actual data points is an example of a. extrapolation c. least-squares fit b. residuals d. interpolation Answer: d. 9. Using a linear-regression equation to predict values outside the range of the data is an example of a. extrapolation c. least-squares fit c. residuals d. interpolation Answer: a. 10. For the line of best fit in the least-square method, a. the sum of the squares of the residuals has the greatest possible value b. the sum of the squares of the residuals has the least possible value c. the sum of the residuals is equal to one d. both b) and c) Answer: b. 11. The coefficient of determination, r2, indicates a. the linear relationship between two variables b. the slope of the line of best fit c. how closly the data fit a defined curve d. the sum of the residuals from each data point Answer: c. 12. Which of the following statements is false? a. The coefficient of determination can have values from – 1 to 1. b. The coefficient of determination can be applied to any regression curve. c. The coefficient of determination is the variation in y explained by variation in x, divided by the total variation in y. d. Manually calculation the coefficient of determination is not practical for large sets of data. Answer: a. Completion [6] Complete each statement. 13. In a set of data, the sum of the values of a variable divided by the total number of values is the mean. 14. Values that are distant from the majority of values in a set of data are outliers. 15. Quantities that indicate how closely a set of data clusters around its center are variance, interquartile range and standard deviation. 16. The difference between an individual value and the mean in a set of data is called deviation. 17. The square root of the mean of the squares of all the deviations in a set of data is standard deviation. 18. The mean of the squares of all of the deviations in a set of data is variance. Problems 19. The numbers of service calls a heating company made during the first 11 days of October are 6, 28, 28, 11, 30, 21, 17, 25, 28, 28, and 20. Find the mean, median, and mode for this set of data. Solution Arrange the data in increasing order: 6, 11, 17, 20, 21, 25, 26, 28, 28,28, 28, 30. The middle number, 25, is the mean. The number 28 occurs most, it is the mean. Average = Sum ÷ 11 = 28. 20. The ages, in years, of a group of friends are: 29, 33, 36, 48, 50, 51, 53, 53. a) Find the mean, median, and mode of the ages. b) Explain what each of these measures tells you about this group of friends. c) What do the relative values of the mean and median tell you about the group? Solution a) Median = (48 + 50) ÷ 2 = 49, Mode = 53, Mean = sum ÷ 8 = 353 ÷ 8 = 44.125. b) The mean shows the average age of 8 friends. The median shows the middle of the ages when they are listed in increasing order. The mode shows the most common age. c) The mean is lower than the median, which shows that more people are older than the mean. However, those that are younger are significantly young enough to influence the mean to be lower. 21. Each child in a study of infantile autism was given a behavioral test and graded on a scale from 0 (no symptoms) to 116 (maximum severity). The scores of the 21 children in the study were: 27, 25, 65, 67, 47, 46, 63, 44, 34, 51, 17, 40, 41, 60, 24, 48, 29, 73, 60, 41, 47 Calculate the mean score, the standard deviation, and the variance. Solution Use a calculator! Mean = Sum ÷ 21 = 45.67, Standard deviation = 15.62, Variance = 243.98. 22. A comsumer magazine evaluated 39 models of bathroom scales. The table below lists the prices for models (rounded to the nearest dollar). Scale Model EconoHealth A10 EconoHealth A12 EconoHealth B10 EconoHealth E10 EconoHealth Digital-10 EconoHealth E-20 EconoHealth E-30 HealthSkale 190 HealthSkale 210 HealthSkale 211 HealthSkale 290 Deluxe HealthSkale 310 HealthSkale 1000 HealthSkale 1002 HealthXact 12573 HealthXact 12756 HealthXact 12856 Prowt P10A Prowt Value Prowt Value 2 Price ($) 50 50 50 28 65 40 50 22 32 30 79 50 23 20 35 24 25 120 35 35 Scale Model Superskale 6400 Superskale 7200 Superskale 8000 Superskale 8280 SvelteCheck 12300 SvelteCheck 12300D SvelteCheck 12509 SvelteCheck 12510 SvelteCheck Fashion SvelteCheck Pro SvelteCheck Xtra Weighbeter 550 Weighbeter 801D Weighbeter 830 Weighbeter 835 Weighbeter 950 Weighbeter 2000 Weighbeter 2100 Weighbeter Basic What is the z-score for the price of a) the Weighbeter 801D scale? b) the Weighbeter 830 scale? Solution Mean = 35.1795, Standard deviation = 21.74 a) z-score = b) z-score = 60− 35.1795 21.74 30− 35.1795 21.74 = 1.14 = – 0.238 Price ($) 65 20 14 25 24 48 15 10 17 50 25 22 60 30 30 10 12 20 12 23. Classify the type of linear correlation you might expect with the following pairs of variables. a) Hours of lacrosse practice, goals scored in a lacrosse game b) Students’ average marks, the numbers of siblings in their families c) Distances from students’ homes to their schools, the time they spend on the school bus each day d) Amounts of television watched per day, scores on an aerobic fitness test e) Years without an accident, driving insurance rates Answers: a) Moderate positive correlation b) No correlation c) Strong positive correlation d) Moderate negative correlation e) Strong negative correlation 24. Does the slope of the line of best fit tell you anything about the strength of a linear correlation? Explain why or why not. Answer No. The slope of the line of best fit shows the steepness of the relationship. However, it does not take into account how close of far the data points are to the line. 25. When should outliers be excluded from a regression analysis? Answer: The outliers should be excluded from a regression analysis when they are significantly far at the end or beginning. 26. Find a regression equation with a coefficient of determination greater than 0.98 for the following set of data. x y 0 2.5 1 7.5 2 22.5 3 67.5 4 202.5 5 607.5 Solution y = 2.5 × 3𝑥 , r = 1. 27. Find a regression equation with a coefficient of determination greater than 0.98 for the following set of data. x y 0.1 0.51 0.2 0.54 0.3 0.59 0.4 0.66 0.5 0.75 0.6 0.86 Solution Use a calculator! y = x2 + (2.7 × 10–13)x + 0.5, r2 = 1. 28. A student has a theory that left-handed people understand mathematical logic more readily than right-handed people do. This student finds that the average mark for the three left-handed students in his mathematics class is 3.2% higher than the class average. Does his statistic prove the student’s theory? Answer No. 3 data points are not enough to make a generalization about the population of left-handed people which are millions of people. In addition, class averages are not a good indication of mathematical logic capability because of a variety of other factors that influence averages.