Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP STATS REVIEW - PART 1 - DESCRIBING DATA (2008 #1) 1. To determine the amount of sugar in a typical serving of breakfast cereal, a student randomly selected 60 boxes of different types of cereal from the shelves of a large grocery store. The student noticed that the side panels of some of the cereal boxes showed sugar content based on one-cup servings, while others showed sugar content based on threequarter-cup servings. Many of the cereal boxes with side panels that showed three-quarter-cup servings were ones that appealed to young children, and the student wondered whether there might be some difference in the sugar content of the cereals that showed different-size servings on their side panels. To investigate the question, the data were separated into two groups. One group consisted of 29 cereals that showed one-cup serving sizes; the other group consisted of 31 cereals that showed three-quarter-cup serving sizes. The boxplots shown below display sugar content (in grams) per serving of the cereals for each of the two serving sizes. a) Write a few sentences to compare the distributions of sugar content per serving for the two serving sizes of cereals. b) What new information about sugar content do the boxplots above provide? c) Based on the boxplots shown above on this page, how would you expect the mean amounts of sugar per cup to compare for the different recommended serving sizes? Explain. (2008B #2) Four different statistics have been proposed as estimators of a population parameter. To investigate the behavior of these estimators, 500 random samples are selected from a known population and each statistic is calculated for each sample. The true value of the population parameter is 75. The graphs below show the distribution of values for each statistic. (a) Which of the statistics appear to be unbiased estimators of the population parameter? How can you tell? (b) Which of statistics A or B would be a better estimator of the population parameter? Explain your choice. (c) Which of statistics C or D would be a better estimator of the population parameter? Explain your choice. (2010B #1) As a part of the United States Department of Agriculture’s Super Dump cleanup efforts in the early 1990s, various sites in the country were targeted for cleanup. Three of the targeted sites—River X, River Y, and River Z—had become contaminated with pesticides because they were located near abandoned pesticide dump sites. Measurements of the concentration of aldrin (a commonly used pesticide) were taken at twenty randomly selected locations in each river near the dump sites. The boxplots shown below display the five-number summaries for the concentrations, in parts per million (ppm) of aldrin, for the twenty locations that were sampled in each of the three rivers. a) Compare the distributions of the concentration of aldrin among the three rivers. b) The twenty concentrations of aldrin for River X are given below. 3.4 4.0 5.6 3.7 8.0 5.5 5.3 4.2 4.3 8.6 5.1 8.7 4.6 7.5 5.3 8.2 4.7 4.8 Construct a stemplot that displays the concentrations of aldrin for River X. 7.3 4.6 b) Describe a characteristic of the distribution of aldrin concentrations in River X that can be seen in the stemplot but cannot be seen in the boxplot. (2007B #1) The Better Business Council of a large city has concluded that students in the city’s schools are not learning enough about economics to function in the modern world. These findings were based on test results from a random sample of 20 twelfth-grade students who completed a 46-question multiple-choice test on basic economic concepts. The data set below shows the number of questions that each of the 20 students in the sample answered correctly. (a) Display these data in a stemplot. (b) Use your stemplot from part (a) to describe the main features of this score distribution. (c) Why would it be misleading to report only a measure of center for this score distribution? (2002B #1) (2003 Exam B – Question 1) A simple random sample of 9 students was selected from a large university. Each of these students reported the number of hours he or she had allocated to studying and the number of hours allocated to work each week. A least squares linear regression was performed and part of the resulting computer output is shown below. Predictor Constant Work Coef 8.107 0.4919 StDev 2.731 0.1950 T 2.97 2.52 P 0.021 0.040 S=4.349 R-Sq=47.6% R-Sq(adj)=40.1% The scatterplot below displays the data that were collected from the 9 students. (a) After point P, label on the graph on the previous page, was removed from the data, a second linear regression was performed and the computer output is shown below. Predictor Constant Work Coef 11.123 0.1500 StDev 3.986 0.3834 T 2.79 0.39 P 0.032 0.709 S=4.327 R-Sq=2.5% R-Sq(adj)=0.0% Does point P exercise a large influence on the regression line? Explain. (b) The researcher who conducted the study discovered that the number of hours spent studying reported by the student represented by P was recorded incorrectly. The corrected data point for this student is represented by the letter Q in the scatterplot below. Explain how the least squares regression line for the corrected data (in this part) would differ from the least squares regression line for the original data. MULTIPLE CHOICE: CHAPTER 1 1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables you have measured is (a) 1463; all quantitative. (b) four; two categorical and two quantitative. (c) four; one categorical and three quantitative. (d) three; two categorical and one quantitative. (e) three; one categorical and two quantitative. 2. Consumers’ Union measured the gas mileage in miles per gallon of 38 1978–1979 model automobiles on a special test track. The pie chart below provides information about the country of manufacture of the model cars used by Consumers Union. Based on the pie chart, we may conclude that: (a) Japanese cars get significantly lower gas mileage than cars of other countries. This is because their slice of the pie is at the bottom of the chart. (b) U.S cars get significantly higher gas mileage than cars from other countries. (c) Swedish cars get gas mileages that are between those of Japanese and U.S. cars. (d) Mercedes, Audi, Porsche, and BMW represent approximately a quarter of the cars tested. (e) More than half of the cars in the study were from the United States. 3. A researcher reports that, on average, the participants in his study lost 10.4 pounds after two months on his new diet. A friend of yours comments that she tried the diet for two months and lost no weight, so clearly the report was a fraud. Which of the following statements is correct? (a) Your friend must not have followed the diet correctly, since she did not lose weight. (b) Since your friend did not lose weight, the report must not be correct. (c) The report only gives the average. This does not imply that all participants in the study lost 10.4 pounds or even that all lost weight. Your friend’s experience does not necessarily contradict the study results. (d) In order for the study to be correct, we must now add your friend’s results to those of the study and recompute the new average. (e) Your friend is an outlier. 4. The following is an ogive on the number of ounces of alcohol (one ounce is about 30 mL) consumed per week in a sample of 150 students. A study wished to classify the students as “light”, “moderate”, “heavy” and “problem” drinkers by the amount consumed per week. About what percentage of students are moderate drinkers, that is consume between 4 and 8 ounces per week? (a) 60% (b) 20% (c) 40% (d) 80% (e) 50% 5. “Normal” body temperature varies by time of day. A series of readings was taken of the body temperature of a subject. The mean reading was found to be 36.5° C with a standard deviation of 0.3° C. When converted to °F, the mean and standard deviation are (°F = °C(1.8) + 32). (a) 97.7, 32 (b) 97.7, 0.30 (c) 97.7, 0.54 (d) 97.7, 0.97 (e) 97.7, 1.80 6. The following is a histogram showing the actual frequency of the closing prices on the New York exchange of a particular stock. Based on the frequency histogram for New York Stock exchange, the class that contains the 80th percentile is: (a) 20-30 (b) 10-20 (c) 40-50 (d) 50-60 (e) 30-40 7. Which of the following is likely to have a mean that is smaller than the median? (a) The salaries of all National Football League players. (b) The scores of students (out of 100 points) on a very easy exam in which most get nearly perfect scores but a few do very poorly. (c) The prices of homes in a large city. (d) The scores of students (out of 100 points) on a very difficult exam in which most get poor scores but a few do very well. (e) Amounts awarded by civil court juries. 8. There are three children in a room, ages three, four, and five. If a four-year-old child enters the room the (a) mean age will stay the same but the variance will increase. (b) mean age will stay the same but the variance will decrease. (c) mean age and variance will stay the same. (d) mean age and variance will increase. (e) mean age and variance will decrease. 9. The weights of the male and female students in a class are summarized in the following boxplots: Which of the following is NOT correct? (a) About 50% of the male students have weights between 150 and 185 pounds. (b) About 25% of female students have weights more than 130 pounds. (c) The median weight of male students is about 162 pounds. (d) The mean weight of female students is about 120 pounds because of symmetry. (e) The male students have less variability than the female students. 10. When testing water for chemical impurities, results are often reported as bdl, that is, below detection limit. The following are the measurements of the amount of lead in a series of water samples taken from inner-city households (ppm). 5, 7, 12, bdl, 10, 8, bdl, 20, 6 Which of the following is correct? (a) The mean lead level in the water is about 10 ppm. (b) The mean lead level in the water is about 8 ppm. (c) The median lead level in the water is 7 ppm. (d) The median lead level in the water is 8 ppm. (e) Neither the mean nor the median can be computed because some values are unknown. CHAPTER 2 1. A company produces packets of soap powder labeled "Giant Size 32 Ounces." The actual weight of soap powder in a box has a normal distribution with a mean of 33 oz. and a standard deviation of 0.8 oz. What proportion of packets are underweight (i.e., weigh less than 32 oz.)? (a) 0.159 (b) 0.212 (c) 0.106 (d) 0.841 (e) 0.115 2. For the density curve shown to the right, what percent of the observations lie above 1.5? (a) 20% (b) 25% (c) 50% (d) 75% (e) 80% 3. For the above density curve, what percent of the observations lie between 0.5 and 1.2? (a) 25% (b) 35% (c) 50% (d) 68% (e) 70% 4. If the heights of 99.7% of American men are between 5'0" and 7'0", what is your estimate of the standard deviation of the height of American men? (a) 1” (b) 3” (c) 4” (d) 6” (e) 12” 5. The figure below is the density curve of a distribution: Five of the seven points marked on the density curve make up the five-number summary for this distribution. Which two points are not part of the five-number summary? (a) B and E. (b) C and F. (c) C and E. (d) B and F. (e) A and G. 6. Suppose that the distribution of math SAT scores from your state this year is normally distributed with mean 480 and standard deviation 100 for males, and mean 440 and standard deviation 120 for females. If someone who scores 780 or higher on math SAT can be considered a genius, what is the proportion of geniuses among the male SAT takers? (a) 28% (b) 14% (c) 3% (d) 1.4% (e) 0.14% 7. The average yearly snowfall in Chillyville is normally distributed with a mean of 55 inches. If the snowfall in Chillyville exceeds 60 inches in 15% of the years, what is the standard deviation? (a) 4.83 inches (b) 5.18 inches (c) 6.04 inches (d) 8.93 inches (e) The standard deviation cannot be computed from the given information. 8. The following graph is a normal probability plot for the amount of rainfall in acre-feet obtained from 26 randomly selected clouds that were seeded with silver oxide: (a) The data appear to show exponential growth; that is, the amount of rainfall increases exponentially as the amount of silver oxide increases. (b) The pattern suggests that the measurement is not normally distributed. (c) A least squares regression line should be fitted to the rainfall variable. (d) It can be expected that the histogram of rainfall amount will look like the normal curve. (e) The shape of the curve suggests that rainfall is caused by seeding the clouds with silver oxide. 9. The five-number summary of the distribution of scores on a statistics exam is 0 26 31 36 50 316 students took the exam. The histogram of all 316 test scores was approximately normal. Thus the variance of test scores must be about (a) 5 (b) 8 (c) 19 (d) 64 (e) 55 10. If the median of a set of data is equal to the mean, then (a) The data are normally distributed. (b) The data are approximately distributed. (c) The distribution is skewed. (d) The distribution is symmetric. (e) One can’t say anything about the shape of the distribution with any certainty. CHAPTER 3 1. In regression, the residuals are which of the following? (a) Those factors unexplained by the data (b) The difference between the observed responses and the values predicted by the regression line (c) Those data points which were recorded after the formal investigation was completed (d) Possible models unexplored by the investigator (e) None of the above 2. What does the square of the correlation (r2) measure? (a) The slope of the least squares regression line (b) The intercept of the least squares regression line (c) The extent to which cause and effect is present in the data (d) The fraction of the variation in the values of y that is explained by least-squares regression on the other 3. Which of the following statements are true? I. Correlation and regression require explanatory and response variables. II. Scatterplots require that both variables be quantitative. III. Every least-square regression line passes through (x , y ) . (a) I and II only (b) I and III only (c) II and III only (d) I, II, and III (e) None of the above 4. A local community college announces the correlation between college entrance exam grades and scholastic achievement was found to be –1.08. On the basis of this you would tell the college that (a) The entrance exam is a good predictor of success. (b) The exam is a poor predictor of success. (c) Students who do best on this exam will be poor students. (d) Students at this school are underachieving. (e) The college should hire a new statistician. 5. A researcher finds that the correlation between the personality traits “greed” and “superciliousness” is –.40. What percentage of the variation in greed can be explained by the relationship with superciliousness? (a) 0% (b) 16% (c) 20% (d) 40% (e) 60% 6. Suppose the following information was collected, where X = diameter of tree trunk in inches, and Y = tree height in feet. X 4 2 8 6 10 6 Y 8 4 18 22 30 8 If the LSRL equation is y = –3.6 + 3.1x, what is your estimate of the average height of all trees having a trunk diameter of 7 inches? (a) 18.1 (b) 19.1 (c) 20.1 (d) 21.1 (e) 22.1 7. Suppose we fit the least squares regression line to a set of data. What is true if a plot of the residuals shows a curved pattern? (a) A straight line is not a good model for the data. (b) The correlation must be 0. (c) The correlation must be positive. (d) Outliers must be present. (e) The LSRL might or might not be a good model for the data, depending on the extent of the curve. 8. The following are resistant: (a) Least squares regression line (b) Correlation coefficient (c) Both the least square line and the correlation coefficient (d) Neither the least square line nor the correlation coefficient (e) It depends CHAPTER 4 1. There is a positive association between the number of drownings and ice cream sales. This is an example of an association likely caused by: (a) Coincidence (b) Cause and effect relationship (b) Confounding factor (d) Common response (e) None of the above 2. If the correlation between body weight and annual income were high and positive, we could conclude that: (a) High incomes cause people to eat more food. (b) Low incomes cause people to eat less food. (c) High-income people tend to spend a greater proportion of their income on food than low-income people, on average. (d) High-income people tend to be heavier than low income people, on average. (e) High incomes cause people to gain weight. 3. A study examined the relationship between the sepal length and sepal width for two varieties of an exotic tropical plant. Varieties A and B are represented by x's and o's, respectively, in the following plot: Which of the following statements is FALSE? (a) Considering variety A alone, there is a negative correlation between sepal length and sepal width. (b) Considering variety B alone, the least squares regression line for predicting sepal length from sepal width has a negative slope. (c) Considering both varieties together, there is a positive correlation between sepal length and sepal width. (d) Considering each variety separately, there is a positive correlation between sepal length and sepal width. (e) Considering both varieties together, the least squares regression line for predicting sepal length from sepal width has a positive slope. 4. From tax records, it is relative easy to determine the amount of liquor consumed per capita and the number of cigarettes consumed per capita for each of the 10 provinces of Canada. These are plotted on a scatterplot and a high positive correlation is found. Which of the following is correct? (a) This implies that heavy smoking causes people to drink more. (b) This implies that heavy drinking causes people to smoke more. (b) We cannot conclude cause and effect, but this also implies that there is a high positive correlation between cigarette smoking and alcohol consumption for individuals. (d) This could be an example of a correlation caused by a common cause because both activities are highly correlated with average family income and average income varies widely among the provinces. (e) We cannot conclude cause and effect, but this also implies that the same individuals both smoke and consume liquor.