Download ap stats review – organizing data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
AP STATS REVIEW - PART 1 - DESCRIBING DATA
(2008 #1)
1. To determine the amount of sugar in a typical serving of breakfast cereal, a student randomly selected 60 boxes of
different types of cereal from the shelves of a large grocery store. The student noticed that the side panels of some of the
cereal boxes showed sugar content based on one-cup servings, while others showed sugar content based on threequarter-cup servings. Many of the cereal boxes with side panels that showed three-quarter-cup servings were ones that
appealed to young children, and the student wondered whether there might be some difference in the sugar content of
the cereals that showed different-size servings on their side panels. To investigate the question, the data were separated
into two groups. One group consisted of 29 cereals that showed one-cup serving sizes; the other group consisted of 31
cereals that showed three-quarter-cup serving sizes. The boxplots shown below display sugar content (in grams) per
serving of the cereals for each of the two serving sizes.
a)
Write a few sentences to compare the distributions of sugar content per serving for the two serving sizes of
cereals.
b) What new information about sugar content do the boxplots above provide?
c)
Based on the boxplots shown above on this page, how would you expect the mean amounts of sugar per cup to
compare for the different recommended serving sizes? Explain.
(2008B #2)
Four different statistics have been proposed as estimators of a population parameter. To investigate
the behavior of these estimators, 500 random samples are selected from a known population and
each statistic is calculated for each sample. The true value of the population parameter is 75. The
graphs below show the distribution of values for each statistic.
(a) Which of the statistics appear to be unbiased estimators of the population parameter? How
can you tell?
(b) Which of statistics A or B would be a better estimator of the population parameter? Explain
your choice.
(c) Which of statistics C or D would be a better estimator of the population parameter? Explain
your choice.
(2010B #1)
As a part of the United States Department of Agriculture’s Super Dump cleanup efforts in the early 1990s, various sites
in the country were targeted for cleanup. Three of the targeted sites—River X, River Y, and River Z—had become
contaminated with pesticides because they were located near abandoned pesticide dump sites. Measurements of the
concentration of aldrin (a commonly used pesticide) were taken at twenty randomly selected locations in each river near
the dump sites. The boxplots shown below display the five-number summaries for the concentrations, in parts per
million (ppm) of aldrin, for the twenty locations that were sampled in each of the three rivers.
a)
Compare the distributions of the concentration of aldrin among the three rivers.
b) The twenty concentrations of aldrin for River X are given below.
3.4
4.0
5.6
3.7
8.0
5.5
5.3
4.2
4.3
8.6
5.1
8.7
4.6
7.5
5.3
8.2
4.7
4.8
Construct a stemplot that displays the concentrations of aldrin for River X.
7.3
4.6
b) Describe a characteristic of the distribution of aldrin concentrations in River X that can be seen in the stemplot
but cannot be seen in the boxplot.
(2007B #1)
The Better Business Council of a large city has concluded that students in the city’s schools are
not learning enough about economics to function in the modern world. These findings were based
on test results from a random sample of 20 twelfth-grade students who completed a 46-question
multiple-choice test on basic economic concepts. The data set below shows the number of
questions that each of the 20 students in the sample answered correctly.
(a)
Display these data in a stemplot.
(b)
Use your stemplot from part (a) to describe the main features of this score distribution.
(c)
Why would it be misleading to report only a measure of center for this score
distribution?
(2002B #1)
(2003 Exam B – Question 1)
A simple random sample of 9 students was selected from a large university. Each of these students reported the number
of hours he or she had allocated to studying and the number of hours allocated to work each week. A least squares linear
regression was performed and part of the resulting computer output is shown below.
Predictor
Constant
Work
Coef
8.107
0.4919
StDev
2.731
0.1950
T
2.97
2.52
P
0.021
0.040
S=4.349
R-Sq=47.6%
R-Sq(adj)=40.1%
The scatterplot below displays the data that were collected from the 9 students.
(a) After point P, label on the graph on the previous page, was removed from the data, a second linear regression
was performed and the computer output is shown below.
Predictor
Constant
Work
Coef
11.123
0.1500
StDev
3.986
0.3834
T
2.79
0.39
P
0.032
0.709
S=4.327
R-Sq=2.5%
R-Sq(adj)=0.0%
Does point P exercise a large influence on the regression line? Explain.
(b) The researcher who conducted the study discovered that the number of hours spent studying reported by the
student represented by P was recorded incorrectly. The corrected data point for this student is represented by
the letter Q in the scatterplot below.
Explain how the least squares regression line for the corrected data (in this part) would differ from the least
squares regression line for the original data.
MULTIPLE CHOICE:
CHAPTER 1
1. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of
variables you have measured is
(a) 1463; all quantitative.
(b) four; two categorical and two quantitative.
(c) four; one categorical and three quantitative.
(d) three; two categorical and one quantitative.
(e) three; one categorical and two quantitative.
2.
Consumers’ Union measured the gas mileage in miles per gallon of 38 1978–1979 model automobiles on a special
test track. The pie chart below provides information about the country of manufacture of the model cars used by
Consumers Union. Based on the pie chart, we may conclude that:
(a) Japanese cars get significantly lower gas mileage than cars of other countries. This is
because their slice of the pie is at the bottom of the chart.
(b) U.S cars get significantly higher gas mileage than cars from other countries.
(c) Swedish cars get gas mileages that are between those of Japanese and U.S. cars.
(d) Mercedes, Audi, Porsche, and BMW represent approximately a quarter of the cars tested.
(e) More than half of the cars in the study were from the United States.
3.
A researcher reports that, on average, the participants in his study lost 10.4 pounds after two months on his new
diet. A friend of yours comments that she tried the diet for two months and lost no weight, so clearly the report was
a fraud. Which of the following statements is correct?
(a) Your friend must not have followed the diet correctly, since she did not lose weight.
(b) Since your friend did not lose weight, the report must not be correct.
(c) The report only gives the average. This does not imply that all participants in the study lost 10.4 pounds or
even that all lost weight. Your friend’s experience does not necessarily contradict the study results.
(d) In order for the study to be correct, we must now add your friend’s results to those of the study and recompute
the new average.
(e) Your friend is an outlier.
4.
The following is an ogive on the number of ounces of alcohol (one ounce is about 30 mL) consumed per week in a
sample of 150 students. A study wished to classify the students as “light”, “moderate”, “heavy” and “problem”
drinkers by the amount consumed per week. About what percentage of students are moderate drinkers, that is
consume between 4 and 8 ounces per week?
(a) 60%
(b) 20%
(c) 40%
(d) 80%
(e) 50%
5.
“Normal” body temperature varies by time of day. A series of readings was taken of the body temperature of a
subject. The mean reading was found to be 36.5° C with a standard deviation of 0.3° C. When converted to °F, the
mean and standard deviation are
(°F = °C(1.8) + 32).
(a) 97.7, 32
(b) 97.7, 0.30
(c) 97.7, 0.54
(d) 97.7, 0.97
(e) 97.7, 1.80
6.
The following is a histogram showing the actual frequency of the closing prices on the
New York exchange of a particular stock. Based on the frequency histogram for New York Stock exchange, the
class that contains the 80th percentile is:
(a) 20-30
(b) 10-20
(c) 40-50
(d) 50-60
(e) 30-40
7.
Which of the following is likely to have a mean that is smaller than the median?
(a) The salaries of all National Football League players.
(b) The scores of students (out of 100 points) on a very easy exam in which most get nearly perfect scores but a few
do very poorly.
(c) The prices of homes in a large city.
(d) The scores of students (out of 100 points) on a very difficult exam in which most get poor scores but a few do
very well.
(e) Amounts awarded by civil court juries.
8.
There are three children in a room, ages three, four, and five. If a four-year-old child enters the room the
(a) mean age will stay the same but the variance will increase.
(b) mean age will stay the same but the variance will decrease.
(c) mean age and variance will stay the same.
(d) mean age and variance will increase.
(e) mean age and variance will decrease.
9.
The weights of the male and female students in a class are summarized in the following boxplots:
Which of the following is NOT correct?
(a) About 50% of the male students have weights between 150 and 185 pounds.
(b) About 25% of female students have weights more than 130 pounds.
(c) The median weight of male students is about 162 pounds.
(d) The mean weight of female students is about 120 pounds because of symmetry.
(e) The male students have less variability than the female students.
10. When testing water for chemical impurities, results are often reported as bdl, that is, below detection limit. The
following are the measurements of the amount of lead in a series of water samples taken from inner-city households
(ppm).
5, 7, 12, bdl, 10, 8, bdl, 20, 6
Which of the following is correct?
(a) The mean lead level in the water is about 10 ppm.
(b) The mean lead level in the water is about 8 ppm.
(c) The median lead level in the water is 7 ppm.
(d) The median lead level in the water is 8 ppm.
(e) Neither the mean nor the median can be computed because some values are unknown.
CHAPTER 2
1. A company produces packets of soap powder labeled "Giant Size 32 Ounces." The actual weight of soap powder in
a box has a normal distribution with a mean of 33 oz. and a standard deviation of 0.8 oz. What proportion of
packets are underweight (i.e., weigh less than 32 oz.)?
(a) 0.159
(b) 0.212
(c) 0.106
(d) 0.841
(e) 0.115
2.
For the density curve shown to the right,
what percent of the observations lie above 1.5?
(a) 20%
(b) 25%
(c) 50%
(d) 75%
(e) 80%
3.
For the above density curve, what percent of the observations lie between 0.5 and 1.2?
(a) 25%
(b) 35%
(c) 50%
(d) 68%
(e) 70%
4.
If the heights of 99.7% of American men are between 5'0" and 7'0", what is your estimate of the standard deviation
of the height of American men?
(a) 1”
(b) 3”
(c) 4”
(d) 6”
(e) 12”
5.
The figure below is the density curve of a distribution:
Five of the seven points marked on the density curve make up the
five-number summary for this distribution. Which two points are not
part of the five-number summary?
(a) B and E.
(b) C and F.
(c) C and E.
(d) B and F.
(e) A and G.
6.
Suppose that the distribution of math SAT scores from your state this year is normally distributed with mean 480
and standard deviation 100 for males, and mean 440 and standard deviation 120 for females. If someone who scores
780 or higher on math SAT can be considered a genius, what is the proportion of geniuses among the male SAT
takers?
(a) 28%
(b) 14%
(c) 3% (d) 1.4%
(e) 0.14%
7.
The average yearly snowfall in Chillyville is normally distributed with a mean of 55 inches. If the snowfall in
Chillyville exceeds 60 inches in 15% of the years, what is the standard deviation?
(a) 4.83 inches
(b) 5.18 inches (c) 6.04 inches
(d) 8.93 inches
(e) The standard deviation cannot be computed from the given information.
8.
The following graph is a normal probability plot for the amount of rainfall in acre-feet obtained from 26 randomly
selected clouds that were seeded with silver oxide:
(a) The data appear to show exponential growth; that is, the amount of
rainfall increases exponentially as the amount of silver oxide increases.
(b) The pattern suggests that the measurement is not normally distributed.
(c) A least squares regression line should be fitted to the rainfall variable.
(d) It can be expected that the histogram of rainfall amount will look like
the normal curve.
(e) The shape of the curve suggests that rainfall is caused by seeding the
clouds with silver oxide.
9.
The five-number summary of the distribution of scores on a statistics exam is
0
26
31
36
50
316 students took the exam. The histogram of all 316 test scores was approximately normal. Thus the variance of
test scores must be about
(a) 5
(b) 8
(c) 19
(d) 64
(e) 55
10. If the median of a set of data is equal to the mean, then
(a) The data are normally distributed.
(b) The data are approximately distributed.
(c) The distribution is skewed.
(d) The distribution is symmetric.
(e) One can’t say anything about the shape of the distribution with any certainty.
CHAPTER 3
1. In regression, the residuals are which of the following?
(a) Those factors unexplained by the data
(b) The difference between the observed responses and the values predicted by the regression line
(c) Those data points which were recorded after the formal investigation was completed
(d) Possible models unexplored by the investigator
(e) None of the above
2. What does the square of the correlation (r2) measure?
(a) The slope of the least squares regression line
(b) The intercept of the least squares regression line
(c) The extent to which cause and effect is present in the data
(d) The fraction of the variation in the values of y that is explained by least-squares regression on the other
3.
Which of the following statements are true?
I. Correlation and regression require explanatory and response variables.
II. Scatterplots require that both variables be quantitative.
III. Every least-square regression line passes through (x , y ) .
(a) I and II only
(b) I and III only
(c) II and III only
(d) I, II, and III
(e) None of the above
4.
A local community college announces the correlation between college entrance exam grades and scholastic
achievement was found to be –1.08. On the basis of this you would tell the college that
(a) The entrance exam is a good predictor of success.
(b) The exam is a poor predictor of success.
(c) Students who do best on this exam will be poor students.
(d) Students at this school are underachieving.
(e) The college should hire a new statistician.
5.
A researcher finds that the correlation between the personality traits “greed” and “superciliousness” is –.40. What
percentage of the variation in greed can be explained by the relationship with superciliousness?
(a) 0%
(b) 16%
(c) 20%
(d) 40%
(e) 60%
6.
Suppose the following information was collected, where X = diameter of tree trunk in inches, and Y = tree height in
feet.
X
4
2
8
6
10
6
Y
8
4
18
22
30
8
If the LSRL equation is y = –3.6 + 3.1x, what is your estimate of the average height of all trees having a trunk diameter
of 7 inches?
(a) 18.1
(b) 19.1
(c) 20.1
(d) 21.1
(e) 22.1
7.
Suppose we fit the least squares regression line to a set of data. What is true if a plot of the residuals shows a curved
pattern?
(a) A straight line is not a good model for the data.
(b) The correlation must be 0.
(c) The correlation must be positive.
(d) Outliers must be present.
(e) The LSRL might or might not be a good model for the data, depending on the extent of the curve.
8.
The following are resistant:
(a) Least squares regression line
(b) Correlation coefficient
(c) Both the least square line and the correlation coefficient
(d) Neither the least square line nor the correlation coefficient
(e) It depends
CHAPTER 4
1.
There is a positive association between the number of drownings and ice cream sales. This is an example of an
association likely caused by:
(a) Coincidence
(b) Cause and effect relationship
(b) Confounding factor
(d) Common response
(e) None of the above
2.
If the correlation between body weight and annual income were high and positive, we could conclude that:
(a) High incomes cause people to eat more food.
(b) Low incomes cause people to eat less food.
(c) High-income people tend to spend a greater proportion of their income on food than low-income people, on
average.
(d) High-income people tend to be heavier than low income people, on average.
(e) High incomes cause people to gain weight.
3.
A study examined the relationship between the sepal length and sepal width for two varieties of an exotic tropical
plant. Varieties A and B are represented by x's and o's, respectively, in the following plot:
Which of the following statements is FALSE?
(a) Considering variety A alone, there is a negative correlation between sepal length and sepal width.
(b) Considering variety B alone, the least squares regression line for predicting sepal length from sepal width has a
negative slope.
(c) Considering both varieties together, there is a positive correlation between sepal length and sepal width.
(d) Considering each variety separately, there is a positive correlation between sepal length and sepal width.
(e) Considering both varieties together, the least squares regression line for predicting sepal length from sepal
width has a positive slope.
4.
From tax records, it is relative easy to determine the amount of liquor consumed per capita and the number of
cigarettes consumed per capita for each of the 10 provinces of Canada. These are plotted on a scatterplot and a high
positive correlation is found. Which of the following is correct?
(a) This implies that heavy smoking causes people to drink more.
(b) This implies that heavy drinking causes people to smoke more.
(b) We cannot conclude cause and effect, but this also implies that there is a high positive correlation between
cigarette smoking and alcohol consumption for individuals.
(d) This could be an example of a correlation caused by a common cause because both activities are highly
correlated with average family income and average income varies widely among the provinces.
(e) We cannot conclude cause and effect, but this also implies that the same individuals both smoke and consume
liquor.