Download C.4 Review - Mrs. McDonald

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Name:
C.4 B & C Review Sheets
AP Statistics
1.
Suppose the correlation between two variables x and y is due to the fact that both are responding to changes in some unobserved
third variable. What is this due to?
(a) Cause and effect between x and y
(b) The effect of a lurking variable
(c) Extrapolation
(d) Common sense
(e) None of the above. The answer is
.
2.
Which of the following statements are true?
I. Regression outliers, if removed, cause a dramatic change in the slope and orientation of the least squares line.
II. Determining a conditional distribution for a two-way table consists of calculating percents for either the row or column sums.
III. An influential observation can have a strong effect on both the regression line and the correlation between the X and Y
variables.
(a) I and II only
(b) I and III only
(c) II and III only
(d) I, II, and III
(e) None of the above. The answer is
.
3.
Suppose a straight line is fit to data having response variable y and explanatory variable x. Predicting values of y for values of x
outside the range of the observed data is called
(a) Correlation
(b) Causation
(c) Extrapolation
(d) Sampling
(e) None of the above. The answer is
.
4.
Suppose that the scatterplot of log Y on X produces a correlation close to 1. Which of the following is true?
I. The correlation between the variables X and Y will also be close to 1.
II. The residual plot of Y on X will show a clearly curved pattern of points.
III. The difference between consecutive values of y for equal x-intervals is approximately constant.
(a) I and II only
(b) I and III only
(c) II and III only
(d) I, II, and III
(e) None of the above. The answer is
.
5. Which of the following are true statements?
I. High correlation does not necessarily imply causation.
II. A lurking variable is a name given to variables that cannot be identified or explained.
III. Successful prediction requires a cause and effect relationship.
(a) I only
(b) II only
(c) III only
(d) I and III only
(e) None of the above. The answer is
.
Part 2: Free Response
A political scientist believes that there is a “gender gap” in American voting with women more likely to vote for the Democratic
candidate. She therefore interviews a random sample of voters and records the gender of the respondents and the political party of the
candidates for whom they voted in the last presidential election. Identify the following variables:
6.
Quantitative:
7.
Categorical:
8.
Explanatory:
9:
Response:
Chapter 4
4B & C Review
1
Over the past 30 years in the United States there has been a strong positive correlation between cigarette sales and the number of high
school graduates.
10. Draw a diagram of the relationship and identify all variables.
11. The statement prior to #10 represents (circle the correct answer):
causation
common response
confounding
In a study of the relationship between the amount of violence a person watches on TV and the viewer’s age, 81 regular TV watchers
were randomly selected and classified according to their age group and whether the were a “low-violence” or “high violence” viewer.
Here is a two-way table of the results.
Amount of
Violence
Watched
Low
High
Age Group
16-34 35-54
8
12
18
15
55 & over
21
7___
12. Compute (in percents) the marginal distribution of age group for all people
surveyed.
13. Construct a bar chart to show your results visually.
14. Compute (in percents) the conditional distributions of age group among “low-violence” viewers. Then do the same for “highviolence” viewers.
15. How do these distributions differ from the marginal distribution of age group?
Chapter 4
4B & C Review
2
Cell phones, a recent innovation, have become increasingly popular with all segments of our society. According to the Strategis
Group, the number of cellular and personal communications systems subscribers in the United States have increased dramatically
since 1990, as shown in the following table.
No.of Subscribers
Year
(millions)
1990
5.3
1991
7.6
1992
11.0
1993
16.0
1994
24.1
1995
33.8
1996
43.4
16. Apply a test to show that the cellular systems are increasing exponentially.
17. Calculate the logarithms of the y-values and extend the table above to
show the
transformed data.
18. Plot the transformed data on the grid provided. Label the axes completely.
19. You want to construct a model to predict cell phone growth in the near future. Perform linear regression on the transformed data.
Write your LSRL equation. What is the correlation for the transformed data?
20. Now transform your linear equation back to obtain a model for the original data. (It should be in the form y = c•10 ^ kx) Write
the equation for this model.
21. The Strategis Group predicts 70.8 million subscribers in 1998, and 99.2 million in the year 2000. How many cellular subscribers
does your model predict for these years?
1.
There is a positive association between the number of drownings and ice cream sales. This is an example of an association likely
caused by:
(a) Coincidence
(b) Cause and effect relationship
(c) Confounding factor
(d) Common response
(e) None of the above
2.
If the correlation between body weight and annual income were high and positive, we could conclude that:
(a) High incomes cause people to eat more food.
(b) Low incomes cause people to eat less food.
(c) High-income people tend to spend a greater proportion of their income on food than low-income people, on average.
(d) High-income people tend to be heavier than low income people, on average.
(e) High incomes cause people to gain weight.
Chapter 4
4B & C Review
3
3.
A study examined the relationship between the sepal length and sepal width for two varieties of an exotic tropical plant. Varieties
A and B are represented by x's and o's, respectively, in the following plot:
Which of the following statements is FALSE?
(a) Considering variety A alone, there is a negative correlation between sepal length and sepal width.
(b) Considering variety B alone, the least squares regression line for predicting sepal length from sepal width has a negative
slope.
(c) Considering both varieties together, there is a positive correlation between sepal length and sepal width.
(d) Considering each variety separately, there is a positive correlation between sepal length and sepal width.
(e) Considering both varieties together, the least squares regression line for predicting sepal length from sepal width has a
positive slope.
4.
From tax records, it is relative easy to determine the amount of liquor consumed per capita and the number of cigarettes
consumed per capita for each of the 10 provinces of Canada. These are plotted on a scatterplot and a high positive correlation is
found. Which of the following is correct?
(a) This implies that heavy smoking causes people to drink more.
(b) This implies that heavy drinking causes people to smoke more.
(c) We cannot conclude cause and effect, but this also implies that there is a high positive correlation between cigarette smoking
and alcohol consumption for individuals.
(d) This could be an example of a correlation caused by a common cause because both activities are highly correlated with
average family income and average income varies widely among the provinces.
(e) We cannot conclude cause and effect, but this also implies that the same individuals both smoke and consume liquor.
Part 2: Free Response
Answer completely, but be concise. Write sequentially and show all steps.
5.
Suppose that two-variable data has been plotted and that the points show a clearly curved pattern. In this situation, several
methods can be used to transform the data. In each of the following, data have been transformed to obtain a good model. In each
case, what would be the equation that best fits the untransformed data?
(a) d = 22.19 t + 0.12
(b) ln c = 0.105 d + 0.01
(c) log y = –7.43 + 2.49 log x
6.
A business school conducted a survey of companies in its state. They mailed a questionnaire to 200 small companies, 200
medium-sized companies, and 200 large companies. The rate of nonresponse is important in deciding how reliable survey results
are. Here are the data on response to this survey:
Small
Medium
Large
Response
125
81
40
No Response
75
119
160
Total
200
200
200
(a) What was the overall response rate?
(b) Describe how nonresponse is related to the size of the business. (Use percents to make your statements precise.)
(c) Draw a bar graph to compare the nonresponse
percents for the three size companies.
Chapter 4
4B & C Review
4
7.
According to data from the U.S. Health Care Financing Administration, the national expenditures for drugs and other medical
nondurables (in billions of dollars) for selected years from 1970 to 1997 are as follows: (Note that Year is coded: 1970 is
recorded simply as 70.)
Year
70
80
85
87
89
90
91
92
93
94
95
97
Spent
8.8
21.6
37.1
43.2
50.6
59.9
65.6
71.2
75
77.7
83.4
108.9
(a) Apply a test to show that, the national expenditures for drugs and other
medical nondurables are increasing exponentially.
(b) Calculate the logarithms of the y-values
and extend the table above to show the
transformed data.
(c) Plot the transformed data on the grid
provided. Label the axes completely.
(d) You want to construct a model to predict the national drug expenditures in the
near future. Perform linear regression on the transformed data and write your
least squares equation.
(e) Now transform your linear equation back to obtain a model for the national drug expenditures data. (It should be in the form
y = (constant)•(10bx) Write the equation for this model.
(f) Predict the national drug expenditure for this year. Do you have confidence in this result? Why or why not?
8.
Foresters are interested in predicting the amount of usable lumber they can harvest from various tree species. The following data
have been collected on the diameter of Ponderosa pine trees, measured at chest height, and the yield in board feet. Note that a
board foot is defined as a piece of lumber 12 inches by 12 inches by 1 inch. Construct an appropriate model for these data. Then
comment on the quality of your model.
Diameter
36
28
28
41
19
32
22
38
25
17
31
20
25
19
39
33
17
37
23
39
Bd Feet
192
113
88
294
28
123
51
252
56
16
141
32
86
21
231
187
22
205
57
265
Chapter 4
4B & C Review
5
ANSWERS FOR THE SECOND PROTION OF THE REVIEW
(1) d. (2) d. (3) d. (4) e.
(5a) d = 492.3961 t2 + 5.3256 t + 0.0144. (5b) c = 1.01 e0.105d. (5c) y = (3.7153510–8) x2.49.
(6a) (75 + 119 + 160)/600 = 59% did not respond. (6b) 75/200 = 37.5% of small businesses, 119/200 = 59.5% of medium sized
businesses, and 160/200 = 80% of large businesses did not respond. Generally the large the business, the less likely they are to
respond.
(7a) The ratios of each term to the previous term are:
(7b) The logarithms (base 10) are:
(7c) The scatterplot of National expenditures for drugs and other medical nondurables by year is screen shot #1 below. The points are
straightened by plotting log(Expenditures) vs. Year (picture #2). Regression is performed on the transformed data; the correlation is
0.998, and the equation of the least squares line is log(Expenditures) = –1.8662 + 0.0402 (Year). (See pictures #3 and #4.) Finally,
we back-transform to obtain the exponential curve Expenditures = (10^–1.8662)(10^(0.0402 Year)). See picture 5.
8. The scatterplot (picture 1 below) shows a clearly curved pattern. To determine the model to use, we note that diameter is onedimensional, and board feet are 3-dimensional. Board feet should be proportional to the cube of the diameter. We hypothesize a
power function of the form y = a xb and plot log(BoardFeet) vs. log(Diameter). The plot of the transformed data appears linear
(picture 2), so we perform least squares regression on the transformed data (picture 3). The fitted line appears in picture 4. The
residual plot (picture 5) shows no pattern, so we judge the line to be an acceptable model for the transformed data. Note that the
correlation is 0.988 (r2 = 97.6%) which indicates a very strong association. Back-transforming, we obtain the power function model,
BoardFeet = (10^-2.5691) (x^3.13667). Note that the power of x is very close to 3, which affirms our hypothesis.
Chapter 4
4B & C Review
6