Download AP Statistics Midterm Review Chapters 1-2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
AP Statistics Midterm Review Chapters 1-2
Use this information for # 1-14
In the paper “Reproduction in Laboratory colonies of Bank Vole,” the authors presented the results of a
study of litter size. (A vole is a small rodent with a stout body, blunt nose, and short ears.) As each new
litter was born, the number of babies was recorded, and the accompanying results were obtained.
1
2
2
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
8
8
8
8
8
8
8
8
8
8
8
9
9
9
10
10
11
The authors also kept track of the color of the first born in each litter. (B = brown, G = gray, W = white,
and T = tan)
B
W
B
G
B
B
G
B
G
B
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
B
B
T
G
T
B
B
B
W
G
T
W
B
G
B
B
B
B
T
T
W
B
B
B
B
G
B
B
G
W
T
T
B
B
T
T
B
G
T
B
G
T
G
T
W
B
G
G
B
G
G
G
W
B
W
B
W
T
B
T
G
B
B
W
B
T
G
T
G
W
B
T
B
T
G
T
T
W
B
G
B
B
G
T
B
G
G
G
B
W
Which variable is categorical, and which variable is quantitative?
Make a bar chart of the colors.
Make a histogram of the litter sizes.
Make a dot plot of the litter sizes.
Are there any outliers in the histogram or dot plot?
Describe the shape of the histogram (symmetric or skewed).
Find the mean of the litter sizes. Is the mean resistant to outliers?
Find the median of the litter sizes. Is the median resistant to outliers?
Find the 5-number summary of the litter sizes.
What is the interquartile range?
Make a boxplot of the litter sizes.
Are there any outliers? Justify your answer.
What is the difference between a modified boxplot and a regular boxplot? Why is a modified
boxplot usually considered better?
Find the standard deviation of the litter sizes. Is standard deviation resistant to outliers?
What does the area under a density curve represent?
The area under a density curve always equals _____.
If a density curve is skewed to the right, the (mean or median) will be further to the right than the
(mean or median).
Sketch the graph of N(266, 16), the distribution of pregnancy length from conception to birth for
humans.
Explain the empirical rule.
Using the empirical rule (the 68-95-99.7 rule), find the length of the longest 16% of all pregnancies.
Sketch and shade a normal curve for this situation.
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)
33)
34)
35)
Find the length of the middle 99.7% of all pregnancies.
Find the length of the shortest 2.5% of all pregnancies.
What percentile rank is a pregnancy of 218 days?
What is a z-score?
What z-score does a pregnancy of 257 days have?
What percent of humans have a pregnancy lasting less than 257 days?
What percent of humans have a pregnancy lasting longer than 280 days?
What percent of humans have a pregnancy lasting between 260 and 270 days?
Would you say pregnancy length is a continuous or discrete variable? Justify.
How long would a pregnancy have to last to be in the longest 10% of all pregnancies?
How is a Normal Probability Plot used?
Make a back-to-back split stemplot of the following data:
Reading Scores
4th Graders
12
15
18
20
20
22
25
26
28
29
31
32
35
35
35
36
37
39
40
42
7th Graders
1
12
15
18
18
20
23
23
24
25
27
28
30
30
31
33
33
33
35
36
Make a comparison between 4th grade and 7th grade reading scores based on your stemplot.
Give an example of data that is skewed right.
Give an example of data that is skewed left.
AP Statistics Midterm Review Chapter 3
36)
37)
38)
39)
40)
41)
42)
43)
44)
45)
46)
47)
48)
49)
50)
51)
52)
53)
54)
55)
56)
At summer camp one of Laura’s counselors told her that you can determine air temperature from
the number of cricket chirps. Here is the data that Laura collected.
What is the response variable?
What is the explanatory variable?
What is the direction of this scatterplot?
What is the form of this scatterplot?
What is the strength of this scatterplot?
Are there outliers? If so, are they influential?
How are outliers and influential points similar? Different?
What is the correlation?
What two things does correlation tell us about a scatterplot?
If I change the units for temperature to degrees Celsius instead of Fahrenheit, what happens to the
correlation?
What is the highest correlation possible?
What is the lowest correlation possible?
Correlation only applies to what type(s) of relationship(s)?
Correlation only applies to what type(s) of variable(s)?
Is correlation resistant to outliers?
Does a high correlation indicate a strong cause/effect relationship?
What is a least-squares regression line (LSRL)?
Find the LSRL for the chirps-temperature data.
What is the slope of this line, and what does it tell you in this context?
Predict the temperature when there are 161 chirps per minute.
Predict the temperature when there are 225 chirps per minute.
Why is the prediction in problem 55 acceptable but the prediction in problem 56 not?
57)
58)
59)
60)
61)
62)
63)
Find the error in prediction (residual) for 160 chirps per minute.
What point is always on a least squares regression line?
What does r2 measure?
Find the coefficient of determination for this data. What does r2 tell you about this data?
The sum of the residuals is always _____.
How can you use a residual plot to tell if a line is a good model for data?
Develop a LSRL based on the following information. How strong is this model? How do you
know?
AP Statistics Midterm Review Chapters 4-5
64)
65)
If you know a scatterplot has a curved shape, how can you decide whether to use a power model
or an exponential model to fit data?
Graph the following data:
66)
Perform the appropriate logarithmic transformation (power or exponential) on the above data to get
an equation. Write the equation for the transformed data.
67) Make a residual plot to support your choice for problem 66.
68) Use the equation for the transformed data to find the number of mice after 45 days.
69) What is extrapolation, and why shouldn’t we trust predictions using extrapolation?
70) What is a lurking variable?
71) Why should we avoid using averaged data for regression and correlation?
72) What is causation? Give an example.
73) What is confounding? Give an example.
Use this table for questions 74-79:
Education
Did not complete high school
Completed high school
1 to 3 years of college
4 or more years of college
74)
75)
76)
77)
78)
79)
Never smoked
82
97
92
86
Smoking Status
Smoked, but quit
19
25
49
63
Smokes
113
103
59
37
What is the total number of people represented by the table?
What percent of these people smoke?
What percent of never-smokers completed high school?
What percent of those with 4 or more years of college have quit smoking?
What percent of smokers did not finish high school?
What conclusion can be drawn about smoking and education from this table?
80)
81)
82)
83)
84)
85)
86)
87)
What is the difference between an observational study and an experiment?
A cause-and-effect relationship can be established from an _____.
What is a voluntary response sample?
How are a population and a sample related but different?
What is bias?
Why is convenience sampling biased?
SRS stands for what kind of sample? Name and define.
Discuss how to choose a SRS of 4 towns from this list:
Allendale
Detour
Gratiot
Joliet
88)
89)
90)
91)
92)
93)
94)
95)
96)
97)
98)
99)
100)
101)
102)
103)
104)
Bangor
Edmonton
Hillsdale
Kentwood
Chelsea
Fennville
Ionia
Ludington
What is a stratified random sample?
What is a cluster sample?
What is undercoverage?
What is nonresponse?
What is response bias?
Why is the wording of questions important? Give an example.
How are experimental units and subjects similar but different?
Explanatory variables in experiments are often called _____.
If I test a drug at 100 mg, 200 mg, and 300 mg, I am testing one variable at three _____.
What is the placebo effect?
What is the purpose of a control group?
What are the two types of matched pairs used in experiments?
What are the three principles of experimental design?
What does double-blind mean, and why would we want an experiment to be double-blind?
What is block design?
What is the difference between a block design and a stratified random sample?
I want to test the effects of aerobic exercise on resting heart rate. I want to test two different levels
of exercise, 30 minutes 3 times per week and 30 minutes 5 times per week. I have a group of 20
people to test, 10 men and 10 women. I will take heart rates before and after the experiment. Draw
a chart for this experimental design.
AP Statistics Midterm Review Chapters 6-7
105) What is independence?
106) You are going to flip a coin three times. What is the sample space for each flip?
107) You are going to flip a coin three times and note how many heads you get. What is the sample
space?
108) You are going to flip a coin three times and note what you get on each flip. What is the sample
space?
109) Make a tree diagram for the three flips.
110) Any probability is a number between (and including) _____ and _____.
111) All possible outcomes together must have probability of _____.
112) If S is the sample space, P(S) = _____.
113) What are complements? Give an example and draw a Venn diagram.
114) What are disjoint events? Give two examples and draw a Venn diagram.
115) What is the probability that an M & M is blue?
116) What is the probability that an M & M is red or green?
117) What is the probability that an M & M is yellow and orange?
118) Bre can beat Erica in tennis 9% of the time. Erica can swim faster than Bre 8% of the time. What is
the probability that Bre would beat Erica in a tennis match and in a swimming race?
119) What assumption are you making in problem 118?
120) Using two dice, what is the probability that you would roll a sum of seven or eleven?
121) Using two dice, what is the probability that you would roll doubles?
122) Using two dice, what is the probability that you would roll a sum of 7 or 11 on the first roll and
doubles on the second roll?
123) What assumption are you making in problem 122? Do you think this assumption is valid?
124) Using two dice, what is the probability that you would roll a sum of 7 or 11 that is also doubles?
125) What is the union of two events?
126) What is an intersection of two events?
127) How can we test independence?
128) Perform an independence test on the smoking/education chart from problem 110 to show that
smoking status and education are not independent.
129) Make a Venn diagram for the following situation:
45% of kids like Barney
25% of kids like Blue
55% of kids like Pooh
15% of kids like Blue and Pooh
25% of kids like Barney and Pooh
5% of kids Barney, Blue, and Pooh
5% of kids like Blue but not Barney or Pooh
For problems 130-133 consider the process of drawing a card from a standard deck and replacing
it. Let A be drawing a heart, B be drawing a king, and C be drawing a spade.
130) Are the events A and B disjoint? Explain.
131) Are the events A and B independent? Justify.
132) Are the events A and C disjoint? Explain.
133) Are the events A and C independent? Justify.
134) What does the symbol U mean?
135) What does the symbol ∩ mean?
136) Make a probability histogram of the following grades on a four-point scale:
137)
138)
139)
140)
141)
142)
Using problem 136, what is P(X > 2)?
Using problem 136, what is P(X  2)?
What is a uniform distribution? Draw a picture.
In a uniform distribution with 0 < X < 1, what is P(0.2 < X < 0.6)?
In a uniform distribution with 0 < X < 1, what is P(0.2 ≤ X ≤ 0.6)?
How do your answers to problems 137, 138, 140, and 141 demonstrate a difference between
continuous and discrete random variables?
143) Normal distributions are (continuous or discrete).
144) Expected value is another name for _____.
145) Find the expected value of the grades in problem 136.
146) Find the variance of the grades in problem 136.
147) Find the standard deviation of the grades in problem 136.
148) What is the law of large numbers?
149) If I sell an average of 5 books per day and 7 CDs per day, what is the average number of items I
sell per day?
150) If I charge $2 per book and $1.50 per CD in problem 149, what is my average amount of income
per day?
151) Before you can use the rules for variances you must make sure the variables are _____.
For problems 152-160, use the following situation: For Test 1, the class average was 80 with a
standard deviation of 10. For Test 2, the class average was 70 with a standard deviation of 12.
152) What is the average for the two tests added together?
153) What is the standard deviation for the two tests added together?
154) What is the difference in the test averages?
155) What is the standard deviation for the difference in the test averages?
156)
157)
158)
159)
160)
If I cut the test scores on Test 2 in half and add 50, what is the new average?
What is the new standard deviation for Test 2 in problem 156?
If I add 7 points to every Test 1, what is the new standard deviation?
If I multiply every Test 1 by 2 and subtract 80, what is the new mean?
If I multiply every Test 1 by 2 and subtract 80, what is the new standard deviation?
Multiple Choice
1) You measure the age, marital status and earned income of an SRS of 1463 women. The number and
type of variables you have measured is
A) 1463
B) Four; two categorical and two quantitative
C) Four; one categorical and three quantitative
D) Three; two categorical and one quantitative
E) Three; one categorical and two quantitative
2) If your score on a test is at the 60th percentile, you know that your score lies
A) Below the lower quartile
B) Between the lower quartile and the median
C) Between the median and the upper quartile
D) Above the upper quartile
E) Can’t say where it lies relative to the quartiles
3) When dealing with financial data (such as salaries or lawsuits settlements), we often find that the
shape of the distribution is _________. When the distribution has this shape, the _________ is pulled
toward the long tail of the distribution, but the _________ is less affected. The sequence of words to
correctly complete this passage is
A) Right skewed, median, mean.
B) Left skewed, mean, median.
C) Right skewed, mean, standard deviation.
D) Right skewed, mean, median.
E) Roughly symmetric, mode, mean.
4) Items produced by a manufacturing process are supposed to weigh 90 grams. The manufacturing
process is such, however, that there is variability in the items produced and they do not all weigh exactly
90 grams. The distribution of weights can be approximated by a normal distribution with mean 90 grams
and a standard deviation of 1 gram. What percentage of the items will either weigh less than 87 grams or
more than 93 grams?
A) 6%
B) 94%
C) 99.7%
D) 0.3%
E) 0.15%
5) The correlation coefficient measures
A) Whether there is a relationship between two variables.
B) The strength of the relationship between two quantitative variables.
C) Whether or not a scatterplot shows an interesting pattern.
D) Whether a cause and effect relation exists between two variables.
E) The strength of the linear relationship between two quantitative variables.
6) Which of the following is true of the correlation coefficient r ?
A) It is a resistant measure of association.
B) –1 ≤ r ≤ 1
C) If r is the correlation between X and Y, then –r is the correlation between Y and X.
D) The correlation coefficient will be +1.0 only if all the data lie on a perfectly straight-line.
E) All of the above.
7) The least-squares regression line is
A) The line that makes the square of the correlation in the data as large as possible.
B) The line that makes the sum of the squares of the vertical distances of the data points from the
line as small as possible.
C) The line that best splits the data in half, with half of the points above the line and half below
the line.
D) The line that makes the sum of the squares of the residuals 0.
E) All of the above.
8) The fraction of the variation in the values of a response y that is explained by the least-squares
regression of y on x is the
A) Correlation coefficient
B) Slope of the least-squares regression line
C) Square of the correlation coefficient
D) Intercept of the least-squares regression line
E) Sum of the squared residuals
9) Suppose we fit the least-squares regression line to a set of data. If a plot of the residuals shows a
curved pattern,
A) A straight line is not a good summary for the data.
B) The correlation must be 0.
C) The correlation must be positive.
D) Outliers must be present
E) r2 = 0.
10) Which of the following statements concerning residuals is true?
A) The sum of the residuals is always 0.
B) A plot of the residuals is useful for assessing the fit of the least-squares regression line.
C) The value of a residual is the observed value of the response minus the value of the response
that one would predict from the least-squares regression line.
D) An influential point on a scatterplot is not necessarily the point with the largest residual.
E) All of the above.
11) If changes in a response variable are due to the effects of the explanatory variable as well as the
effects of lurking variables, and we cannot distinguish between these effects, we are said to have
A) A cause-and-effect relation between the explanatory and response variable.
B) Common response
C) Confounding
D) Correlation
E) Extrapolated
12) A variable grows exponentially over time if
A) The variable increases by the addition of a fixed amount to the variable as time increases by a
fixed amount.
B) The variable increases by squaring its value whenever time is increased by a certain fixed
amount.
C) The variable increases by multiplication by a fixed amount as time increases by a fixed
amount.
D) The variable increases by the logarithm of its value whenever time is increased by a certain
fixed amount.
E) None of the above.
13) Using least-squares regression, I determined that the logarithm of the population of a country is
approximately described by the equation: log(population) = -13.5 + 0.01x(year)
Based on this equation, the population of the country in the year 2000 should be about
A) 6.5
B) 665
C) 6,665
D) 2,000,000
E) 3,162,278
14) The reason that blocking (as in a randomized block design) is sometimes used in experimentation is
to
A) Prevent the placebo effect
B) Allow double blinding
C) Eliminate confounding with another factor
D) Eliminate sampling variability
15) An experiment compares the taste of a new spaghetti sauce with the taste of a successful sauce.
Each of a number of tasters tastes both sauces (in random order) and says which tastes better. This is
called a
A) Simple Random Sample
B) Stratified Random Sample
C) Completely Randomized Design
D) Matched Pairs Design
E) Double-Blind Design
16) In a certain town, 50% of the households own a cellular phone, 40% own a pager, and 20% own both
a cellular phone and a pager. The proportion of households that own neither a cellular phone nor a pager
is
A) 0%
B) 10%
C) 30%
D) 70%
E) 90%
17) If the knowledge that an event A has occurred implies that a second event B cannot occur, the
events A and B are said to be
A) Independent
B) Disjoint
C) Mutually Exhaustive
D) The Sample Space
E) Complementary
18) A deck of cards contains 52 cards, of which 4 are aces. You are offered the following wager: Draw
one card at random from the deck. You win $10 if the card drawn is an ace. Otherwise you lose $1. If you
make this wager very many times, what will be the mean outcome?
A) About -$1, because you will lose most of the time.
B) About $9, because you win $10 but lose only $1.
C) About -$0.15, that is, on the average you lose about 15 cents.
D) About $0.77, that is, on the average you win about 77 cents.
E) About $0, because the random draw gives you a fair bet.
19) The density curve for a continuous random variable X has which of the following properties?
A) The probability of any event is the area under the density curve and above the values of X that
make up the event.
B) The total area under the density curve for X must be exactly 1.
C) The probability of any event of the form X = constant is 0.
D) The density curve lies completely on or above the horizontal axis.
E) All of the above.
20) All bags entering a research facility are screened. Ninety-seven percent of the bags that contain
forbidden material trigger an alarm. Fifteen percent of the bags that do not contain forbidden material
also trigger the alarm. If 1 out of every 1,000 bags entering the building contains forbidden material, what
is the probability that a bag that triggers the alarm will actually contain forbidden material?
A) 0.00097
B) 0.00640
C) 0.03000
D) 0.14550
E) 0.97000