Download Data Analysis Key.jnt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
AP Statistics
Final Exam Review – Data Analysis
1.
You measure the age, marital status and earned income of an SRS of 1,463 women. The number and
type of variables you have measured is
a.)
b.)
c.)
d.)
e.)
2.
Which of the following statements is NOT true?
a.)
b.)
c.)
d.)
e.)
3.
The distribution is symmetric.
The distribution is skewed left.
The distribution is skewed right.
The distribution is bimodal.
The data set probably has a few low outliers.
Which of the following graphs can be used to summarize the data in a two-way table?
a.)
b.)
c.)
d.)
e.)
5.
In a symmetric distribution, the mean and median are equal.
Fifty percent of the scores in a distribution are between the first and third quartiles.
In a symmetric distribution, the median is halfway between the first and third quartiles.
The median is always greater than the mean.
The range is the difference between the largest and smallest observation in the data set.
A set of data that has a mean that is much larger than the median. Which of the following statements is
most consistent with this information?
a.)
b.)
c.)
d.)
e.)
4.
1,463 categorical.
four; two categorical and two quantitative.
four; one categorical and three quantitative.
three; two categorical and one quantitative.
three; one categorical and two quantitative.
Dot plot
Segmented bar graph
Box plot
Stem and leaf plot
Histogram
The mean age of four people in a room is 30 years. A new person whose age is 55 years old enters the
room. The mean age of the give people now in the room is
a.)
b.)
c.)
d.)
e.)
30.
35.
37.5.
40.
Cannot be determined from the information given.
6.
You want to use numerical summaries to describe a distribution that is strongly skewed to the left.
Which combination of measure of center and spread would be the best ones to use?
a.)
b.)
c.)
d.)
e.)
7.
Mean and interquartile range.
Mean and standard deviation.
Median and range.
Median and standard deviation.
Median and interquartile range.
A lobster fisherman is keeping track of the productivity of a set of traps he has placed in a favorite
location. Below are the numbers of lobsters in these traps over the course of 12 different hauls.
0
3
3
3
4
5
5
6
7
7
12
14
According to the 1.5 x IQR rule, which values in the above distribution are outliers?
a.)
b.)
c.)
d.)
e.)
8.
0 only
14 only
12 and 14
0 and 14
0, 12 and 14
You are told that your score on an exam is at the 85th percentile of the distribution of scores. This means
that
a.)
b.)
c.)
d.)
Your score was lower than approximately 85% of the people who took this exam.
Your score was higher than approximately 85% of the people who this exam.
You answered 85% of the questions correctly.
If you took this test (or one like it) again, you would score as well as you did this time, 85% of the
time.
e.) 85% of the people who took this test earned the same score as you did.
9.
Here is a list of scores for Mr. William’s calculus class:
60
61
61
65
72
75
75
78
81
81
85
89
91
98
What is the percentile of the person whose score was 85%
a.)
b.)
c.)
d.)
e.)
10.
15th
21st
29th
71st
85th
Using the standard Normal distribution tables, the area under the standard Normal curve corresponding
to -0.5 < Z < 1.2 is
a.)
b.)
c.)
d.)
e.)
0.2815
0.3085
0.3661
0.5764
0.8849
11.
An ecologist studying starfish populations collected starfish of the species Pisaster was interested in the
distribution of sizes of starfish on a certain shoreline. One measure of size is “arm length.” Below is a
cumulative frequency distribution for the arm length of 102 Pisaster individuals.
The median and interquartile range of this distribution are approximately:
a.)
b.)
c.)
d.)
e.)
12.
Ramon is planning on buying a new car. He’s looking at the Ford Escape – a sport-utility vehicle –
which gets 28 highway miles per gallon, and the Ford Fusion – a mid-sized sedan – which gets 31
highway miles per gallon. The mean fuel efficiency for all sport utility vehicles is 23, with standard
deviation of 7.6. The mean of all mid-sized sedans is 27, with a standard deviation of 5.2. Which
vehicle has a better standing, relative to others of the same style?
a.)
b.)
c.)
d.)
e.)
13.
The Ford Fusion sedan has a better relative standing, because, it’s z-score is higher.
The Ford Fusion sedan has a better relative standing because, it’s z-score is closer to 0.
The Ford Escape SUV has a better relative standing because, it’s z-score is higher.
The Ford Escape SUV has a better relative standing because, it’s z-score is closer to 0.
We can’t make any comparisons unless we know that the distribution of fuel efficiency vehicles
types is Normally distributed.
A data set is Normally distributed with a mean of 25 and a standard deviation of 8. If you standardize
every observation in this data set, the resulting values will have a distribution that has
a.)
b.)
c.)
d.)
e.)
14.
Median is 15.2; Interquartile range is 12.5 to 16.8
Median is 13; Interquartile range is 13 to 16.1
Median is 13; Interquartile range is 3.1
Median is 13; Interquartile range is 4.3
Median is 15.2; Interquartile range is 4.3
a mean of 100 and a standard deviation of 10.
a mean of 25 and a standard deviation of 10.
a mean of 25 and a standard deviation of 1.
a mean of 1 and a standard deviation of 1.
a mean of 0 and a standard deviation of 1.
IQs among undergraduates at Mountain Tech are approximately Normally distributed. The mean
undergraduate IQ is 110. About 95% of undergraduates have IQs between 100 and 120. The standard
deviation of these IQs is about
a.)
b.)
c.)
d.)
e.)
5
10
15
20
25
15.
The time to complete a standardized exam is approximately Normal with a mean of 70 minutes and a
standard deviation of 10 minutes. How much time should be given to complete the exam so that 80% of
the students will complete the exam in the time given?
a.)
b.)
c.)
d.)
e.)
16.
The correlation coefficient measures
a.)
b.)
c.)
d.)
e.)
17.
It is a resistant measure of association?
-1 < r < 1
If r is the correlation between X and Y, then –r is the correlation between Y and X.
Whenever all the data lie on a perfectly straight line, the correlation r will always be equal to +1.0.
All of the above.
In a statistics course a linear regression equation was computed to predict the final exam score from the
score on the first test. The equation of the least-squares regression line was ŷ = 10 + 0.9x where ŷ
represents the predicted final exam score and x is the score on the first exam. Suppose Joe scores a 90
on the first exam. What would be the predicted value of his score on the final exam?
a.)
b.)
c.)
d.)
e.)
19.
whether there is a relationship between two variables.
the strength of the relationship between two quantitative variables.
whether or not a scatterplot shows an interesting pattern.
whether a cause and effect relation exists between two variables.
the strength of the linear relationship between two quantitative variables.
Which of the following is true of the correlation r?
a.)
b.)
c.)
d.)
e.)
18.
61.6 minutes
78.4 minutes
79.8 minutes
84 minutes
92.8 minutes
91
90
89
81
Cannot be determined from the information given. We also need to know the correlation.
Suppose we fit a least-squares regression line to a set of data. If a plot of the residuals shows a curved
pattern,
a.)
b.)
c.)
d.)
e.)
a straight line is not a good summary for the data.
the correlation must be 0.
the correlation must be positive.
outliers may be present.
r2 = 0.
20.
Suppose a straight line is fit to data having response variable y and explanatory variable x. Predicting
values of y for values of x outside the range of the observed data is called
a.)
b.)
c.)
d.)
e.)
21.
The least-squares regression line is fit to a set of data. If one of the data points has a positive residual,
then
a.)
b.)
c.)
d.)
e.)
22.
contingency.
extrapolation.
causation.
correlation.
interpolation.
the correlation between the values of the response and explanatory variables must be positive.
the point must lie above the least-squares regression line.
the point must lie near the right edge of the scatterplot.
the point is probably an influential point.
all of the above.
Which of the following statements concerning residuals is true?
a.) The sum of the residuals is always 0.
b.) A plot of the residuals is useful for assessing the fit of the least-squares regression line.
c.) The value of a residual is the observed value of the response minus the value of the response that one
would predict from the least-squares regression line.
d.) An influential point on a scatterplot is not necessarily the point with the largest residual.
e.) All of the above.
23.
A study of the effects of television measured how many hours of television each of 125 grade school
children watched per week during a school year and their reading scores. The study found that children
who watch more television tend to have lower reading scores than children who watch fewer hours of
television. The study report says that “Hours of television watched explained 9% of the observed
variation in the reading scores of the 125 subjects.” The correlation between hours of TV and reading
scores must be
a.)
b.)
c.)
d.)
e.)
24.
r = 0.09
r = -0.09
r = 0.3
r = -0.3
Can’t tell form the information given.
Scores on the 1995 SAT verbal aptitude test x among Kentucky high school seniors were normally
distributed with mean 420 and standard deviation 80. Scores on the 1995 SAT quantitative aptitude test
y among Kentucky high school seniors were normally distributed with mean 440 and standard deviation
60. The least-squares regression line has the equation ŷ = 188 + .6x. The correlation between verbal
scores and math scores is
a.)
b.)
c.)
d.)
e.)
-0.8
0
.45
.8
Cannot be determined from the given information.
25.
Which of the following statements describes what the standard deviation of residuals for a regression
equation can be used for?
I. It describes the typical vertical distance between an observed data point and the regression line.
II. It evaluates whether a linear model is appropriate for a data set.
III. It measures the overall precision of the predictions made using the regression equation.
a.)
b.)
c.)
d.)
e.)
26.
I only
II only
III only
Both I and II
both 1 and III
In the scatterplot below, the point indicated by the open circle
a.)
b.)
c.)
d.)
e.)
has a negative value for the residual.
has a positive value for the residual.
has a zero value for the residual.
has a zero value for the correlation.
is an outlier.
A Minitab Review…
Predictor
Coef
SE Coef
T
P
Constant
2.08
15.93
0.13
0.899
Calories
0.06297
0.02409
2.61
0.028
S = 3.37648
R-Sq = 43.2%
R-Sq(adj) = 36.9%
Related documents