Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
```Name ________________________
AP Statistics
Date _____________
Schwimmer
Midterm Review #1
1.
A volunteer for a mayoral candidate’s campaign periodically conducts polls to estimate
the proportion of people in the city who are planning to estimate the proportion of people
in the city who are planning to vote for this candidate in the upcoming election. Two
weeks before the election, the volunteer plans to double the sample size in the polls. The
main purpose of this is to
(1)
(2)
(3)
(4)
(5)
2.
Which one of the following would be a correct interpretation if you have a z-score of
+2.0 on an exam?
(1)
(2)
(3)
(4)
(5)
3.
reduce nonresponse bias
reduce the effects of confounding variables
reduce bias due to the interviewer effect
decrease the variability in the population
decrease the standard deviation of the sampling distribution of the sample proportion
It means that you missed two questions on the exam.
It means that you got twice as many questions correct as the average student.
It means that your grade is two standard deviations above the mean for this exam.
The statistics below provide a summary of the distribution of heights, in inches, for a
simple random sample of 200 young children.
Mean: 46 inches
Median: 45 inches
Standard Deviation: 3 inches
First Quartile: 43 inches
Third Quartile: 48 inches
About 100 children in the sample have heights that are
(1)
(2)
(3)
(4)
(5)
Less than 43 inches
Less than 48 inches
Between 43 and 48 inches
Between 40 and 52 inches
More than 46 inches
4.
z Jay =
Jay, James, and Joe each drive different types of cars. The price they paid for their cars
are in the table below along with the mean and standard deviation of the type of car they
drive. Show necessary work to determine who paid the most and least amount for their
cars relative to their type. Assume that prices for each type are normally distributed.
24857 − 21335
= 1.435
2455
z James =
37925 − 32853
= 1.248
4065
z Joe =
55290 − 48936
= 1.124
5654
Jay paid the most relative to his car type and Joe paid the least relative to his car type.
A least squares regression line was fitted to the weights (in pounds) versus age (in
months) of a group of many young children. The equation of the line is
yˆ = 16.6 + 0.65t ,
where ŷ is the predicted weight and t is the age of the child. A 20-month old child in
this group has an actual weight of 25 pounds. Which of the following is the residual
weight, in pounds, for this child?
yˆ = 16.6 + 0.65(20) = 29.6
(1)
(2)
(3)
(4)
(5)
5.
–7.85
–4.60
4.60
5.00
7.85
residual = observed − predicted
residual = 25 − 29.6 = −4.6
Let X represent a random variable whose distribution is Normal, with a mean of 100 and
a standard deviation of 10. Which of the following is equivalent to P ( X > 115) ?
(1) P ( X < 115 )
(2) P ( X ≤ 115 )
(3) P ( X < 85 )
(4) P ( 85 < X < 115)
(5) 1 − P ( X < 85 )
6.
The following table shows the federal debt for a short period of time from 1980 through 1991.
3.6
2.8
2.0
1.2
.4
1980
1984
1988
1992
Year
(a) Construct a scatterplot on the grid provided.
(b) Determine if the data is linear or exponential and explain how you reached your conclusion.
After find the equation for the least squares regression line, I constructed a residuals plot.
There was an clear pattern, which means a linear model may not be best. After transforming
(
)
the data, a scatterplot of (year, log federalˆ debt ) looks linear with an r-value of .9959 and
small residuals.
(c) Perform exponential regression and generate an equation that predicts the federal debt based
on the year.
(
)
log federalˆ debt = −110.6417 + .0559(year)
(d) Find the residual for this model for the year 1990.
(
)
log ( federalˆ debt ) = .5993
log federalˆ debt = −110.6417 + .0559(1990)
10
(
log federalˆ debt
) = 10.5993
federalˆ debt = 3.975
residual = observed – predicted = 3.2 – 3.975 = –.775
(e) Use this model to predict the national debt for the year 2000.
(
)
log ( federalˆ debt ) = 1.1583
log federalˆ debt = −110.6417 + .0559(2000)
10
(
log federalˆ debt
) = 101.1583
federalˆ debt = 14.3979
7.
“You’ll never have as many friends in your life as you do in high school.”
A statistician decides to test the accuracy of this statement so he conducts a long-range study. He
starts with 1,000 high school seniors, and asks them to count how many “friends” they have.
Every year for 15 years, he stays in touch with the students via email and asks them to inform him
how many friends they have. The statistician averages the number of friends per year and 15
years later, plots the points (age, friends).
The least squares regression line is:
a.
The direction of the association is
ˆ
friends
= 116.14 − 2.94(age)
b.
(1) Positive
(2) Negative
(3) Impossible to tell
c.
How can you interpret the equation?
A 20 year old is predicted to have
(1) 75 friends
(2) 57 friends
(3) Impossible to tell
The strength of the direction is
(1) Very strong
(2) Moderately strong
(3) Moderately weak
d.
What is the correct interpretation?
(1) A person loses 58% of his friends
over time.
(2) 58% of the variation in ages of
people can be explained by the
LSRL.
(3) 58% of the variation in the number
of friends can be explained by the
linear relationship of age on friends.
(4) 58% of the time we can predict the
number of friends a person will lose.
(1) For every friend that a person loses,
(2) For every 2.94 years a person ages,
he/she loses a friend.
(3) For every year a person ages, he/she
e.
r 2 = .58
f.
30 year olds in the study have, on
average, 25 friends. What is the residual
to the nearest whole number?
(1) 3
(2) –3
(3) –5
g.
At what age does the model suggest you
will only have 25 friends?
h.
At 39 years old, the prediction is
(1) You will only have 1 friend
(2) You won’t have any friends.
(3) It’s difficult to determine how many
friends you’ll have.
(1) 31
(2) 43
(3) Impossible to tell
8.
The length of time an alkaline AA battery is usable in a CD player is described by a normal
distribution with mean of 76.3 hours with a standard deviation of 2.1 hours. For each question,
draw a small diagram and answer the question.
a. What percentage of batteries get over 80
hours of use?
80 − 76.3 

P ( x > 80 ) = P  z >
=
2.1 

P ( z > 1.762 ) = .0390
c. What percentage of batteries gets
between 73 and 77 hours of use?
77 − 76.3 
 73 − 76.3
P ( 73 < x < 77 ) = P 
z>
=
2.1 
 2.1
P ( −1.571 < z < .3333) = .5725
b. What percentage of batteries get under
75 hours of use?
75 − 76.3 

P ( x < 75 ) = P  z <
=
2.1 

P ( z < .61905 ) = .2679
d. A battery getting 78 hours of use would
be in what percentile?
78 − 76.3 

P ( x < 78 ) = P  z <
=
2.1 

P ( z < .8095) = .7908
79th percentile
e. How many hours of use would a battery
need to be in the top 10 percentile?
f.
How many hours of use would a batter
need to be in the 99.99th percentile?
z = 1.28
z = 3.719
x − 76.3
2.1
78.988 ≈ 79 hours
x − 76.3
2.1
84.1099 hours
1.28 =
3.719 =
g. Find Q1.
h. Find Q3.
25th percentile
75th percentile
z = −.6745
z = .6745
x − 76.3
2.1
74.88 hours
−.6745 =
x − 76.3
2.1
77.72 hours
.6745 =
```
Related documents