Download Ch1-26 Review Day2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Names: _______________________________
Free Response Practice 2
Question 1
Mrs. Brown’s Pre-Calculus class is taking an introduction to Calculus the last six weeks of the school year. Many of the Pre-Cal students
did badly on their first test, but Mrs. Schofield has confidence that they will improve in time. Two weeks later, the students take another
test. The scores on the exams are as follows:
1)
2)
3)
Student
1st Test
2nd Test
Student
1st Test
2nd Test
1
2
3
4
5
6
7
8
9
10
54
63
69
61
66
69
51
37
61
47
62
80
80
72
76
78
57
39
64
47
11
12
13
14
15
16
17
18
19
20
53
72
59
70
58
68
51
60
93
48
65
87
62
79
59
86
66
73
99
45
Sketch a scatterplot of these data. Are there any points you might consider as outliers or influential points? Why do you consider
them outliers/influential points? How could you check to see if, in fact, they are outliers or influential points? Circle these points
on your scatterplot.
What type of hypothesis test would you use to determine the average improvement from test one to test two? Why did you
choose this type of test?
Give a 90% confidence interval for the improvement of Mrs. Brown’s students from test one to test two. Assume that, over the
years, the standard deviation of improvement from the first Calculus test to the second Calculus test is 8 points.
Question 2
A random sample of 100 current couples at O’Connor HS was selected from the entire large population of current couples.
 Heights of male counterparts of the current couples are approximately normally distributed with mean 70 inches and
standard deviation 3 inches.
 Heights of female counterparts of the current couples are approximately normally distributed with mean 65 inches and
standard deviation 2.5 inches.
 There were 20 couples in which the female was taller than her male counterpart, and there were 80 in which the female
was shorter than her male counterpart.
a) Find a 95 percent confidence interval for the proportion of current couples in the population for which the female is taller than the
male. Interpret your interval in the context of this question.
b) Suppose that a male counterpart is selected at random and a female counterpart is selected at random. Find the approximate
probability that the female will be taller than the male.
c)
Based on your answer to (a) and (b), are the heights of female and their male counterparts independent? Explain your reasoning.
Question 3
High cholesterol level in people can be reduced by exercise or by drug treatment. A pharmaceutical company has developed a new
cholesterol-reducing drug. Researchers would like to compare its effects to the effects of the cholesterol-reducing drug that is currently
available on the market. Volunteers who have a history of high cholesterol and who are currently not on medication will be recruited to
participate in a study.
a) Explain how you would carry out a completely randomized experiment for the study.
b) Describe an experimental design that would improve the design in (a) by incorporating blocking.
c)
Can the experimental design in (b) be carried out in a double blind manner? Explain.
Multiple Choice Practice 2:
134. Consider a data set of positive values, at least two of which are not equal. Which of the following sample statistics will be changed when each value
in this data set is multiplied by a constant whose absolute value is greater than 1?
I.
The mean
II.
The median
III.
The standard deviation
(a) I only
(b) II only (c) III only (d) I and II only (e) I, II, and III
135. Each person in a simple random sample of 2,000 received a survey, and 317 people returned their survey. How could non-response cause the results
of the survey to be biased?
(a) Those who did not respond reduced the sample size, and small samples have more bias than large samples.
(b) Those who did not respond caused a violation of the assumption of independence.
(c) Those who did not respond were indistinguishable from those who did not receive the survey.
(d) Those who did not respond represent a stratum, changing the simple random sample into a stratified random sample.
(e) Those who did respond may differ in some important way from those who did not respond.
136. In a certain game, a fair die is rolled and a player gains 20 points if the die shows a “6”. If the die does not show a “6”, the player loses 3 points. If
the die were to be rolled 100 times, what would be the expected total gain or loss for the player?
(a) A gain of about 1,700 points
(b) A gain of about 583 points
(c) A gain of about 83 points
(d) A loss of about 250 points
(e) A loss of about 300 points.
137. The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and normally distributed with a mean of 720
ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are
normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be
normally distributed with a mean of 7,520 ounces and a standard deviation of ________?
(a) 12 ounces (b) 80 ounces (c) 224 ounces
(d) 48 ounces (e) 1,664 ounces
138. Exercise psychologists are investigating the relationship between lean body mass (in kilograms) and the resting metabolic rate ( in calories per day)
in sedentary males. Based on the computer print out below, which of the following is the best interpretation of the value of the slope of the regression
line?
(a)
(b)
(c)
(d)
(e)
Predictor
Constant
Mass
Coef
St.Dev
264.0 276.9
22.563 6.360
S = 144.9
R-Sq = 55.7%
T
0.95
3.55
P
0.363
0.005
R-Sq(adj) = 51.3%
For each additional kilogram of lean body mass, the resting metabolic rate increases on average by 144.9 calories per day.
For each additional kilogram of lean body mass, the resting metabolic rate increases on average by 22.563 calories per day.
For each additional kilogram of lean body mass, the resting metabolic rate increases on average by 264.0 calories per day.
For each additional calorie per day for the resting metabolic rate, the lean body mass increases on average by 22.563 kilograms.
For each additional calorie per day for the resting metabolic rate, the lean body mass increases on average by 264.0 kilograms.



Consider “n” pairs of numbers x1, y1 , x 2, y 2 , ... , and x n , y n . The mean and standard deviation of the x-values are x  5 and s x  4 ,
139.
respectively. The mean and standard deviation of the y-values are y  10 and s y  10 , respectively. Of the following, which could be the least squares
regression line?
^
(a) y   5.0  3.0 x
^
(b) y  3.0 x
^
(c) y  5.0  2.5 x
^
(d) y  8.5  0.3 x
^
(e)
y  10 .0  0.4 x
140.
An investigator was studying a territorial species of Central American termites, Nasutitermes corniger. Forty-nine termite pairs were randomly
selected; both members of each of these pairs were from the same colony. Fifty-five additional termite pairs were randomly selected; the two members in
each of these pairs were from different colonies. The pairs were placed in Petri dishes and observed to see whether they exhibited aggressive behavior.
The results are shown in the table below:
Aggressive NonTotal
aggressive
Same
40 (33.5)
9 (15.5)
49
Colony
Different
31 (37.5)
24 (17.5)
55
Colonies
Total
71
33
104
A Chi=square test for homogeneity was conducted, resulting in  2 =7.638. The expected counts are shown in parentheses in the table. Which of the
following sets of statements follows from these results?
(a)  2 is not significant at the 0.05 level.
(b)  2 is significant, 0.01 < p < 0.05; the counts in the table suggest that termite pairs from the same colony are less likely to be aggressive than
the termite pairs from different pairs.
(c)  2 is significant, 0.01< p < 0.05; the counts in the table suggest that termite pairs from the different colonies are less likely to be aggressive
than the termite pairs from the same colony.
(d)  2 is significant, p < 0.01; the counts in the table suggest that termite pairs from the same colony are less likely to be aggressive than the
termite pairs from the different colonies.
(e)  2 is significant, p < 0.01; the counts in the table suggest that termite pairs from the different colonies are less likely to be aggressive
than the termite pairs from the same colony.
141.
The mayor of a large city will run for governor if he believes that more than 30 percent of the voters in the state already support him. He will
have a survey firm ask a random sample of “n” voters whether or not they support him. He will use a large sample test for proportions to test the null
hypothesis that the proportion of all voters who support him is 30 percent or less against the alternative that the percentage is higher than 30 percent.
Suppose that 35% of all voters in the state actually support him. In which of the following situations would the power for this test be highest?
(a) The mayor uses a significance level of 0.01 and n = 250 voters.
(b) The mayor uses a significance level of 0.01 and n = 500 voters.
(c) The mayor uses a significance level of 0.01 and n = 1,000 voters.
(d) The mayor uses a significance level of 0.05 and n = 500 voters.
(e) The mayor uses a significance level of 0.05 and n = 1,000 voters.
142. George and Michelle each claimed to have the better recipe for chocolate chip cookies. They decided to conduct a study to determine whose cookies
were really better. They each baked a batch of cookies using their own recipe. George asked a random sample of his friends to taste his cookies and
to complete a questionnaire on their quality. Michelle asked a random sample of her friends to complete the same questionnaire for her cookies. They
then compared the results. Which of the following statements about this study is false?
(a) Because George and Michelle have a different population of friends, their sampling procedure makes it difficult to compare recipes.
(b) Because George and Michelle each used only their own respective recipes, their cooking ability is confounded with the recipe quality.
(c) Because George and Michelle each used only the ovens in their houses, the recipe quality is confounded with the characteristic of the oven.
(d) Because George and Michelle used the same questionnaire, the results will generalize to the combined population of their friends.
(e) Because George and Michelle each baked one batch, there is no replication of the cookie recipes.
143. A plant researcher identifies a sample of plots of land to examine the effect of a new fertilizer on the speed of growth of a particular crop. She treats
half the plots of crops with the new fertilizer and the other half with the traditional brand. She observes that crop growth in both sets of plots is
almost identical. The data collection strategy she used is a (an):
a) experiment
b) anecdotal study
c) observational study
d) survey analysis
e) quasi-study
144. Many statisticians say that U.S Census is significantly less accurate than a count estimated by random sampling. Why might a count estimated from
random samples be more accurate than a census?
a) Random samples are scientific whereas censuses are not.
b) A census is an old method written into law before anyone knew anything about statistics.
c) A census often can’t find every population member, so some groups (such as homeless) are often under-represented
d) A census is a haphazard sample, and census takers may be bribed by households.
e) None of the above
145. Double-blind is best described as:
a) the use of chance to divide experimental units into groups.
b) a design in which neither the experimenter nor the subject knows who is in the treatment group and who is in the control group
c) a group of subjects that are similar in some way known to affect the response to the treatment
d) the policy of repeating the experiment on different subjects to reduce chance variation and to determine the generalizability of the findings
e) the tendency of subjects to respond favorably to any treatment
146. A primary purpose of blocking is:
a) to minimize the placebo effect
c) to eliminate bias between treatment groups
e) to assist in developing a matched-pairs procedure
b) to avoid having to use complicated double-blind procedure
d) to isolate the separate effects of the treatment and another important variable
147. An experiment was conducted in which a researcher first administered a survey on depression and self –esteem to 100 individuals and then taught
them some proper techniques of aerobic exercise. The 100 individuals were then sent off to exercise at least one hour a day. After two months, the
depression and self-esteem survey was administered again and showed that depression symptoms had declined and self-esteem increased. You’re
skeptical about the results. You believe there are two problems. One, you really don’t know the effects of the treatment without a control group, and
two, you suspect there’s interviewer bias. Which type of experimental design might best reduce these problems?
a) matched pairs design, in which subjects are matched on gender and age before starting the exercise routine
b) randomized block design, in which the interviewers or researchers are randomly assigned to teach different groups of subjects
c) completely randomized design with two treatments conditions: “new exercise” for one hour a day vs. “no exercise” for ½ hour a day
d) completely randomized design comparing those who exercise vs. those who don’t exercise, and a blind procedure
e) completely randomized design comparing those who use the new exercise vs. those who don’t use the new exercise
148. An important advantage of using a randomized block design in an experiment is:
a) it controls for the effects of factors that may confound your results
b) eliminating of all possible lurking variables that may confound the effect of the treatment
c) reducing bias associated with using multiple interviewers
d) collecting information from the subjects before and after the administration of a treatment
e) All of the above
149. Which of the following is a key distinction between well designed experiments and observational studies?
a) more subjects are available for experiments than for observational studies
b) ethical constraints prevent large-scale observational studies
c) experiments are less costly to conduct than observational studies
d) an experiment can show direct cause-and-effect relationship, whereas observational studies cannot
e) tests of significance cannot be used on the data collected from an observational study
150. A manufacturer of balloons claims that p, the proportion of its balloons that burst when inflated to a diameter of up to 12 inches, is no more than
0.05. Some customers have complained that the balloons are bursting more frequently. If the customers want to conduct an experiment to test the
manufacturer’s claim, which of the following hypothesis would be appropriate?
a)
c)
H 0 : p  0.05 , H a :p  0.05
H 0 : p  0.05 , H a :p  0.05
b)
d)
H 0 : p  0.05 , H a :p  0.05
H 0 : p  0.05 , H a :p  0.05
e)
H 0 : p  0.05 , H a :p  0.05
151. Lauren is enrolled in a very large college calculus class. On the first exam, the class average was 75 and the standard deviation was 10. On the
second exam, the class mean was 70 and the standard deviation was 15. Lauren scored 85 on both exams. Assuming the scores on each exam were
normally distributed, on which exam did Lauren score better relative to the rest of the class?
a) she scored much better on the first exam
b) she scored much better on the second exam
c) she scored equally well on both exams
d) it is impossible to tell because the class size is not given
e) it is impossible to tell because correlation between the two sets of exams is not given
152. Suppose that 30% of the subscribers to a cable television service watch the shopping channel at least once a week. You are to design a simulation to
estimate the probability that none of the five randomly selected subscribers watched the shopping channel at least once a week. Which of the
following assignments of the digits 0 through 9 would be appropriate for modeling an individual subscriber’s behavior in this simulation?
a) assign “0,1,2” as watching the channel and “3-9” as not watching
b) assign “0,1,2,3” as watching the channel and “4-9” as not watching
b) assign “1,2,3,4,5” as watching the channel and “6-9 and 0” as not watching d) assign “3” as watching the channel and the rest as not watching
153. The number of sweatshirts a vendor sells daily has the following probability distribution.
# of sweatshirts, x
0
1
2
3
4
P(x)
0.3
0.2
0.3
0.1
0.08
If each sweatshirt sells for $25, what is the expected total dollar amount taken in by the vendor from the sale of sweatshirts?
a) $5.00
b)$7.60
c) $35.50
d) $38.000
e) $75.00
5
154. The correlation between two scores x and y equals 0.8. If both the x scores and the y scores are converted to z-scores, then the correlation between
the z-scores for x and z-scores for y would be:
a) -0.8
b) -0.2
c) 0.0
d) 0.2
e) 0.8
155. Suppose that the distribution of a set of scores has a mean of 47 and a standard of 14. If 4 is added to each score, what will be the mean and the
standard deviation, respectively, of the distribution of new scores?
a) 51, 14
b) 51, 18
c) 47, 14
d) 47, 16
e) 47, 18
156. A test engineer wants to estimate the mean gas mileage  (in miles per gallon) for a particular model of automobile. Eleven of these cars are
subjected to a road test, and the gas mileage is computed for each car. A dot-plot of the 11 gas-mileage values is roughly symmetric and has no
outliers. The mean and the standard deviation are 25.5 and 3.01 respectively. Assuming that these 11 cars can be considered a random sample of the
cars in this model, which of the following is a correct statement?
a) a 95% confidence interval for
 is 25.5  2.228(3.01/ 11)
c) a 95% confidence interval for  is 25.5  2.228(3.01/
e) the results cannot be trusted; the sample is too small
10)
b) a 95% confidence interval for
 is 25.5  2.201(3.01/ 11)
d) a 95% confidence interval for
 is 25.5  2.201(3.01/ 10)
157. A volunteer of a mayoral candidate’s campaign periodically conducts polls to estimate the proportion of the people in the city who are planning to
vote for this candidate in the upcoming election. Two weeks before the election, the volunteer plans to double the sample size in the polls. The main
purpose of this is to:
a) reduce nonresponse bias
b) reduce the effects of confounding variables
c) reduce bias due to the interviewer effect
d) decrease the variability in the population
e) decrease the standard deviation of the sampling distribution of the sample proportion
158. The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally distributed with mean 10cm and standard
deviation 0.2cm. Which of the following is the shortest interval that contains approximately 4,000 shellfish?
a) 0cm to 9.949cm
b) 9.744cm to 10cm
c) 9.744cm to 10.256cm
d) 9.895cm to 10.105cm
e) 9.928cm to 10.080cm
159. Based on the table to the left. If the null hypothesis of no association between level of education and employment
status is true, which of the following expressions gives the expected number who earned at least a h.s. diploma and
Employed Not
who are employed full time?
Full Time employed
full time
92  52
92  82
82  52
65  52
a)
b)
c)
d)
Earned
52
40
157
157
92
92
at least
a high
school
diploma
Didn’t
30
35
earn a
high
school
diploma
160. The manager of a factory wants to compare the mean number of units assembled per employee in a week for two new assembly
techniques. Two hundred employees from the factory are randomly selected and each is randomly assigned to one of the two
techniques. After teaching 100 employees one of the techniques and 100 employees the other technique, the manager records the
number of units each of the employees assembles in one week. Which of the following would be the most appropriate inferential
statistical test in this situation?
a) one-sample z-test
b) two-sample t-test
c) paired t-test d) chi-square GOF
e) one-sample t-test
161. A random sample has been taken from a population. A statistician, using this sample, needs to decide whether to construct a 90%
confidence interval for the population mean or a 95% confidence interval. How will these intervals differ?
a) the 90% confidence interval will not be as wide as the 95% confidence interval
b) the 90% confidence interval will be wider as the 95% confidence interval
c) which interval is wider will depend on how large the sample is
d) which interval is wider will depend on whether the sample is unbiased
e) which interval is wider will depend on whether a z-statistic or a t-statistic is used
162.
The boxplots shown above summarize two data sets, I and II. Based on the boxplots, which of the following statements about these two
data sets CANNOT be justified?
a) the range of data set I is equal to the range of data set II
b) the IQR of data set I is equal to the IQR of data set II
c) the median of data set I is less than the median of data set II
d) data set I and data set II have the same number of data points
e) about 75% of the values in data set II are greater than or equal to about 50% of the values in data set I
163. A high school statistics class wants to conduct a survey to estimate what percentage of students in the high school would be willing to
pay a fee for participating in after-school activities. 20 students are randomly selected from each grade level to complete the survey.
This plan is an example of which type of sampling?
a) cluster
b) convenience
c) simple random
d) stratified random
e) systematic
164. Jason wants to determine how age and gender are related to political party preference in his town. Voter registration lists are stratified
by gender and age-group. Jason selects a SRS of 50 men from the 20-29 age-group and records their age, gender, and party
registration. He also selects a independent SRS of 60 women form the 40-49 age-group and records the same information. Of the
following, which is the most important observation about Jason’s plan?
a) the plan is well conceived and should serve the intended purpose
b) his samples are too small
c) he should have used equal sample sizes
d) he should have randomly selected the 2 age groups
e) he will be unable to tell whether a difference in party affiliation is related to age or gender
165. A least square regression line was fitted to the weights (in pounds) versus age( in months) of a group of many young children. The
equation of the line is yˆ  16.6  0.65t where y is the weight and t is the age. A 20-month old child in this group has an actual weight of 25
pounds. Which of the following is the residual weight for this child?
a) -7.85
b) -4.6
c) 4.6
d) 5.00
e) 7.85
166. Which of the following statements is (are) true about the t distribution with k degrees of freedom?
I. The t distribution is symmetric
II. The t distribution with k degrees of freedom has smaller variance than the t distribution with k+1 df
III. The t distribution has a larger variance than the standard normal z distribution
a) I only
b) II only
c) III only
d) I and II
e) I and III