Download Cumulative Rev Answers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Review 1 Answers
1.
D.
The mean of the sum of two random variables would be the sum of the respective
means, or 45 + 15 = 60. In terms of the standard deviation of the sum of two random variables,
the variance of the sum is the sum of the two variances. Therefore, the variance of the sum
would be 15^2 + 3^2 = 234. The standard deviation would be square root of 234, which are
approximately 15.3.
2.
D.
The boxplot for the data set A is wider, overall, indicating a wider range. Also,
the “box” portion of the boxplot for data set A is wider, indicating a wider interquartile range, or
IQR. However, boxplots do not indicate the sample sizes for the sets they are based upon.
3.
D.
Cluster sampling is a process whereby a portion of the population is selected, and
all values from that portion are taken as the sample. This is achieved in the case whereby a
single county is used, in its entirety, as the sample in question.
4.
E.
First, since the size of one of the data sets is 10, and the distribution of the land
sizes is likely not normally distributed, the conditions for using the t-distribution are not met.
Second, since the entire populations are taken into the samples, the means would be population
means and have no variation. Third, since the two nations are contiguous, sharing borders, it can
also be argued that the land sizes of the states and provinces are not independent.
5.
B.
The main purpose of a stratified sample is to divide the population into groups,
called strata, so that each stratum has members represented in the sample. It is the simple
random sample that allows each member of the population to have an equal probability of being
selected. Both types of samples can be done with any sample size, and significance tests can be
done with both. Moreover, it is often that the stratified sample is more difficult to collect.
6.
E.
Since each value in the set is divided by 2, the mean and standard deviation
would both be divided by 2. With the original mean being 170 and standard deviation being 50,
the new mean and standard deviation would be 85 and 25, respectively.
7.
E.
First, since no maximum possible score was given, it is not known, whether the
outlier was an extremely high value or low value. However, if it is known whether the outlier is a
low value its removal would cause the mean to increase, but the median could possibly stay the
same, or go up with the removal would cause the mean to increase, but the median could
possibly stay the same, or go up with the removal of the lowest value set. If it is known whether
the outlier is a high value, its removal would cause the mean to decrease, but the median could
possibly stay the same, or go down with the removal of the highest value of the set. Therefore,
the exact behavior of the mean and median cannot be determined from the information given.
8.
C.
A stratified sample involves dividing the population into a number of subsets
called strata, and taking a simple random sample from each stratum. In this case, the strata are
the countries of the state.
9.
A.
The most likely explanation for the difference between the poll results and the
actual results and the actual election results is the fact that those who participated in the poll
were only those who chose to be included in the sample. This is the issue of voluntary sampling.
Although the other responses are all possible explanations for the difference in the results, they
would unlikely be causes of great differences. It is possible that the question was slanted, but the
actual question is not given and, therefore, cannot be assumed. Lying in the responses is
possible, but would likely not account for great differences in the results. Also, respondents
changing their minds might account for some difference, but likely not a great differences in the
results. Last, it is likely, from the context, that the sample size was sufficient to produce
publishable results.
10.
D.
It was originally noted that there was an inverse relationship between the rate of
involvement in volunteer activities and the rate of delinquency among this group of children.
Therefore, it was assumed that if children who had a high rate of delinquency were placed in
voluntary activities, their rate of delinquency would be reduced. However, this was based on the
assumption that there was a cause-and-effect relationship between volunteer activities and
delinquency. In reality, since the children placed in the volunteer activities did not have a
reduction in delinquency, it showed that volunteer activities were not the cause of low
delinquency, but merely that the high volunteer rate and low delinquency were only correlated,
being caused by a third factor, namely the type of children who would be involved in both.
Differences between boys and girls would not account for the difference, nor would necessarily
examining other factors. Observing other samples could possibly produce differing results but
may not explain the failure of placing children into volunteer activities as an attempt to lower
delinquency rates.
11.
B.
In a study to measure the effect of a drug, the effects are compared to a control
group, which measures the natural behavior of a condition without medication, in this case, that
of migraine headaches. In order to reduce the psychological bias of patients who know they are
receiving medication, the members of the control group are given a placebo, or fake pill. To
further reduce this bias, the study is a blind study, whereby the patients do not know whether the
pill they are given is the real medication or the placebo. Moreover, in a double-blind study,
neither the patient nor the doctor administering the pill knows whether the pill is the real
medication of the placebo. This is meant to eliminate any unintended clues the doctor may give
to the patient to indicate whether the medication is real or not.
12.
C.
With replacement, the possible samples of size 2 are {1, 1}, {1,3}, {1,4}, {1,8},
{3,1}, {3,3}, {3,4}, {3,8}, {4,1}, {4,3}, {4,4}, {4,8}, {8,1}, {8,3}, {8,4}, and {8,8} yielding
possible sample means 1, 2, 2.5, 4.5, 2, 3, 3.5, 5.5, 2.5, 3.5, 4, 6, 4.5, 5.5, 6, and 8. For these
possible samples, μ x = 4 and σ x =1.80.
13.
A.
Plan I is the superior plan, because both the initial selection of the deer, and the
second selection, are based on random samples from the population. Plan II lacks randomness,
because only the one square mile region is taken at random. This process would not take into
consideration factors, such as terrain, that may affect the distribution of the deer over the region,
whereby the deer would not be randomly chosen into the samples.
14.
E.
The percentile associated with a score represents how many whole percent of all
scores lie below that score. Therefore, if a score is in the 90th percentile, this means that 90% of
all other scores are below that score.
15.
D.
The center box for Set A is wider than the center box for Set B, indicating a
greater interquartile range for Set A. Other statements concerning the exact mean, standard
deviation, set size, or presence of outliers cannot be justified, since the exact distribution of
actual data points cannot be determined from the boxplot alone.
16.
A.
The definition of “random sample” is that each possible sample size n is equally
likely to be chosen as the sample size for n. Therefore, if a sample is random, each member of
the population is equally likely to be included in the sample. A random sample does not need to
have any particular size, nor does the random sample guarantee that each stratum of the
population will be represented or that the values will follow any pattern.
17.
C.
The goal in this case was to best represent the typical student in the large class.
The typical student, which made up the largest portion of the class, was a Freshman. By
assigning values 1, 2, 3, and 4 to the students, the mode, the value which appears most often,
would indicate that the Freshman make up the greatest portion of the class.
18.
E.
Standard deviation gives a measure of the amount of variation in the values in
either a sample or a population. In this case, the standard deviation would give a measure of the
amount of variation in the wattages in the sample of power strips. Low variation would indicate
that the values are, on the whole, close to the mean. Therefore, standard deviation would be the
best indicator of stability.
19.
B.
The only statement that is not true is the claim that the ranges are equal. The
range for Stockton is (38 - 17) = 21, and the range for Waffletown is (37 - 18) = 19. The other
statements are true. Waffletown has 5 players over 30, and Stockton has 4 players over 30. The
ages of the Stockton players, if compared in order, are equal, or lower than each corresponding
age of the Waffletown players, with the exception of only the oldest player. The median age for
Waffletown is 24.5, and the median age for Stockton is 23.5. Stockton has 9 players in the 20s,
and Waffletown has 7.
20.
C.
To produce a simple random sample, all values in the population must be equally
likely to be chosen into the sample. Therefore, a random process should be implemented that
selects values from the population, as one single, nonstratified group. Using random numbers to
select voters from a voter registration list would best produce a random sample from the entire
population. Selecting 25 voters from four countries would produce a voter on a list is a
systematic process that is not random, and selecting voters from street corners eliminates a large
number of voters from the possibility selection.
21.
C.
In order to stimulate a process whereby 20% of students are absent, and 80% are
present, assign two of the random digits to represent “absent” and eight of the random digits to
represent “present.” Assigning 0 and 1 to represent “absent” and the remainder of the digits as
“present” will accomplish this.