Download Report of ______ for Chapter 2 pretest

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Categorical variable wikipedia , lookup

Gibbs sampling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Report of ___________ for Chapter 2 pretest
Exam: Chapter 2 pretest
Category: Organizing and Graphing Data
1. "For our study of driving habits, we recorded the speed of every fifth
vehicle on Drury Lane. Nearly every car traveled right at the speed limit
or a little over, but there were some that were 10 mph under, even fewer
at 20 mph under, and one car that crept by a just 15 mph. On the basis
of the central tendency calculation on our data, we drew conclusions
about all drivers on this stretch of road." The proper central tendency
value calculated from the data is the
You Answered
Correct Answer
population median;
sample median;
population mean (m);
sample mean (`X).
Explanation:
In this situation, the speed (ratio scale) of every fifth vehicle was recorded. The first
thing to note then is that the researchers have generated data from a systematic
random sample. We would also note that the shape of the distribution was negatively
skewed with a few very slow speeds and most at or slightly over the speed limit. In a
negatively skewed distribution, the median is not senstive to outlying data and thus
would present a clearer summary than the mean of driver performance on the targeted
stretch of road. Since the data represent a sample, the sample median would be the
correct response to this question.
Help: none
Confident: 36/100
Central tendency measures: statistics - parameters: 67
Mean, Median, and Mode in frequency distributions: 66
Time of problem: 0 minutes 19.858 seconds
2. Which of the following is not an accurate statement of one of the
conventions your text used in establishing class intervals?
Use not fewer than 10 or more than 20 class intervals;
Class intervals should start with an odd number or a multiple of 10;
The largest scores should be at the top of the list of class intervals;
You Answered
Correct
All of the above are correct statements.
Answer
Explanation:
All are correct answers. There are 4 general criteria or guidelines for establishing class
intervals.
1. The number of intervals should be between 10 and 20.
2. The class interval should be convenient size. For example,
• Use an odd number such as 3, 5, or 7 for the interval width to make it easy
to compute the midpoint of the interval.
• Use widths of 10, 20, 50, or 100 where appropriate to make it easy for the
reader to "understand" the intervals. When displaying test data based on
a 100 point scale, 10-point intervals like 50-59, 60-69, 70-79, 80-89, and
90-99 make sense to most folks because they correspond in general to
letter grade distributions.
3. Begin each class interval with a multiple of i.
4. The largest scores should go at the top of the distribution.
Help: Text/Notes
Confident: 73/100
Frequency Distributions: 67
Time of problem: 0 minutes 54.469 seconds
3. A frequency distribution with a mean of 100 and a median of 90 is
You Answered
Correct Answer
positively skewed;
negatively skewed;
neither positive nor negative;
cannot be determined from the information given.
Explanation:
Positively skewed is the correct answer. In a symmetrical distribution, the
mode=mean=median. If I know that the mean is greater than the median, I know that
there must be some high scores in the distribution that are pulling the mean away from
the center of the distribution. This is a situation where outliers are influencing how the
distribution is shaped. When a few high numbers pull the mean away from the center of
the distribution, the tail of the distribtion pointing to the high numbers is stretched in a
positive direction, hence, the distribution is positively skewed.
Help: Text/Notes
Confident: 69/100
Mean, Median, and Mode (Define): 83
Mean, Median, and Mode in frequency distributions: 66
Time of problem: 0 minutes 6.459 seconds
4. Which of the following words could legitimately fit into this sentence:
"That simple frequency distribution has two __________, 13 and 18."
means;
medians;
You Answered
Correct Answer
modes;
all of the above.
Explanation:
Modes is the correct answer. A distribution can have only one mean and only one
median. Means and medians are computed values whereas the mode reflects the most
frequently occurring value(s). In a simple frequency distribution (where scores are
presented in an ascending or desending order accompanied by the number of
occurrences for each score), the mode is the frequently appearing score or scores in
the distribution. In a group frequency distribution, the mode is the midpoint of the
interval that contains the most scores. Group frequency distributions can also be multimodal if more than one interval has the same high frequency count.
Help: Text/Notes
Confident: 85/100
Mean, Median, and Mode (Define): 83
Mean, Median, and Mode in frequency distributions: 66
Time of problem: 0 minutes 6.77 seconds
5. The U.S. Department of Agriculture reported the total number of bushels
harvested of corn, soy beans, wheat, rice, and oats. This is a frequency
distribution of a
You Answered
Correct Answer
nominal variable;
ordinal variable;
interval variable;
ratio variable.
Explanation:
Nominal variable is the correct response. The data in this question is nominal, that is
categories are set up according to names like corn, wheat, soy beans, etc. There is no
order to the variables, just simple categories and frequencies indicating the total number
of bushels harvested in each category. A bar graph would be used to visually present
this information. Histograms, frequency polygons, and line graphs are used when the
underlying scale of measurement is continuous (interval or ratio). In this case, the
underlying scale of measurement is nominal, so we want a presentation device that
separates the different categories and that vehicle is the bar graph. The bar graph
allows the reader to compare categories without regard to order.
Help: Text/Notes
Confident: 100/100
Frequency graphs: 59
Time of problem: 0 minutes 8.943 seconds
6. Which of the following is not used to present a frequency distribution?
a bar graph;
You Answered
Correct Answer
a scatterplot;
a line graph;
a histogram.
Explanation:
Scatterplot is the correct response. A scatterplot shows the relationship between two
quantitative variables, both measured on either an interval or ratio scale. The bar
graph, histogram, and frequency polygon all indicate frequency values on the Y-axis
and measurement on one variable represented on the X-axis. In a scatterplot, a point
represents a the value on the Y variable that goes with the corresponding value on the
X variable. In a scatterplot, measurement for both the X and Y variables requires
underlying scales of measurement that are continuous (interval or ratio).
Help: Text/Notes
Confident: 90/100
Frequency graphs: 59
Time of problem: 0 minutes 7.441 seconds
7. To present a frequency distribution of nominal data you should use
a polygon;
a histogram;
a line graph;
You Answered
Correct Answer
a bar graph.
Explanation:
Bar graph is the correct response. With nominal data, the underlying scale of
measurement is categorical, reflecting no specific order or equal distances between
categories. Of the choices, only the bar graph makes sense for categorical data. The
bars in a bar graph do not touch, emphasizing the separateness of the categories.
Help: Text/Notes
Confident: 100/100
Frequency graphs: 59
Time of problem: 0 minutes 8.292 seconds
8. In a set of scores that ranged from 11 to 50, an acceptable lowest class
interval would be
You Answered
11 - 13;
11 - 14;
9 - 12;
Correct Answer
9 - 11.
Explanation:
9-11 is the correct answer. Remember:
a. the number of intervals should be between 10 and 20;
b. the size for the class interval should be convenient; and
c. each interval should begin with a value that is a multiple of the interval.
Given the guidelines above, the range of values is 40 [(50-11)+1]. Dividing 40 by 10
(the recommended number of intervals) yields an interval width of 4. But, this interval
width is not convenient because of midpoint problems. A better choice would be an
interval width of 3 which would give us more than 10 intervals and a convenient width to
work with when computing midpoints. This eliminates choices b and c. Given an
interval width of 3, only the interval 9 to 11 begins with a multiple of 3.
Help: Text/Notes
Confident: 26/100
Frequency Distributions: 67
Time of problem: 0 minutes 10.985 seconds
9. In which situation would the mean be an appropriate measure of central
tendency?
Correct
Answer
Most of the scores are near the minimum, a few are in the middle range, and
there are almost none near the maximum;
We have frequency data on cows, horses, mules, and goats;
You
The data categories in the soil analysis are: 0-2 ppm, 3-5 ppm, 6-8 ppm, 9Answered
11 ppm, and over 11 ppm;
A few scores are at the minimum of the range, some scores are in the
middle range, most scores are near the maximum;
None of the above;
The mean would be the appropriate measure of center for a, c, and d.
Explanation:
If I'm looking for a single score to communicate how a group of individuals has
performed, the mean is the best measure of central tendency when the measurement
scale is continuous (interval or ratio) and the distribution of scores is reasonably
symmetric. Because the mean is mathematically derived and represents the SX ¸ N,
outliers in the data set can artificially raise or lower the mean by increasing or
decreasing SX. Thus, in distributions that are skewed, the outlying data could render
the mean too high or too low and make the median a more representative and
informative measure of central tendency. When the distribution is symmetric (most of
the scores are near the minimum, a few are in the middle range, and there are almost
none near the maximum) and the measurement scale is continuous, the mean is the
most appropriate and most commonly used measure of central tendency.
Stems b and c feature nominal data (mode only) and stem d describes a negatively
skewed distribution in which case the median would be a better measure of central
tendency than the mean.
Help: Text/Notes
Confident: 68/100
Central tendency measures: statistics - parameters: 67
Mean, Median, and Mode (Define): 83
Time of problem: 0 minutes 4.797 seconds
10. Following are final examination scores for 40 students in a basic
statistics class. These scores were randomly selected from the records
of all students who have taken the course over the past 10 years and
have taken the standardized final examination.
58 86 70 80 82
88 60 80 72 75
89 61 72 76 80
63 73 82 81 89
75 65 82 86 90
75 63 65 84 82
76 68 82 91 94
68 74 79 84 96
a. Create a simple frequency distribution.
b. Create grouped frequency distributions with interval widths of 3 and
5. Include columns for class intervals, exact limits, midpoints, f, cf, %,
and c% in your tables.
c. Draw histograms, keeping the same scale on the Y axis the same for
each of the grouped frequency distributions.
d. Draw a boxplot for this data using the five-number summary [Xmin,
Xmax, Q3, Q1, and the Median]. Note. You may eliminate this
question as we have yet to cover boxplots.
e. Comparing the two histograms, which do you think best represents
the distribution and why?
f. Using the group frequency distribution (i = 5), compute the
percentiles for scores of 63 and 81.
g. Using the group frequency distribution (i = 5), what score
corresponds to the 15th percentile?
h. Compute the mean and standard deviation for this set of scores using
both the deviation and raw score methods? Should you generate
population parameters or sample statistics? Why? Note. Compute
the mean but leave the standard deviation for later; we haven't
covered it yet.
i. From the grouped frequency distribution (i = 5), what are the values
for the median and the mode?
j. Given what you know about distributions and measures of center,
what can you conclude about this data set?
Your Answer:
placeholder
Explanation:
The full explanation for this question, along with tables, histograms, boxplot, summary
statistics, and computations can be viewed and printed by clicking on the following link:
http://www.coe.tamu.edu/~epsy439/prob11-prechp3/prob11.htm
a. The simple frequency distribution can be viewed on the linked page. In
ascending order the data are:
58,60,61,63,63,65,65,68,68,70,72,72,73,74,75,75,75,76,76,79
80,80,80,81,82,82,82,82,82,84,84,86,86,88,89,89,90,91,94,96
b.
Class Interval
Exact Limits
Midpoint
f
cf
%
c%
96-98
95.5-98.5
97
1
40
2.5
100.0
93-95
92.5-95.5
94
1
39
2.5
97.5
90-92
89.5-92.5
91
2
38
5.0
95.0
87-89
86.5-89.5
88
3
36
7.5
90.0
84-86
83.5-86.5
85
4
33
10.0
82.5
81-83
80.5-83.5
82
6
29
15.0
72.5
78-80
77.5-80.5
79
4
23
10.0
57.5
75-77
74.5-77.5
62
5
19
12.5
47.5
72-74
71.5-74.5
73
4
14
10.0
35.0
69-71
68.5-71.5
70
1
10
2.5
25.0
66-68
65.5-68.5
67
2
9
5.0
22.5
63-65
62.5-65.5
64
4
7
10.0
17.5
60-62
59.5-62.5
61
2
3
5.0
7.5
57-59
56.5-59.5
58
1
1
2.5
2.5
Class Interval
Exact Limits
Midpoint
95-99
94.5-99.5
97
1
40
2.5
100.0
90-94
89.5-94.5
92
3
39
7.5
97.5
85-89
84.5-89.5
87
5
36
12.5
90.0
80-84
79.5-84.5
82
11
31
27.5
77.5
75-79
74.5-79.5
77
6
20
15.0
50.0
70-74
69.5-74.5
72
5
14
12.5
35.0
65-69
64.5-69.5
67
4
9
10.0
22.5
60-64
59.5-64.5
62
4
5
10.0
12.5
55-59
54.5-59.5
57
1
1
2.5
2.5
c.
d. See linked page.
e. See linked page.
f
cf
%
c%
f. In this distribution, scores range from 58 to 96, a distance of 39 score points.
Generally, you are looking to create between 10 and 20 intervals. The interval
width should be odd and it should make sense in terms of the distribution. The
lower limit of the first interval should be a multiple of the interval width and
include the lowest value in the distribution. Divide 39 by 10 and you get 3.9. The
closest whole number value to 3.9 is 4.0, but this value is an even number which
makes computation of the midpoint somewhat difficult. This leads us to
considering interval widths of 3 and 5. A width of 3 yields 14 class intervals
starting at 57-59 and ending at 96-98. A width of 5 yields 9 class intervals
starting at 55-59 and ending at 95-99. Look over the histograms. With an interval
width of 3, the distribution looks too flat, too sparse. There are not enough cases
to adequately populate the 14 class intervals. An interval width of 5 on the other
hand, leads to a much more cogent picture of the distribution, a slight negative
skew, with bunching in the middle of the distribuiton. Moreover, an interval width
of 5 makes sense, in that test scores naturally break at points of 90, 80, 70, 60,
etc.
g. Percentiles:
h. Percentile Rank:
i. See linked page. The data is a random sample and will be used to describe the
general characteristics of the population, all students who have taken the
introductory statistics class over the past 10 years. Therefore, you should be
computing sample statistics and using n-1 in the denominator of the standard
deviation formula.
j.
In the grouped frequency distribution, the interval from 80 to 84 has the highest
frequency (11), so the mode is the midpoint of that interval, 82.
The median of the distribution is also in the interval from 80-84 (79.5 to 84.5).
The c% associated with the lower limit is 47.5 and with the upper limit is 75.0.
27.5% (11¸40) of the cases are in this interval. To compute the median, subtract
47.5 from 50 to get 2.5. Divide 2.5 by 27.5 to get .09. Multiply .09 times the
interval width (5) to get .45 and add the .45 to 79.5 to get 79.95, the median.
• Slight negative skew in the exam score distribution
• Scores bunched in the 80-85 area
• Mean=77.4; Median=79.95
• 25% of the students scored at or below 70 and 25% scored at or above
84.
• The score distribution might be interpreted to suggest that students were
generally well prepared for the exam. The slight negative skew bunched
in the 80% to 85% area indicates that the exam was challenging but fair.
A shift in the distribution up would maybe indicate that the exam was too
easy and a shift down, too hard. Overall, good scores were within the
reach of most students in the class.
Help: Text/Notes
Confident: 30/100
Central tendency measures: statistics - parameters: 67
Frequency Distributions: 67
Frequency graphs: 59
Graphs of distributions: 100
Mean, Median, and Mode (Define): 83
Mean, Median, and Mode in frequency distributions: 66
Skewness: 88
Time of problem: 0 minutes 12.688 seconds
11. Your text noted which of the following as a characteristic of the mean?
Correct
Answer
The sum of the results of squaring the difference between each score and
the mean is a minimum;
The sum of the results of squaring the difference between each score and
the mean is zero;
You
Answered
Both a and b;
Neither a nor b.
Explanation:
There are two important properties of the mean:
1. the mean is a balance point in a distribution, therefore, the sum of the deviations
about the mean, S(X -`X), equals 0; and
2. if I square each of the deviation scores (to get rid of negative values), the sum of
the squared deviation scores, S(X -`X)2, is a minimum.
The properties of the mean can be illustrated by the numbers 1, 3, and 5. The mean of
these numbers is 3. If I create deviation scores for each value of X, I get (1-3) or -2; (33) or 0; and (5-3) or +2. Summing these deviation scores, I get ((-2) + 0 + (+2))=0 (1st
property of the mean). If I square and sum each deviation, I get (-2)2 + (0)2 + (+2)2 =
4+0+4=8. If I choose some other number for the mean, other than 3, say for instance 5,
squaring and summing the deviations yields (1-5)2 + (3-5)2 + (5-5)2 = 16 + 4 + 0 = 20.
Note that 8 is the lowest value that can be obtained when summing the squared
deviations for the data values 1, 3, and 5. Try any other value. This is the second
property of the mean.
The second property of the mean is expressed in stem a and represents the correct
answer to this question. Stem b cannot be correct because squaring the deviations and
summing the values will yield 0 only when all scores in the distribution are the same.
Help: Text/Notes
Confident: 14/100
Mean, Median, and Mode (Define): 83
Time of problem: 0 minutes 6.209 seconds
12. The mean temperature for January was 30º. In February the mean was
25º and for March the mean was 35º. The overall mean for these
months is
30º
You Answered
Correct Answer
greater than 30º
less than 30º
Cannot be determined from the information presented.
Explanation:
At first reading, this would appear to be a simple problem. Add 30 + 25 + 35 and divide
by 3 to get an average temperature of 30º. Certainly this would give you a rough
estimate of the average temperature, but it would not be accurate estimate. This is a
weighted mean problem because the three months are not equal in length. January
and March have 31 days but February has only 28 days. This would mean that there
are fewer days with the average temperature at 25º and more days with the average
temperature at 35º. The weighting then would be on the above 30º side meaning that
the average temperature over the three months would be slighly more than 30º.
Help: Text/Notes
Confident: 74/100
Weighted mean: 64
Time of problem: 0 minutes 4.256 seconds
13. Two investigators tested their friends for memory span. The first tested
five people and found a mean of 6.0. The second tested nine people
and found a mean of 7.0. The overall mean for the data gathered is
6.00
6.50
You Answered
Correct Answer
6.64
7.00
Explanation:
This is a weighted mean problem. One group has a mean of 6 based on 5 people and
the other group has a mean of 7 based on 9 people. Because there are unequal
numbers in the two groups, I need to proportionally weight the means. To determine
the overall mean, you need to key in on the formula for the mean, SX¸N. For the first
group, the SX is equal to 6*5 or 30. In the second group, SX is equal to 7*9 or 63. To
find the combined mean, I would add the two SX terms, 30 and 63, and divide by the
total number of people, 5+9. The result is 93¸14 or 6.64.
Help: Calculator
Confident: 95/100
Weighted mean: 64
Time of problem: 2 minutes 24.54 seconds
14. Describe the distinguishing characteristics of the histogram, line graph,
and frequency polygon. Under what conditions would each be used?
Correct answers:
quantitative; two variables; line-curve
Your Answer:
placeholder (Incorrect)
Explanation:
a. histogram
• Graphing technique appropriate for quantitative data.
• Class intervals are represented on the X-axis, and the frequency of each
class interval is represented by the height of the bar.
• Midpoints of the class intervals are plotted on the X-axis and the width of
the bars extends to the exact limits for each class interval. The bars in a
histogram, then, touch one another.
• Histograms tend to be easy to read and convey a sense for how scores in
the distribution are gathered.
b. line graph
• Line graphs picture the relationship between two variables. That is, for
every value of X there is a corresponding value of Y.
• Line graphs are very useful for indicating trends, in fact, the most common
use is probably with stock market data where transaction averages or
stock values are plotted on the Y-axis and time (days, months, quarters, or
years) is plotted on the X-axis.
c. frequency polygon
• The frequency polygon is often referred to as a smooth-line curve and is a
variation of the histogram.
• Midpoints of the class intervals on the X-axis are connected by straight
lines.
• The frequency polygon allows researchers to easily compare, on one set
of axes, distributions for two or more groups. Histograms comparing
groups on the same axes are generally too cluttered.
• Like the histogram, frequency polygons represent frequencies for values
of quantitative variables. Bar graphs, with spaces between bars, are used
to display frequencies of the categories of a qualitative variable.
Help: Text/Notes
Confident: 31/100
Frequency graphs: 59
Time of problem: 0 minutes 8.873 seconds
15. The fact that the middle of a series of items is more difficult to learn that
the beginning or the end is known as the
series effect;
middling effect;
bimodal effect;
You Answered
Correct Answer
serial position effect.
Explanation:
Plotting this phenomenon, known as the serial-position effect, results in a good
example of a line graph. In trying to learn a list of items, it tends to be the case that
recall of the items is highest for those that occur at the beginning or at the end of the
list. These effects are referred to as primacy and recency effects, respectively. That is,
items that you reviewed most recently tend to have the highest recall followed by those
at the beginning of the list that have been repeated most often in the study process.
Items in the middle of the list are the ones that suffer in terms of memory and thus are
the ones that need careful attention in the study process. Use of nmenonic devices can
really improve interior recall, but the line curve is a good way to illustrate how the serial
position effect impacts recall.
Help: Text/Notes
Confident: 100/100
Frequency graphs: 59
Time of problem: 0 minutes 4.927 seconds
16. Suppose a frequency distribution with a range of 0 to 100 was positively
skewed. The greatest frequency of scores would be expected around
Correct Answer
25;
50;
You Answered
75;
any of the above are possible for such a distribution;
none of the above are reasonable for such a distribution.
Explanation:
Help: Text/Notes
Confident: 34/100
Graphs of distributions: 100
Mean, Median, and Mode in frequency distributions: 66
Time of problem: 0 minutes 6.679 seconds
17. Identify the skew of the two distributions below.
a.
X
f
5 10
4 8
3 6
2 2
1 1
0 1
b.
X
f
103
3
102
3
101
18
100
10
99
8
98
3
97
2
96
1
95
1
Correct answers:
negatively skewed; negatively skewed
Your Answer:
negatively skewed; negatively skewed (Incorrect)
Explanation:
a. Even without sketching the first distribution, you should be able to see
the strong negative skew. Negative skew is present when the tail of
the distribution extends or points to the low numbers and that is
exactly the case here. As scores on the X-axis go from 0 to 5 the
frequencies move from 1 to 1 to 2 to 6 to 8 to 10. Eighteen of the 28
scores are 4's or 5's, that is, the scores tend to be bunched at the
upper end of the distribution. This is characteristic of negatively
skewed distributions.
b. The second distribution is also negatively skewed, but not nearly as
pronounced as the first. The tail points somewhat to the lower
numbers and bunches some at the higher numbers, but at the highest
values, there is a dropoff in scoring and there are no real outlying
scores on either end of the distribution. The distribution is not
symmetric and it is negatively skewed, but the skew is slight.
Help: Text/Notes
Confident: 80/100
Skewness: 88
Time of problem: 0 minutes 21.381 seconds
18. The appropriate statistic for conveying the central tendency of a
nominal variable is
mean;
median;
You Answered
Correct Answer
mode;
any of the above, but the mean is preferable;
any of the above, but the median is preferable;
any of the above, but the mode is preferable.
Explanation:
The mode is the only appropriate measure of central tendency that can be
used with nominal variables. Remember, nominal variables are
represented by categories or names that have no inherent order to them. A
bar graph is used to summarize the data and the bars are separated by
space to indicate that their location along the X-axis is arbitrary. Means
and medians, the other measures of central tendency, require that the
measurement scale along the X-axis be at least ordered (mode + median)
and at best interval or ratio (mode + median + mean).
Help: Text/Notes
Confident: 22/100
Mean, Median, and Mode (Define): 83
Mean, Median, and Mode in frequency distributions: 66
Time of problem: 0 minutes 5.228 seconds
You answered 10 correct out of 15 computer graded questions.
Add the number of the written answers (if any were given) that you believe
you got correct to the total correct value to determine your score out of 18.