Download Homework 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics for Psychology
Homework 2 ~ Descriptive Statistics
Spring 2017
Answers
1. What are the three things to know about any distribution of scores?
Shape, central tendency, and variation.
2. What are the purposes of visual presentation of data?
To organize and summarize information into easily seen patterns and to determine the shape of a
distribution.
3. Scores on a standard test of reading comprehension for a third-grade class of 18 students appear in
Table 1 below. Scores on this particular test are continuous in nature and can range from a low score of
1 to a high score of 5 where a score of 3 is considered typical.
a) Place the scores in a simple frequency distribution.
b) Draw a histogram showing the distribution. State why a histogram is the correct type of figure
and not a bar chart
c) If a score of 3 is considered typical for third-graders, how would you describe the general
reading level for this particular class?
Table 1
Reading Comprehension Scores
5 3 5 4 5 5
4 5 2 4 5 3
5 4 5 5 3 5
The simple frequency distribution & histogram are shown below. Note that the frequency distribution
starts with the number 1 and not 0 (it is a 1 to 5 scale). Also note that both figures have descriptive titles
describing the x (scores) and y (frequency of scores) variables, and that the graph has the axes labeled. A
histogram is the correct figure to use because the variable was stated to be continuous in nature.
Concerning part c, if a score of 3 is considered typical then this particular class looks as though it has
many above-typical reading scores.
Frequency of Reading
Scores
Score
f
5
4
3
2
1
10
4
3
1
0
Frequency of Reading Scores
f
10
8
6
4
2
0
1
2
3
4
Reading Score
5
2
4. Create a grouped frequency table and a stem and leaf plot of data in Table 2 below, where it would
be possible to spend between 0 and 100 minutes eating lunch.
Table 2
Number of Minutes Spent Eating Lunch
20 34 45
16 35 25
81 15 50
75 30 18
100 24 82
50 22 72
50 90 33
The grouped frequency table could have had a number of correct answers aside from that listed below,
though note that there is a zero this time as zero minutes eating lunch was a possibility. There was only
one correct answer for the stem and leaf plot.
Frequency of Lunch
Eating Times in
Minutes
Score
f
81-100
61-80
41-60
21-40
1-20
0
4
2
4
7
4
0
Number of Minutes
Eating Lunch
10
9
8
7
6
5
4
3
2
1
0
0
0
12
25
000
5
0345
0245
568
5. What is the frequency, cumulative frequency, and percentile rank of each of the following from the
data in Table 2? (a) 34 (b) 90 (c) 72 (d) 50
In order to complete this question, you needed to create a simple frequency distribution or use the stem
and leaf plot.
(a) 34
(b) 90
(c) 72
(d) 50
f
cf
%rank
1
1
1
3
10
20
16
15
(10/21)*100 = 48%
(20/21)*100 = 95%
(16/21)*100 = 76%
(15/21)*100 = 71%
3
6. What four things might any measure of central tendency be used for?
Measures of central tendency are used to find the most typical score, to indicate relative location, to find
out where most of the scores cluster, & to predict new scores.
7. Find the mean, median, and mode for the data in Table 2 above.
Mean = 46.05, Median = 35, Mode = 50. I used a stem and leaf plot to examine the mode and the median.
8. What is the relationship between sufficiency & resistance? Use the three measures of central
tendency as examples in your answer.
There is normally an inverse relationship between sufficiency and resistance. This is because as you take
account of more scores, the more likely it is that you will take account of an outlier. The mean is very
sufficient but not very resistant, the mode is usually the opposite, and the median is somewhat sufficient
and resistant.
9. Explain what a deviation score is. What would be the deviation score for the score 72 from the data
in Table two?
A deviation score is the distance from one score to the mean, (x-mean). The deviation score for the raw
score 72 would be 25.95. Deviation scores tell about relative location, that is, how one score is different
from, or “deviates”, from the mean. In order to get 25.95 above, I did the following: 72 – 46.05 = 25.95.
10. In general, why is the mean the best predictor of future scores for a distribution?
Because the sum of the deviation scores around the mean = 0. That is, if you calculate all the deviation
scores for a group of numbers, and add them all up, the answer will = 0. What this means in more
practical terms is that, since deviation scores can be thought of as errors in prediction, the overall error in
predicting a new score (remember, this is one purpose of a measure of central tendency) will be zero. So,
even though you may be wildly off in individual predictions, “overall” your average error in prediction
will be zero.
11. Define the concept of variation. State what you would be “getting at” if you presented somebody
with a measure of variability for scores on a test of depression level taken by 100 people.
Variation refers to how scores are different from one another. With N=100 on a depression test, you
would be stating how 100 people’s levels of depression differed.
12. What are the formulas, with correct symbols, for all the following:
 Population variance
 Population standard deviation
 Sample variance
 Sample standard deviation
 Estimated population variance
 Estimated population standard deviation
4
Population variance
2 
 x   2
N
Population standard deviation
 
 x   2
N
Sample variance
S2 
 x  x 2
N
Sample standard deviation
S
 x  x 2
N
s2 
 x  x 2
N 1
Est. pop. standard deviation
s
 x  x 2
N 1
Est. pop. variance
13. Why is the range not necessarily a good estimate of the typical spread of scores in a distribution?
The range only takes into account the two most extreme scores. The range is a rare case of a statistic with
neither sufficiency nor resistance.
14. Calculate the ranges, population standard deviations, and estimated population standard
deviations for the following two distributions: A: 25 15 20 30 10 B: 120 90 95 92 110
(a) Range for A = 20,  for A = 7.07, s for A 7.91

250
= 7.07
5
s
250
= 7.91
4
(b) Range for B = 30,  for B = 11.66, s for B = 13.03

679.2
= 11.66
5
s
679.2
= 13.03
4
15. What are the strengths and weaknesses of the three measures of central tendency and the standard
deviation?
Some Advantages
 Mean – Measure of choice for interval/ratio data since the sum of the deviations = 0 so it is also the
best predictor of new scores over the long run, high sufficiency,
 Median – Can be used with ordinal, interval, and ratio data, usually medium sufficiency and
resistance
 Mode – Can use with any scale of measurement, usually high in resistance
 Standard Deviation – Measure of choice for interval/ratio data, high sufficiency
Some Disadvantages
 Mean – Can only use with continuous interval/ratio scores, low resistance, may not be an actual
score
 Median – May not be an actual score, usually medium sufficiency and resistance (note that this was
also a strength)
 Mode – May be misleading if most people tend to perform a certain way, low sufficiency, may not
be only one mode (or any) and/or may not be an actual score
 Standard Deviation – Can only use with continuous interval/ratio scores, low resistance, may not
be an actual score