Download Chapter 2: Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 2: Descriptive Statistics
**This chapter corresponds to chapters 2 (“Means to an End”) and 3 (“Vive la Difference”) of
your book.
What it is: Descriptive statistics are values that describe the characteristics of a sample or
population. This chapter will focus on two types of descriptive statistics. The first type is
measures of central tendency (the mean, median, and mode), which are statistics that describe
the typical value in a sample or population. The second type is measures of variability (the
range, standard deviation, and variance), which are statistics that describe how different the
scores in a sample or population are from each other.
When to use it: You should use descriptive statistics when you wish to describe the average
value and/or the amount of variability in a sample or population.
Questions asked by descriptive statistics: What is the typical value in a set of scores? How
variable are a set of scores?
Examples of research questions that would use descriptive statistics:
o
o
What is the average household income of children diagnosed with Attention Deficit
Hyperactivity Disorder?
Do students at Ivy League universities all have the same high school grades and SAT
scores (e.g., everyone has a 4.0 and a 1600) or is there a large amount of variability in
the students’ grades and SAT scores (such that some students have very high scores
and others very low scores)?
Using SPSS to Calculate Descriptive Statistics (dataset: Chapter 2 Example 1.sav)
Stella has noticed that Aggies are always saying “howdy” to her. She wonders if Aggies are just
more extraverted than your average person. So, Stella gives 20 Aggies a questionnaire called
the Extraversion IQ Instrument that provides persons with an extraversion score. Like a normal
IQ measure, the Extraversion IQ Instrument is scaled such that a score of 100 means you have
an average amount of extraversion. Stella wants to know (a) if Aggies are more extraverted than
normal and (b) if all Aggies are extraverted or if Aggies are variable in extraversion.
A note on drawing inferences from descriptive statistics
Stella wants to know if Aggies are “more extraverted than normal.” To address this question she
is using descriptive statistics such as the mean, median, and mode. For instance, if the mean
Extraversion IQ of the Aggie sample is significantly larger than 100, Stella might infer that
Aggies are more extraverted than normal. The problem with relying solely on descriptive
statistics to draw inferences such as “more extraverted than normal” (or “the mean of group 1 is
larger than the mean of group 2”, or “the amount of variability in this sample is large”) is that the
meaning of “significantly larger” is vague. How much higher than 100 would the Aggie sample
mean have to be for Stella to declare it “significantly larger” than 100 (e.g., is 115 high enough;
how about 101)?
There are actually formal statistics called inferential statistics that can be used to quantify
exactly how large something needs to be to call it “significant”. You will learn all about inferential
statistics later in the semester. For now, you will just use your subjective judgment (based on
your knowledge of descriptive statistics) to determine whether an average is larger than another
average or whether a sample has a large or small amount of variability. However, you should
know that this is, at best, a “quick and dirty” way to do this, and that inferential statistics are
much more appropriate.
Selection of the appropriate statistic(s)
Because Stella is interested in describing Aggies’ average amount of extraversion, the mean,
median, and mode are each appropriate descriptive statistics. Additionally, because Stella is
interested in describing the degree to which Aggies vary in extraversion, the range, standard
deviation, and variance are each appropriate statistics (although you don’t normally report the
variance because it’s difficult to interpret squared units such as amount of extraversion
squared).
Computation of the statistic(s)
We will use SPSS to calculate the descriptive statistics for us. Open the dataset “Chapter 2
Example 1.sav”. Take a moment to familiarize yourself with the data. Note how data for this type
of analysis should be entered.
1) Each participant has one row in the data.
2) One column is used to indicate each participant’s identification number, which is just a
number that is assigned to each participant in the study (this variable is “ID” in the
present example).
3) The second column indicates each participant’s score on the variable for which we want
descriptive statistics (this variable is “exiq” in the present example, meaning the
Extraversion IQ score for each participant).
The data should look something like this in SPSS:
If you switch to variable view, you should see that the two variables have labels indicating that
they represent “Participant ID” and “Extraversion IQ”, respectively. If you did not like those
labels, you could change the labels to whatever you want.
To calculate descriptive statistics in SPSS, click on the “Analyze” drop-down menu, highlight
“Descriptive Statistics”, and then click “Frequencies”, as pictured below.
The following pop-up window will appear:
Note that the two variables are listed in the pop-up window by their labels, with their variable
names in parentheses (e.g. “Extraversion IQ [exiq]”).
Highlight the variable(s) for which you wish to calculate descriptive statistics (“Extraversion IQ
[exiq]” in this example) and then click on the arrow to make the variable(s) appear in the
Variable(s): window, as pictured below.
Now click the “Statistics…” button. The following popup window will appear:
We use this window to tell SPSS exactly which descriptive statistics to calculate. In the present
example, we would like the three main measures of central tendency (mean, median, and
mode) and the three main measures of variability (range, standard deviation, and variance). So,
click on the boxes to put checkmarks next to those six descriptive statistics. When calculating
the range it is useful to know the minimum and maximum values that the range is based on, so
it is a good idea to put checkmarks next to “Minimum” and “Maximum” as well. Your screen
should look like this:
Click Continue to return to the original “Frequencies” popup window. Uncheck “Display
Frequency Tables”. For some purposes, the frequency tables can be useful, but for our present
purposes, the frequency tables would just be extra, unneeded output. Your screen should look
like this:
The Frequencies popup window also provides an easy way to create a type of graph/chart
called a histogram. You will learn more about histograms in the class on “Graphing Data”, so we
won’t go into them in too much detail here. In essence, a histogram is a graph that provides a
snapshot of your distribution of data when you have continuous/quantitative data (as is the case
with our Extraversion IQ data).
To create a histogram, click on the “Charts…” button. The following popup window will appear.
Click on the circle next to “Histograms:”. Your window should look like this:
Click “Continue” to return to the original popup window. Finally, click “OK” and navigate to the
Output window to find your results. The output will generate a table and a histogram that look
like this:
Statistics
Extraversion IQ
N
Valid
Missing
20
0
Mean
101.0500
Median
102.5000
Mode
Std. Deviation
Variance
This tells you there were 20 participants and none of them were missing Extraversion IQ scores. 104.00
15.32619
234.892
Range
56.00
Minimum
74.00
Maximum
130.00
The values for each of the descriptive statistics are listed here. This is the histogram. Y‐axis: Frequency of scores X‐axis: Continuous variable Interpreting the Output
Mean – The mean (sum of Extraversion IQs divided by sample size) is 101.05. This suggests
that this sample of Aggies is just about average in their level of extraversion. Remember, a
better way to draw this inference that Aggies are “just about average in their level of
extraversion” would be to use inferential statistics, but it can be handy to make a “quick and
dirty” subjective determination simply based on the descriptive statistics as well.
Median – The median (the middle score in the distribution of Extraversion IQs) is 102.50. This
value is very similar to the mean and means that half of the Aggies had scores less than 102.5
and half had scores greater than 102.5. As it does not differ very much from 100, the median
also suggests that this sample of Aggies is essentially average in extraversion.
Mode – The mode (the most frequent Extraversion IQ) is 104. This means that a score of 104
occurred most frequently in the sample of Aggies. The mode is a little higher than the mean or
median, but all three measures seem to converge on the idea that this sample is about average
(maybe just slightly above average) in extraversion.
Standard Deviation – This is essentially the average deviation of the individual Extraversion IQs
from the mean Extraversion IQ of 101.05. The standard deviation of 15.33 is relatively large (a
subjective judgment) and suggests there is a fair amount of variability in extraversion in this
sample. Some Aggies are quite introverted, some are quite extraverted, and others are about
average.
Variance – This is the standard deviation squared and essentially carries the same information
as the standard deviation (i.e., it tells us there is a fair amount of variability in this sample).
Range, Minimum, and Maximum – The range tells us that there is a difference of 56
Extraversion IQ points between the highest and lowest Extraversion IQs in this sample. The
lowest Extraversion IQ was 74 (which is almost two standard deviations below the mean; this
person is pretty introverted) and the highest Extraversion IQ was 130 (almost two standard
deviations above the mean; this person is pretty extraverted). The wide range substantiates the
idea that Aggies vary a great deal in extraversion.
Histogram – A histogram is a type of bar graph with the continuous variable (Extraversion IQ) on
the X-axis and the frequency of scores (in this case, the number of times Extraversion IQ scores
within each of the ranges on the X-axis occur) on the Y-axis. So, you can see that the first bar
on the left shows that there were two Extraversion IQs that were slightly below 80, the next bar
shows there were two Extraversion IQs that were slightly above 80, the next bar shows there
was one Extraversion IQ slightly below 90, and so on. Although it can be interesting to look at
each of the individual bars, probably the most useful aspect of the histogram is its ability to
capture the entire distribution of scores in pictorial form. By looking at the histogram as a whole
you can see that (a) the Extraversion IQs range from about the upper 70s to around 130, (b) the
majority of scores are in the middle of the distribution clustered around 100 or so, and (c) there
are fewer scores at the extreme ends of the distribution. A histogram is one of the most effective
ways to determine the shape of your distribution of scores, and can act as a useful supplement
to the mean, median, and mode when describing your distribution.
Interpretation of the Findings
Based on our descriptive statistic results, it looks like Aggies aren’t really much more
extraverted than your average person. So, Stella’s idea that Aggies greet people with “howdy”
so often because they are just so extraverted might not be true; some Aggies are extraverted,
some are introverted, and some are average.
Now we report our results. When reporting variability results in a journal article, researchers
typically report the standard deviation and range, but not the variance (because the variance is
difficult to interpret due to it being based on squared units). Researchers also typically only
report one measure of central tendency instead of exhaustively reporting all six. The reported
measure of central tendency is the measure that is most appropriate, given the data. See page
29 of your Salkind text for information on when each measure of central tendency is most
appropriate. In the present circumstance, the data are quantitative (as opposed to
qualitative/categorical) and there are no obvious and influential outliers. So, the most precise of
the three measures of central tendency, the mean, is most appropriate.
Here’s an example of how these results might be reported in a journal article:
The sample of 20 Aggies was average to slightly-above-average in Extraversion IQ
scores (M = 101.05). There was also substantial variability between Aggies in
extraversion scores (SD = 15.33), with Extraversion IQs ranging from 73 to 130 (higher
Extraversion IQs mean more extraversion).
For someone unfamiliar with statistics, you might say: “Aggies are about average in extraversion
(maybe slightly above average). Some Aggies are very introverted, some are very extraverted,
and all points in between.”
Practice Problem #1 for SPSS (answer in Appendix)
Brown University students are tired of hearing that they are just a bunch of rich kids whose
parents bought their way into an Ivy League university. To make their point, they draw a random
sample of 30 Brown students and record these students’ combined parental income. Below are
combined parental incomes of the 30 Brown students. Use SPSS and descriptive statistics to
answer the questions below.
30,000
72,000
61,000
44,000
312,000
59,000
58,000
26,000
225,000
42,000
1,200,000
27,000
77,000
79,000
40,000
59,000
379,000
52,000
55,000
91,000
70,000
145,000
100,000
35,000
63,000
925,000
45,000
60,000
48,000
53,000
A. Calculate the mean, median, and mode of the parental incomes.
B. Do the mean, median, and mode differ from each other? Are the differences large or small? If
the measures of central tendency do differ from each, why do you think this is? Is one of the
three measures more appropriate in this instance, and why?
C. Calculate the range, standard deviation, and variance of the parental incomes.
D. What do you conclude about whether all Brown students are “just a bunch of rich kids?”
Practice Problem #2 for Hand Calculation (answer in Appendix)
The makers of the hot new weight loss drug, Xylophone, ran a weight loss study to test how well
their drug works. Below is the number of pounds lost (negative numbers) or gained (positive
numbers) by the 10 participants during the 4-week weight loss study. Xylophone has been
featuring Participant #2 in their commercials, pointing toward Participant #2’s 15 pounds lost as
proof that Xylophone is an effective weight loss drug. Use hand calculations and descriptive
statistics to answer the questions below. A table for calculating the standard deviation is
provided.
Participant ID Weight Lost/Gained (X)
1
2
3
4
5
6
7
8
9
10
(X – x )
(X – x )2
-3
-15
0
5
-2
7
3
0
0
5
Sum
SD =
Σ( X − X ) 2
n −1
A. Calculate the mean, median, and mode of weight lost/gained.
B. Do the mean, median, and mode differ from each other? Why do you think they do or do not
differ? Is one of the three measures more appropriate in this instance, and why?
C. Based on the measures of central tendency, do you agree that Xylophone is an effective
weight loss drug?
D. Calculate the range, standard deviation, and variance of weight lost/gained.
Practice Problem #3 for Hand Calculation and SPSS (answer in Appendix)
The mean score on the math subtest of the SAT is 500 and the standard deviation is 100. Gabe
believes that persons aren’t reaching their potential when taking the SAT because they are too
tense, so he suggests that people get back massages right before they take the SAT. He gives
ten participants back massages right before they take the math subtest of the SAT. Their scores
are below. Calculate the six major descriptive statistics. Based on those descriptive statistics do
you believe that the back massages helped the participants score better?
Participant ID SAT Score
1
406
2
582
3
736
4
565
5
378
6
466
7
521
8
435
9
495
10
435