Download Student Handout for Review of Descriptive

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics for Everyone, Student Handout
Part 1: Statistics as a Tool in Scientific Research: Review of Summarizing
and Graphically Representing Data and Introduction to SPSS (v18 PASW)
To choose the appropriate statistical test, you need to consider what type of research question you are
asking and what type of data you are measuring
A. Types of Research Questions
•
•
•
Descriptive (What does X look like?)
Correlational (Is there an association between X and Y? As X increases, what does Y do?)
Experimental (Do changes in X cause changes in Y?)
B. Types of Data: Measurement Scales
Categorical:
•
•
Nominal (name/label): numbers are arbitrary; e.g., 1= male, 2 = female; blood type (A, B,
AB, O)
Ordinal (rank order): numbers have order (i.e., more or less) but you do not know how much
more or less; 1st place runner was faster but you do not know how much faster than 2nd place
runner; e.g., Stage 1, Stage 2, Stage 3 melanoma
Numerical:
•
•
Interval (equal intervals): numbers have order and equal intervals so you know how much
more or less; A temperature of 102 is 2 points higher than one of 100; e.g., IQ
Ratio (equal intervals and absolute zero): same as interval but because there is an absolute
zero you can talk meaningfully about twice as much and half as much; Weighing 200 pounds
is twice as heavy as 100 pounds; e.g., # of white blood cells; miles per gallon
Entering Data into SPSS:
You will need to specify the following for each variable:
 Name of the variable
 Type of data: Numerical or String
 Type of measure: Nominal, Ordinal, Scale (Interval or Ratio)
 Labels or units
C. Two Major Types of Statistical Procedures

Descriptive: Organize and summarize data

Inferential: Draw inferences about the relations between variables; Use samples to generalize
to population
Materials developed by L. McSweeney and L. Henkel for the Quantitative Reasoning Pathway of the Core Integration Initiative
D. Ways to Summarize and Describe Your Data
The first step is ALWAYS getting to know your data: Summarize and visualize your data!
It is a big mistake to just throw numbers into the computer and look at the output of a statistical test
without any idea what those numbers are trying to tell you or without checking if the assumptions for the
test are met.
Think about what type of data you have (categorical or numerical) so you can determine the best way to
summarize and represent your data
Assuming that numerical data is collected from different treatments/groups, one would get the following
summaries for each treatment/group:
 Numerical Summaries: Measures of Central Tendency, Measures of Variability, Representing
numerical summaries in tables

Graphical Summaries: Bar graphs of means/Mean Plots, Histograms, Boxplots
E. Choosing the Appropriate Type of Graph
Type of variable
One categorical variable
Two categorical variables*
One numerical variable
One numerical variable and
one categorical variable*
Example
Political party
Political party vs. Gender
Height
Height vs. Gender
Two paired numerical
variables*
One numerical variable over
time
Weight vs. Exercise per week
Type of graph
Bar Chart or Pie Graph
Side-by-side Bar Chart
Histogram, Boxplot
Side-by-side Histograms,
Boxplots, Bar graphs of
Means
Scatterplot
Number of Cells vs. Minutes
Times Series Plot
*Note: With 2 variables, one variable may be treated as the dependent variable and one variable may be
treated as the independent variable.
See separate handouts on how to create bar charts, pie graphs, histograms, time series plots, and
scatterplots using SPSS or Excel.
2
F. Other Issues to Consider in Summarizing and Graphically Representing Data
Shapes of Distribution
Normal = most scores in center, tapering off symmetrically in both tails (bell-shaped curve)
Kurtosis is a measure of the “peakedness” or “flatness” of a distribution; SPSS computes value
 A kurtosis value near 0 indicates a distribution shape close to normal.
 A negative kurtosis indicates a shape flatter than normal.
 A positive value indicates more peaked than normal.
 An extreme kurtosis (e.g., |k| > 5.0) indicates a distribution where more of the values are in the
tails of the distribution than around the mean.
Skewness measures the extent to which a distribution deviates from symmetry around the mean; SPSS
computes value
 A value of 0 represents a symmetric or evenly balanced distribution (i.e., a normal distribution).
 A positive/right skewness indicates a greater number of smaller values (peak is to the left, tail is
longer on high end/right).
 A negative/negative skewness indicates a greater number of larger values (peak is to the right, tail
is longer on the low end/left).
Bimodal distribution: two peaks
Rectangular/Uniform: all scores (high and low) occur with equal frequency
Potential Outlier: An observation that is well above or below the overall bulk of the data
3
G. Assessing Normality
It is important to determine shape of distribution (is it normal [bell shaped] or skewed) so you can choose
appropriate measures of descriptive statistics (i.e., central tendency and variability) and choose
appropriate inference methods (i.e., hypothesis tests)
Method 1: Make a histogram of numerical data and compare with normal curve, Check if the histogram
is unimodal and symmetric, bell-shaped
Method 2: Kurtosis and skewness values are between +1 is considered excellent, but a value between +2
is acceptable in many analyses in the life sciences. SPSS will calculate both kurtosis and skewness.
Method 3: Conduct a hypothesis test for normality
 Shapiro-Wilk (n<2000)
 Kolmogorov-Smirnov (n>2000)
Ho: Data come from a population with a normal distribution
Ha: Data do not come from a population with a normal distribution
So if p-value < .05, conclude the distribution is not normal
More details are given in the “Handout for Getting Descriptive Statistics and Graphing in SPSS”.
H. Measures of Central Tendency
Central tendency = Typical or representative value of a group of scores
Measure
Definition
Mean
Average score
M=X/N
Median
Middle value; score at 50th
percentile; half the scores
are at or above, half are at
or below
Most frequently occurring
data value
Mode
Takes Every Value
Into Account?
Yes
No
No
When to Use
Numerical data
BUT… Can be heavily
influenced by outliers so can
give inaccurate view if
distribution is not
(approximately) symmetric
Ordinal data or for
numerical data that are
skewed
Nominal data
4
I. Quartiles
Quartiles= divide a data set into 4 equal parts
First Quartile = Q1 = 25th percentile
Second Quartile = Q2 = Median = 50th percentile
Third Quartile = Q3 = 75th percentile
Fourth Quartile = Q4 = 100th percentile
Interquartile range = IQR = Score at 75th percentile – Score at 25th percentile; The range of the middle
half of the scores
Relation between the Quartiles and the Boxplot
• Box is formed by Q1, Median and Q3
• Whiskers extend to the smallest and largest observations that are not outliers
• Extreme outliers lie outside the interval
Q1 – 3*IQR and Q3+ 3*IQR (denoted by *)
• Mild outliers lie outside of the interval
Q1 – 1.5*IQR and Q3 + 1.5*IQR (denoted by o)
J. Measures of Dispersion
Variability = extent to which scores in a distribution differ from each other; are spread out
Measure
Standard
Deviation
Takes Every Value
Into Account?
Highest – lowest score
No, only based on two
most extreme values
68% of the data fall within 1 Yes, but describes
majority
SD of the mean (M  SD)
Interquartile
Range
Middle 50% of the data fall
within the IQR
Range
Definition
When to Use
To give crude measure of
spread
For numerical data that are
approximately symmetric
or normal
No, but describes most For ordinal data or for
when numerical data are
skewed
K. Presenting Measures of Central Tendency and Variability in Text
Sentences should always be grammatical and sensible. Do not just list a bunch of numbers. Use the
statistical information to supplement what you are saying.
For example:
The number of fruit flies observed each day ranged from 0 to 57 (M = 25.32, SD = 5.08).
Plants exposed to moderate amounts of sunlight were taller (M = 6.75 cm, SD = 1.32) than plants
exposed to minimal sunlight (M = 3.45 cm, SD = 0.95).
5
L. Presenting Measures of Central Tendency and Variability in Tables
Be sure to include the unit of measurement; Might want to include column for sample size (N)
Symmetric Data
Range
M
SD
Number of Fruit Flies
0 to 57
25.32
5.08
118 to 208
160.31
10.97
0 to 8
2.1
.8
Skewed Data
Range
Median
IQR
Number of Fruit Flies
0 to 57
27
9
118 to 208
155.6
12
0 to 8
1.5
1
Weight (lbs)
Response time to
Patient’s Call (mins)
Weight (lbs)
Response time to
patient’s call (mins)
M. What’s the Difference Between SD and SE?
Sometimes instead of standard deviation (SD), people report the standard error of the mean (SE or SEM)
in text, tables, and figures. Use the one that makes sense for your research question. Are you describing
one data set only (SD) or generalizing to the population (SE)?
Standard deviation (SD) = “Average” deviation of individual scores around mean of scores;
describes the spread of one sample
Standard error (SE = SD/N) = How much on average sample means would vary if you sampled
more than once from the same population (we do not expect the particular mean we got to be an
exact reflection of the population mean); Used to describe the spread of all possible sample
means and used to make inference about the population mean
N. Other Issues to Be Aware Of

Dangers of low N: With a small sample size, data may not be representative of the population at
large and you should take care in drawing conclusions

Dangers of Outliers: Be sure you look for outliers (extreme values) in your data and justify
appropriate strategies for dealing with them (e.g., eliminating data because the researcher assumes
it is a mistake instead of part of the natural variability in the population = subjective science)
6