Download Psy301 - Lecture 1 - Outline

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Research Methods & Design in
Psychology
Lecture 3
Descriptives &
Graphing
Lecturer: James Neill
Overview
• Univariate descriptives &
graphs
• Non-parametric vs. parametric
• Non-normal distributions
• Properties of normal
distributions
• Graphing relations b/w 2 and 3
variables
Empirical Approach to Research
A positivistic approach ASSUMES:
• the world is made up of bits of data which can
be ‘measured’, ‘recorded’, & ‘analysed’
• Interpretation of data can lead to valid insights
about how people think, feel and behave
What do we want to Describe?
Distributional properties of
variables:
• Central tendency(ies)
• Shape
• Spread / Dispersion
Basic Univariate Descriptive
Statistics
Central tendency
• Mode
• Median
• Mean
Shape
• Skewness
• Kurtosis
Spread
• Interquartile Range
• Range
• Standard Deviation
• Variance
Basic Univariate Graphs
•
•
•
•
Bar Graph – Pie Chart
Stem & Leaf Plot
Boxplot
Histogram
Measures of Central Tendency
• Statistics to represent the ‘centre’ of a
distribution
– Mode (most frequent)
– Median (50th percentile)
– Mean (average)
• Choice of measure dependent on
– Type of data
– Shape of distribution (esp. skewness)
Measures of Central Tendency
Mode
Median
Mean
Nominal
X
Ordinal
X
X
Interval
X
X
X
Ratio
X?
X
X
Measures of Dispersion
• Measures of deviation from the
central tendency
• Non-parametric / non-normal:
range, percentiles, min, max
• Parametric:
SD & properties of the normal
distribution
Measures of Dispersion
Range,
Min/Max
Percentile
s
SD
Nominal
Ordinal
X
Interval
X
X
X?
Ratio
X
X
X
Describing Nominal Data
• Frequencies
– Most frequent?
– Least frequent?
– Percentages?
• Bar graphs
– Examine comparative heights of bars
– shape is arbitrary
• Consider whether to use freqs or
%s
Frequencies
• Number of individuals obtaining
each score on a variable
• Frequency tables
• graphically (bar chart, pie chart)
• Can also present as %
Frequency table for sex
S
E
X
C
u
m
u
la
t
iv
e
F
r
e
q
u
e
n
c
yP
e
r
c
e
n
tV
a
lid
P
e
r
c
e
n
t P
e
r
c
e
n
t
V
a
lidf
e
m
a
le
1
4
7
0
.
0
7
0
.
0
7
0
.
0
m
a
le
6
3
0
.
0
3
0
.
0
1
0
0
.
0
T
o
t
a
l
2
0 1
0
0
.
0
1
0
0
.
0
Bar chart for frequency by sex
SEX
16
14
12
10
8
6
Frequency
4
2
0
female
SEX
male
Pie chart for frequency by sex
SEX
male
female
Bar chart:
Do you believe in God?
60
50
Count
40
30
20
10
0
No
Sort of
Do you believe in God?
Yes
Bar chart for cost by state
Bar chart vs. Radar Chart
Bar Chart of Sorted Factor Effect Sizes Time 1 to 2
0.45
0.40
Effect size
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
Time
Management
Social
Competence
Achievement
Motivation
Intellectual
Flexibility
Task
Leadership
Factors
Emotional
Control
Active
Initiative
Self
Confidence
Bar chart vs. Radar Chart
Radar Chart of Factor Effect Sizes Time 1 to 2
Time Management
Social Competence
0.60
0.40
Self Confidence
0.20
Achievement Motivation
0.00
Active Initiative
Intellectual Flexibility
Emotional Control
Task Leadership
Mode
• Most common score - highest point in
a distribution
• Suitable for all types of data including
nominal (may not be useful for ratio)
• Before using, check frequencies and
bar graph to see whether it is an
accurate and useful statistic.
Describing Ordinal Data
• Conveys order but not distance
(e.g., ranks)
• Descriptives as for nominal
(i.e., frequencies, mode)
• Also maybe median – if accurate/useful
• Maybe IQR, min. & max.
• Bar graphs, pie charts, & stem-&-leaf
plots
Stem & Leaf Plot
• Useful for ordinal, interval and ratio data
• Alternative to histogram
Box & whisker
• Useful for
interval and
ratio data
• Represents
min. max,
median and
quartiles
Describing Interval Data
• Conveys order and distance, but no true
zero (0 pt is arbitrary).
• Interval data is discrete, but is often
treated as ratio/continuous (especially for
> 5 intervals)
• Distribution (shape)
• Central tendency (mode, median)
• Dispersion (min, max, range)
• Can also use M & SD if treating as
continuous
Describing Ratio Data
• Numbers convey order and distance,
true zero point - can talk meaningfully
about ratios.
• Continuous
• Distribution (shape – skewness,
kurtosis)
• Central tendency (median, mean)
• Dispersion (min, max, range, SD)
Univariate data plot for a ratio
variable
Mean
<-Kurt->
The Four Moments of a Normal
Distribution
<-SD->
<-Skew
Skew->
The Four Moments of a Normal
Distribution
Four mathematical qualities (parameters)
allow one to describe a continuous
distribution which as least roughly
follows a bell curve shape:
•
•
•
•
1st = mean (central tendency)
2nd = SD (dispersion)
3rd = skewness (lean / tail)
4th = kurtosis (peakedness /
flattness)
Mean (1st moment )
• Average score
• Mean = X / N
• Use for ratio data or interval (if
treating it as continuous).
• Influenced by extreme scores
(outliers)
Standard Deviation (2nd moment )
• SD = square root of Variance
=  (X - X)2
N–1
• Standard Error (SE) = SD / square root
of N
Skewness (3rd moment )
•
•
•
•
•
•
Lean of distribution
+ve = tail to right
-ve = tail to left
Can be caused by an outlier
Can be caused by ceiling or floor effects
Can be accurate
(e.g., the number of cars owned per
person)
Skewness (3rd moment )
• Negative
• Positive
Ceiling Effect
Floor Effect
Kurtosis (4th moment )
•
•
•
•
Flatness or peakedness of distribution
+ve = peaked
-ve = flattened
Be aware that by altering the X and Y
axis, any distribution can be made to
look more peaked or more flat – so add
a normal curve to the histogram to help
judge kurtosis
Kurtosis (4th moment )
Red = Positive
(leptokurtic)
Blue = negative
(platykurtic)
Key Areas under the Curve for
Normal Distributions
• For normal distributions,
approx.
+/- 1 SD = 68%
+/- 2 SD ~ 95%
+/- 3 SD ~ 99.9%
Areas under the normal curve
Types of Non-normal Distribution
•
•
•
•
•
•
Bi-modal
Multi-modal
Positively skewed
Negatively skewed
Flat (platykurtic)
Peaked (leptokurtic)
Non-normal distributions
Non-normal distributions
Rules of Thumb in Judging
Severity of Skewness & Kurtosis
• View histogram with normal
curve
• Deal with outliers
• Skewness / kurtosis <-1 or >1
• Skewness / kurtosis
significance tests
Histogram of weight
Histogram
8
6
4
Frequency
2
Std. Dev = 17.10
Mean = 69.6
N= 20.00
0
40.0
50.0
WEIGHT
60.0
70.0
80.0
90.0
100.0 110.0
Histogram of daily calorie intake
Histogram of fertility
1
60
50
Frequency
40
30
20
10
Mean =81.21
Std. Dev. =18.228
N =188
0
0
20
40
60
80
100
120
140
2
Count
60
40
20
0
Very feminine
Fairly feminine
Androgynous
Fairly masculine
Femininity-Masculinity
Very masculine
3
Gender: male
50
40
Count
30
20
10
0
Fairly feminine
Androgynous
Fairly masculine
Very masculine
4
Gender: female
Count
60
40
20
0
Very feminine
Fairly feminine
Androgynous
Fairly masculine
Very masculine
60
5
50
Frequency
40
30
20
10
0
0
50
100
150
Exercise (mins/day)
200
250
Skewed Distributions
& the Mode, Median & Mean
• +vely skewed
mode < median < mean
• Symmetrical (normal)
mean = median = mode
• -vely skewed
mean < median < mode
Effects of skew on measures
of central tendency
More on
Graphing
(Visualising
Data)
Edward Tufte
Graphs:
 Reveal data
 Communicate complex ideas
with clarity, precision, and
efficiency
Tufte's Guidelines 1
•
•
•
•
•
Show the data
Substance rather than method
Avoid distortion
Present many numbers in a small space
Make large data sets coherent
Tufte's Guidelines 2
• Encourage eye to make comparisons
• Reveal data at several levels
• Purpose: Description, exploration,
tabulation, decoration
• Closely integrated with statistical and
verbal descriptions
Tufte’s Graphical Integrity 1
• Some lapses intentional, some not
• Lie Factor = size of effect in graph
size of effect in data
• Misleading uses of area
• Misleading uses of perspective
• Leaving out important context
• Lack of taste and aesthetics
Tufte's Graphical Integrity 2
• Trade-off between amount of
information, simplicity, and accuracy
• “It is often hard to judge what users will
find intuitive and how [a visualization]
will support a particular task” (Tweedie
et al)
Chart scale
Chart scale
Chart scale
Cleveland’s Hierarchy
Volume
Fa
Ethso
Mo iop
zam ia
biq
ue
Ke
ny
Mo a
Ba rocco
ng
lad
esh
Ind
Pa ia
kis
tan
Eg
yp
t
Bu
rki
na
$millioninfoodaid(1988)
Food Aid Received by Developing Countries
350
300
250
200
150
100
50
0
Percentage of Doctors Devoted Solely to Family
Practice in California 1964-1990
Distortive Variations in Scale
Distortive Variations in Scale
Restricted Scales
Restricted Scales
Example Graphs Depicting
the Relationship between
Two Variables (Bivariate)
People Histogram
Separate Graphs
Example Graphs Depicting
the Relationship between
Three Variables
(Multivariate)
Clustered bar chart
19 vs. 20 century causes of
death
Demographic distribution of age
Where partners first met
Line graph
Line graph
Causes of Mortality
Bivariate Normality
Exampes of More Complex
Graphs
Sea Temperature
Sea Temperature
Inferential Statistical
Analaysis Decision Making
Tree
Links
• Presenting Data – Statistics Glossary v1.1 http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html
• A Periodic Table of Visualisation Methods - http://www.visualliteracy.org/periodic_table/periodic_table.html
• Gallery of Data Visualization
• Univariate Data Analysis – The Best & Worst of Statistical
Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm
• Pitfalls of Data Analysis –
http://www.vims.edu/~david/pitfalls/pitfalls.htm
• Statistics for the Life Sciences –
http://www.math.sfu.ca/~cschwarz/Stat301/Handouts/Handouts.html