Download Descriptive Statistics-Understanding Central Tendency, Shape and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Topics: Descriptive Statistics
• A road map
• Examining data through frequency
distributions
• Measures of central tendency
• Measures of variability
• The normal curve
• Standard scores and the standard normal
distribution
The Role of Description
• Description as a purpose of research
• Choosing the right statistical procedures
Raw Data: Overachievement Study
Frequency Distributions
• A method of summarizing and highlighting
aspects of the data in a data matrix, showing
the frequency with which each value occurs.
• Numerical Representations: a tabular
arrangement of scores
• Graphical Representations: a pictorial
arrangement of scores
Numerical Frequency
Distributions
•
•
•
•
Ungrouped Frequency Distributions
Grouped Frequency Distributions
Relative Frequency Distributions
Cumulative Frequency Distributions
Tabular Frequency
Distributions
Single-Variable (“Univariate”)
Frequency Distribution: Major
MAJOR
Value Label
PHYSICS
CHEMISTRY
BIOLOGY
ENGINEERING
ANTHROPOLOGY
SOCIOLOGY
ENGLISH
DESIGN
Total
Valid cases
40
Value
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
Frequency
5
4
7
5
5
4
7
3
------40
Missing cases
0
Valid
Percent
12.5
10.0
17.5
12.5
12.5
10.0
17.5
7.5
------100.0
Cum
Percent
12.5
10.0
17.5
12.5
12.5
10.0
17.5
7.5
------100.0
Percent
12.5
22.5
40.0
52.5
65.0
75.0
92.5
100.0
Frequency Distribution: Major Group
MAJORGRP
Value Label
SCIENCE & ENGINEERIN
SOCIAL SCIENCE
HUMANITIES
Total
Value
1.00
2.00
3.00
Frequency
21
52.5
9
22.5
10
25.0
------------40
100.0
Valid
Percent
52.5
22.5
25.0
------100.0
Cum
Percent
52.5
75.0
100.0
Frequency Distribution: SAT
SAT
Value
1000.00
1025.00
1050.00
1060.00
1075.00
1080.00
1085.00
1090.00
1100.00
1120.00
1125.00
1130.00
1150.00
1160.00
1175.00
1185.00
1200.00
Total
Valid cases
40
Frequency
2
1
2
1
1
1
1
2
7
2
3
1
5
2
3
1
5
------40
Missing cases
Percent
5.0
2.5
5.0
2.5
2.5
2.5
2.5
5.0
17.5
5.0
7.5
2.5
12.5
5.0
7.5
2.5
12.5
------100.0
Valid
Percent
5.0
2.5
5.0
2.5
2.5
2.5
2.5
5.0
17.5
5.0
7.5
2.5
12.5
5.0
7.5
2.5
12.5
------100.0
0
Cum
5.0
7.5
12.5
15.0
17.5
20.0
22.5
27.5
45.0
50.0
57.5
60.0
72.5
77.5
85.0
87.5
100.0
Grouped Frequency Distribution: SAT
Graphical Frequency
Distributions
•
•
•
•
•
Bar Graphs
Histograms
Stem and Leaf
Frequency Polygons
Pie Chart
Graphical Frequency
Distributions:
Single-Variable (“Univariate”)
Bar Chart: Major
Bar Chart
F
r
e
q
u
e
n
c
y
8
7
6
5
4
3
2
1
0
PHYSICS
BIOLOGY
ANTHROPOLOGY
ENGLISH
CHEMISTRY
ENGINEERING
SOCIOLOGY
DESIGN
MAJOR
Histogram: SAT
(From Grouped Data)
Frequency Polygon Overlay: SAT
(From Grouped Data)
Frequency Polygon: SAT
(From Grouped Data)
Frequency Polygon: SAT Scores
(From Ungrouped Data)
Frequency Polygon: SAT
C 8
o 7
u
n 6
t
5
4
3
2
1
0
1000.00 1050.00 1075.00 1085.00 1100.00 1125.00 1150.00 1175.00 1200.00
1025.00 1060.00 1080.00 1090.00 1120.00 1130.00 1160.00 1185.00
SAT
Cumulative Frequency Polygon: SAT
Scores
C 50
u
m
u 40
l
a
t 30
i
v
e
20
F
r
e 10
q
u
e 0
n 1000.00 1050.00 1075.00 1085.00 1100.00 1125.00 1150.00 1175.00 1200.00
c
1025.00 1060.00 1080.00 1090.00 1120.00 1130.00 1160.00 1185.00
y
SAT
Stem and Leaf: SAT
Stem and Leaf: SAT
SAT Stem-and-Leaf Plot
Frequency
3.00
8.00
13.00
11.00
5.00
Stem width:
Each leaf:
Stem &
10
10
11
11
12
.
.
.
.
.
Leaf
002
55678899
0000000222223
55555667778
00000
100.00
1 case(s)
Graphical Frequency
Distributions
Two-Variable (“Joint” or “Bivariate”)
Relative Frequency Polygon: GPA
Comparison of Majors
P 40
e
r
c
e 30
n
t
20
MAJORGRP
10
SCIENCE & ENGINEERIN
SOCIAL SCIENCE
0
2.00
HUMANITIES
2.50
2.30
GPA
2.80
2.70
3.00
2.90
3.20
3.10
3.40
3.30
3.60
3.50
Relative Frequency Polygon: GPA
Comparison of Gender
P 30
e
r
c
e
n
t
20
10
SEX
MALE
0
2.00
FEMALE
2.30
GPA
2.50
2.70
2.80
2.90
3.00
3.10
3.20
3.30
3.40
3.50
3.60
What Can Be Seen in Frequency
Distributions
• Shape
• Central Tendency
• Variability
Shapes of Frequency Polygons
Shapes of Distributions
Bel l-S hap e d
Pro t oty p e :
No rma l Dist rib u t i o n
SYMMETRIC
Very Pea ked i n et hCen ter
C omp are d t o
No rma l Dist rib u t i o n
LEPT O KURT IC
Hu m p i Di
n str i b tui o n
at Hi g h co
S re End
Tail at L ow Sc ore End
Hu m p i Di
n str i b tui o n
at Lo w Sc ore E n d
Tail at Hig h S
co re End
NE GATIVELY SKE W ED
POSI T IVELY SKEW ED
Peak J u sLi
t ke
t he
No rma l Dist rib u t i o n
ME SOKURTIC
Fla t in t h e
Ce nter
C omp are d t o
No rma l Dist rib u t i o n
PLATYKURT IC
Descriptive Statistics
• Central Tendency
– Mode
– Median
– Mean
• Variability
– Range
– Standard Deviation
– Variance
Definitions:
Measures of Central Tendency
• Mean:
– “Arithmetic mean”
– “Center of gravity” such that the “weight” of the scores
above the mean exactly balances the “weight” of the
scores below the mean
• Median:
– The number that lies at the midpoint of the distribution
of scores; divides the distribution into two equal halves
• Mode:
– Most frequently occurring score
Mean, Median, Mode:
SAT Scores by Gender
Group
Mode
Median
Mean
Male
1200
1112.50
1112.00
Female
1100
1122.50
1129.50
Total
1100.00
1122.50
1122.75
Mean, Median, Mode:
SAT Scores by Area
Group
Mode
Median
Mean
Humanities
1100
1092.50
1095.00
Social Sciences
1100
1100.00
1108.89
Sciences
1150,1200
1150.00
1138.10
Total
1100
1122.50
1122.75
Relative Position of Mode, Median,
and Mean
Definitions:
Measures of Variability
• Range:
– Difference between highest and lowest score
• Inter-quartile Range:
– The spread of the middle 50% of the scores
– The difference between the top 25% (Upper Quartile-Q3) and the lower
25% (Lower Quartile-Q1)
• Standard Deviation:
– The average dispersion or deviation of scores around the mean (measured
in original score units)
• Variance:
– The average variability of scores (measured in squared units of the
original scores (square of the standard deviation)
Range, Interquartile Range, and Standard
Deviation: SAT Scores by Area
Group
Rang e
IQ Rang e Standa rd
Dev iaiton
Huma nit ies
200
35.00
55.88
So c ial Sc ienc es
95
15.00
28.59
Sc ienc es
200
27.50
57.00
Range, Interquartile Range, and Standard
Deviation: SAT Scores by Gender
Group
Range
IQ Range
Standard
Deviation
Males
200
100
60.92
Females
175
75
46.02
Total
200
70
54.02
Properties of Normal Distribution
• Bell-shaped (unimodal)
• Symmetric about the mean
• Mode, median, and mean are equal (though
rarely occurs)
• Asymptotic (curve never touches the
abscissa)
Normal Curve
Areas Under the Curve
.3413
.3413
.1359
68%
.0214
.0214
95%
.0013
-3s
.1359
.0013
99%
-2s
-1s
X
+1s
+2s
+3s
Definitions: Standard Scores
• Standard Scores: scores expressed as SD
away from the mean (z-scores)
• Obtained by finding how far a score is
above or below the mean and dividing that
difference by the SD
• Changes mean to 0 and SD to 1, but does
not change the shape (called Standard
Normal Distribution)
Uses of Standard Normal
Distribution
• What proportion of scores falls between the mean
and a given raw score
• What proportion of scores falls above or below a
given raw score
• What proportion of scores falls between two raw
scores
• What raw score fall above (or below) a certain
percentage of scores