Download Measures of Central Tendency and Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Psychometrics wikipedia , lookup

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Generalized linear model wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Deviations from Normality
Skewness and Kurtosis
Why do we care if the distribution is not normal?
• It helps you understand how a characteristic
exhibits itself in your sample or in the population.
• It impacts what descriptive statistics you might
use.
• It impacts the inferential statistics you might use.
Skewness
The majority of scores do not fall in the middle of
the distribution.
The distribution is asymmetrical
You label the kind of skew according to the longer
tail of the distribution.
Normal versus Skewed Distributions
Frequency
en.wikipedia.org/wiki/Image:Standard_deviation_diagram.png
Long tail is on
the positive end
Positive skew
Long tail is on
the negative end
Negative skew
Normal Distribution
Mean
Median
Mode
What Positive Skew Means
If this represented the results of a quiz, the
majority of the participants scored very low—
almost no one scored in the highest range.
Mode
5
Median
This test must have
been very
Mean
difficult?
1
2
3
4
easy or
0
10
20
30
40
50
60
70
80
90
100
What Negative Skew Means
5
Mode
If this represented the results of a quiz, the
majority of the participants scored high—
almost no one scored in the lowest range.
Median
4
Mean
This test must have been very
difficult?
1
2
3
easy or
0
10
20
30
40
50
60
70
80
90
100
•
Normal vs. Skewed Distributions
Income Distributions
http://www.city-data.com/city/Chicago-Illinois.html
Kurtosis
leptokurtic
platykurtic
Kurtosis
Leaping
leptokurtic
distribution
0
10
20
30
40
50
60
70
80
90
100
Platykurtic
Platykurtic
like a
platypus
distribution
0
10
20
30
40
50
60
70
80
90
100
leptokurtic
Normal
(mesokurtic)
platykurtic
Sprinthall - Quick Kurtosis Rule
When you have a distribution and How you determine this:
know its standard deviation and
Range = Standard Value
range, you can estimate its
6
kurtosis.
Compare actual sd to SV
Fact: For a normal distribution,
the standard deviation is about
If sd > SV, platykurtic
1/6 of the range.
™If the standard deviation is
more than 1/6 of the range, then a If sd < SV, leptokurtic
distribution is platykurtic.
™If the standard deviation is less
than 1/6 of the range, then a
distribution is leptokurtic.
Dr. Bellini’s MCC Research
47/6 = SV
SV = 7.83
8.98 > 7.83
Platykurtic
Dr. Bellini’s MCC Research
19/6 = SV
SV = 3.17
3.52 > 3.17
Platykurtic
Skew
100
125
75
Count
Count
100
75
50
50
25
25
0
$20,000
$40,000
$60,000
Beginning Salary
$25,000
$50,000
$75,000
$100,000
Current Salary
$125,000
C:\Program Files\SPSS\University of Florida graduate salaries.sav
Statistics
125
Starting Salary
N
Mean
Median
Mode
Std. Deviation
Range
Count
100
75
50
25
10000
20000
30000
40000
Starting Salary
50000
60000
1100
26064.20
26000.00
20000
6967.982
58300
Measurement Scales
Measurement
• Assigning numbers to observations
following a set of rules.
How are numbers assigned to observations?
What scale is used?
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Nominal Data
• Using numbers to label categories, but the
numbers have no inherent numerical qualities
• Male = 1
Female = 2
• social security number
• jersey numbers
• race/ethnicity
Other examples of nominal scaling
• Whether a participant does or does not have a
driver’s license (0,1)
• Whether the participant belongs to the
experimental group or the control group (0,1)
• The school the participant attends (1,2,3,4,5)
Uses of nominal data
• Generally, the most you can do with nominal data
is count it.
Categorical or Continuous?
• Is a variable that uses a nominal scale of
measurement categorical or continuous?
Ordinal Scaling
• The assigned number provides information about
the rank of an observation.
• Ordinal scales put observations in order.
• Rating scales are often considered ordinal in how
they measure characteristics.
Example
Strongly
Agree
agree
Statistical
Thinking
is my favorite
class
1
Neither agree
Disagree
nor disagree
2
X
Hospital pain scales (1-10)
3
Strongly
disagree
4
5
Ordinal Scales have two basic rules:
1. Equality/non-equality rule
2. Greater-than-or-less-than rule
Ranking of Tennis Players
http://sports.espn.go.com/sports/tennis/rankings?sport=WOMRANK
Uses of ordinal data
You can express if something is greater than or
less than (but you can’t express how much greater
than or less than).
< >
• Strongly agreeing is more than simply agreeing
• Being tenth in the class is lower than being ninth
How much lower?
Issues
There are limitation of what you can do statistically
with ordinal data
Degree of violation
Consensus
NOIR Interval Scaling
Satisfies all the requirements of an ordinal scale
(there is a high to low structure to the scale) and
• The intervals between the points on the scale
become meaningful because the distance between
successive points on an interval scale are equal.
Examples
• Degrees Fahrenheit or Celsius
• Calendar years
Interval scales do not have a meaningful zero point.
They may contain a zero point, but it is arbitrary.
NOIR Ratio
• Satisfies all the requirements of an interval scale
• There is a real and meaningful zero point on a
ratio scale
• Weight, height, heart rate, breaths per minute,
degrees Kelvin, annual income, miles per hour,
pulse, etc.
HINT: If your scale has negative numbers, like with
temperature, then it is interval but it probably isn’t
ratio.
Our Survey
R
N
O
N
N
R
N
O
O
R
Measures of Central Tendency and
Dispersion
Describing data
Describing Data vs. Describing People
Measures of Central Tendency
•Mean (M or X)
•Median (Md)
•Mode (Mo)
Issues with Describing Data
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
3
3
3
3
4
4
4
4
4
4
4
4
4
5
5
6
6
6
6
n = 34
Mean = 3.15 ???
What does that mean?
The Mean
Most common synonym is the “average”
But what does the mean “mean”?
Definitions:
• “arithmetic average”
• a descriptor of the center of the data, when data
are distributed “normally”
When a mean value is most useful
• To simply summarize a data set that is normally
distributed
• To summarize data from a sample that can be used
to estimate information about an entire population
When a mean can be less useful as s descriptor
• To summarize data that is skewed (especially
when it used as the only descriptor)
• To summarize data where there is an outlier
• To summarize data measured using a nominal
scale
Mean and Skew- Find the mean annual income
$ 7,200
$ 9,011
$ 20,074
$ 24,999
$ 36,567
$ 32,145
$ 54,158
$567,987
$94,018
How well does this number
represent a measure of the center
of our data set?
Median
a.k.a. Middle score in a ranked set of scores.
It divides the distribution of scores into equal halves.
When there is an odd number of scores:
1 2 2 3 5 5 7 8 10 15 16 16 21
When there is an even number of scores:
4 5 7 15 16 19 31 32
Average = (15+16)/2 = 15.5 Median Score
Mean and Skew- Find the mean annual income
$7,200
$28,572
$9,011
$20,074
$24,999
How well does this number
represent a measure of the center
of our data set?
2.0
$32,145
$54,158
$567,987
Count
$36,567
1.5
1.0
0.5
0.0
10000.00
20000.00
30000.00
40000.00
VAR00002
50000.00
Mode
Most frequently occurring score is a group of scores.
Exam scores:
78
85
92
55
87
85
98
84
71
88
85
78
65
99
100 85
62
100
Measures of central tendency
Guide to which measures of central tendency are appropriate to use
with each scale of measurement:
Mean
Median
Mode
Nominal
X
Ordinal
(X)?
X
X
Interval
X
X
X
Ratio
X
X
X
Measures of central tendency
Guide to which measures of central tendency are appropriate to use
with each scale of measurement:
Mean
Median
Mode
Nominal
X
Ordinal
(X)?
X
X
Interval
Ratio
X
X
X
X
X
X
Mean, Median and/or Mode
Class Survey Age Variable
Describe this dataset in terms of its
Mean
6
Median
Mode
Count
4
Range
2
0
30
40
50
age
Descriptive Statistics
N
age
Valid N (listwise)
14
14
Range
37
Mean
36.79
Std. Deviation
10.714
Class Survey Commute Variable
Describe this dataset in terms of its
8
Mean
Count
6
Median
4
Mode
2
Range
What about this data set makes it
difficult to describe using measures
of central tendency?
Descriptive Statistics
N
travel
Valid N (listwise)
14
14
Range
64.5
Mean
10.914
Std. Deviation
16.3124
10.0
20.0
30.0
40.0
50.0
60.0
travel
Commute
Giving more information, such
as how skewed this variable is,
would be helpful in the
description.