Download Measures of Central Tendency, Spread, and Shape

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Nominal Scaling
Measures of Central Tendency, Spread, and
Shape
• The lowest level of scaling
• In Nominal Scaling, each of the values serves as a
representation.
• Each observation belongs to one mutually exclusive category
Dr. J. Kyle Roberts
and has no logical order
• Examples:
• Gender
• Ethnicity
• School
Southern Methodist University
Simmons School of Education and Human Development
Department of Teaching and Learning
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Ordinal Scaling
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Interval Scaling
• In Interval Scaling, each of the values has a specific order that
• In Ordinal Scaling, each of the values is in rank order.
reflects equal differences between observations.
• Each observation belongs to one mutually exclusive category,
• Each observation belongs to one mutually exclusive category,
but we now have logical order.
• Examples:
with logical order, and equal differences between each of the
points and no absolute zero.
• Examples:
• Letter Grades
• Place Finished in a Race
• Likert-type Scaling
• Temperature (Fahrenheit and Celcius)
• IQ Scores
• SAT/GRE Scores
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Ratio Scaling
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Measures of Central Tendency
• In Ratio Scaling, each of the values has a specific order that
• The measures of central tendency try and give us a picture of
reflects equal differences and a ”true” zero.
what is going on at the middle of the distribution
• Each observation belongs to one mutually exclusive category,
with logical order, equal differences between each of the
points, and has a ”true” zero.
• Examples:
• Kelvin Scale
• Height and Weight
• Speed
Levels of Scale
Central Tendency
• There are 3 types of measures of central tendency
• The mode - this is the most frequently appearing score
• The median - this is also called the ”middle” score
• The mean - this is also called the ”average” score
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
The Mode
• The mode is the most frequently appearing score
• There can be as many modes as there are pieces of data in a
dataset
• Datasets with two modes are usually referred to as bimodal
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Representing the Mode
• X = 1,2,2,3,3,3,4,4,5
• Y = 1,1,1,2,2,3,4,4,4,5
Histogram of Y Data
30
30
Percent of Total
Percent of Total
25
20
10
20
15
10
5
0
0
1
2
3
4
5
1
2
x
Levels of Scale
Central Tendency
3
4
5
y
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
The Median
• The median is often called the ”middle” score
• To compute the median, you first arrange the scores in
ascending order
• For datasets with an odd number of scores, the median is the
middle score.
• For datasets with an even number of scores, the median is the
score halfway between the two middle scores
• If X = 1,2,3,4,10, the median would be 3
• If X = 1,2,3,10, the median would be 2.5
• If X = 3,2,10,1, the median would still be 2.5
Confidence Intervals
Measures of Shape
Confidence Intervals
Measures of Shape
Outline
• Suppose we have 2 separate datasets
Histogram of X Data
Measures of Spread/Variation
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Levels of Scale
Central Tendency
Measures of Spread/Variation
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
The Mean
X=
i=1
i=1 (Xi )
n
• Given the dataset X = 1,2,3,4
X=
Levels of Scale
Central Tendency
1+2+3+4
10
=
= 2.5
4
4
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Range
• The range is used to describe the number of units on the scale
of measurement
• It is computed as:
• (Highest score - Lowest score)
• Suppose we have the dataset X = 1,2,3,4, the range would
be:
• (4 - 1) = 3
Measures of Shape
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Pn
(Xi )/n =
Confidence Intervals
Outline
• The mean is also called the ”average” score
n
X
Measures of Spread/Variation
Levels of Scale
Central Tendency
Measures of Spread/Variation
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Standard Deviation
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Computing the Standard Deviation
• Suppose that we have a dataset where X = 1,2,3,4
r
• The standard deviation is also sometimes referred to as the
(1 − 2.5)2 + (2 − 2.5)2 + (3 − 2.5)2 + (4 − 2.5)2
4−1
r
2
2
(−1.5) + (−0.5) + (0.5)2 + (1.5)2
=
3
r
2.25 + 0.25 + 0.25 + 2.25
=
3
r
5
=
3
√
= 1.67
SDX =
mean deviation
• The SD represents the ”average distance” of each score from
the mean
s
Pn
SDX =
− X)2
n−1
i=1 (Xi
= 1.29
Levels of Scale
Central Tendency
Measures of Spread/Variation
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Z-Scores
• As opposed to the other measure of variation, the z-score is a
measure of deviation for a single individual, as opposed to a
group of scores.
• The Z-score represents the number of Standard Deviation
units a given piece of data is from the mean.
Zi =
Xi − X
SDX
Raw Score
1
2
3
4
Z-Score
-1.16
-0.39
0.39
1.16
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Outline
Central Tendency
Confidence Intervals
Measures of Shape
Standard Error of the Mean
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Levels of Scale
Measures of Spread/Variation
• The standard error of the mean is really just computed for the
purposes of computing Confidence Intervals
SDX
SEMX = √
n
• For our dataset, X = 1,2,3,4 ,
1.29
SEMX = √
4
= 0.645
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Confidence Intervals
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Area Under the Curve
• Suppose that we have a uniform normal distribution of
0.3
0.3
0.4
95% area at z=1.95
0.4
68.26% area at z=1
population scores between -1 and +1 [U (−1, 1)] and we take
many samples of n=10.
4
0.2
6
dnorm(x, 0, 1)
8
0.2
dnorm(x, 0, 1)
Percent of Total
10
0
[1] 6.570396e-05
> sd(res)
[1] 0.1825853
0.0
0.5
1.0
0.0
> mean(res)
−0.5
0.0
−1.0
0.1
0.1
2
−3
−2
−1
0
x
1
2
3
−3
−2
−1
0
x
1
2
3
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Computing 95% Confidence Intervals
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Graphing 95% Confidence Intervals
8
6
Let’s look back at our previous dataset where X=1,2,3,4. In this
case, we noted that the mean of X = 2.5 and the SD of X = 1.29.
This would make the SEM = 1.29/2 = 0.645. 95% confidence
intervals for any dataset are computed as:
> x <- c(1, 2, 3, 4)
> y <- c(1, 3, 5, 7, 9)
> plotmeans(c(x, y) ~ rep(c("x", "y"), c(4, 5)))
c(x, y)
= 2.5 ± 1.96 ∗ 0.645
= 2.5 ± 1.264
●
4
CI95% = X ± 1.96 ∗ SEM
2
●
n=4
n=5
x
y
rep(c("x", "y"), c(4, 5))
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Thinking Through Confidence Intervals
13
12
11
●
●
Given the properties of the above two studies, which study would
have the LARGER confidence intervals?
7
8
Thought Question
> study1 <- rnorm(30, 10, 5)
> study2 <- rnorm(30, 10, 10)
> plotmeans(c(study1, study2) ~ rep(c("Study 1",
+
"Study 2"), c(30, 30)))
10
Study 2
10
10
30
Confidence Intervals
9
mean
SD
n
Study 1
10
5
30
Measures of Spread/Variation
Thinking Through Confidence Intervals
c(study1, study2)
Suppose that we have two different datasets with the following
properties:
Central Tendency
n=30
n=30
Study 1
Study 2
rep(c("Study 1", "Study 2"), c(30, 30))
Measures of Shape
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Thinking Through Confidence Intervals
Confidence Intervals
Measures of Shape
Thought Question
12
11
●
●
10
c(study1, study2)
13
14
Study 2
10
10
30
> study1 <- rnorm(30, 10, 10)
> study2 <- rnorm(300, 10, 10)
> plotmeans(c(study1, study2) ~ rep(c("Study 1",
+
"Study 2"), c(30, 300)))
9
Study 1
10
10
300
Measures of Spread/Variation
Thinking Through Confidence Intervals
Suppose that we have two different datasets with the following
properties:
mean
SD
n
Central Tendency
8
Given the properties of the above two studies, which study would
have the LARGER confidence intervals?
n=30
n=300
Study 1
Study 2
rep(c("Study 1", "Study 2"), c(30, 300))
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Outline
Confidence Intervals
Specific Normal Distributions
N(10,1)
N(10,4)
N(12,1)
0.4
0.3
f(x)
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Measures of Spread/Variation
0.2
0.1
0.0
6
8
10
x
12
14
Measures of Shape
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
The standard normal distribution
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Evaluating the Normal Distribution
• The standard normal distribution is the one with µ = 0 and
σ = 1. We often use Z to denote it, i.e. Z ∼ N (0, 12 ).
• Its density function is often written φ(z) and
1
2
φ(z) = √ e−z /2
2π
• As given from the previous slide, we assume that the normal
distribution has a structure such that mean meadian mode.
• We can also evaluate the shape of our distribution relative to
it’s deviation from the standard normal distribution through
multiple means.
−∞<z <∞
0.4
• Skewness - a measure of shape determining whether or not the
”left half” looks like the ”right half”
0.3
• Kurtosis - a measure of shape determining relative hieght to
width
0.2
• Gaussian probability density plot
0.1
0.0
−3
Levels of Scale
−2
Central Tendency
−1
0
Measures of Spread/Variation
1
2
Confidence Intervals
3
Measures of Shape
Levels of Scale
Central Tendency
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Skewness
• Skewness measures typically range from -3 to +3
• A skewness of 0 means that the left half looks exactly like the
right half
• Positively skewed means that the mean is influenced by
outliers making it ”seem” larger than the relative mean of the
population from which the sample was drawn
n n
X
X
n
xi − x̄ 3
n
skew =
=
zi3
(n − 1)(n − 2)
SDx
(n − 1)(n − 2)
i=1
i=1
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Skewness Examples - Normal Distribution
Histogram for Normally Distributed Data
QQ plot for a Normal Distribution
Histogram for Positively Skewed Data
10
5
●●●●
●
●
●
●
●
●
●
●
100
80
60
0
●
50
QQ plot for Positively Skewed Data
100
−2
0
0
2
Measures of Spread/Variation
Measures of Shape
QQ plot for Negatively Skewed Data
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
0
25
rskt(200, 2, −2)
20
15
−5
●
●
−10
10
●
●●
●
5
−15
●
0
●
−15
−10
−5
rskt(200, 2, −2)
0
−3
−2
−1
0
qnorm
20
40
60
80
−3
rskt(200, 2, 2)
Confidence Intervals
30
●
0
qnorm
Histogram for Negatively Skewed Data
●
1
2
3
Levels of Scale
Central Tendency
●
●
10
20
Skewness Examples - Negatively Skewed Distribution
−20
40
0
150
Central Tendency
●
20
●
●●
●●●●
rnorm(1000, 100, 15)
Measures of Shape
30
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
120
Confidence Intervals
60
●
Percent of Total
rnorm(1000, 100, 15)
Percent of Total
Measures of Spread/Variation
●
140
Percent of Total
Central Tendency
Skewness Examples - Positively Skewed Distribution
15
Levels of Scale
Levels of Scale
rskt(200, 2, 2)
Levels of Scale
● ●
●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
−2
−1
0
1
2
3
qnorm
Measures of Spread/Variation
Outline
Levels of Scale
Central Tendency
The Mode
The Median
The Mean
Measures of Spread/Variation
Range
Standard Deviation
Z-Scores
Standard Error of the Mean
Confidence Intervals
Measures of Shape
The Normal Distribution
Skewness
Kurtosis
Confidence Intervals
Measures of Shape
Levels of Scale
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Levels of Scale
Kurtosis
Central Tendency
Measures of Spread/Variation
Confidence Intervals
Measures of Shape
Graphical Examples of Kurtosis
• Although all of these theoretically could be normal, they
illustrate leptokurtic and platykurtic distributions.
• Kurtosis measures typically range from -3 to +3
Normal
Platykurtic
Leptokurtic
• A kurtosis of 0 means that your distribution has the same
0.4
relative height to width properties as a normal distribution
• Positive kurtosis means that your distribution is leptokurtic
(taller and skinnier)
0.3
• Negative kurtosis means that your distribution is platykurtic
kurt =
n(n + 1)
(n − 1)(n − 2)(n − 3)
f(x)
(shorter and fatter)
n X
i=1
xi − x̄
SDx
4
−
0.2
1)2
3(n −
(n − 2)(n − 3)
0.1
0.0
6
8
10
x
12
14