Download Measures of Center and Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
What to look for in a distribution
1) Look for the overall pattern and for deviations from the pattern
• See if the distribution has a shape we can describe
in a few words
• Describe the center and spread of the distribution
2) One common deviation from the overall pattern in any graph
of data is an outlier, i.e., an observation that falls outside the overall
pattern of the graph
What to look for in a distribution
3) A distribution is symmetric if the right and left sides of the
histogram are approximately mirror images of each other.
4) A distribution is skewed to the right
if the right side of the histogram extends
much further out than the left side.
5) A distribution is skewed to the left
if the left side of the histogram extends
much further out than the right side.
Describing the Center of a Distribution
How to find the mean (average):
1) Add the values together
2) Divide the total by the number of observations
• Example: Test Scores : 56, 65, 54, 55, 57, 54, 61, 62, 60, 55, 57,
56, 57, 61, 62, 60, 49, 66, 59, 80
Step 1 : 56 + 65 + 54 + …… + 59 + 80 = 1186
Step 2 : 1186 / 20 = 59.3
Mean
Describing the Center of a Distribution
Fancy Schmancy Notation :
To find the mean x of a set of observations, add their values and
divide by the number of observations. If the n observations are
x1 , x2 , x3 , ….. , xn , their mean is :
x 1 + x 2 + x 3 + ... + x n
x =
n
Or, in more compact notation:
x
=
1
n
x
i
Describing the Center of a Distribution
How to find the median M :
1) Arrange the observations in order from smallest to largest.
2) If the number of observations is odd, then the median is
located at the center of the list. So, if there are n observations,
then the median is located in spot (n + 1) / 2
3) If the number of observations is even, then the median is
the average of the two terms in the middle spots. These are
located in spots (n / 2) and (n / 2) + 1
Describing the Center of a Distribution
Example of finding a Median :
List 1 : 2, 4, 6, 3, 5, 2, 6, 8, 10, 11, 1
Step 1: Order the list :
1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11
Step 2 : Find the middle term2 : (n+1) / 2 = (11 + 1) / 2 = 6
1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11
Median
Describing the Center of a Distribution
Example of finding a Median :
List : 2, 4, 6, 3, 5, 2, 6, 8, 10, 11, 1, 12
Step 1: Order the list :
1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11, 12
Step 2 : Find the two middle terms :
(n / 2) + 1 = (12 / 2) + 1 = 7
n / 2 = 12 / 2 = 6
Step 3 : Average the sixth and seventh terms :
1, 2, 2, 3, 4, 5, 6, 6, 8, 10, 11, 12
Median = (5 + 6) /2
= 5.5
In The Presence Of Outliers
Q: Do outliers affect the Mean and Median?
Consider the list on numbers from 1 through 9 :
1, 2, 3, 4, 5, 6, 7 ,8 ,9
The Mean is : 5
The Median is : 5
What if we put the number 100 at the end of the list :
1, 2, 3, 4, 5, 6, 7 ,8 ,9, 100
The Mean is : 14.5
The Median is : 5.5
A: Outliers affect the Mean much more than the Median !
Describing Spread
Consider the following pay distributions:
Low
Low
High
High
Center
Measuring Spread
The simplest useful numerical description of a distribution
consists of both a measure of center and a measure of spread.
We can begin to describe the spread of a distribution by talking
about percentiles.
Definition: The pth percentile of a distribution is the value such
that p percent of the observations fall at or below it.
Example: The Median is the 50th percentile.
Q: Why isn’t the Mean the 50th percentile?
1, 2, 3, 4, 5, 6, 7 ,8 ,9, 100
The Mean is : 14.5
The Median is : 5.5
Describing Spread
The Five Number Summary :
1) The Median
2) First Quartile : 25% of the observations lie
below the First Quartile
3) Third Quartile : 75% of the observations lie
below the third quartile
4) Lowest Individual Observation (Minimum)
5) Highest Individual Observation (Maximum)
Describing Spread
Calculating the Quartiles :
1) Arrange the observations in increasing order and locate the Median
M in the ordered list o’ observations.
2) The First Quartile Q1 is the median of the observations whose
position in the ordered list is to the left of the location of the
overall median.
3) The Third Quartile Q3 is the median of the observations whose
position in the ordered list is to the right of the location of the
overall median.
Describing Spread
Example of calculating First Quartile :
List o’ quiz scores: 10, 8, 9, 4, 6, 6, 8, 9, 2, 7
1) Order the list: 2, 4, 6, 6, 7, 8, 8, 9, 9, 10
Find the median: (7 + 8) / 2 = 7.5
2) Find all the observations whose position in the list is to the
left of the median : 2, 4, 6, 6, 7, 8, 8, 9, 9, 10
Find the median of these values : 6
Describing Spread
Example of calculating Third Quartile :
List o’ quiz scores: 10, 8, 9, 4, 6, 6, 8, 9, 2, 7, 11
1) Order the list: 2, 4, 6, 6, 7, 8, 8, 9, 9, 10, 11
Find the median: 8
2) Find all the observations whose position in the list is to the
right of the median : 2, 4, 6, 6, 7, 8, 8, 9, 9, 10, 11
Find the median of these values : 9
Interquartile Range
The interquartile range , IQR, is the distance between the first
quartile and the third quartile.
Determining Outliers
Call an observation a suspected outlier if it falls more than
1.5 * IQR above the third quartile or below the first quartile.
Example : Imagine we have a bunch of test scores with Q1 = 50 and
Q3 = 80.
The IQR = 80 - 50 = 30
So, 1.5 * IQR = 1.5 * 30 = 45
This means that if there are any scores above Q3 + 45 = 125
or any scores Q1 - 45 = 5, then these scores are suspected outliers.
Boxplot
•Example:
A BoxplotLow
is a =graph
of the= five
number =summary.
central
47, High
98, Median
77, Q1 =A65,
Q3 =box
85
spans the quartiles, with a line marking the median. Whiskers
extend out from the box to the extremes.
Highest Observation (98)
90
Q3 (85)
Median (77)
70
Q1 (65)
50
30
10
0
Lowest Observation (47)
Describing Spread
The Standard Deviation
• Variance: The variance of a set of observations is an “average” of the
deviations of the observations from the mean.
• Note: You divide by (n - 1) instead of n.
• Standard Deviation: The SD is the square root of the variance.
Describing Spread
The Standard Deviation
Example : Test Scores : 65, 77, 83, 80, 95
1) Find the average : 80
2) Find the deviations from the mean, and their squares
Obs
Deviation from Mean
65
77
-15
-3
83
80
95
3
0
15
Deviations Squared
225
9
9
0
225
Describing Spread
The Standard Deviation
3) Determine the mean of the squares:
(225 + 9 + 9 + 0 + 225)
(5 - 1)
= 117
Variance
4) Determine the Standard Deviation:
117
= 10.8
More Fancy Schmancy Notation
2
The variance s of a set of observations is the average of the squares
of the deviations of the observations from their mean. In symbols,
the variance on n observations x 1 , x 2 , ... x n is :
2
s2 =
2
2
+ (x 2 - x ) + ...
(x 1 - x )
+ (x n - x )
n-1
or, in more compact notation :
s
2
1
=
n-1

2
(x i - x )
The standard deviation s is the square root of the variance s 2 :
s=
1
n-1

2
(x i - x )
Another Example of Standard Deviation
Consider the following years in our past :
1792, 1666, 1362, 1614, 1460, 1867, 1439
Find the standard deviation of these years.
The Mean = 1600
xi
1792
1666
1362
1614
1460
1867
1439
2
( xi - x )
( xi - x )
192
66
-238
14
-140
267
-161
36864
4356
56644
196
19600
71298
25921
s
2
1
=
n-1

2
(x i - x )
1
( 214879 )
=
6
= 35813.166
s = 189.2
Why Do We Square The Deviations ?
1) The sum of the squared deviations of any set of observations from their
mean is the smallest that the sum of squared deviations from any number
can possibly be.
Why use the Standard Deviation and not the Variance ?
1) The standard deviation is the measure of spread for an important
class of symmetric unimodal distributions called the normal distribution.
2) The standard deviation is used by the normal distribution.
3) The variance uses squared deviations, which gives a different unit
from the original data.
Why use n - 1 ?
1) The sum of the deviations is *always* zero. So, if we know n-1 of the
deviations, then the last deviation can be calculated. So, only n-1 of the
deviations can vary freely. These are called degrees of freedom.
Properties of Standard Deviations
1) The standard deviation measures spread about the mean and should be
used only when the mean is chosen as the measure of center.
2) s = 0 only when there is no spread. This happens only when all
observations have the same value. Otherwise, s > 0. As the observations
get more spread out from the mean, then s gets larger.
3) s, like the mean, is not resistant. A few outliers can make s very large.
Which Measure To Use ?
Q: When is the mean better than median? When is the five number
summary better than the standard deviation?
Rules O’ Thumb
A1: If outliers appear, or if your distribution is skewed, then the mean
could be affected, so use the median and the five number summary.
A2: If the distribution is reasonably symmetric and is free of outliers,
then the mean and standard deviation should be used.
Changing Units
Consider the following values : 30, 40, 50, 60, 70
The mean is 50 and the standard deviation is 15.8
What happens to these if we take every score, multiply it by 2 and add 10
We get these values : 70, 90, 110, 130, 150
The mean is 110 and the standard deviation is 31.6
Changing Units
Old values : 30, 40, 50, 60, 70
mean = 50 and s = 15.8
What happens to these if we take every score, multiply it by 2 and add 10
New values : 70, 90, 110, 130, 150
mean = 110 and s = 31.6
150
150
150
130
130
130
110
110
110
90
90
90
70
70
70
50
50
50
30
30
30
Linear Transformations
A linear transformation changes the original variable x into the new
variable x new given an equation of the form :
x new = bx + a
Note: The constant a shifts all values of x either up or down by the value
a. The constant b changes the size of the unit of the distribution.
Effects of Linear Transformations
1) To get the new spread, multiply the old spread by b.
2) To get the new mean, multiply the old mean by b and add the
constant a.
Homework
43, 45, 49, 50, 54, 59, 63, 64, 65, 72, 73, 75
Related documents