Download Means & Medians

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Mean field particle methods wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Means & Medians
Chapter 4
Parameter • Fixed value
about a
population
• Typical unknown
Statistic • Value calculated
from a sample
Measures of Central Tendency
• Median - the middle of the data;
50th percentile
–Observations must be in
numerical order
–Is the middle single value if n is
odd
–The average of the middle two
values if n is even
NOTE: n denotes the sample size
Measures of Central Tendency
parameter
• Mean - the arithmetic average
–Use m to represent a population
statistic
mean
–Use x to represent a sample
mean
Formula:
x
x
n
S is the capital Greek
letter sigma – it means to
sum the values that follow
Measures of Central Tendency
• Mode – the observation that occurs
the most often
–Can be more than one mode
–If all values occur only once – there
is no mode
–Not used as often as mean &
median
Suppose we are interested in the number of
lollipops that are bought at a certain store. A
sample of 5 customers buys the following number
of lollipops. Find the median.
The numbers are in order
& n is odd – so find the
middle observation.
2
The median is 4
lollipops!
3 4 8 12
Suppose we have sample of 6 customers that buy
the following number of lollipops. The median is …
The median is 5
The numbers are in order
lollipops!
& n is even – so find the
middle two observations.
Now, average these two values.
2
5
3 4 6 8 12
Suppose we have sample of 6 customers that buy
the following number of lollipops. Find the mean.
To find the mean number of lollipops
add the observations and divide by
n.
x  5.833
2  3  4  6  8  12
6
2
3 4 6 8 12
Using the calculator . . .
What would happen to the median & mean if the
12 lollipops were 20?
The median is . . .
The mean is . . .
5
7.17
2  3  4  6  8  20
6 What happened?
2
3 4 6 8 20
What would happen to the median & mean if the
20 lollipops were 50?
The median is . . .
The mean is . . .
5
12.17
2  3  4  6  8  50
6 What happened?
2
3 4 6 8 50
Resistant • Statistics that are not affected by
outliers
• Is the median resistant?
►Is
the mean resistant?
YES
NO
IMPORTANT: Median is resistant to outliers
Mean is NOT resistant to outliers
Look at the following data set. Find
the mean.
22
23
24
25
25
26
29
30
x  25.5
Now find how eachWill
observation
this sum always
equal zero?
deviates from the mean.
YES
What is the sum of the deviations from
This is the
deviation from
the mean.
the mean?
 x  x   0
Look at the following data set. Find the
mean & median.
Mean = 27
Median = 27
21
27
Create a histogram with the
data.
x-scale
of 2) Then
Look(use
at the
placement
of
find
mean
median.
thethe
mean
andand
median
in
this symmetrical
distribution.
23
23
24
25
25
27
27
28
30
30
26
26
26
27
30
31
32
32
Look at the following data set. Find the
mean & median.
Mean = 28.176
Median = 25
Create a histogram with the
data.
x-scale
of 8) Then
Look(use
at the
placement
of
find
mean
median.
thethe
mean
andand
median
in
this right skewed
22
29 distribution.
28
22
24
25
28
21
23
62
23
24
23
26
36
38
25
Look at the following data set. Find the
mean & median.
Mean = 54.588
Median = 58
Create a histogram with the
data.
Then
findplacement
the meanof
and
Look
at the
median.
the mean
and median in
this skewed left
distribution.
21
46
54
47
53
60
55
55
56
63
64
58
58
58
58
62
60
Recap:
• In a symmetrical distribution, the mean
and median are equal.
• In a skewed distribution, the mean is
pulled in the direction of the skewness.
• In a symmetrical distribution, you
should report the mean!
• In a skewed distribution, the median
should be reported as the measure of
center!
Example calculations
• During a two week period 10 houses
were sold in Fancytown.
House Price
in Fancytown
x
231,000
313,000
299,000
312,000
285,000
317,000
294,000
297,000
315,000
287,000
 x  2,950,000
x 2,950,000

x

n
10
 295,000
The “average” or
mean price for this
sample of 10 houses
in Fancytown is
$295,000
• During a two week period 10 houses
were sold in Lowtown.
House Price
in Lowtown
x
97,000
93,000
110,000
121,000
113,000
95,000
100,000
122,000
99,000
2,000,000
 x  2,950,000
x 2,950,000

x

n
10
 295,000
Outlier
The “average” or
mean price for this
sample of 10 houses
in Lowtown is
$295,000
• Looking at the dotplots of the samples for
Fancytown and Lowtown we can see that
the mean, $295,000 appears to accurately
represent the “center” of the data for
Fancytown, but it is not representative of
the Lowtown data.
• Clearly, the mean can be greatly affected
by the presence of even a single outlier.
Dotplots for Fancytown and Lowtown
Outlier
Lowtown
Fancytown
500000
295000
1000000
1500000
2000000
1. In the previous example of the house
prices in the sample of 10 houses
from Lowtown, the mean was
affected very strongly by the one
house with the extremely high price.
2. The other 9 houses had selling
prices around $100,000.
3. This illustrates that the mean can be
very sensitive to a few extreme
values.
SOOOO……
Describing the Center of a Data
Set with the median
The sample median is obtained by first
ordering the n observations from smallest to
largest (with any repeated values included, so
that every sample observation appears in the
ordered list). Then
the single middle value if n is odd
sample median= 
 the mean of the middle two values if n is even
Example of Median Calculation
Consider the Fancytown data. First, we
put the data in numerical increasing
order to get
231,000 285,000 287,000 294,000
297,000 299,000 312,000 313,000
315,000 317,000
Since there are 10 (even) data values, the
median is the mean of the two values in
the middle.
297000  299000
median 
 $298,000
2
Consider the Lowtown data. We put the data
in numerical increasing order to get
93,000 95,000 97,000 99,000
100,000 110,000 113,000 121,000
122,000
2,000,000
Since there are 10 (even) data values, the
median is the mean of the two values in the
middle.
100,000  110,000
median 
 $105,000
2
• Typically,
1. when a distribution is skewed positively, the
mean is larger than the median,
2. when a distribution is skewed negatively, the
mean is smaller then the median, and
3. when a distribution is symmetric, the mean
and the median are equal.
Trimmed mean:
Purpose is to remove outliers from a
data set
To calculate a trimmed mean:
• Multiply the % to trim by n
• Truncate that many observations from
BOTH ends of the distribution (when
listed in order)
• Calculate the mean with the
shortened data set
Find a 10% trimmed mean with the following data.
12
14
19
20
22
24
25
26
26
10%(10) = 1
So remove one observation
from each side!
14  19  20  22  24  25  26  26
 22
8
35
Example of
Trimmed
Mean
House Price
in Fancytown
Sum of the eight
231,000
middle values is
285,000
2,402,000
287,000
294,000
Divide this value
297,000
by 8 to obtain
299,000
the 10%
312,000
trimmed mean.
313,000
315,000
317,000
 x  2,950,000
x
291,000
median  295,000
10% Trim Mean  300,250
Example of Trimmed Mean
House Price
in Lowtown
Sum of the eight
93,000
middle values is
95,000
857,000
97,000
99,000
Divide this value
100,000
by 8 to obtain
the 10%
110,000
trimmed mean.
113,000
121,000
122,000
2,000,000
 x  2,950,000
x
295,000
median  105,000
10% Trim Mean 107,125