Download Central Tendency and Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Mean field particle methods wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Central Tendency and Variability
The two most essential features of a
distribution
Questions
• Define
– Mean
– Median
– Mode
• What is the effect of distribution shape
on measures of central tendency?
• When might we prefer one measure of
central tendency to another?
Questions (2)
• Define
–
–
–
–
Range
Average Deviation
Variance
Standard Deviation
• When might we prefer one measure of
variability to another?
• What is a z score?
• What is the point of Tchebycheff’s
inequality?
Variables have distributions
• A variable is something that changes or
has different values (e.g., anger).
• A distribution is a collection of
measures, usually across people.
• Distributions of numbers can be
summarized with numbers (called
statistics or parameters).
Central Tendency refers to the
Middle of the Distribution
Variability is about the Spread
1. Central Tendency: Mode,
Median, & Mean
• The mode – the most frequently
occurring score. Midpoint of most
populous class interval. Can have
bimodal and multimodal distributions.
Median
• Score that separates top 50% from
bottom 50%
• Even number of scores, median is half
way between two middle scores.
– 1 2 3 4 | 5 6 7 8 – Median is 4.5
• Odd number of scores, median is the
middle number
– 1 2 3 4 5 6 7 – Median is 4
Mean
• Sum of scores divided by the number of
people. Population mean is  (mu)
and sample mean is X (X-bar).
• We calculate the sample mean by:
X

X 
N
• We calculate the population mean by:
X


N
Deviation from the mean
• x = X – X . Deviations sum to zero.
• Deviation score – deviation from the
mean
9
• Raw scores
8 9 10
7
8
9
10 11
-1
-1
0
0
0
1
1
• Deviation scores
-2
2
Comparison of mean, median
and mode
• Mode
– Good for nominal variables
– Good if you need to know most frequent
observation
– Quick and easy
• Median
– Good for “bad” distributions
– Good for distributions with arbitrary
ceiling or floor
Comparison of mean, median
& mode
• Mean
– Used for inference as well as description;
best estimator of the parameter
– Based on all data in the distribution
– Generally preferred except for “bad”
distribution. Most commonly used statistic
for central tendency.
Best Guess interpretations
• Mean – average of signed error will be
zero.
• Mode – will be absolutely right with
greatest frequency
• Median – smallest absolute error
Expectation
•
•
•
•
•
Discrete and continuous variables
Mean is expected value either way
Discrete: E( X )   xp( x)  mean of X

Continuous: E( X )   xf ( x)dx  mean of X
(The integral looks bad but just means
take the average)
Influence of Distribution
Shape
Review
•
•
•
•
What is central tendency?
Mode
Median
Mean
2. Variability aka Dispersion
• 4 Statistics: Range, Average Deviation,
Variance, & Standard Deviation
• Range = high score minus low score.
– 12 14 14 16 16 18 20 – range=20-12=8
• Average Deviation – mean of absolute
deviations from the median:
| X  Md |

AD 
N
Note difference between this definition &
undergrad text- deviation from Median vs. Mean
Variance
2 
2
(
X


)

• Population Variance:
N
• Where  2means population variance,
•  means population mean, and the other
terms have their usual meaning.
• The variance is equal to the average squared
deviation from the mean.
• To compute, take each score and subtract the
mean. Square the result. Find the average
over scores. Ta da! The variance.
Computing the Variance
(N=5)
X
X
X  X (X  X )
5
15
-10
100
10
15
-5
25
15
15
0
0
20
15
5
25
25
15
10
100
Total:
75
0
250
Mean:
Variance
Is 
50
2
Standard Deviation
• Variance is average squared deviation
from the mean.
• To return to original, unsquared units,
we just take the square root of the
variance. This is the standard
deviation.
2
• Population formula:
( X  )


N
Standard Deviation
• Sometimes called the root-mean-square
deviation from the mean. This name
says how to compute it from the inside
out.
• Find the deviation (difference between
the score and the mean).
• Find the deviations squared.
• Find their mean.
• Take the square root.
Computing the Standard
Deviation
(N=5)
5
10
15
20
25
Total:
Mean:
Sqrt
X
X
15
15
15
15
15
75
Variance
SD
X  X (X  X )
-10
-5
0
5
10
0
Is 
Is 
2
100
25
0
25
100
250
50
 50  7.07
Example: Age Distribution
Distribution of Age
Central Tendency, Variability, and Shape
16
Median = 23
Mean=25.73
12
Frequency
Average Distrance from Mean
Mode = 21
SD = 6.47
8
4
0
10
20
30
age
40
50
Review
•
•
•
•
Range
Average deviation
Variance
Standard Deviation
Standard or z score
• A z score indicates distance from the
mean in standard deviation units.
Formula:
X X
z
S
z
X 

• Converting to standard or z scores does
not change the shape of the distribution.
Z-scores are not normalized.
Tchebycheff’s Inequality (1)
• General form
p(| X   | b) 
2
b2
Suppose we know mean height in inches is 66 and SD
is 4 inches. We assume nothing about the shape of the
distribution of height. What is the probability of
finding people taller than 74 inches? (Note that b is a
deviation from the mean; in this case 74-66=8.). Also
74 inches is 2 SDs above the mean; therefore, z = 2.
 42 16

p 2 
 .25
64
8

[If we assume height is normally distributed, p is much
smaller. But we will get to that later.]
Tchebycheff (2)
| X |
1
• Z-score form
p(
 k)  2

k
• Probability of z score
from any distribution
For the problem in the
being more than k SDs
previous slide:
from mean is at most
1/k2.
1
1
• Z-scores from the worst p(| z | 2)   2  2  .25
 k

2
distributions are rarely
more than 5 or less than
-5.
4 1
• For symmetric,
p (| z | k )    2
9k
unimodal distributions,
|z| is rarely more than 3.
4 1
 
p (| z | 3)    2  .05
93
Review
• Z-score in words
• Z-score in symbols
• Meaning of Tchebycheff’s theorem
Median House Price Data
• Find data
• Show Univariate
• Show plots