Download Central Tendency and Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Mean field particle methods wikipedia, lookup

Transcript
Central Tendency and Variability
The two most essential features of a
distribution
Questions
• Define
– Mean
– Median
– Mode
• What is the effect of distribution shape
on measures of central tendency?
• When might we prefer one measure of
central tendency to another?
Questions (2)
• Define
–
–
–
–
Range
Average Deviation
Variance
Standard Deviation
• When might we prefer one measure of
variability to another?
• What is a z score?
• What is the point of Tchebycheff’s
inequality?
Variables have distributions
• A variable is something that changes or
has different values (e.g., anger).
• A distribution is a collection of
measures, usually across people.
• Distributions of numbers can be
summarized with numbers (called
statistics or parameters).
Central Tendency refers to the
Middle of the Distribution
Variability is about the Spread
1. Central Tendency: Mode,
Median, & Mean
• The mode – the most frequently
occurring score. Midpoint of most
populous class interval. Can have
bimodal and multimodal distributions.
Median
• Score that separates top 50% from
bottom 50%
• Even number of scores, median is half
way between two middle scores.
– 1 2 3 4 | 5 6 7 8 – Median is 4.5
• Odd number of scores, median is the
middle number
– 1 2 3 4 5 6 7 – Median is 4
Mean
• Sum of scores divided by the number of
people. Population mean is  (mu)
and sample mean is X (X-bar).
• We calculate the sample mean by:
X

X 
N
• We calculate the population mean by:
X


N
Deviation from the mean
• x = X – X . Deviations sum to zero.
• Deviation score – deviation from the
mean
9
• Raw scores
8 9 10
7
8
9
10 11
-1
-1
0
0
0
1
1
• Deviation scores
-2
2
Comparison of mean, median
and mode
• Mode
– Good for nominal variables
– Good if you need to know most frequent
observation
– Quick and easy
• Median
– Good for “bad” distributions
– Good for distributions with arbitrary
ceiling or floor
Comparison of mean, median
& mode
• Mean
– Used for inference as well as description;
best estimator of the parameter
– Based on all data in the distribution
– Generally preferred except for “bad”
distribution. Most commonly used statistic
for central tendency.
Best Guess interpretations
• Mean – average of signed error will be
zero.
• Mode – will be absolutely right with
greatest frequency
• Median – smallest absolute error
Expectation
•
•
•
•
•
Discrete and continuous variables
Mean is expected value either way
Discrete: E( X )   xp( x)  mean of X

Continuous: E( X )   xf ( x)dx  mean of X
(The integral looks bad but just means
take the average)
Influence of Distribution
Shape
Review
•
•
•
•
What is central tendency?
Mode
Median
Mean
2. Variability aka Dispersion
• 4 Statistics: Range, Average Deviation,
Variance, & Standard Deviation
• Range = high score minus low score.
– 12 14 14 16 16 18 20 – range=20-12=8
• Average Deviation – mean of absolute
deviations from the median:
| X  Md |

AD 
N
Note difference between this definition &
undergrad text- deviation from Median vs. Mean
Variance
2 
2
(
X


)

• Population Variance:
N
• Where  2means population variance,
•  means population mean, and the other
terms have their usual meaning.
• The variance is equal to the average squared
deviation from the mean.
• To compute, take each score and subtract the
mean. Square the result. Find the average
over scores. Ta da! The variance.
Computing the Variance
(N=5)
X
X
X  X (X  X )
5
15
-10
100
10
15
-5
25
15
15
0
0
20
15
5
25
25
15
10
100
Total:
75
0
250
Mean:
Variance
Is 
50
2
Standard Deviation
• Variance is average squared deviation
from the mean.
• To return to original, unsquared units,
we just take the square root of the
variance. This is the standard
deviation.
2
• Population formula:
( X  )


N
Standard Deviation
• Sometimes called the root-mean-square
deviation from the mean. This name
says how to compute it from the inside
out.
• Find the deviation (difference between
the score and the mean).
• Find the deviations squared.
• Find their mean.
• Take the square root.
Computing the Standard
Deviation
(N=5)
5
10
15
20
25
Total:
Mean:
Sqrt
X
X
15
15
15
15
15
75
Variance
SD
X  X (X  X )
-10
-5
0
5
10
0
Is 
Is 
2
100
25
0
25
100
250
50
 50  7.07
Example: Age Distribution
Distribution of Age
Central Tendency, Variability, and Shape
16
Median = 23
Mean=25.73
12
Frequency
Average Distrance from Mean
Mode = 21
SD = 6.47
8
4
0
10
20
30
age
40
50
Review
•
•
•
•
Range
Average deviation
Variance
Standard Deviation
Standard or z score
• A z score indicates distance from the
mean in standard deviation units.
Formula:
X X
z
S
z
X 

• Converting to standard or z scores does
not change the shape of the distribution.
Z-scores are not normalized.
Tchebycheff’s Inequality (1)
• General form
p(| X   | b) 
2
b2
Suppose we know mean height in inches is 66 and SD
is 4 inches. We assume nothing about the shape of the
distribution of height. What is the probability of
finding people taller than 74 inches? (Note that b is a
deviation from the mean; in this case 74-66=8.). Also
74 inches is 2 SDs above the mean; therefore, z = 2.
 42 16

p 2 
 .25
64
8

[If we assume height is normally distributed, p is much
smaller. But we will get to that later.]
Tchebycheff (2)
| X |
1
• Z-score form
p(
 k)  2

k
• Probability of z score
from any distribution
For the problem in the
being more than k SDs
previous slide:
from mean is at most
1/k2.
1
1
• Z-scores from the worst p(| z | 2)   2  2  .25
 k

2
distributions are rarely
more than 5 or less than
-5.
4 1
• For symmetric,
p (| z | k )    2
9k
unimodal distributions,
|z| is rarely more than 3.
4 1
 
p (| z | 3)    2  .05
93
Review
• Z-score in words
• Z-score in symbols
• Meaning of Tchebycheff’s theorem
Median House Price Data
• Find data
• Show Univariate
• Show plots