Download lecture 05 slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Central Tendency
& Scale Types
Outline
• Central Tendency
– Mean
– Median
– Mode
• Scale Types
–
–
–
–
Nominal
Ordinal
Interval
Ratio
• Different statistics for different variables
Central Tendency
• Statistics – simplify a large set of data to a single
(meaningful) number
• Central Tendency
– One useful kind of summary information
– Intuitively: typical, average, normal value
• Three statistics for central tendency
– Mean
– Median
– Mode
Mean
• Sum of scores divided by number of scores
• Sample mean:
M=
åX
sample
n
IQ: X = [94, 108, 145, 121, 88, 133]
SX = 94 + 108 + 145 + 121 + 88 + 133 = 689
M = 689/n = 689/6 = 114.83
• Equal apportionment
– If everyone had mean score, total would be the same
• Balance point, seesaw analogy (Fig 3.3)
• Equal upward and downward distances
88
94
108
M
121
133
145
Population Mean
• Finite population
m=
åX
pop
N
• Infinite population
   x  p( x)
x
X = [1,1,2,2,2,2,3,3,3,3,3,4,4,4,5,5,6]
2  f x( x=)3
xX= 1 x = x
f(x)

= f(x)
2 = 4 f(x) = 5
x
SX = 2*4SX = 3*5
SX = 1*2
 X  x  f ( x)  x  p ( x )


N
x
N
x
Probability
p(x) = fraction of population
with value x
Mean of Infinite Population
• Half of all leprechauns have 1 pot of gold. The other half have 2.
Mean?
m = å x × p(x) = 1× p(1) + 2× p(2) = 1× 1 + 2× 1 = 1.5
x
• 92% of dogs have 4 legs.
5% have 3 legs.
2% have 2 legs.
1% have 5 legs.
2
2
100 dogs.
92 with 4 legs,
5 with 3 legs,
2 with 2 legs,
1 with 5 legs.
m=
åX
pop
N
m = å x × p(x) = 2× p(2) + 3 × p(3) + 4 × p(4) + 5 × p(5)
4+…+4+3+3+3+3+3+2+2+5 = 392
= 2× 2% + 3 × 5% + 4 × 92% + 5 ×1% = 392% = 3.92
392/100 = 3.92
x
Median
• Middle value
– Higher than half the scores,
lower than other half
• Not average of minimum and maximum
• Sorting approach
X = [4,7,5,8,6,2,1,4,3,5,6,8,7,4,3,6,9]
X = [1,2,3,3,4,4,4,5,5,6,6,6,7,7,8,8,9]
• Same as 50th percentile
Mean vs. Median
74
40
73
72
71
69
70
12
Height
(Inches)
Household
Income
Digit
Span
11
68
67
66
Mean
9
65
64
8
63
62
7
61
6
60
59
Mean excluding outlier
Mean ($63k)
5
0
Median ($46k)
10
Frequency
(.5 M households)
Skew
Outliers
58
10
–
9
–8
7
6 6
5 5
4 4
3
3
2
2 1
1 0
4
Frequency
(subjects)(students)
Frequency
• Both based on a notion of balance
• Mean sensitive to each datum's distance from middle
• Median better for irregular
distributions
Median
Mean
Mean
Mode
Most common value
Peak in the distribution for continuous variables
Simple and insensitive
10
Most
useful when mean, median not definable
Height (Inches)
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
College majors, sex, favorite color
59
9
–
8
7
6
5
4
3
2
1
0
58
Frequency (students)
•
•
•
•
Scale types
•
We usually use numbers to represent values of variables
– Numbers are just a model or analogy for real world
– Some properties relevant, some superfluous
•
Sex
– Males = 1; females = 2
– Females not twice males
•
Analogy still limited for more “numerical” variables
– Height, reaction time
– Can’t multiply together
•
Numbers have many properties
– Which are relevant for a given variable?
– Determines what kinds of statistics make sense
•
Scale of a variable
– Summarizes what numerical properties are meaningful
– 4 types of scales: Nominal, Ordinal, Interval, Ratio
Nominal Scale
• Values are just labels
– Sex: {male, female}
– Color: {red, green, blue, …}
• No structure or relationships between values
• Essentially non-numeric
– Can use numbers for “coding” but just as placeholders
– Red = 1; green = 2; blue = 3
• Only mathematical notion is equality (=)
– Two scores are equal, or they’re not
• Few meaningful statistics
– Frequencies: Number of scores of a given value
– Mode: Value with greatest frequency
Ordinal Scale
• Values are ordered, but differences aren’t meaningful
– Preferences, contest placings, years of education
– 1st - 2nd  2nd - 3rd
• Mathematical notion of greater-than (>, =)
• Additional meaningful statistics
– Median, quantiles
– Range, interquartile range
Interval Scale
• Differences between scores are meaningful
– Today 4° warmer than yesterday
• Ratios of scores not meaningful
– 2° not twice as hot as 1°
– No real zero point
– E.g. Fahrenheit vs. Celcius; IQ
• Mathematical notion of subtraction (–, >, =)
• Additional meaningful statistics
– Mean
– Variance, standard deviation
Ratio Scale
• Zero is meaningful
– Weight, time, etc.
0
• Ratios between scores make sense
– Twice as heavy, twice as long
• Mathematical notion of division (/, –, >, =)
• No notable new statistics
Summary of Scale Types
Scale
0
Meaningful
Operations
Nominal
=
Ordinal
>=
Interval
–>=
Ratio
/–>=
Mode
Median
Mean
Related documents