Download 09-03 lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Descriptive Statistics II:
Variability
9/3
Attendance Question
Really, how many courses?
A:  4
B: 5
C: 6
D: 7
E:  8
Variability
9
8
7
6
5
4
3
2
1
0
Central tendency locates middle of distribution
How are scores distributed around that point?
Low variability vs high variability
Ways to measure variability 10
–
–
–
–
3
Range
Interquartile range
Variance
Standard deviation
4
5
6
7
Maze Attempts
8
9
Frequency (rats)
Frequency (rats)
•
•
•
•
10
9
8
7
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11
Maze Attempts
Why Variability is Important
• Inference
– Reliability of estimators
• For its own sake
31 52 13 43
49 58 100 61
61 97
97 55 31
m = 100
97 106 104
104 106
100105
105 100 108 108 99 97
– Consistency
(manufacturing, sports,
etc.)
105104
104 109 93 97 100 104
104
28
93 108 96 105
M
M
– Diversity (attitudes,
strategies)
87.3
99.1
99 96 105 95 93 92 100 103 103 97
34 175 37 139 106 19 184 88 112 64
178 154 136 103 94
94 181
181 91 109
109 46
46
178
175
76 172 157 22
22 16
16 85
85 142 130
130 151 67
67
121
111.6
100.5
99.5
99
92
103
99 101 106 94
94
102 92 94 96 108
108 98 99
121124160 40 145 127
127 100 82 133 169
169
95 102
95
93 99 101
94 93
101 102 107 92
92 97
166 163 118 148
148 79 25
25 115 73
73 187 70
101 107 92 102
102 103 98
105 95 96
96 107 101
Range
• Distance from minimum to maximum
Measurement unit
or precision
10
9
8
7
6
5
4
3
2
1
0
50
10
45
9
40
8
35
7
30
6
25
5
20
4
–
15
3
10
2
–
5
1
–
0
Frequency (rats)
Frequency (students)
range  max( X ) - min( X )  
X=
[66.2,depends
78.6, 69.6,on
65.3,
• Sample
range
n 62.7]
• More useful as population parameter
(11Height
– 1) (Inches)
+ 1 = 11
84
82
80
78
76
74
72
70
68
66
64
62
60
58
56
54
52
50
78.6
– 62.7of+ measurement
.1 = 16.0
Theoretical
property
variable
E.g. memory test: min and max possible
1
2
3
4
5
6
7
8
9
10
11
Rough guidelines, e.g.
height
Maze Attempts
Interquartile range
• Quartiles
–
–
–
–
Values of X based on dividing data into quarters
1st quartile: greater than 1/4 of data
3rd quartile: greater than 3/4 of data
2nd quartile = median
• Interquartile range
– Difference between 1st and 3rd quartiles
– Like range, but for middle half of distribution
– Not sensitive to n  more stable
6–3=3
X = [1,1,2,2,2,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7,7,7,8]
1st quartile = 3
3rd quartile = 6
Variance
• Most sophisticated statistic for variability
• Based on distance of each datum to the mean: |X – m|
88
94
108
m
115
122
133
729 = 272
441 = 212
49 = 72
• Could compute average of distances
• Instead do squared distance
2
• Average squared difference
 
from mean
729  441  49  49  324  900
 615 .3
6
145
7 2 = 49
18 2 = 324
30 2 = 900
2


X

m

N
Why squared difference?
• Special property of mean
– Given population X and
some single value X,
X̂ define
– What X
X̂ minimizes MSE?

X  Xˆ 


2
M ean Squared E rror
N
• Mean
error X̂ 108 X̂X̂
X̂ minimizes
 m 122
88 94
• Variance is intrinsic, unavoidable error
133
145
108 X̂  m 122
133
145
88
94
MSE
2000
2
0
X̂
m
Alternative formula for Variance
 
2
X
N
2
 m2
• Mean of squares minus square of mean
X = [m, m, m,…]
X2 = [m2, m2, m2,…]
Mean(X2) = m2
2 = m2 – m2 = 0
• As scores move away from m (some up, some down),
S(X2) increases but m stays same
Standard deviation
• Typical difference between X and m
• Again, based on (X – m)2

X  m 2


2N

X  m


X = [5, 3, 7, 6, 4, 6, 8, 7, 4, 2, N
3, 5]
m=5
•X –Variance
is average squared
m = [0, -2, 2, 1, -1, 1, 3, 2, -1, -3, -2, 0]
so sqrt(variance) is standard deviation
Average
deviation,
2
N
Square-root
Square
(X – m)2 = [0, 4, 4, 1, 1, 1, 9, 4, 1, 9, 4, 0]
  X  m   1.8
Average
  X  m   3.2
2
N
Related documents