Download The Mean

Document related concepts
no text concepts found
Transcript
There are several TYPES of variables
that reflect characteristics of the data
Ratio
Interval
Ordinal
Nominal
Ratio scale
 constant size interval between
adjacent values on the
measurement scale
 existence of a meaningful zero
point
Interval scale
 constant size interval between
adjacent values on the
measurement scale
 no true zero value
N
W
E
S
10
0
-10
Ordinal scale
 data that convey only relative
magnitude
Dark
Medium
Tall
Medium Short
Light
Nominal scale
 data in which there is no
meaningful numerical information
Single
Married
Divorced
Widowed
Another useful classification
Continuous
 data can take-on any value
Eg height 150 to 210cm range
Bill - 174.25 cm
Discrete
 data can take-on only certain values
Eg # of hands 0 to 3 range
Bill - 2 hands
2 more important issues with data
Accuracy  how close is a measured
value to the real value
Precision  how close repeated
measurements are to one another
Let’s say Bill’s real height
is 174.25 cm.
Accurate
Precise
Not Accurate
Not Precise
174.25
172
174.25
178
174.25
171
174.25
174
174.25
182
174.25
168
Not Accurate
Precise
170.25
170.25
170.25
170.25
170.25
170.25
Frequency Distribution
 occurrence of the various values
observed for the variable
 raw frequency
 counts
 relative frequency
 counts divided by total
number of observations
Name
Height (cm) Hair Colour
Anne
168
Brown
Rishi
178
Black
Bill
183
Brown
Cristin
172
Brown
Rich
175
Black
Variable: Hair Colour
Sample size = 5
Frequency of Black Hair = 2
Frequency of Brown Hair = 3
Must add to 5
Relative Frequency of Black Hair = 2/5 = 0.4
Relative Frequency of Brown Hair = 3/5 = 0.6
Must add to 1
Variable: Height
Sample size = 5
Frequency
Frequency
Frequency
Frequency
Frequency
Relative
Relative
Relative
Relative
Relative
of
of
of
of
of
168
172
175
178
183
cm = 1
cm = 1
cm = 1
cm = 1
cm = 1
Frequency
Frequency
Frequency
Frequency
Frequency
of
of
of
of
of
168
172
175
178
183
cm =
cm =
cm =
cm =
cm =
1/5
1/5
1/5
1/5
1/5
= 0.2
= 0.2
= 0.2
= 0.2
= 0.2
Make categories
Eg. Number above and number below midpoint of range
Range: Maximum - Minimum
183 cm - 168 cm = 15 cm
Mid-point: half way between Min and Max
= Min + (Range / 2)
= 168 cm + 7.5 cm
= 175.5 cm
Frequency of Heights Below 175.5 cm = 3
Frequency of Heights Above 175.5 cm = 2
Relative Frequency of Heights Below 175.5 cm =
3/5 = 0.6
Relative Frequency of Heights Above 175.5 cm =
2/5 = 0.4
Could make THREE categories
Divide range by 3: 15 cm / 3 = 5 cm
Category 1: 168 cm to 168 cm + 5 cm
 168 cm to 173 cm
Category 2: 174 cm to 174 cm + 5 cm
 174 cm to 179 cm
Category 3: 180 cm to 180 cm + 5 cm
 180 cm to 185 cm
Frequency of Heights in 168 cm to 172 cm = 2
Frequency of Heights in 173 cm to 178 cm = 2
Frequency of Heights in 179 cm to 184 cm = 1
Relative Frequency of Heights in 168 cm to 172 cm =
2/5 = 0.4
Relative Frequency of Heights in 173 cm to 178 cm =
2/5 = 0.4
Relative Frequency of Heights in 179 cm to 184 cm =
1/5 = 0.2
Mother’s age and babies birth weight data from Massachusetts
19
33
20
21
18
21
22
17
29
26
19
19
22
30
18
18
15
25
20
28
32
31
36
28
25
28
17
29
26
17
17
24
35
2523
2551
2557
2594
2600
2622
2637
2637
2663
2665
2722
2733
2750
2750
2769
2769
2778
2782
2807
2821
2835
2835
2836
2863
2877
2877
2906
2920
2920
2920
2920
2948
2948
25
25
29
19
27
31
33
21
19
23
21
18
18
32
19
24
22
22
23
22
30
19
16
21
30
20
17
17
23
24
28
26
20
2977
2977
2977
2977
2992
3005
3033
3042
3062
3062
3062
3076
3076
3080
3090
3090
3090
3100
3104
3132
3147
3175
3175
3203
3203
3203
3225
3225
3232
3232
3234
3260
3274
24
28
20
22
22
31
23
16
16
18
25
32
20
23
22
32
30
20
23
17
19
23
36
22
24
21
19
25
16
29
29
19
19
3274
3303
3317
3317
3317
3321
3331
3374
3374
3402
3416
3430
3444
3459
3460
3473
3475
3487
3544
3572
3572
3586
3600
3614
3614
3629
3629
3637
3643
3651
3651
3651
3651
30
24
19
24
23
20
25
30
22
18
16
32
18
29
33
20
28
14
28
25
16
20
26
21
22
25
31
35
19
24
45
28
29
3699
3728
3756
3770
3770
3770
3790
3799
3827
3856
3860
3860
3884
3884
3912
3940
3941
3941
3969
3983
3997
3997
4054
4054
4111
4153
4167
4174
4238
4593
4990
709
1021
34
25
25
27
23
24
24
21
32
19
25
16
25
20
21
24
21
20
25
19
19
26
24
17
20
22
27
20
17
25
20
18
18
1135
1330
1474
1588
1588
1701
1729
1790
1818
1885
1893
1899
1928
1928
1928
1936
1970
2055
2055
2082
2084
2084
2100
2125
2126
2187
2187
2211
2225
2240
2240
2282
2296
20
21
26
31
15
23
20
24
15
23
30
22
17
23
17
26
20
26
14
28
14
23
17
21
2296
2301
2325
2353
2353
2367
2381
2381
2381
2395
2410
2410
2414
2424
2438
2442
2450
2466
2466
2466
2495
2495
2495
2495
Range of the Birth Weight data:
Minimum: 709 g
Maximum: 4990 g
Difference: 4281 g
Let’s say we want to look at the
distribution of data across 10 categories.
Each category would span 428.1 g, but for
convenience we’ll round to 430 g.
Also, instead of starting our first
category at 709 g we’ll use 700g
Category
1
2
3
4
5
6
7
8
9
10
Range
Freq.
3
700-1130
3
1131-1560
1561-1990 14
1991-2420 29
2421-2850 34
2851-3280 44
3281-3710 33
3711-4140 23
4141-4750 4
4751-5000 2
Rel. Freq.
0.015873016
0.015873016
0.074074074
0.153439153
0.17989418
0.232804233
0.174603175
0.121693122
0.021164021
0.010582011
Previous breakdown ok as long as I have
measured weight to the nearest gram.
gram
BUT, if I’ve measure to the nearest 0.1
--> my categories may miss some
observations
So need to adjust…
Category
1
2
3
4
5
6
7
8
9
10
Range
700-1130
1131-1560
1561-1990
1991-2420
2421-2850
2851-3280
3281-3710
3711-4140
4141-4750
4751-5000
Measured to the nearest gram
Range
700-1130.9
1131-1560.9
1561-1990.9
1991-2420.9
2421-2850.9
2851-3280.9
3281-3710.9
3711-4140.9
4141-4750.9
4751-5000 .9
Measured to the nearest 0.1 gram
Histogram - graphical representation
of a frequency distribution
3
2.5
2
1.5
1
0.5
0
Brown Hair
Black Hair
Hair colour
Frequency distribution of neonatal birth weight
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9
Birth Weight Category
10
Frequency distribution of neonatal birth weight
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
Birth Weight Category
10
Category
1
2
3
4
5
6
7
8
9
10
Range
700-1130
1131-1560
1561-1990
1991-2420
2421-2850
2851-3280
3281-3710
3711-4140
4141-4750
4751-5000
Mid-point
915
1346
1776
2206
2636
3066
3496
3926
4356
4966
Frequency distribution of neonatal birth weight
91
5
13
46
17
76
22
06
26
36
30
66
34
96
39
26
43
56
49
66
50
40
30
20
10
0
Birth Weight Category Mid-point
Cumulative Frequency - Cum. Freq. at any
category is equal to the frequency at that
category plus the frequency in each previous
category.
Category
1
2
3
4
5
6
7
8
9
10
Range
700-1130
1131-1560
1561-1990
1991-2420
2421-2850
2851-3280
3281-3710
3711-4140
4141-4750
4751-5000
Freq.
3
3
14
29
34
44
33
23
4
2
Rel. Freq.
0.0158
0.0158
0.07407
0.15343
0.17989
0.23280
0.17460
0.12169
0.02116
0.01058
Cum. Freq.
0.0158
0.0317
0.1058
0.2592
0.4391
0.6719
0.8465
0.9682
0.9894
1.0
Frequency distribution of neonatal birth weight
1.2
1
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
8
9
Birth Weight Category
10
Measures of Central Tendency
 These generally tell you where the
majority of the observations lie
 Each one tells something slightly
different
Mean
Median
Mode
Average
Middle Value
Most Frequent Value
The Mean:
The mean is calculated by summing the observed
values and dividing the sum by the total number of
observations.
Population Mean = μ
Sample Mean = X
A die has 6 sides, 1 dot, 2, 3, 4, 5, and 6
1 2  3 4  5  6

 3.5dots
6
2  3 4
X
 3dots
3
X 1  X 2  X 3  ...  X N

N
X 1  X 2  X 3  ...  X n
X
n


 Xi 
i 1



N
N


 Xi 
i 1


X
n
n
Rishi
Anne
Bill
Cristin
Rich
Observation
i
Height
Xi
1
2
3
4
5
172
185
132
191
205
n=5
 = 885
X ' s 885
X 

 177
n
5
19
33
20
21
18
21
22
17
29
26
19
19
22
30
18
18
15
25
20
28
32
31
36
28
25
28
17
29
26
17
17
24
35
2523
2551
2557
2594
2600
2622
2637
2637
2663
2665
2722
2733
2750
2750
2769
2769
2778
2782
2807
2821
2835
2835
2836
2863
2877
2877
2906
2920
2920
2920
2920
2948
2948
25
25
29
19
27
31
33
21
19
23
21
18
18
32
19
24
22
22
23
22
30
19
16
21
30
20
17
17
23
24
28
26
20
2977
2977
2977
2977
2992
3005
3033
3042
3062
3062
3062
3076
3076
3080
3090
3090
3090
3100
3104
3132
3147
3175
3175
3203
3203
3203
3225
3225
3232
3232
3234
3260
3274
24
28
20
22
22
31
23
16
16
18
25
32
20
23
22
32
30
20
23
17
19
23
36
22
24
21
19
25
16
29
29
19
19
3274
3303
3317
3317
3317
3321
3331
3374
3374
3402
3416
3430
3444
3459
3460
3473
3475
3487
3544
3572
3572
3586
3600
3614
3614
3629
3629
3637
3643
3651
3651
3651
3651
30
24
19
24
23
20
25
30
22
18
16
32
18
29
33
20
28
14
28
25
16
20
26
21
22
25
31
35
19
24
45
28
29
3699
3728
3756
3770
3770
3770
3790
3799
3827
3856
3860
3860
3884
3884
3912
3940
3941
3941
3969
3983
3997
3997
4054
4054
4111
4153
4167
4174
4238
4593
4990
709
1021
34
25
25
27
23
24
24
21
32
19
25
16
25
20
21
24
21
20
25
19
19
26
24
17
20
22
27
20
17
25
20
18
18
1135
1330
1474
1588
1588
1701
1729
1790
1818
1885
1893
1899
1928
1928
1928
1936
1970
2055
2055
2082
2084
2084
2100
2125
2126
2187
2187
2211
2225
2240
2240
2282
2296
20
21
26
31
15
23
20
24
15
23
30
22
17
23
17
26
20
26
14
28
14
23
17
21
2296
2301
2325
2353
2353
2367
2381
2381
2381
2395
2410
2410
2414
2424
2438
2442
2450
2466
2466
2466
2495
2495
2495
2495
n = 189
189
X
i 1
i
 556540
189
n = 189
X
i 1
i
 556540
X ' s 556540
X

 2944.656
n
189
Another way to calculate the mean
Suppose you had a frequency distribution for the number of
cancerous moles on people who regularly visit Club Med
# cancerous moles
(X)
Frequency
(f)
0
1
2
3
4
5
8
4
8
10
2
1
# cancerous moles
(x)
Frequency
(f)
0
1
2
3
4
5
8
4
8
10
2
1
0
4
16
30
8
5
n = 33
 f*x = 63
n =  f’s
 X’s =  f*x
f*x
f * x 63
X

 1.909
f
33
The Mode: the most frequently occurring value in a set of
measurements
Frequency distribution of neonatal birth weight
50
40
30
20
10
0
1
2
3
4
5
6
7
8
Birth Weight Category
9
10
Category
1
2
3
4
5
6
7
8
9
10
Range
700-1130
1131-1560
1561-1990
1991-2420
2421-2850
2851-3280
3281-3710
3711-4140
4141-4750
4751-5000
Freq.
3
3
14
29
34
44
33
23
4
2
Rel. Freq.
0.015873016
0.015873016
0.074074074
0.153439153
0.17989418
0.232804233
0.174603175
0.121693122
0.021164021
0.010582011
Mid-point is 3065.5 --> report the MODE as 3065.5
The Median: the middle measurement of a set of data
--> data must be ordered
Observation (X)
1
2
3
4
5
6
7
8
9
Heights (cm) Ordered Heights (cm)
178
123
143
143
123
168
189
173
187
178
205
187
168
189
173
198
198
205
Median is 178 cm
Observation (X)
1
2
3
4
5
6
7
8
9
10
Heights (cm) Ordered Heights (cm)
178
123
143
143
123
162
189
168
187
173
205
178
168
187
173
189
198
198
162
205
Middle observation is 5.5 --> median is midway between
observation 5 and observation 6
Median is (173+178)/2 = 175.5
General formula for Median:
If n is an odd number:
  X ( n 1) / 2
  X ( 91) / 2
  X ( 5)  178
General formula for Median:
If n is an even number:
  X ( n 1) / 2
  X (101) / 2
  X ( 5. 5 )
X5  X6

2
173  178

 175.5
2
# cancerous moles
(X)
Frequency
(f)
0
1
2
3
4
5
8
4
8
10
2
1
M = X(n+1)/2=X17=2
Cumulative
Frequency
8
12
20
30
32
33
0
0
0
0
0
0
0
0
1
1
1
1
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
4
4
5
Category
1
2
3
4
5
6
7
8
9
10
Range
700-1130
1131-1560
1561-1990
1991-2420
2421-2850
2851-3280
3281-3710
3711-4140
4141-4750
4751-5000
M = X(n+1)/2 = X190/2 = X95
Freq.
3
3
14
29
34
44
33
23
4
2
Cum. Freq.
3
6
20
49
83
127
160
183
187
189
Of the previous class
Median =
(lower limit of class) + ((0.5*n - cum.freq.)/#obs in interval)(interval size)
= 2851 + ((0.5*189- 83)/44) * (430)
= 2851 + (94.5-83)/44 *430
= 2963.4
Frequency distribution of neonatal birth weight
50
40
30
20
10
0
1
2
3
4
5
6
7
8
Birth Weight Category
9
10
Symetrical, unimodal distribution
Mean, Mode and Median
45
40
35
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Symetrical, bimodal distribution
Mean
Medain
Mode
Mode
18
16
14
12
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Asymmetric distribution
Mode
Median
Mean
45
40
35
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Asymmetric distribution
Mean
Median
Mode
45
40
35
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Measures of Dispersion and Variability
Frequency distribution of neonatal birth weight
50
40
30
20
10
0
1
2
3
4
5
6
7
8
Birth Weight Category
9
10
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Maximum
Mean
Minimum
0
0.2
0.4
0.6
0.8
1
1.2
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Maximum
Mean
Deviation
Observationi
Minimum
0
0.2
0.4
0.6
0.8
1
1.2
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
Average Deviation from the Mean
--> on average, how much do the
individual observations differ from
the mean?
n
( Xi  X )
i 1
n
i
1
2
3
4
5
6
7
Xi  X
Xi
1.2
1.4
1.6
1.8
2.0
2.2
2.4
X=12.6
n=7
1.2-1.8 = -0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
 X i  X   0
7
i 1
12.6
X
 1.8
7
Xi  X
X
i  X
2
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
Average Absolute Deviation from the Mean
--> on average, how much do the
individual observations differ from
the mean?
n
 Xi  X
i 1
n
i
1
2
3
4
5
6
7
Xi
1.2
1.4
1.6
1.8
2.0
2.2
2.4
X=12.6
n=7
12.6
X
 1.8
7
X
Xi  X
Xi  X
1.2-1.8 = -0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.0
i  X
|1.2-1.8| = 0.6
0.4
0.2
0.0
0.2
0.4
0.6
7
 Xi  X
i 1
7
2.4

 0.34
7
2
Sum of Squared Deviations
n
SS   ( X i  X )
i 1
“Sum of Squares”
2
i
1
2
3
4
5
6
7
Xi
1.2
1.4
1.6
1.8
2.0
2.2
2.4
X=12.6
n=7
12.6
X
 1.8
7
Xi  X
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.0
X
Xi  X
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.34
n
i  X
2
(-0.6)2 = 0.36
0.16
0.04
0
0.04
0.16
0.36
1.12
2
(
X

X
)
 1.12
 i
i 1
Variance
--> mean sum of squares
n
 
2
( X
i 1
s 
 )
Population
N
n
2
i
2
( X
i 1
i
 X)
n 1
2
Sample
i
1
2
3
4
5
6
7
Xi
1.2
1.4
1.6
1.8
2.0
2.2
2.4
X=12.6
n=7
12.6
X
 1.8
7
Xi  X
Xi  X
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.0
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.34
X
i  X
(-0.6)2 = 0.36
0.16
0.04
0
0.04
0.16
0.36
1.12
n
s 
2
2
(
X

X
)
 i
i 1
n 1
2
1.12

 0.1867
6
Standard Deviation
 
2
s s
Population
2
Sample
Coefficient of Variation
s
V
X
S expressed as a %
of the mean
--> allows comparison of variability among
samples measured in different units or
scales.
Mean Deviation
Variance
3Standard deviation
CV
2.5
2
1.5
1
0.5
0
0.34
0.1867
0.43
0.24
0.26
0.1367
0.37
0.21
Standard Error of the Mean
 Recall:
 How
x and s are estimates of μ and σ
good are these measures??
 Need
level of uncertainty (due to sampling
error) in the mean:
SEx = s/√ n
Confidence Intervals
 SE
= measure of how far x is likely to be
from μ
2
* SE = 95% confidence
 I.e.
μ is inside 2 * SE 95% of the time
Reporting variability about the mean.
Text
In a table as in previous slide.
Or, for example, in a manuscript, I might write:
The mean (± 95% CI) for the random samples of 100, 50,
25 and 10 was 24.84079 (±0.1816), 24.91241(±0.31996),
24.86719 (±0.40142) and 25.16212 (±0.859) respectively.
You are not restricted to using the confidence intervals when
reporting variability about the mean, ie I could have used
mean ± std dev, or mean ± std error
Graphically: Box Plot or Box and Whisker Plot
3
2
5
0
95% CI
Standard Error
Mean
3
1
5
0
3
0
5
0
NeonateWeight(g)
2
9
5
0
2
8
5
0
2
7
5
0
2
6
5
0
2
5
5
0
N
o
n
-s
m
o
k
e
rs
S
m
o
k
e
rs
T
y
p
eo
fM
o
th
e
r
Graphically: Box Plot or Box and Whisker Plot
3
2
5
0
95% CI
Standard Error
Mean
3
1
5
0
3
0
5
0
NeonateWeight(g)
2
9
5
0
2
8
5
0
2
7
5
0
2
6
5
0
2
5
5
0
N
o
n
-s
m
o
k
e
rs
S
m
o
k
e
rs
T
y
p
eo
fM
o
th
e
r
Graphically: Box Plot or Box and Whisker Plot
3
2
5
0
95% CI
3
1
5
0
Mean
3
0
5
0
NeonateWeight(g)
2
9
5
0
2
8
5
0
2
7
5
0
2
6
5
0
2
5
5
0
N
o
n
-s
m
o
k
e
rs
S
m
o
k
e
rs
T
y
p
eo
fM
o
th
e
r
Graphically: Box Plot or Box and Whisker Plot
95% CI
4
0
0
0
3
5
0
0
Mean
3
0
0
0
2
5
0
0
NeonateWeight(g)
2
0
0
0
1
5
0
0
1
0
0
0
5
0
0
0
N
o
n
-s
m
o
k
e
rs
S
m
o
k
e
rs
T
y
p
eo
fM
o
th
e
r
Related documents