Download Positive skew - WordPress.com

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measures of Variability
OBJECTIVES
•To understand the different measures of
variability
•To determine the range, variance, quartile
deviation, mean deviation and standard
deviation for ungrouped and grouped data
Measures of dispersion
(variability or spread)

consider the extent to which the
observations vary
MEASURES OF VARIATION





RANGE
QUARTILE DEVIATION
MEAN DEVIATION
VARIANCE
STANDARD DEVIATION
1. Range, R



The difference in value between the
highest-valued data, H, and the lowestvalued data, L
R=H–L
Example: 3, 3, 5, 6, 8

R=H–L=8–3=5
2. Quartile Deviation, QD


or semi-interquartile range
obtained by getting one half the
difference between the third and the
first quartiles
Q3  Q1
QD 
2
SOLVE FOR Q1 and Q3
N
 cf
Q1  ll  4
w
fi
3N
 cf
Q3  ll  4
w
fi
cf  less than cumulative frequency
before the quantile class
f i  frequency of the quantile class
w  class width or size of class interval (or simply classes)
Problem




The examination scores of 50 students
in a statistics class resulted to the
following values:
Q3 = 75.43
Q1 = 54.24
Determine the value of the quartile
deviation or semi-interquartile range.
Solution
Q3  Q1 75.43  54.24
QD 

2
2
 10.60
Problem
Compute the value of the semiinter quartile range or quartile deviation

The performance ratings of 100 faculty
members of a certain college are
presented in a frequency distribution as
follows:
Class interval or Classes







71-74
75-78
79-82
83-86
87-90
91-94
95-98
f
3
10
13
18
25
19
12
<cf
3
13
26
44
69
88
100
1st quartile class
3rd quartile class
Solution
(Grouped data)
Q3  Q1
QD 
2
N
  cf
Q1  ll   4
 f Q1


Q1  82.19


25  13 

 w  78.5  
4
13 




 3N

 cf 

4

w
Q3  ll 
 fQ3 




 75  69 
 90.5  
4
 19 
Q3  91.76
Solution cont’d…
Substitute
91.76  82.19
QD 
2
QD  4.78
3. Mean Deviation, MD

– based on all items in a distribution
For ungrouped data
x

MD 
i
For grouped data
fx

MD 
n
where MD  mean deviation
f  frequency
n
xi  x  x
x  the individual values
x  mean of the distributi on
n  sample size
i
4. Variance, s2


- most commonly used measure of
variability
- the square of standard deviation
For ungrouped data

x

x  n
2
s2 
 xi
2
2
n
or s 2 
n
where
s 2  variance of a set of observatio ns
xi  x  x  the deviation of a score from the mean
x  the mean
x  a score
n  sample size
Note:

The greater the variability of the
observations in a data set, the greater
the variance. If there is no variability of
the observations, that is, if all are equal
and hence, all are equal to the mean

then s2 = 0
For grouped data
s
2


fxi
n
2
or s 2 
 fx

fx


2
2
n
n
where
s 2  variance of a set of observatio ns
xi  x  x  the deviation of a score from the mean
x  the mean
x  individual values in the distriutio n
n  sample size
5. Standard Deviation, s

- the positive square root of the
variance
s s
2
Problem:
Find the (a) range, (b) quartile
deviation, © mean deviation, (d) variance and
(e) standard deviation











Student
1
2
3
4
5
6
7
8
9
10
Score
50
48 lowest value
72
67
71
65
73 highest value
62
64
60
(a) Range, R


R=H–L
R = 73 – 48 = 25
(b) Quartile Deviation, QD

Arrangement in ascending order

48 50 60 62 64 65 67 71 72 73
Using method 3 for finding Qn (ungrouped data)

Q1 is located at n/4 = 10/4 = 2.5


Q1 = (50+60)/2 = 55
Q3 is located at 3n/4 = 3(10)/4 = 7.5
Q3 = (67+71)/2 =69
QD cont’d…
Q3  Q1 69  55
QD 

2
2
QD  7
© Mean Deviation, MD
x

MD 
i
n
65.6

 6.56
10
First, solve for the mean
Ungrouped data
73  72  71  67  65  64  62  60  50  48
x
10
x  63.20
Data for mean deviation, MD











Score, x
73
72
71
67
65
64
62
60
50
48
TOTAL
xi = x- x
9.8
8.8
7.8
3.8
1.8
0.8
-1.2
-3.2
-13.2
-15.2
65.6
xi2
96.04
77.44
60.84
14.44
3.24
0.64
1.44
10.24
74.24
231.04
669.60
(d) Variance, s2
s
2
x


i
n
2

669.6
10
 66.96
(e) Standard Deviation, s
s  66.96
2
s  66.96  8.18
Problem:

The following are marks obtained by a
group of 40 university students on an
English examination:
42
88
37
75
98
93
73
62
96
80
52
76
66
73
69
54
83
62
53
79
69
56
81
75
52
65
49
80
67
59
88
80
44
71
87
82
89
79
72
91
Find the following:





a. range
b. quartile deviation
c. mean deviation
d. variance
e. standard deviation
Solution
a. Range, R = H – L
= 98 – 37
= 61
b. Quartile Deviation, QD
Q3  Q1
QD 
2
 3n

  cf 
w
where Q 3  ll   4
 fQ3 




 30  26 
 79.5  
5  82.83
 6 
English scores of 40 university students
Classes
f
<cf
95-99
2
40
90-94
2
38
85-89
4
36
80-84
6
32
75-79
5
26
70-74
4
21
65-69
5
17
60-64
2
12
55-59
2
10
50-54
4
8
45-49
1
4
40-44
2
3
35-39
1
1
Solve for Q1
n

  cf 
w
Q1  ll   4
 f Q1 




 10  8 
 54.5  
5  59.50
 2 
Substitute
Q3  Q1
QD 
2
82.83 - 59.50

2
 11.67
c. Mean Deviation, MD
2840
x
 71
40
xi  x  x
Refer to the table
f x

MD 
i
n
516

 12.9
40
Data for mean deviation, MD
Class interval
x
f
fx
|xi|
f|xi|
95-99
97
2
194
26
52
90-94
92
2
184
21
42
85-89
87
4
348
16
64
80-84
82
6
492
11
66
75-79
77
5
385
6
30
70-74
72
4
288
1
4
65-69
67
5
335
4
20
60-64
62
2
124
9
18
55-59
57
2
114
14
28
50-54
52
4
208
19
76
45-49
47
1
47
24
24
40-44
42
2
84
29
58
35-39
37
1
37
34
34
40
2840
Total
516
d. Variance, s2
s
2


fxi
n
 241.5
2
9660

40
Data for the variance, s2
Class interval
x
f
fx
xi
fxi2
95-99
97
2
194
26
1352
90-94
92
2
184
21
882
85-89
87
4
348
16
1024
80-84
82
6
492
11
726
75-79
77
5
385
6
180
70-74
72
4
288
1
4
65-69
67
5
335
-4
80
60-64
62
2
124
-9
162
55-59
57
2
114
-14
392
50-54
52
4
208
-19
1444
45-49
47
1
47
-24
576
40-44
42
2
84
-29
1682
35-39
37
1
37
-34
1156
Total
40
9660
e. Standard Deviation, s
The standard deviation, s, is the
positive square root of the variance, s
s  241.5  15.54
2
New Topic…
Objectives


To know the measures of skewness and
kurtosis
To find the Pearsonian coefficient of
skewness
Measures of Skewness

summarize the extent to which the
observations are symmetrically
distributed
Skewness


the degree to which a distribution
departs from symmetry about its mean
value
or refers to asymmetry (or "tapering")
in the distribution of sample data
Positive skew






the right tail is longer
the mass of the distribution is
concentrated on the left of the figure
has a few relatively high values
the distribution is said to be right-skewed
mean > median > mode
the skewness is greater than zero
Negative skew






the left tail is longer
the mass of the distribution is
concentrated on the right of the figure
has a few relatively low values
the distribution is said to be left-skewed
mean < median < mode
the skewness is lower than zero
No skew


the distribution is symmetric like the
bell-shaped normal curve
mean = median = mode
~
x  x  xˆ
OR…
Exercise
Pearsonian coefficient of skewness
x  ~x 
x  xˆ
Sk 
or S k  3
s
s
where Sk  Pearsonian coefficien t
of skewness
x  mean
~
x  median
x̂  mode
Skewness based on quartiles
Sk

Q3  Q2   Q2  Q1 

Q3  Q1
where
Q1  1st quartile
Q 2  2nd quartile
Q3  3rd quartile
Interpretation

If skewness is positive, the data are
positively skewed or skewed right,
meaning that the right tail of the
distribution is longer than the left. If
skewness is negative, the data are
negatively skewed or skewed left,
meaning that the left tail is longer.
Interpretation cont’d…

If skewness = 0, the data are perfectly
symmetrical. But a skewness of exactly
zero is quite unlikely for real-world
data, so how can you interpret the
skewness number? In the classic
Principles of Statistics (1965), M.G.
Bulmer suggests this rule of thumb:
Interpretation cont’d…


If skewness is less than −1 or greater
than +1, the distribution is highly
skewed.
If skewness is between −1 and −½ or
between +½ and +1, the distribution is
moderately skewed.
Interpretation cont’d…


If skewness is between −½ and +½,
the distribution is approximately
symmetric.
Example:

With a skewness of −0.1082, the sample
data are approximately symmetric.
Problem

Find the Pearsonian coefficient of
skewness of the set of data shown in
the following table:
Scores of ten students in a
mathematics ability test
Student
1
2
3
4
5
6
7
8
9
10
Score
50
48
72
67
71
65
73
62
64
60
Computed values

Refer to the previous computations
x  63.20
s  8.18
65  71
~
x  Mdn 
 68
2
(x  ~
x)
(63.2  68)
Sk  3
3
s
8.18
S k  1.76
Interpretation

Negative sign means



the tail extends to the left
the mean is less than the mode by 176%
considered a substantial departure from
symmetry
Problem

Find the Pearsonian coefficient of
skewness for the following set of data:
x  71
s  15.54
d1
xˆ  Mo  ll 
( w)
d1  d 2
2
xˆ  79.5 
(5)  82.83
2 1
Class interval
x
f
fx
|xi|
f|xi|
95-99
97
2
194
26
52
90-94
92
2
184
21
42
85-89
87
4
348
16
64
80-84
82
6
492
11
66
75-79
77
5
385
6
30
70-74
72
4
288
1
4
65-69
67
5
335
4
20
60-64
62
2
124
9
18
55-59
57
2
114
14
28
50-54
52
4
208
19
76
45-49
47
1
47
24
24
40-44
42
2
84
29
58
35-39
37
1
37
34
34
2840
516
Total
40
d1
xˆ  Mo  ll 
( w)
d1  d 2
2
xˆ  79.5 
(5)  82.83
2 1
x  xˆ 71  82.83
Sk 

s
15.54
S k  0.761
Interpretation

Negative (-) computed value means



the mean is less than the mode by 76.1%
considered quite negligible departure from
symmetry
given set of data is more or less evenly
distributed
Problem

Find the Pearsonian coefficient of
skewness for the distribution whose
mean, x  20.5
mode , x̂  18.6 and
standard deviation, s  5
Solution
x  xˆ 20.5  18.6
S k

s
5
S k  0.38
Interpretation

Positive sign indicates


the tail of the distribution extends to the right
Computed value means
 the mean is greater than the mode by 38%
 considered negligible skewness
Measures of Kurtosis

Kurtosis - the degree of peakedness (or
flatness) of a distribution
Standardiz ed kurtosis measure
n
m4
m4 '  4
s
where m 4 
i    xi  x 
4
i 1
n 1
s  standard deviation  s
2
Types of Kurtosis

Mesokurtic distribution


a normal distribution, neither too peaked
nor too flat
its kurtosis (Ku) is equal to 3
Leptokurtic distribution



has a higher peak than the normal
distribution
with narrow humps and heavier tails
its kurtosis (Ku) is higher than 3
Platykurtic distribution



has a lower peak than a normal
distribution
flat distributions with values evenly
distributed about the center with broad
humps and short tails
its kurtosis (Ku) is less than 3