Download the mean - Eportfolio@UTM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
GROUPED DATA
SUBTOPIC
8.3 : Measures of Location
8.4 : Measures of Dispersion
LEARNING OUTCOMES
8.3(b) Find and interpret the mean, mode,
median, quartiles and percentile for
grouped data
8.3(c) Describe the symmetry and
skewness for a data distribution
8.4(b) Find and interpret variance,
standard deviation and coefficient of
variation for grouped data
Sketch of Median, Quartiles, Interquartiles, Decile
and Percentile from ogive
Cumulative frequency
P75 = Q3 = D7.5
Median = P50 = Q2 = D5
P25=Q1 = D2.5
X1
X2
X3
Class boundaries
Example 1
Using the ogive drawn below,
determine the
(a)Median
(b)First quartile
(c)Third decile
(d)Seventieth
percentile
5
10 15
20 25
30
35 40
Solution
(a) Median: 60/2= 30th observation
From the ogive, the median = 20
(b) First quartile:60/4=15th observation
From the ogive, the first quartile
=12.5
(c) Third decile;3/10 X 60=18th
From the ogive, the third decile =14
(d) Seventieth percentile;
70/100 X 60=42th
From the ogive percentile is
= 24.5
Seventieth percentile
Median
Third decile
First quartile
5
12.5
10 15
14
20
25 30 35
20
24.5
40
Shape of data distribution
Symmetry and Skewness
• The general shape of the data distribution can be
determine from mean, median and mode as
illustrated in the histogram or frequency curve.
• For largely skewed distribution, median is more
appropriate measure of central tendency.
• For symmetrical distribution or almost symmetrical
distribution, mean is the appropriate measure of
central tendency.
Shape of data distribution
Symmetry and Skewness
• Three important shapes:
•
i. Symmetry
•
ii. Positively skewed or rightskewed distribution
•
iii. Negatively skewed or left-skewed
distribution
(i)
Symmetrical
~The values of the mean, median
mode are identical.
~They lie at the center.
frequency
and
Mean = Median = Mode
SYMMETICAL
Mean
Median
Mode
variable
A set of observations is symmetrically distributed
if its graphical representation (histogram, bar
chart) is symmetric with respect to a vertical axis
passing through the mean. For a symmetrically
distributed population or sample, the mean,
median and mode have the same value. Half of all
measurements are greater than the mean, while
half are less than the mean.
(ii) Positively skewed or Skewed to the right
~The value of the mean is the largest ~The
mode is the smallest
~The median lies between these two
values
frequency
Mode
Mean
Median
Mean > Median > Mode
POSITIVELY SKEWED
variable
A set of observations that is not symmetrically
distributed is said to be skewed. It is positively skewed
if a greater proportion of the observations are less than
or equal to (as opposed to greater than or equal to) the
mean; this indicates that the mean is larger than the
median. The histogram of a positively skewed
distribution will generally have a long right tail; thus,
this distribution is also known as being
skewed
to the right.
(iiI) Negatively skewed or Skewed to the left
~The value of the mean is the smallest
~The mode is the largest
~The median lies between these two
values
frequency
Mean < Median < Mode
NEGATIVELY SKEWED
Mean
Mode variable
Median
A negatively skewed distribution has more
observations that are greater than or equal to the
mean. Such a distribution has a mean that is less than
the median. The histogram of a negatively skewed
distribution will generally have a long left tail; thus,
the phrase skewed to the left is applied here.
RANGE
Range = upper boundary of the last data
- lower boundary of the first class
INTERQUARTILE RANGE
• Defined as the difference between the
third quartile and the first quartile
Interquartile range = Q3 - Q1
fx 


 fx 
f
 f -1
2
2
Variance, S2 
standard deviation, S  Variance
 S
2
Example 2:
Find the range, variance and
standard deviation
Class
Frequency Class
mark x
Intervals
1-3
5
2
4-6
3
5
7-9
2
8
10-12
1
11
13-15
6
14
16-18
4
17
 f  21
2
fx
fx
10
15
16
11
84
68
20
75
128
121
1176
1156
 fx 
fx 2
= 204  2676
Solution:
Range = upper boundary of the last data
- lower boundary of the first class
= 18.5 – 0.5 = 18
fx 


 fx  f


 f 1
2
204 

2676 
2
S
2

21
20
2
S  34.71
2
S = 34.71
 5.892
REMARK
Sometimes we would like to compare the
variability of two different data sets that
have different units of measurement.
Standard deviation is not suitable since it is a
measure of absolute variability and not of
relative variability.
The most appropriate measure is the
coefficient of variation (CV) which expresses
standard deviation as a percentage of the
mean.
Coefficient of variation
standard deviation
CV 
X 100%
mean
• Note:
A larger coefficient of variation means that
the data is more dispersed and less
consistent.
Example :
Suppose we want to compare two production process that
fill containers with products
• Process A is filling fertilizer
bags, which have a nominal
weight of 80 pounds.
• For process A :
x  80.2 pounds
s 1.2 pounds
• Process B is filling
cornflakes boxes, which
have a nominal weight of
24 ounces.
• For process B :
x  24.6 ounces
s  0.4 ounces
For process A,
For process B,
1.2
CV 
100%  1.50%
0.4
80.2
CV 
100%  1.63%
24.6
Is process A much more variable than
process B because 1.2 is three times larger
than 0.4?
No because the two processes have very
similar variability relative to the size of
their means