Download Chapter 3 - Portal UniMAP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 3 : DESCRIPTIVE
STATISTIC : NUMERICAL
MEASURES (STATISTICS)
DESCRIPTIVE STATISTICS :
NUMERICAL MEASURES (STATISTICS)
3.1 Measures of Central Tendency/ Location
There are 3 popular central tendency measures, mean, median
& mode.
1) Mean
 The mean of a sample is the sum of the measurements divided
by_the number of measurements in the set. Mean is denoted by
(x
)
Mean = Sum of all values / Number of values
 Mean can be obtained as below :- For raw data, mean is defined by,
_
x
x1  x2  .......  xn

x
, for n  1,2,..., n or x 
n
n
_
Example 3.1:
Table 3.1
MLB Team
Anaheim Angels
Atlanta Braves
New York Yankees
St. Louis Cardinals
Tampa Bay Devil Rays
2002 Total Payroll
(Million of dollars)
62
93
126
75
34
Total
390
The mean sample of CGPA (raw/ungroup) is:
x

x
_
n
62  93  126  75  34

5
390

5
 78
- For tabular/group data, mean is defined by:
n

x 

i 1
n
f i xi

i 1
or
fi
 fx
f
Where
f  class frequency;
x
 class mark
 mid
point 
Example 3.2 :
The mean sample for Table 3.2
CGPA
(Class)
Frequency, f
2.50 - 2.75
2.75 - 3.00
2
10
Class Mark
(Midpoint), x
2.625
2.875
fx
5.250
28.750
3.00 - 3.25
15
3.125
46.875
3.25 - 3.50
13
3.375
43.875
3.50 - 3.75
7
3.625
25.375
3.75 - 4.00
3
3.875
11.625
Total
50
Table 3.2
161.750
n

x 

i 1
n
f i xi

i 1
fi
161.75
50
 3.235

2) Median
 Median is the middle value of a set of observations arranged in
~
order of magnitude and normally is devoted by x
i) The median for ungrouped data.
- The median depends on the number of observations in the
data, n.
- If nis odd, then the median is the ( n 2 1) th observation of the
ordered observations.
- If nis even, then the median is the arithmetic mean of the
n th observation and the ( n  1) th observation.
2
2
ii) The median of grouped data / frequency of distribution.
The median of frequency distribution is defined by:
f F 
j 1

x  L  c 2
fj




~
where,
• L = the lower class boundary of the median class;
• c = the size of the median class interval;
•F = the sum of frequencies of all classes lower than the median
class
f
• = the frequency of the median class.
j 1
j
Example 3.3 for ungrouped data :The median of this data 4, 6, 3, 1, 2, 5, 7, 3 is 3.5.
Proof :- Rearrange the data in order of magnitude becomes 1,2,3,3,4,5,6,7. As n=8 (even), the
median is the mean of the 4th and 5th observations that is 3.5.
Example 3.4 for grouped data :Cum.
CGPA (Class)
Frequency, f
frequency
2.50 - 2.75
2
2
2.75 - 3.00
10
12
f F 
j 1

x  L  c 2
fj




~
 25  12 
Median , x  3.00  0.25
 3.217

 15 
~
3.00 - 3.25
15
27
3.25 - 3.50
13
40
3.50 - 3.75
7
47
3.75 - 4.00
3
50
Total
50
Table 3.3
3) Mode
•
The mode of a set of observations is the observation with the
highest frequency and is usually denoted by (x
). Sometimes
mode can also be used to describe the qualitative data.
i) Mode of ungrouped data :- Defined as the value which occurs most frequent.
- The mode has the advantage in that it is easy to calculate
and eliminates the effect of extreme values.
- However, the mode may not exist and even if it does exit, it
may not be unique.


*Note:
If a set of data has 2 measurements with higher frequency,
therefore the measurements are assumed as data mode and
known as bimodal data.
If a set of data has more than 2 measurements with higher
frequency so the data can be assumed as no mode.
ii) The mode for grouped data/frequency distribution data.
- When data has been grouped in classes and a frequency
curve is drawn to fit the data, the mode is the value of
corresponding to the maximum point on the curve.
- Determining the mode using formula.
 1 
x  Lc

 1   2 

where
L  the lower class boundary of the modal class
c  the size of the modal class interval;
1  the difference between the modal class frequency and the
class before it;and
 2  the difference between the modal class frequency and the
class after it.
*Note:
- The class which has the highest frequency is called the modal class.
Example 3.5 for ungrouped data :
The mode for the observations 4,6,3,1,2,5,7,3 is 3.
Example 3.6 for grouped data based on table :
Proof :-
Modal
Class
CGPA
(Class)
2.50 - 2.75
2.75 - 3.00
3.00 - 3.25
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
Total
Table 3.4
Frequency
2
10
15
13
7
3
50
 1 
x  Lc




 1
2 
5
 3.00  0.25(
)
52
 3.179

3.2 Measure of Dispersion
 The measure of dispersion/spread is the degree to which a set
of data tends to spread around the average value.
 It shows whether data will set is focused around the mean or
scattered.
 The common measures of dispersion are:
1) range
2) variance
3) standard deviation
 The standard deviation actually is the square root of the
variance.
 The sample variance is denoted by s2 and the sample standard
deviation is denoted by s.
1) Range
The range is the simplest measure of dispersion to calculate.
Range = Largest value – Smallest value
Example 3.7:
Table 3.5 gives the total areas in square miles of the four western SouthCentral states the United States.
State
Total Area
(square
miles)
Arkansas
53,182
Louisiana
49,651
Oklahoma
69,903
Texas
267, 277
Table 3.4
Solution:
Range = Largest Value – Smallest Value
= 267, 277 – 49, 651 = 217, 626 square miles.
2) Variance
i) Variance for ungrouped data
 The variance of a sample (also known as mean square) for the
raw (ungrouped) data is denoted by s2 and defined by:

2
(
x

x
)
S2  
n 1
ii) Variance for grouped data
 The variance for the frequency distribution is defined by:

fx

 fx  n
2
S 
2
2
f
(
x

x
)

 fx  1
2

n 1
Example: Ungrouped Data

7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6

Range = 9-4=5

Mean
x

x
 6.5
_
n


Variance
S 
2
2
(
x

x
)

n 1
18.5

 2.0556
9


Standard Deviation S 
2
(
x

x
)

n 1

2.0556  1.4337
16
Example: Ungrouped Data

Variance, S 
2
7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6


( x  x)
n 1
 4  6.5    5  6.5    6  6.5    6  6.5    6  6.5    7  6.5    7  6.5    7  6.5   8  6.5   9  6.5 
2


2
2
2
2
2
2
2
2
2
10  1
18.5
9
 2.0556

S
 ( x  x)
n 1
2

2.0556  1.4337
17
2
Example 3.9 for grouped data :
CGPA (Class)
2.50 - 2.75
2.75 - 3.00
3.00 - 3.25
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
Frequency, f
2
10
15
13
7
3
Total
Class Mark,
x
2.625
2.875
3.125
3.375
3.625
3.875
50
fx
5.250
28.750
46.875
43.875
25.375
11.625
fx2
13.781
82.656
146.484
148.078
91.984
45.047
161.750
528.031
Table 3.5
The variance for frequency distribution in Table 3.5 is:
S2 
 fx
2

f x


n 1
n
2
(161.75) 2
528.031 
50

 0.0973
49
3) Standard Deviation
i) Standard deviation for ungrouped data :
S2 
 ( x  x)
2
n 1
ii) Standard deviation for grouped data :2
S2 

 fx  1
f ( x  x )2

 fx
2
f x


n 1
n
Example 3.10 (Based on example 3.8) for ungrouped data:
*Refer example
Example 3.11 (Based on example 3.9) for grouped data:
S 
2


 f
fx 2  
n
n 1

x


2
(161.75) 2
528.031 
50

 0.0973
49
 0.3119
3.3 Rules of Data Dispersion
By using the mean x and standard deviation, we can find the
percentage of total observations that fall within the given
interval about the mean.
i) Chebyshev’s Theorem
1
At least (1  k 2 ) of the observations will be in the range of k
standard deviation from mean.
where k is the positive number exceed 1 or (k>1).
Applicable for any distribution /not normal distribution.
Steps:
1)
2)
3)
4)
Determine the interval x  ks
1
Find value of (1  2 )
k
Change the value in step 2 to a percent
Write statement: at least the percent of data found in step 3 is in the
interval found in step 1
Example 3.12 :
Consider a distribution of test scores that are badly skewed to
the right, with a sample mean of 80 and a sample standard
deviation of 5. If k=2, what is the percentage of the data fall in
the interval from mean?
Solution:
1) Determine interval
x  ks
2)
Find
1
1
k2
 1

3)
4)
3
4
 80  ( 2)(5)
 (70,90)
1
22
3
 75%
4
Convert into percentage:
Conclusion: At least 75% of the data is found in the interval
from 70 to 90
ii) Empirical Rule
Applicable for a symmetric bell shaped distribution / normal
distribution.
k is a constant. k is a 1, 2 or 3 for Empirical Rule.
There are 3 rules:


i. 68% of the observations lie in the interval ( x s, x  s )
ii. 95% of the observations lie in the interval ( x  2s, x  2s)


iii. 99.7% of the observations lie in the interval ( x 3s, x 3s)
If k is not given, then:
Formula for k =Distance between mean and each point
standard deviation
Example
The age distribution of a sample of 5000 persons is bell-shaped with a
mean of 40 yrs and a standard deviation of 12 yrs. Determine the
approximate percentage of people who are 16 to 64 yrs old.
40  16
12
24

12
2
k
Solution:
95% of the people in the sample are 16 to 64 yrs old.
Exercise for summarizing data
The following data give the total number of
iPods sold by a mail order company on each of
30 days. Construct a frequency table.
23
22
8

14
13
25
19
26
11
23
16
15
20
18
28
16
12
22
27
9
10
9
26
5
21
20
17
14
16
21
Find the mean, variance and standard deviation,
mode and median.
Institut Matematik Kejuruteraan,
UniMAP
25
Related documents