Download Chapter 3 - UniMAP Portal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 3 : DESCRIPTIVE
STATISTIC : NUMERICAL
MEASURES (STATISTICS)
DESCRIPTIVE STATISTICS :
NUMERICAL MEASURES (STATISTICS)
3.1 Measures of Central Tendency
• Gives the center of a histogram or a frequency distribution
curve.
3.1.1 Different measures of central tendency
i. Mean
• The mean of a sample is the sum of the measurements
divided by the number of measurements in the set. Mean is
denoted by
Mean = Sum of all values / Number of values
• Mean x can be obtained as below :For ungrouped data, mean is defined by,
_
x
x1  x2  .......  xn

x
, for n  1,2,..., n or x 
n
n
_
For grouped data, mean is defined by,
n
x
fx
i
i 1
n
f
i 1
i
i
fx

or
f
Where f = class frequency;
x = class mark (mid point)
Example 3.1:2002 Total Payroll
(Million of dollars)
MLB Team
Anaheim Angels
62
Atlanta Braves
93
New York Yankees
126
St. Louis Cardinals
75
Tampa Bay Devil Rays
34
Total
390
The mean sample of CGPA (raw) is
x 390

x

 78
n
5
Table 3.1
Example 3.2 :The mean sample for Table 3.2
CGPA
(Class)
2.50 - 2.75
2.75 - 3.00
3.00 - 3.25
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
Total
Table 3.2
Frequency
2
10
15
13
7
3
50
Class
Mark
2.625
2.875
3.125
3.375
3.625
3.875
fx
5.250
28.750
46.875
43.875
25.375
11.625
161.750
n
x
fx
i 1
n
i i
f
i 1
i
161.75

 3.235
50
ii. Median
• Median is the middle value of a set of observations
arranged in
~
order of magnitude and normally is devoted by x
1. The median for ungrouped data.
- The median depends on the number of observations in the
data, n.
- If nis odd, then the median is the ( n 2 1) th observation of the
ordered observations.
- If nis even, then the median is the arithmetic mean of the
n th observation and the ( n  1) th observation.
2
2
2. The median of grouped data / frequency of distribution.
The median of frequency distribution is defined by:
f F 
j 1

x  Lc 2
fj




where,
• L = the lower class boundary of the median class;
• c = the size of the median class interval;
• F = the sum of frequencies of all classes lower than the median
class; and
• f = the frequency of the median class.
j 1
j
Example 3.3 for ungrouped data :The median of this data 4, 6, 3, 1, 2, 5, 7, 3 is 3.5.
- Rearrange the data in order of magnitude becomes
1,2,3,3,4,5,6,7. As n=8 (even), the median is the mean of the
4th and 5th observations that is 3.5.
Example 3.4 for grouped data :f F 
j 1

x  Lc 2
fj 



= 3.217
Proof :CGPA
(Class)
2.50 - 2.75
2.75 - 3.00
Frequency
2
10
3.00 - 3.25
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
Total
15
13
7
3
50
 25  12 
Median, x  3.00  0.25 
 3.217

15


Class
Mark
2
12
Median
27
40
47
50
Table 3.3
iii. Mode
• The mode of a set of observations is the observation with the
highest frequency and is usually denoted by . Sometimes mode
can also be used to describe the qualitative data.
1. Mode of ungrouped data :- Defined as the value which occurs most frequent.
- The mode has the advantage in that it is easy to calculate
and eliminates the effect of extreme values.
- However, the mode may not exist and even if it does exit, it
may not be unique.
*Note:
• If a set of data has 2 measurements with higher frequency,
therefore the measurements are assumed as data mode and
known as bimodal data.
• If a set of data has more than 2 measurements with higher
frequency so the data can be assumed as no mode.
2. The mode for grouped data/frequency distribution data.
- When data has been grouped in classes and a frequency curve
is drawn to fit the data, the mode is the value of corresponding
to the maximum point on the curve.
- Determining the mode using formula.
 1 
x̂  L  c 

 1   2 
where
L = the lower class boundary of the modal class;
c= the size of the modal class interval;
1 = the difference between the modal class frequency and the class
before it;
and
 2 = the difference between the modal class frequency and the class
after it.
*Note:
- The class which has the highest frequency is called the
modal class.
Example 3.5 for ungrouped data :The mode for the observations 4,6,3,1,2,5,7,3 is 3.
Example 3.6 for grouped data based on table :Proof :CGPA
Table 3.4
(Class)
2.50 - 2.75
Frequency
2
2.75 - 3.00
3.00 - 3.25
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
Total
10
15
13
7
3
50
 1 
xˆ  L  c 
  3.179
 1   2 
 1 
5
xˆ  L  c 

3.00

0.25(
)  3.179

5 2
 1   2 
3.2 Measure of Dispersion
• The measure of dispersion or spread is the degree to which a
set of data tends to spread around the average value.
• It shows whether data will set is focused around the mean or
scattered.
• The common measures of dispersion are variance and standard
deviation.
• The standard deviation actually is the square root of the
variance.
• The sample variance is denoted by s2 and the sample standard
deviation is denoted by s.
3.2.1 Range
• The range is the simplest measure of dispersion to calculate.
Range = Largest value – Smallest value
Example 3.7:Table 3.5 gives the total areas in square miles of the four western SouthCentral states the United States.
State
Total Area (square miles)
Arkansas
53,182
Louisiana
49,651
Oklahoma
69,903
Texas
267, 277
Table 3.4
Range = Largest Value – Smallest Value
= 267, 277 – 49, 651 = 217, 626 square miles.
3.2.2 Variance
i. Variance for ungrouped data
- The variance of a sample (also known as mean square) for
the raw (ungrouped) data is denoted by s2 and defined by:
S2 
2
(
x

x
)

n 1
ii. Variance for grouped data
- The variance for the frequency distribution is defined by:
S2
f (x  x )


 fx 1
2
fx


2
 nx 2
n 1
Example 3.8 for ungrouped data :Variance for the Students’ CGPA for Data 1 is 0.105.
Example 3.9 for grouped data :The variance for frequency distribution in Table 3.5 is:
S
2
CGPA
(Class)
2.50 - 2.75
2.75 - 3.00
Frequency, f
2
10
Class
Mark, x
2.625
2.875
fx
5.250
28.750
3.00 - 3.25
15
3.125
46.875
3.25 - 3.50
3.50 - 3.75
3.75 - 4.00
13
7
3
3.375
3.625
3.875
Total
50
43.875
25.375
11.625
161.75
0
fx


2
 nx 2
n 1
fx2
13.781
82.656
146.48
4
148.07
8
91.984
45.047
528.03
1
Table 3.5
528.031  50(3.235)2

 0.0973
49
3.2.3 Standard Deviation
i. Standard deviation for ungrouped data :S2 
 (x  x )
2
n 1
ii. Standard deviation for grouped data :-
S2 

 fx  1
f ( x  x )2

2
2
fx

nx

n 1
Example 3.10 (Based on example 3.8) for ungrouped data:
S2 
0.105  0.3240
Example 3.11 (Based on example 3.9) for grouped data:
S2 
 fx
2
 nx 2
n 1
528.031  50(3.235)2

 0.3119
49
3.2.4 Rules of Data Dispersion
• Chebyshev’s Theorem
- At least of the observations in will be in the range of standard
deviation from mean, where is the positive number exceed 1.
Steps:
1)
2)
3)
4)
Determine the interval x  ks
1
(1

)
Find value of
k2
Change the value in step 2 to a percent
Write statement: at least the percent of data found in step 3 is in the
interval found in step 1
- Example 3.12:(1 
1
)
k2
=
(1 
1
)
22
= (1 
1
)
4
=
3
( )
4
= 0.75
Hence, according to Chebyshev`s Theorem, at least 75% of the value of a
data set lie within two standard deviations of the mean.
• Empirical Rule
- For a data that is normally distributed, at least
i. 68% of the observations lie in the interval ( x  s, x  s)
ii. 95% of the observations lie in the interval ( x  2s, x  2s)
iii. 99.7% of the observations lie in the interval ( x  3s, x  3s)
Related documents