Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Math 211
Introduction to Statistics
Chapter III The Measures of Central Tendency
Index (subscript) Notation: Let the symbol X i (read ‘ X sub i) denote any of the N values
X1 , X 2 ,..., X N assumed by a variable X . The letter i in X i  i  1, 2,..., N  is called an index
or subscript. The letters j , k , p, q or s can also be used.
N
Summation Notation:
å
X i = X 1 + X 2 + .... + X N
i= 1
N
Example :
å
X i Yı = X 1Y1 + X 2Y2 + .... + X N YN
i= 1
N
å
N
aX i = aX 1 + aX 2 + .... + aX N = a (X 1 + X 2 + .... + X N ) = a å X i , a Î
i= 1
i= 1
Averages (Measures of Central Tendency)
The average of a set of numbers is the value which best represents it. There are three different
types of averages. Each has advantages and disadvantages depending on the data and intended
purpose.
Mean
This is also known as the arithmetic mean. It is found by dividing the sum of the set of
numbers with the actual number of values and defined as
N
X  X 2  ...  X N
X 1

N
X
i 1
N
i

X
N
Example : Find the mean of 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10.
Sum of values: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55
Number of values = 10
Mean of values= X = 55 / 10 = 5.5
Note: If the numbers X1 , X 2 ,..., X k occur f1 , f 2 ,..., f k times respectively, (occur with
frequencies f1 , f 2 ,..., f k ), the arithmetic mean is,
k
X
fX
i 1
i
i
N
k
where N   X i is the total frequency.
i 1
Sonuç Zorlu
Lecture Notes
1
Example : The grades of a student on six examinations were 84,91,72,68,91, and 72. Find the
arithmetic mean.
The arithmetic mean
k
å
X=
fi X i
i= 1
N
=
1(84)+ 2 (91)+ 2 (72)+ 1(68)
1+ 2 + 2 + 1
= 79.67
Example : If 5,8,6 and 2 occur with frequencies 3,2,4 and 1 respectively, the arithmetic mean
is
X=
3(5) + 2(8) + 4(6) + 1(2) 57
=
= 5.7
3+ 2 + 4 + 1
10
The Weighted Arithmetic Mean
The weighted arithmetic mean of a set of N numbers X1 , X 2 ,..., X N is defined as
N
w X + w2 X 2 + ... + wk X k
X= 1 1
=
w1 + w2 + ... + wk
å
wi X i
i= 1
k
å
wi
i= 1
where w j represents the weight of the j th value.
Example: If a final examination is weighted 4 times as much as a quiz, a midterm
examination 3 times as much as a quiz, and a student has a final examination grade of 80, a
midterm examination grade of 95 and quiz grades of 90, 65 and 70, the mean grade is
X=
1(90)+ 1(65)+ 1(70)+ 3(95)+ 4 (80)
1+ 1+ 1+ 3 + 4
=
830
= 83 .
10
Properties of the Arithmetic Mean
(1) The algebraic sum of the deviations of a set of numbers from their arithmetic
N
mean is zero, that is
å
(Xi - X ) = 0 .
i= 1
n
(2)
å
( X i - a ) 2 is minimum if and only if a = X .
i= 1
Sonuç Zorlu
Lecture Notes
2
(3) If f1 numbers have mean m1 , f 2 numbers have mean m2 ,…, f k numbers
have mean mk , then the mean of all the numbers is
f1m1 + f 2 m2 + ... + f k mk
f1 + f 2 + ... + f k
X=
æweighted arithmetic ÷
ö
çç
÷
÷
çèmean of all the means÷
ø
(4) If A is any guessed or assumed arithmetic mean and if d j = X j - A are the
deviations of X j from A , then
N
å
dj
i= 1
X = A+
N
= A+
å
d
N
( for raw data )
= A+ d
k
å
X = A+
f jdj
i= 1
k
= A+
å
å
fd
N
fj
= A+ d
( for grouped data)
i= 1
Arithmetic Mean Computed from Grouped Data
k
å
Formula 1. X = A +
f jdj
i= 1
k
å
= A+
fj
å
fd
where A is any guessed or assumed
N
i= 1
class mark, d j = X j - A are the deviations of X j from A .
k
å
Formula 2. X =
X j fj
i= 1
k
å
where X j is the class mark of the corresponding class.
fj
i= 1
Sonuç Zorlu
Lecture Notes
3
Median
The median of a set of numbers arranged in an array is either the middle value or the
arithmetic mean of the two middle values. That is,
 X  n 1 / 2 if n is odd

X   X n / 2  X n / 2 1
 
 
if n is even


2
The disadvantage of median is that it is not sensitive against changes in the data.
Example: Find the median of 2, 4, 8, 7, 4, 6, 10, 8, and 5.
Array: 2, 4, 4, 5, 6, 7, 8, 8, 10
Middle value = ( ( 9 + 1 ) / 2 ) th value = 5 th value= X 5
Median = 6
The Median for Grouped Data
æN
ö
çç - (å f ) ÷
÷
÷
1
ç
÷
Median = X = L1 + çç 2
c
÷
÷
çç
÷
f median
÷
÷
çè
÷
ø
where
L1 = lower class boundary of the median class
N = number of item s in the data (total frequency )
(å f ) =
1
sum of frequencies of all classes lower than the median class
f median = frequency of the median class
c = size of the median clas s int erval
Mode
The mode is the value which occurs most frequently in the set of values. The mode of
the set of values is also known as the modal value. The mode may be unique, may not
exist or may be more than one.
Example: Find the mode of 1, 2, 2, 3, 4, 4, 5, 5, 5, 5, 7, 8, 8 and 9.
Modal value = 5, since it has the highest frequency.
Sonuç Zorlu
Lecture Notes
4
In the case of grouped data where a frequency curve has constructed to fit the data, the mode
will be the value (or values) of X corresponding to the maximum point (or points) on the
curve. This value of X is sometimes denoted by X .
From a frequency distribution or histogram the mode can be obtained by
æ D1 ÷
ö
÷
Mode = X = L1 + çç
c
÷
çèD 1 + D 2 ÷
ø
where
L1 = lower class boundary of the mod al class
D 1 = excess of mod al frequency over frequency of the next lower class
D 2 = excess of mod al frequency over frequency of the next higher class
c = size of the median clas s int erval
The Empirical Relation between the Mean, Median and Mode
MEAN  MODE  3  MEAN  MEDIAN 
The above relation is true for unimodal frequency curves which are asymmetrical.
The Geometric Mean G
Let X1 , X 2 ,..., X N be the sample values, the geometric mean is
G  N X1 X 2 ... X N .
Example: The geometric mean of the numbers 2,4 and 8 is
G  3 2.4.8  3 64  4 .
The Harmonic Mean H
Let X1 , X 2 ,..., X N be the sample values, the harmonic mean is
H
1
1
N

N
X
i 1
Sonuç Zorlu
i
N
1
X
Lecture Notes
5
Example: The harmonic mean of the numbers 2,4 and 8 is
H
3
 3.43
1 1 1
 
2 4 8
The Relation between the Arithmetic, Geometric and Harmonic Means:
H G X
Quartiles, Deciles and Percentiles
Three of these divide the data set into four, ten or hundred divisions, respectively.
Quartiles, Deciles and Percentiles are measures of position useful for comparing scores within
one set of data. You probably all took some type of college placement exam at some point. If
your composite math score was say 28, it might have been reported that this score was in the
94th percentile. What does this mean? This does not mean you received a 94% on the test. It
does mean that of all the students who took that exam, 94% of them scored lower than you
did (and 6% higher). For a set of data you can divide the data into three quartiles ( Q1 , Q2 , Q3 ),
nine deciles ( D1 , D2 ,...D9 ) and 99 percentiles ( P1 , P2 ,...., P99 ). The quartile Q1 separates the
bottom 25% from the top 75%, Q2 is the median and Q3 separates the top 25% from the
bottom 75%. To work with percentiles, deciles and quartiles - you need to learn to do two
different tasks. First you should learn how to find the percentile that corresponds to a
particular score and then how to find the score in a set of data that corresponds to a given
percentile.
Sonuç Zorlu
Lecture Notes
6
Exercise 1: The table shows the speed distribution of vehicles on Magusa-Lefkosa Road on a
typical day.
Speed(km/hr) No. of
Class marks
vehicles X i
60-69
138
64.5
70-79
163
74.5
80-89
325
84.5
90-99
541
94.5
100-109
427
104.5
110-119
214
114.5
120-129
110
124.5
130-139
52
134.5
140-149
30
144.5
N=2000
(a)
(b)
(c)
(d)
dj  X j  A
-40
-30
-20
-10
0
10
20
30
40
f jd j
-5520
-4890
-6500
-5410
0
2140
2200
1560
1200
-15210
Find the mean speed.
Find the median speed.
Find the modal speed.
Find Q1 , D3 and P95 .
k
Solution. (a) Let A  104.5 , then X = A +
å
f jd j
i= 1
k
å
= 104.5 fj
15210
= 96.895km / hr .
2000
i= 1
(b) Since N  2000, the 1000th value will be the median, this value can be found as
follows:
æN
ö
çç - (å f ) ÷
÷
æ1000 - 626 ö
1÷
ç
÷
÷
X = L1 + çç 2
c = 89.5 + çç
10 = 89.5 + (0.69)10 = 96.4km / hr
÷
÷
÷
÷
çè
çç
ø
÷
f median
541
÷
÷
çè
÷
ø
(c) The modal speed can be found as follows:
æ D1 ÷
ö
æ 216 ö
÷
÷
X = L1 + çç
c = 89.5 + çç
÷
÷
÷10 = 89.5 + 6.55 = 96.05km / hr .
ç
çèD 1 + D 2 ÷
è
ø
216
+
114
ø
æN
ö
çç - (å f ) ÷
÷
æ500 - 301
ö
÷
1
ç
÷
÷
çç
(d) Q1 = L1 + çç 4
c
=
79.5
+
325
÷
÷
÷
÷10 = 85.62km / hr
çè
çç
ø
÷
fQ1
÷
÷
çè
÷
ø
Sonuç Zorlu
Lecture Notes
7
æ3N
ö
çç
- (å f ) ÷
÷
÷
1
ç
÷
÷
D3 = P30 = L1 + çç 10
c = 79.5 +
÷
çç
÷
f D3
÷
÷
ççè
÷
ø
æ95 N
ö
çç
- (å f ) ÷
÷
÷
1
ç
÷
÷
P95 = L1 + çç 100
c = 119.5 +
÷
çç
÷
f P 95
÷
÷
ççè
÷
ø
æ600 - 301ö
÷
ç
÷
÷10 = 88.7km / hr
èçç 325 ø
æ1900 - 1808 ÷
ö
çç
÷
÷10 = 127.89km / hr.
èç
ø
110
Exercise 2: The following table shows a frequency distribution of the weekly wages of 65
employees at the P&R Company.
Wages
\$250.00-259.99
\$260.00-269.99
\$270.00-279.99
\$280.00-289.99
\$290.00-299.99
\$300.00-309.99
\$310.00-319.99
(a)
(b)
(c)
(d)
No. of
employees
8
10
16
14
10
5
2
N=65
Find the mean wage
Find the modal wage
Find the median wage
Find Q3 and D8 .
Exercise 3: Consider the following frequency distribution.
classes frequency
10-14
15-19
20-24
25-29
30-34
7
11
14
13
5
Total 50
Xi
12
17
22
27
32
fi X i
84
187
308
351
160
1090
di  X i  A
-10
-5
0
5
10
fi di
-70
-55
0
65
50
0
(a) Find the (approximate) mean using formula 1 and formula 2. Compare the results.
k
Formula 1. Let A  22 , then X = A +
å
f jd j
i= 1
k
å
= 22 fj
10
= 21.8
50
i= 1
Sonuç Zorlu
Lecture Notes
8
k
å
Formula 2. X =
X j fj
i= 1
k
å
=
fj
1090
= 21.8
50
i= 1
(b) Find the mode.
The modal class is the third class with frequency 14. 1  14  11  3,  2  14  13  1 .
æ D1 ÷
ö
æ 3 ö
÷
÷
Thus, X = L1 + çç
c = 19.5 + çç
÷
÷
÷5 = 23.25
ç
çèD + D ÷
è
ø
3
+
1
ø
1
2
(c) Find P90 and P10 .
æ90 N
ö
çç
- (å f ) ÷
÷
1÷
ç
÷
÷
P90 = L1 + çç 100
c = 24.5 +
÷
çç
÷
f P 90
÷
÷
÷
ççè
ø
æ10 N
ö
çç
- (å f ) ÷
÷
÷
1
ç
÷
÷
P10 = L1 + çç 100
c = 9.5 +
÷
çç
÷
f P10
÷
÷
ççè
÷
ø
æ45 - 32 ö
÷
ç
÷
÷5 = 29.5 .
èçç 13 ø
æ5 - 0 ÷
ö
çç
÷
÷5 = 13.07
èç 7 ø
Exercise 4: A student’s grades in laboratory, lecture, and recitation parts of a computer
course were 71, 78, and 89, respectively.
(a) If the weights accorded these grades are 2,4, and 5, respectively, what is an average