Download stat slides - measurements (mean, st.dev., etc.)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MEASURES OF CENTRAL TENDENCY
A NOTATION FOR ADDING A PATTERNED SEQUENCE

EXAMPLE.
This is the SUMMATION SYMBOL (uppercase Greek letter ‘sigma’).
It tells us to: “compute all the values of the expression next to it, then add!”.
For example, X = X1 + X2 + X3 + X4 + . . . (all values of X)
Suppose we have the following data X :
values for the variables X and Y.
Y:
12
5
4
9
10
9
A.
Y  5
+ 9 + 9 + 8 + 10 + 8  49
B.
XY 
(12)(5) + (4)(9) + (10)(9) + (6)(8) + (8)(10) + (8)(8)  378
C.
XY 2 
D.
X2Y3 
6
8
8
10
8
8
(12)(52) + (4)(92) + (10)(92) + (6)(82) + (8)(102) + (8)(82)  3130
(122)(53) + (42)(93) + (102)(93) + (62)(83) + (82)(103) + (82)(83)  217764
MEASURES OF CENTRAL TENDENCY — ANOTATION FOR: ADDING A PATTERNED SEQUENCE
Page 1
THE MODE OF A DATA SET
TAKE
NOTE!
IMPORTANT REMINDER!
The formulas given here apply for sample data only. From here onwards,
the data we will be dealing with are (mostly) sample data, and the formulas
are sample measurements.
As for population data and formulas, our goal in Stat is actually to provide
good estimates to population measurements using the sample data.
If a data set has certain values occurring more than once, the most frequently occurring
among them is called the MODE. If the data set has only one such value, it is described as
UNIMODAL; if it has two, BIMODAL; and if it has more than two, MULTIMODAL.
A data set may have no modal value (i.e., if each data value is distinct). And it’s fine!
EXAMPLE.
Find the mode. X : 2.3
MODE = 5.4, 4.8
EXAMPLE.
9.0
5.4
6.8
3.5
4.8
5.4
7.2
7.8
6.5
9.2
(The data set X is bimodal)
Find the mode. Y : 4.2
MODE = N.A.
4.8
1.4
3.8
4.7
6.3
5.5
8.3
(The data set Y has no mode)
MEASURES OF CENTRAL TENDENCY — THE MODE OF A DATA SET
Page 2
THE MEDIAN OF A DATA SET
The MEDIAN of a data set is the data value in the middle position when the data set is
arranged in order. The MEDIAN divides the data sets into two parts of equal size.
For a data set with N elements the location
of the MEDIAN is given by the following formula:
EXAMPLE.
X : 2.3
Find the median.
Arrange: X : 2.3
3.5
4.8
4.8
4.8
9.0
5.4
MEDIAN LOCATION  N + 1
2
5.4
5.4
6.8
6.8
3.5
9.0
4.8
5.4
7.2
9.2
Compute: MEDIAN LOCATION  (9+1)/2  5th
Locate: MEDIAN  5.4
TAKE If the calculated median location is not a whole number (has a 0.5), the
NOTE! median is the midpoint of the two values around the calculated location.
EXAMPLE.
Find the median.
Arrange: X : 1.4
3.8
X : 4.2
4.2
1.4
4.7
3.8
5.5
4.7
6.3
6.3
6.5
5.5
7.8
7.8
8.3
6.5
9.2
8.3
9.2
Compute: MEDIAN LOCATION  (10+1)/2  5.5th
Locate: MEDIAN  (5.5+6.3)/2  5.9
MEASURES OF CENTRAL TENDENCY — THE MEDIAN OF A DATA SET
Page 3
THE MEAN OF A DATA SET
The MEAN of a data set is the data value that you can expect to find at the center, basing
on the pattern of values in the data set. (Hence, it is also called the EXPECTED VALUE.)
The formula for the mean of a data set is:
where N is the number of data values in the set
and the X’s are the individual data values.
EXAMPLE.
Find the mean for the sample data. X : 2.3
MEAN X 
X
N

MEAN X 
4.8
9.0
5.4
6.8
X
N
3.5
4.8
5.4
7.2
2.3 + 4.8 + 9.0 + 5.4 + 6.8 + 3.5 + 4.8 + 5.4 + 7.2
9
 5.4666...
 5.47
EXAMPLE.
(Always round off results to two decimal places!)
Find the mean for the sample data. X : 4.2 1.4 3.8 4.7 6.3 5.5 7.8 6.5 9.2 8.3
MEAN X 
X
4.2 + 1.4 + 3.8 + 4.7 + 6.3 + 5.5 + 7.8 + 6.5 + 9.2 + 8.3
N
9
 5.77

MEASURES OF CENTRAL TENDENCY — THE MEAN OF A DATA SET
Page 4
MEASURES OF VARIABILITY
EXAMPLE.
Which of the following data values are more “scattered” (or “far apart”?
Data Set #3:
10
12
13
13
15
17
19
19
20
Data Set #4:
10
15
19
23
27
32
36
40
45
By, simple observation, Data Set #2 is more “scattered”.
EXAMPLE.
Which of the following data values are more “scattered” (or “far apart”?
Data Set #1:
2.3
4.8
9.0
5.4
6.8
3.5
4.8
5.4
7.2
Data Set #2:
4.2
1.4
3.8
4.7
6.3
5.5
7.8
6.5
9.2
8.3
I don’t know! I cant decide anymore!
THE VARIANCE AND STANDARD DEVIATION OF A DATA SET
N ( X2 )  ( X )2
The VARIANCE and STANDARD DEVIATION of
2
VARIANCE s 
a data set measure how scattered the data values
N(N  1)
are around the mean value. The formulas for the
variance and standard deviation of a data set is: STANDARD DEVIATION s  VARIANCE
MEASURES OF VARIABIILITY — THE VARIANCE AND STANDARD DEVIATION OF A DATA SET
Page 5
EXAMPLE.
Find the variance and standard
X : 2.3
deviation of the ff sample data.
4.8
X2
2.3
5.29
4.8
23.04
9.0
81
5.4
29.16
6.8
46.24
3.5
12.25
STANDARD s 
DEVIATION
4.8
23.04
s 
5.4
29.16
s  2.01
9.2
51.84
  49.2
VARIANCE s 
N(
5.4
6.8
3.5
4.8
5.4
9.2
X2
X
2
9.0
)  ( X )2
N(N  1)
10(301.2)  (49.2)2
s 
10(10  1)
2
s2  4.03
VARIANCE
4.03
  301.2
MEASURES OF VARIABIILITY — THE VARIANCE AND STANDARD DEVIATION OF A DATA SET
Page 6
EXAMPLE.
Find the variance and standard
deviation of the ff sample data.
X : 4.2 1.4 3.8 4.7 6.3 5.5 7.8 6.5 9.2 8.3
X2
X
X2
4.2
17.64
1.4
1.96
3.8
14.44
4.7
22.09
6.3
39.69
5.5
30.25
STANDARD s 
DEVIATION
7.8
60.84
s 
6.5
42.25
s  2.35
9.2
84.64
8.3
68.89
  57.7
  301.2
2
VARIANCE s 
N(
)  ( X )2
N(N  1)
10(382.69)  (57.7)2
s 
10(10  1)
2
s2  5.53
VARIANCE
5.53
MEASURES OF VARIABIILITY — THE VARIANCE AND STANDARD DEVIATION OF A DATA SET
Page 7
MEASURES OF relative position
THE Z-SCORE OR THE STANDARD SCORE
The Z-SCORE or STANDARD SCORE of a data value measures
how far this data value from the MEAN of the data set in terms
of the STANDARD DEVIATION of the data set.
Given the data set:
49.5
57
59
Z-SCORE Z 
XX
s
...
with mean X = 52 and standard deviation s = 2.5
s  2.5
s  2.5
49.5
s  2.5
57
X  52
Data value 49.5 is one s.d.’s away
to the left of the mean. So Z=-1.
Data value 57 is two s.d.’s away
to the right of the mean. So Z=+2.
MEASURES OF RELATIVE POSITION — THE Z-SCORE OR THE STANDARD SCORE
Page 8
EXAMPLE.
We have the ff. sample data.
X : 2.3
We have also calculated: MEAN X  5.47
X = 4.8:
Z = 2.3 - 5.47
2.01
 –1.58
X = 5.4:
Z = 5.4 - 5.47
2.01

–0.03
X = 9.0:
Z = 9.0 - 5.47
2.01

1.76
NOTE!
ALSO!
4.8
9.0
5.4
6.8
3.5
4.8
5.4
7.2
and STAND. DEV. s  2.01
If z<0 (z is negative), it means that the data value lies on the left side of the MEAN.
If z>0 (z is positive), it means that the data value lies on the right side of the MEAN.
If z=0 (z is zero), it means that the data value is equal to the MEAN.
If we convert an entire data set into z-scores a, we will come up with a new data set
(small values between -3 and 3) with MEAN = 0 and STANDARD DEVIATION = 1.
MEASURES OF RELATIVE POSITION — THE Z-SCORE OR THE STANDARD SCORE
Page 9
RANKING DATA VALUES BY PERCENTILES
MEASURES OF RELATIVE POSITION — THE Z-SCORE OR THE STANDARD SCORE
Page 10
Related documents