Download Steps in the calculation of the sample median

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
MEASURES OF THE LOCATION or CENTRE OF A SAMPLE
Sample Mean: Let a sample has n observations x1 , x2 , . . . , xn .
The sample mean is denoted by the symbol
and is defined as
follows:
 is the upper case Greek letter “sigma”. In statistics it means to sum. Thus
xi means sum of the sample values xi .
(23)
Example: Stop watches are tested for reliability by counting the number of
cycles (on-off-restart) till some part of the mechanism fails. The data below
gives the failure time ( in thousands of cycles) for a random sample of a
certain type.
22,
2,
12,
18,
16
= xi / n = 70/5 = 14
_____________________________________________________
0
5
10
15
20
25
Sample Median : The value above and below which approximately 50% of
the sample falls is called the median. It is denoted by the symbol m.
Steps in the calculation of the sample median
(1) Order the sample values from smallest to largest.
(2) Calculate the approximate position of m: AP(m)=(50/100)n, where n is
the sample size.
(3) From (2) calculate the exact position of m as follows:
(i)
If AP(m) is not a whole number:
Pos(m) = AP(m) rounded up to the next whole number.
(ii) If AP(m) is a whole number:
Pos(m)= AP(m) + .5
(4) RESULT: m is the value at Pos(m).
If Pos(m) is not a whole number, there is no observation in that position.
In that case m is the average of the values in the two positions on either
side of Pos(m).
(24)
Example: Consider the stop watch data of the previous example.
For this sample,
(1) Ordered Sample: 2 12 16 18 22
(2) AP(m)= (.50)n = (.50)5 = 2.5
(3) Pos(m) = 3
(4) m=16
Interpretation: Approximately 50% of the failure times in the sample fall
below m=16.
Example: Incomes ( in $1000) of a sample of 8 residents of a
hypothetical town are:
29
16
17
19
964
26
10
29
= xi / n = 1110/8 = 138.75
Now we calculate sample median for this sample:
(1) Ordered Sample: 10 16
17
19
26
29
29
964
(2) AP(m) = (.50)n =
(3) Pos(m) =
(4) m =
Interpretation: Approximately 50% incomes are below 22.5 thousands
in the sample.
Note: Data values like “964” in the above example are called OUTLIERS.
These are extreme data values which are either
(a) infrequently occurring members of the population, or
(b) sample values that do not belong to the population such as measurement
errors, recording errors, or measurements from the wrong population.
In case (a) outliers should not be removed from the sample. In case (b) they
should be removed.
(25)
Questions: 1. How sensitive to outliers is
(a) The sample mean
?
(c) The sample median m?
2. In general which is a better measure of the centre ( or location) of a
sample when the sample is
(a) Skewed or has outliers ______________________
(b) symmetric __________________________
3. What does each of (a)-(c) below indicate about the shape of the sample?
(left skewed, right skewed or symmetric):
(a)
is well above m? ________________
(b)
is well below m?___________________
(c)
is close to m ?___________________
(26)
The Sample Mean and the Sample Median on Minitab
Consider the data from our previous example: Incomes ( in $1000) of a
sample of 8 residents of a hypothetical town.
C1: 10
16
17
19
26
29
29
964
MTB> mean C1
Mean = 138.75
MTB> medi C1
Median = 22.5
MTB> desc C1
C1
N MEAN MEDIAN
8
139
22
MIN
MAX
Q1
10
964
16
TRMEAN
139
Q3
29
STDEV
334
Note: To see how the outlier “964” affects the sample mean
the sample without the outlier.
C2: 10
16
17
19
26
29
SEMEAN
118
, consider
29
MTB> mean c2
Mean = 20.857
MTB>medi c2
Median = 19.00
Notice that the removal of the outlier has changed the sample mean from
138.75 to 20.857 while the sample median has changed little [from 22.5 to
19]
(27)
MEASURES OF VARIATION (SPREAD) OF A SAMPLE
VARIATION: a measure of how far the sample values are from their central
value.
There are many such measures. One convenient way is as follows:
Total Variation:
SSTO=  (
)2 .
The average value of the total variation for a sample is known as the
Sample Variance:
s2 = SSTO/n-1
Note that in calculating the average here we divide by n-1 rather than n. It
turns out that a sample variance defined this way has better statistical
properties.
Since the total variation and the sample variance are calculated as squares of
the observations, their unit of measurement is the square of the unit of
measurement of the data ( for example if the data is measured in centimeters
(cm) then SSTO and s2 are measured in centimeters2 (cm2)).
To obtain a measure of variation which has the same units of measurement
as the data we take the square root of s2 to get the
Sample Standard Deviation: s =  s2
(28)
Example: The stopwatch data from a previous example is given below:
22
We have calculated
_______
SSTO =
sample mean.
2
12
18
16
=14. We now calculate SSTO, s2 and s.
______________
_____________
, Total squared deviations of the sample values from the
S2 = SSTO/n-1 =
, roughly the average distance2 of
=
the sample values from the sample mean.
S=
= Some sort of “average distance” of the sample values
from the sample mean.
(29)
Computational Formula for Total variation
The computation of total variation can actually be done without calculating
An alternative formula is
SSTO =  xi2 - (xi)2/n
For the stopwatch data
__________
_______________
SSTO =
S2 = SSTO/n-1 =
S=
Example: Below is a srs of 36 grades of students in a mathematics course
57 83 75 79 60 75 60 51 56 65 52 78
57 47 89 50 62 54 96 66 73 62 68 64
60 55 75 78 59 62 57 77 64 68 57 61
For this sample n=36, x=2352,  x2 = 158,210
Sample mean =
SSTO =
=
=
=
S2 = SSTO/n-1 =
;s=
(30)
SAMPLE PERCENTILES
The median splits the sample evenly into two halves. We can define
measures that divide the sample into parts of different size.
The rth Percentile of a sample is a value Pr such that (approximately) r% of
the sample falls below Pr.
Example: If a students’s score on a university entrance exam is the 84th
percentile, this means that approximately 84% of all the scores in the exam
are below this student’s score.
STEPS IN THE CALCULATION OF Pr FOR A SAMPLE OF SIZE n
(1) Order the sample values from smallest to largest.
(2) Calculate the approximate position of Pr: AP(Pr)=(r/100)n
(3) From (2) calculate the exact position of Pr as follows:
If AP(r) is not a whole number: Pos(Pr) = AP(Pr) rounded up to the next
whole number.
If AP(Pr) is a whole number: Pos(Pr) = AP(Pr)+.5
(4) RESULT: ‘Pr’ is the value at Pos(Pr).
If Pos(Pr) is not a whole number, there is no observation in that position. In
that case, Pr is the average of the values in the two positions on either side of
Pos(Pr).
Example: Below are SAT (scholastic aptitude test) mathematics scores for a
srs of 32 university applicants. [the sample has been ordered for
convenience]
484 490 506 509 523 532 539 539 544 545 550
558 578 580 591 593 610 610 630 634 641 647
648 655 662 673 682 688 693 726 745 780
(i) 60th Percentile: AP(P60) =
; Pos(P60 ) =
P60 =
Interpretation:
(ii) 25th Percentile:
AP(P25)=
; Pos(P25) =
P25 =
Interpretation:
(31)
SPECIAL PERCENTILES
First Quartile: Q1 = P25
Second Quartile: Q2 = P50 = m
Third Quartile: Q3 = P75
Example: Below are the scores of a profile test given by a psychologist to
random sample of 30 convicted felons. The scores have been ordered from
smallest to largest.
146 165 171 179 181 184 190 191 192 192
192 193 195 196 196 197 198 199 200 200
201 203 204 205 206 213 215 221 232 247
First Quartile:
Second Quartile:
Third Quartile:
(32)